public inbox for linux-hyperv@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] mshv: Refactor memory region management and map pages at creation
@ 2026-03-30 20:04 Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 1/7] mshv: Convert from page pointers to PFNs Stanislav Kinsburskii
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

This series refactors the mshv memory region subsystem in preparation
for mapping populated pages into the hypervisor at movable region
creation time, rather than relying solely on demand faulting.

The primary motivation is to ensure that when userspace passes a
pre-populated mapping for a movable memory region, those pages are
immediately visible to the hypervisor. Previously, all movable regions
were created with HV_MAP_GPA_NO_ACCESS on every page regardless of
whether the backing pages were already present, deferring all mapping
to the fault handler. This added unnecessary fault overhead and
complicated the initial setup of child partitions with pre-populated
memory.

The series takes a bottom-up approach:

- Patches 1-2 lay the groundwork by converting internal data structures
from page pointers to PFNs and teaching the range processing
infrastructure to handle holes (invalid PFNs) uniformly. The PFN
conversion eliminates redundant page_to_pfn()/pfn_to_page() conversions
between the HMM interface (which returns PFNs) and the hypervisor
hypercalls (which consume PFNs). The hole handling enables mapping
regions that contain a mix of present and absent pages, remapping holes
with no-access permissions to preserve hypervisor dirty page tracking
for precopy live migration.

- Patch 3 extends HMM fault handling to support memory regions that span
multiple VMAs with different protection flags, which is required for
flexible guest memory layouts.

- Patch 4 consolidates region setup by moving pinned region preparation
into mshv_regions.c, making five helper functions static, and fixing
a pre-existing bug where mshv_region_map() failures on non-encrypted
partitions were silently ignored.

- Patch 5 is the core functional change: movable regions now collect
already-present PFNs from userspace at creation time and map them
into the hypervisor immediately. A new do_fault parameter controls
whether hmm_range_fault() should fault in missing pages or only
collect those already present.

- Patches 6-7 are cleanups: extracting the MMIO mapping path into its
own function for consistency with the pinned and movable paths, and
adding a tracepoint for GPA mapping hypercalls to aid debugging.

---

Stanislav Kinsburskii (7):
      mshv: Convert from page pointers to PFNs
      mshv: Add support to address range holes remapping
      mshv: Support regions with different VMAs
      mshv: Move pinned region setup to mshv_regions.c
      mshv: Map populated pages on movable region creation
      mshv: Extract MMIO region mapping into separate function
      mshv: Add tracepoint for map GPA hypercall


 drivers/hv/mshv_regions.c      |  580 +++++++++++++++++++++++++++++-----------
 drivers/hv/mshv_root.h         |   29 +-
 drivers/hv/mshv_root_hv_call.c |   53 ++--
 drivers/hv/mshv_root_main.c    |   99 +------
 drivers/hv/mshv_trace.h        |   36 ++
 5 files changed, 503 insertions(+), 294 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] mshv: Convert from page pointers to PFNs
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-04-13 21:08   ` Michael Kelley
  2026-03-30 20:04 ` [PATCH 2/7] mshv: Add support to address range holes remapping Stanislav Kinsburskii
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

The HMM interface returns PFNs from hmm_range_fault(), and the
hypervisor hypercalls operate on PFNs. Storing page pointers in
between these interfaces requires unnecessary conversions and
temporary allocations.

Store PFNs directly in memory regions to match the natural data flow.
This eliminates the temporary PFN array allocation in the HMM fault
path and reduces page_to_pfn() conversions throughout the driver.
Convert to page structs via pfn_to_page() only when operations like
unpin_user_page() require them.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c      |  297 ++++++++++++++++++++++------------------
 drivers/hv/mshv_root.h         |   20 +--
 drivers/hv/mshv_root_hv_call.c |   50 +++----
 drivers/hv/mshv_root_main.c    |   30 ++--
 4 files changed, 212 insertions(+), 185 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index fdffd4f002f6..b1a707d16c07 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -18,12 +18,13 @@
 #include "mshv_root.h"
 
 #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
+#define MSHV_INVALID_PFN				ULONG_MAX
 
 /**
  * mshv_chunk_stride - Compute stride for mapping guest memory
  * @page      : The page to check for huge page backing
  * @gfn       : Guest frame number for the mapping
- * @page_count: Total number of pages in the mapping
+ * @pfn_count: Total number of pages in the mapping
  *
  * Determines the appropriate stride (in pages) for mapping guest memory.
  * Uses huge page stride if the backing page is huge and the guest mapping
@@ -32,18 +33,18 @@
  * Return: Stride in pages, or -EINVAL if page order is unsupported.
  */
 static int mshv_chunk_stride(struct page *page,
-			     u64 gfn, u64 page_count)
+			     u64 gfn, u64 pfn_count)
 {
 	unsigned int page_order;
 
 	/*
 	 * Use single page stride by default. For huge page stride, the
 	 * page must be compound and point to the head of the compound
-	 * page, and both gfn and page_count must be huge-page aligned.
+	 * page, and both gfn and pfn_count must be huge-page aligned.
 	 */
 	if (!PageCompound(page) || !PageHead(page) ||
 	    !IS_ALIGNED(gfn, PTRS_PER_PMD) ||
-	    !IS_ALIGNED(page_count, PTRS_PER_PMD))
+	    !IS_ALIGNED(pfn_count, PTRS_PER_PMD))
 		return 1;
 
 	page_order = folio_order(page_folio(page));
@@ -57,60 +58,61 @@ static int mshv_chunk_stride(struct page *page,
 /**
  * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
  *                             in a region.
- * @region     : Pointer to the memory region structure.
- * @flags      : Flags to pass to the handler.
- * @page_offset: Offset into the region's pages array to start processing.
- * @page_count : Number of pages to process.
- * @handler    : Callback function to handle the chunk.
+ * @region    : Pointer to the memory region structure.
+ * @flags     : Flags to pass to the handler.
+ * @pfn_offset: Offset into the region's PFNs array to start processing.
+ * @pfn_count : Number of PFNs to process.
+ * @handler   : Callback function to handle the chunk.
  *
- * This function scans the region's pages starting from @page_offset,
- * checking for contiguous present pages of the same size (normal or huge).
- * It invokes @handler for the chunk of contiguous pages found. Returns the
- * number of pages handled, or a negative error code if the first page is
- * not present or the handler fails.
+ * This function scans the region's PFNs starting from @pfn_offset,
+ * checking for contiguous valid PFNs backed by pages of the same size
+ * (normal or huge). It invokes @handler for the chunk of contiguous valid
+ * PFNs found. Returns the number of PFNs handled, or a negative error code
+ * if the first PFN is invalid or the handler fails.
  *
- * Note: The @handler callback must be able to handle both normal and huge
- * pages.
+ * Note: The @handler callback must be able to handle valid PFNs backed by
+ * both normal and huge pages.
  *
  * Return: Number of pages handled, or negative error code.
  */
-static long mshv_region_process_chunk(struct mshv_mem_region *region,
-				      u32 flags,
-				      u64 page_offset, u64 page_count,
-				      int (*handler)(struct mshv_mem_region *region,
-						     u32 flags,
-						     u64 page_offset,
-						     u64 page_count,
-						     bool huge_page))
+static long mshv_region_process_pfns(struct mshv_mem_region *region,
+				     u32 flags,
+				     u64 pfn_offset, u64 pfn_count,
+				     int (*handler)(struct mshv_mem_region *region,
+						    u32 flags,
+						    u64 pfn_offset,
+						    u64 pfn_count,
+						    bool huge_page))
 {
-	u64 gfn = region->start_gfn + page_offset;
+	u64 gfn = region->start_gfn + pfn_offset;
 	u64 count;
-	struct page *page;
+	unsigned long pfn;
 	int stride, ret;
 
-	page = region->mreg_pages[page_offset];
-	if (!page)
+	pfn = region->mreg_pfns[pfn_offset];
+	if (!pfn_valid(pfn))
 		return -EINVAL;
 
-	stride = mshv_chunk_stride(page, gfn, page_count);
+	stride = mshv_chunk_stride(pfn_to_page(pfn), gfn, pfn_count);
 	if (stride < 0)
 		return stride;
 
 	/* Start at stride since the first stride is validated */
-	for (count = stride; count < page_count; count += stride) {
-		page = region->mreg_pages[page_offset + count];
+	for (count = stride; count < pfn_count ; count += stride) {
+		pfn = region->mreg_pfns[pfn_offset + count];
 
-		/* Break if current page is not present */
-		if (!page)
+		/* Break if current pfn is invalid */
+		if (!pfn_valid(pfn))
 			break;
 
 		/* Break if stride size changes */
-		if (stride != mshv_chunk_stride(page, gfn + count,
-						page_count - count))
+		if (stride != mshv_chunk_stride(pfn_to_page(pfn),
+						gfn + count,
+						pfn_count - count))
 			break;
 	}
 
-	ret = handler(region, flags, page_offset, count, stride > 1);
+	ret = handler(region, flags, pfn_offset, count, stride > 1);
 	if (ret)
 		return ret;
 
@@ -118,70 +120,73 @@ static long mshv_region_process_chunk(struct mshv_mem_region *region,
 }
 
 /**
- * mshv_region_process_range - Processes a range of memory pages in a
- *                             region.
- * @region     : Pointer to the memory region structure.
- * @flags      : Flags to pass to the handler.
- * @page_offset: Offset into the region's pages array to start processing.
- * @page_count : Number of pages to process.
- * @handler    : Callback function to handle each chunk of contiguous
- *               pages.
+ * mshv_region_process_range - Processes a range of PFNs in a region.
+ * @region    : Pointer to the memory region structure.
+ * @flags     : Flags to pass to the handler.
+ * @pfn_offset: Offset into the region's PFNs array to start processing.
+ * @pfn_count : Number of PFNs to process.
+ * @handler   : Callback function to handle each chunk of contiguous
+ *              valid PFNs.
  *
- * Iterates over the specified range of pages in @region, skipping
- * non-present pages. For each contiguous chunk of present pages, invokes
- * @handler via mshv_region_process_chunk.
+ * Iterates over the specified range of PFNs in @region, skipping
+ * invalid PFNs. For each contiguous chunk of valid PFNS, invokes
+ * @handler via mshv_region_process_pfns.
  *
- * Note: The @handler callback must be able to handle both normal and huge
- * pages.
+ * Note: The @handler callback must be able to handle PFNs backed by both
+ * normal and huge pages.
  *
  * Returns 0 on success, or a negative error code on failure.
  */
 static int mshv_region_process_range(struct mshv_mem_region *region,
 				     u32 flags,
-				     u64 page_offset, u64 page_count,
+				     u64 pfn_offset, u64 pfn_count,
 				     int (*handler)(struct mshv_mem_region *region,
 						    u32 flags,
-						    u64 page_offset,
-						    u64 page_count,
+						    u64 pfn_offset,
+						    u64 pfn_count,
 						    bool huge_page))
 {
+	u64 pfn_end;
 	long ret;
 
-	if (page_offset + page_count > region->nr_pages)
+	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
+		return -EOVERFLOW;
+
+	if (pfn_end > region->nr_pfns)
 		return -EINVAL;
 
-	while (page_count) {
+	while (pfn_count) {
 		/* Skip non-present pages */
-		if (!region->mreg_pages[page_offset]) {
-			page_offset++;
-			page_count--;
+		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
+			pfn_offset++;
+			pfn_count--;
 			continue;
 		}
 
-		ret = mshv_region_process_chunk(region, flags,
-						page_offset,
-						page_count,
-						handler);
+		ret = mshv_region_process_pfns(region, flags,
+					       pfn_offset, pfn_count,
+					       handler);
 		if (ret < 0)
 			return ret;
 
-		page_offset += ret;
-		page_count -= ret;
+		pfn_offset += ret;
+		pfn_count -= ret;
 	}
 
 	return 0;
 }
 
-struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
+struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pfns,
 					   u64 uaddr, u32 flags)
 {
 	struct mshv_mem_region *region;
+	u64 i;
 
-	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
+	region = vzalloc(sizeof(*region) + sizeof(unsigned long) * nr_pfns);
 	if (!region)
 		return ERR_PTR(-ENOMEM);
 
-	region->nr_pages = nr_pages;
+	region->nr_pfns = nr_pfns;
 	region->start_gfn = guest_pfn;
 	region->start_uaddr = uaddr;
 	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
@@ -190,6 +195,9 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
 		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
 
+	for (i = 0; i < nr_pfns; i++)
+		region->mreg_pfns[i] = MSHV_INVALID_PFN;
+
 	kref_init(&region->mreg_refcount);
 
 	return region;
@@ -197,15 +205,15 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 
 static int mshv_region_chunk_share(struct mshv_mem_region *region,
 				   u32 flags,
-				   u64 page_offset, u64 page_count,
+				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      region->mreg_pages + page_offset,
-					      page_count,
+					      region->mreg_pfns + pfn_offset,
+					      pfn_count,
 					      HV_MAP_GPA_READABLE |
 					      HV_MAP_GPA_WRITABLE,
 					      flags, true);
@@ -216,21 +224,21 @@ int mshv_region_share(struct mshv_mem_region *region)
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
 
 	return mshv_region_process_range(region, flags,
-					 0, region->nr_pages,
+					 0, region->nr_pfns,
 					 mshv_region_chunk_share);
 }
 
 static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
 				     u32 flags,
-				     u64 page_offset, u64 page_count,
+				     u64 pfn_offset, u64 pfn_count,
 				     bool huge_page)
 {
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-					      region->mreg_pages + page_offset,
-					      page_count, 0,
+					      region->mreg_pfns + pfn_offset,
+					      pfn_count, 0,
 					      flags, false);
 }
 
@@ -239,30 +247,30 @@ int mshv_region_unshare(struct mshv_mem_region *region)
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
 
 	return mshv_region_process_range(region, flags,
-					 0, region->nr_pages,
+					 0, region->nr_pfns,
 					 mshv_region_chunk_unshare);
 }
 
 static int mshv_region_chunk_remap(struct mshv_mem_region *region,
 				   u32 flags,
-				   u64 page_offset, u64 page_count,
+				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
 	if (huge_page)
 		flags |= HV_MAP_GPA_LARGE_PAGE;
 
-	return hv_call_map_gpa_pages(region->partition->pt_id,
-				     region->start_gfn + page_offset,
-				     page_count, flags,
-				     region->mreg_pages + page_offset);
+	return hv_call_map_ram_pfns(region->partition->pt_id,
+				    region->start_gfn + pfn_offset,
+				    pfn_count, flags,
+				    region->mreg_pfns + pfn_offset);
 }
 
-static int mshv_region_remap_pages(struct mshv_mem_region *region,
-				   u32 map_flags,
-				   u64 page_offset, u64 page_count)
+static int mshv_region_remap_pfns(struct mshv_mem_region *region,
+				  u32 map_flags,
+				  u64 pfn_offset, u64 pfn_count)
 {
 	return mshv_region_process_range(region, map_flags,
-					 page_offset, page_count,
+					 pfn_offset, pfn_count,
 					 mshv_region_chunk_remap);
 }
 
@@ -270,38 +278,50 @@ int mshv_region_map(struct mshv_mem_region *region)
 {
 	u32 map_flags = region->hv_map_flags;
 
-	return mshv_region_remap_pages(region, map_flags,
-				       0, region->nr_pages);
+	return mshv_region_remap_pfns(region, map_flags,
+				      0, region->nr_pfns);
 }
 
-static void mshv_region_invalidate_pages(struct mshv_mem_region *region,
-					 u64 page_offset, u64 page_count)
+static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
+					u64 pfn_offset, u64 pfn_count)
 {
-	if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
-		unpin_user_pages(region->mreg_pages + page_offset, page_count);
+	u64 i;
+
+	for (i = pfn_offset; i < pfn_offset + pfn_count; i++) {
+		if (!pfn_valid(region->mreg_pfns[i]))
+			continue;
+
+		if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
+			unpin_user_page(pfn_to_page(region->mreg_pfns[i]));
 
-	memset(region->mreg_pages + page_offset, 0,
-	       page_count * sizeof(struct page *));
+		region->mreg_pfns[i] = MSHV_INVALID_PFN;
+	}
 }
 
 void mshv_region_invalidate(struct mshv_mem_region *region)
 {
-	mshv_region_invalidate_pages(region, 0, region->nr_pages);
+	mshv_region_invalidate_pfns(region, 0, region->nr_pfns);
 }
 
 int mshv_region_pin(struct mshv_mem_region *region)
 {
-	u64 done_count, nr_pages;
+	u64 done_count, nr_pfns, i;
+	unsigned long *pfns;
 	struct page **pages;
 	__u64 userspace_addr;
 	int ret;
 
-	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
-		pages = region->mreg_pages + done_count;
+	pages = kmalloc_array(MSHV_PIN_PAGES_BATCH_SIZE,
+			      sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	for (done_count = 0; done_count < region->nr_pfns; done_count += ret) {
+		pfns = region->mreg_pfns + done_count;
 		userspace_addr = region->start_uaddr +
 				 done_count * HV_HYP_PAGE_SIZE;
-		nr_pages = min(region->nr_pages - done_count,
-			       MSHV_PIN_PAGES_BATCH_SIZE);
+		nr_pfns = min(region->nr_pfns - done_count,
+			      MSHV_PIN_PAGES_BATCH_SIZE);
 
 		/*
 		 * Pinning assuming 4k pages works for large pages too.
@@ -311,39 +331,44 @@ int mshv_region_pin(struct mshv_mem_region *region)
 		 * with the FOLL_LONGTERM flag does a large temporary
 		 * allocation of contiguous memory.
 		 */
-		ret = pin_user_pages_fast(userspace_addr, nr_pages,
+		ret = pin_user_pages_fast(userspace_addr, nr_pfns,
 					  FOLL_WRITE | FOLL_LONGTERM,
 					  pages);
-		if (ret != nr_pages)
+		if (ret != nr_pfns)
 			goto release_pages;
+
+		for (i = 0; i < ret; i++)
+			pfns[i] = page_to_pfn(pages[i]);
 	}
 
+	kfree(pages);
 	return 0;
 
 release_pages:
 	if (ret > 0)
 		done_count += ret;
-	mshv_region_invalidate_pages(region, 0, done_count);
+	mshv_region_invalidate_pfns(region, 0, done_count);
+	kfree(pages);
 	return ret < 0 ? ret : -ENOMEM;
 }
 
 static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
 				   u32 flags,
-				   u64 page_offset, u64 page_count,
+				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
 	if (huge_page)
 		flags |= HV_UNMAP_GPA_LARGE_PAGE;
 
-	return hv_call_unmap_gpa_pages(region->partition->pt_id,
-				       region->start_gfn + page_offset,
-				       page_count, flags);
+	return hv_call_unmap_pfns(region->partition->pt_id,
+				  region->start_gfn + pfn_offset,
+				  pfn_count, flags);
 }
 
 static int mshv_region_unmap(struct mshv_mem_region *region)
 {
 	return mshv_region_process_range(region, 0,
-					 0, region->nr_pages,
+					 0, region->nr_pfns,
 					 mshv_region_chunk_unmap);
 }
 
@@ -427,8 +452,8 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
 /**
  * mshv_region_range_fault - Handle memory range faults for a given region.
  * @region: Pointer to the memory region structure.
- * @page_offset: Offset of the page within the region.
- * @page_count: Number of pages to handle.
+ * @pfn_offset: Offset of the page within the region.
+ * @pfn_count: Number of pages to handle.
  *
  * This function resolves memory faults for a specified range of pages
  * within a memory region. It uses HMM (Heterogeneous Memory Management)
@@ -437,7 +462,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
  * Return: 0 on success, negative error code on failure.
  */
 static int mshv_region_range_fault(struct mshv_mem_region *region,
-				   u64 page_offset, u64 page_count)
+				   u64 pfn_offset, u64 pfn_count)
 {
 	struct hmm_range range = {
 		.notifier = &region->mreg_mni,
@@ -447,13 +472,13 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 	int ret;
 	u64 i;
 
-	pfns = kmalloc_array(page_count, sizeof(*pfns), GFP_KERNEL);
+	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
 	if (!pfns)
 		return -ENOMEM;
 
 	range.hmm_pfns = pfns;
-	range.start = region->start_uaddr + page_offset * HV_HYP_PAGE_SIZE;
-	range.end = range.start + page_count * HV_HYP_PAGE_SIZE;
+	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
+	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
 
 	do {
 		ret = mshv_region_hmm_fault_and_lock(region, &range);
@@ -462,11 +487,15 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 	if (ret)
 		goto out;
 
-	for (i = 0; i < page_count; i++)
-		region->mreg_pages[page_offset + i] = hmm_pfn_to_page(pfns[i]);
+	for (i = 0; i < pfn_count; i++) {
+		if (!(pfns[i] & HMM_PFN_VALID))
+			continue;
+		/* Drop HMM_PFN_* flags to ensure PFNs are valid. */
+		region->mreg_pfns[pfn_offset + i] = pfns[i] & ~HMM_PFN_FLAGS;
+	}
 
-	ret = mshv_region_remap_pages(region, region->hv_map_flags,
-				      page_offset, page_count);
+	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
+				     pfn_offset, pfn_count);
 
 	mutex_unlock(&region->mreg_mutex);
 out:
@@ -476,24 +505,24 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 
 bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
 {
-	u64 page_offset, page_count;
+	u64 pfn_offset, pfn_count;
 	int ret;
 
 	/* Align the page offset to the nearest MSHV_MAP_FAULT_IN_PAGES. */
-	page_offset = ALIGN_DOWN(gfn - region->start_gfn,
-				 MSHV_MAP_FAULT_IN_PAGES);
+	pfn_offset = ALIGN_DOWN(gfn - region->start_gfn,
+				MSHV_MAP_FAULT_IN_PAGES);
 
 	/* Map more pages than requested to reduce the number of faults. */
-	page_count = min(region->nr_pages - page_offset,
-			 MSHV_MAP_FAULT_IN_PAGES);
+	pfn_count = min(region->nr_pfns - pfn_offset,
+			MSHV_MAP_FAULT_IN_PAGES);
 
-	ret = mshv_region_range_fault(region, page_offset, page_count);
+	ret = mshv_region_range_fault(region, pfn_offset, pfn_count);
 
 	WARN_ONCE(ret,
-		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, page_offset %llu, page_count %llu\n",
+		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, pfn_offset %llu, pfn_count %llu\n",
 		  region->partition->pt_id, region->start_uaddr,
-		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
-		  gfn, page_offset, page_count);
+		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
+		  gfn, pfn_offset, pfn_count);
 
 	return !ret;
 }
@@ -523,16 +552,16 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 	struct mshv_mem_region *region = container_of(mni,
 						      struct mshv_mem_region,
 						      mreg_mni);
-	u64 page_offset, page_count;
+	u64 pfn_offset, pfn_count;
 	unsigned long mstart, mend;
 	int ret = -EPERM;
 
 	mstart = max(range->start, region->start_uaddr);
 	mend = min(range->end, region->start_uaddr +
-		   (region->nr_pages << HV_HYP_PAGE_SHIFT));
+		   (region->nr_pfns << HV_HYP_PAGE_SHIFT));
 
-	page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
-	page_count = HVPFN_DOWN(mend - mstart);
+	pfn_offset = HVPFN_DOWN(mstart - region->start_uaddr);
+	pfn_count = HVPFN_DOWN(mend - mstart);
 
 	if (mmu_notifier_range_blockable(range))
 		mutex_lock(&region->mreg_mutex);
@@ -541,12 +570,12 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 
 	mmu_interval_set_seq(mni, cur_seq);
 
-	ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
-				      page_offset, page_count);
+	ret = mshv_region_remap_pfns(region, HV_MAP_GPA_NO_ACCESS,
+				     pfn_offset, pfn_count);
 	if (ret)
 		goto out_unlock;
 
-	mshv_region_invalidate_pages(region, page_offset, page_count);
+	mshv_region_invalidate_pfns(region, pfn_offset, pfn_count);
 
 	mutex_unlock(&region->mreg_mutex);
 
@@ -558,9 +587,9 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
 	WARN_ONCE(ret,
 		  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
 		  region->start_uaddr,
-		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
+		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
 		  range->start, range->end, range->event,
-		  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
+		  pfn_offset, pfn_offset + pfn_count - 1, (u64)range->mm, ret);
 	return false;
 }
 
@@ -579,7 +608,7 @@ bool mshv_region_movable_init(struct mshv_mem_region *region)
 
 	ret = mmu_interval_notifier_insert(&region->mreg_mni, current->mm,
 					   region->start_uaddr,
-					   region->nr_pages << HV_HYP_PAGE_SHIFT,
+					   region->nr_pfns << HV_HYP_PAGE_SHIFT,
 					   &mshv_region_mni_ops);
 	if (ret)
 		return false;
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 947dfb76bb19..f1d4bee97a3f 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -84,15 +84,15 @@ enum mshv_region_type {
 struct mshv_mem_region {
 	struct hlist_node hnode;
 	struct kref mreg_refcount;
-	u64 nr_pages;
+	u64 nr_pfns;
 	u64 start_gfn;
 	u64 start_uaddr;
 	u32 hv_map_flags;
 	struct mshv_partition *partition;
 	enum mshv_region_type mreg_type;
 	struct mmu_interval_notifier mreg_mni;
-	struct mutex mreg_mutex;	/* protects region pages remapping */
-	struct page *mreg_pages[];
+	struct mutex mreg_mutex;	/* protects region PFNs remapping */
+	unsigned long mreg_pfns[];
 };
 
 struct mshv_irq_ack_notifier {
@@ -282,11 +282,11 @@ int hv_call_create_partition(u64 flags,
 int hv_call_initialize_partition(u64 partition_id);
 int hv_call_finalize_partition(u64 partition_id);
 int hv_call_delete_partition(u64 partition_id);
-int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs);
-int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
-			  u32 flags, struct page **pages);
-int hv_call_unmap_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
-			    u32 flags);
+int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs);
+int hv_call_map_ram_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
+			 u32 flags, unsigned long *pfns);
+int hv_call_unmap_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
+		       u32 flags);
 int hv_call_delete_vp(u64 partition_id, u32 vp_index);
 int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
 				     u64 dest_addr,
@@ -329,8 +329,8 @@ int hv_map_stats_page(enum hv_stats_object_type type,
 int hv_unmap_stats_page(enum hv_stats_object_type type,
 			struct hv_stats_page *page_addr,
 			const union hv_stats_object_identity *identity);
-int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
-				   u64 page_struct_count, u32 host_access,
+int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
+				   u64 pfns_count, u32 host_access,
 				   u32 flags, u8 acquire);
 int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
 				      void *property_value, size_t property_value_sz);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index cb55d4d4be2e..a95f2cfc5da5 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -188,17 +188,16 @@ int hv_call_delete_partition(u64 partition_id)
 	return hv_result_to_errno(status);
 }
 
-/* Ask the hypervisor to map guest ram pages or the guest mmio space */
-static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
-			       u32 flags, struct page **pages, u64 mmio_spa)
+static int hv_do_map_pfns(u64 partition_id, u64 gfn, u64 pfns_count,
+			  u32 flags, unsigned long *pfns, u64 mmio_spa)
 {
 	struct hv_input_map_gpa_pages *input_page;
 	u64 status, *pfnlist;
 	unsigned long irq_flags, large_shift = 0;
 	int ret = 0, done = 0;
-	u64 page_count = page_struct_count;
+	u64 page_count = pfns_count;
 
-	if (page_count == 0 || (pages && mmio_spa))
+	if (page_count == 0 || (pfns && mmio_spa))
 		return -EINVAL;
 
 	if (flags & HV_MAP_GPA_LARGE_PAGE) {
@@ -227,14 +226,14 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
 		for (i = 0; i < rep_count; i++)
 			if (flags & HV_MAP_GPA_NO_ACCESS) {
 				pfnlist[i] = 0;
-			} else if (pages) {
+			} else if (pfns) {
 				u64 index = (done + i) << large_shift;
 
-				if (index >= page_struct_count) {
+				if (index >= pfns_count) {
 					ret = -EINVAL;
 					break;
 				}
-				pfnlist[i] = page_to_pfn(pages[index]);
+				pfnlist[i] = pfns[index];
 			} else {
 				pfnlist[i] = mmio_spa + done + i;
 			}
@@ -266,37 +265,37 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
 
 		if (flags & HV_MAP_GPA_LARGE_PAGE)
 			unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
-		hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
+		hv_call_unmap_pfns(partition_id, gfn, done, unmap_flags);
 	}
 
 	return ret;
 }
 
 /* Ask the hypervisor to map guest ram pages */
-int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
-			  u32 flags, struct page **pages)
+int hv_call_map_ram_pfns(u64 partition_id, u64 gfn, u64 pfn_count,
+			 u32 flags, unsigned long *pfns)
 {
-	return hv_do_map_gpa_hcall(partition_id, gpa_target, page_count,
-				   flags, pages, 0);
+	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
+			      pfns, 0);
 }
 
-/* Ask the hypervisor to map guest mmio space */
-int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs)
+int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa,
+			  u64 pfn_count)
 {
 	int i;
 	u32 flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE |
 		    HV_MAP_GPA_NOT_CACHED;
 
-	for (i = 0; i < numpgs; i++)
+	for (i = 0; i < pfn_count; i++)
 		if (page_is_ram(mmio_spa + i))
 			return -EINVAL;
 
-	return hv_do_map_gpa_hcall(partition_id, gfn, numpgs, flags, NULL,
-				   mmio_spa);
+	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
+			      NULL, mmio_spa);
 }
 
-int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
-			    u32 flags)
+int hv_call_unmap_pfns(u64 partition_id, u64 gfn, u64 page_count_4k,
+		       u32 flags)
 {
 	struct hv_input_unmap_gpa_pages *input_page;
 	u64 status, page_count = page_count_4k;
@@ -1009,15 +1008,15 @@ int hv_unmap_stats_page(enum hv_stats_object_type type,
 	return ret;
 }
 
-int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
-				   u64 page_struct_count, u32 host_access,
+int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
+				   u64 pfns_count, u32 host_access,
 				   u32 flags, u8 acquire)
 {
 	struct hv_input_modify_sparse_spa_page_host_access *input_page;
 	u64 status;
 	int done = 0;
 	unsigned long irq_flags, large_shift = 0;
-	u64 page_count = page_struct_count;
+	u64 page_count = pfns_count;
 	u16 code = acquire ? HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS :
 			     HVCALL_RELEASE_SPARSE_SPA_PAGE_HOST_ACCESS;
 
@@ -1051,11 +1050,10 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
 		for (i = 0; i < rep_count; i++) {
 			u64 index = (done + i) << large_shift;
 
-			if (index >= page_struct_count)
+			if (index >= pfns_count)
 				return -EINVAL;
 
-			input_page->spa_page_list[i] =
-						page_to_pfn(pages[index]);
+			input_page->spa_page_list[i] = pfns[index];
 		}
 
 		status = hv_do_rep_hypercall(code, rep_count, 0, input_page,
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index f2d83d6c8c4f..685e4b562186 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -619,7 +619,7 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 
 	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
 		if (gfn >= region->start_gfn &&
-		    gfn < region->start_gfn + region->nr_pages)
+		    gfn < region->start_gfn + region->nr_pfns)
 			return region;
 	}
 
@@ -1221,20 +1221,20 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 					bool is_mmio)
 {
 	struct mshv_mem_region *rg;
-	u64 nr_pages = HVPFN_DOWN(mem->size);
+	u64 nr_pfns = HVPFN_DOWN(mem->size);
 
 	/* Reject overlapping regions */
 	spin_lock(&partition->pt_mem_regions_lock);
 	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
-		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
-		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
+		if (mem->guest_pfn + nr_pfns <= rg->start_gfn ||
+		    rg->start_gfn + rg->nr_pfns <= mem->guest_pfn)
 			continue;
 		spin_unlock(&partition->pt_mem_regions_lock);
 		return -EEXIST;
 	}
 	spin_unlock(&partition->pt_mem_regions_lock);
 
-	rg = mshv_region_create(mem->guest_pfn, nr_pages,
+	rg = mshv_region_create(mem->guest_pfn, nr_pfns,
 				mem->userspace_addr, mem->flags);
 	if (IS_ERR(rg))
 		return PTR_ERR(rg);
@@ -1372,21 +1372,21 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		 * the hypervisor track dirty pages, enabling pre-copy live
 		 * migration.
 		 */
-		ret = hv_call_map_gpa_pages(partition->pt_id,
-					    region->start_gfn,
-					    region->nr_pages,
-					    HV_MAP_GPA_NO_ACCESS, NULL);
+		ret = hv_call_map_ram_pfns(partition->pt_id,
+					   region->start_gfn,
+					   region->nr_pfns,
+					   HV_MAP_GPA_NO_ACCESS, NULL);
 		break;
 	case MSHV_REGION_TYPE_MMIO:
-		ret = hv_call_map_mmio_pages(partition->pt_id,
-					     region->start_gfn,
-					     mmio_pfn,
-					     region->nr_pages);
+		ret = hv_call_map_mmio_pfns(partition->pt_id,
+					    region->start_gfn,
+					    mmio_pfn,
+					    region->nr_pfns);
 		break;
 	}
 
 	trace_mshv_map_user_memory(partition->pt_id, region->start_uaddr,
-				   region->start_gfn, region->nr_pages,
+				   region->start_gfn, region->nr_pfns,
 				   region->hv_map_flags, ret);
 
 	if (ret)
@@ -1424,7 +1424,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	/* Paranoia check */
 	if (region->start_uaddr != mem.userspace_addr ||
 	    region->start_gfn != mem.guest_pfn ||
-	    region->nr_pages != HVPFN_DOWN(mem.size)) {
+	    region->nr_pfns != HVPFN_DOWN(mem.size)) {
 		spin_unlock(&partition->pt_mem_regions_lock);
 		return -EINVAL;
 	}



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] mshv: Add support to address range holes remapping
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 1/7] mshv: Convert from page pointers to PFNs Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-04-13 21:08   ` Michael Kelley
  2026-03-30 20:04 ` [PATCH 3/7] mshv: Support regions with different VMAs Stanislav Kinsburskii
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Consolidate memory region processing to handle both valid and invalid PFNs
uniformly. This eliminates code duplication across remap, unmap, share, and
unshare operations by using a common range processing interface.

Holes are now remapped with no-access permissions to enable
hypervisor dirty page tracking for precopy live migration.

This refactoring is a precursor to an upcoming change that will map
present pages in movable regions upon region creation, requiring
consistent handling of both mapped and unmapped ranges.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |  108 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 95 insertions(+), 13 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index b1a707d16c07..ed9c55841140 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -119,6 +119,57 @@ static long mshv_region_process_pfns(struct mshv_mem_region *region,
 	return count;
 }
 
+/**
+ * mshv_region_process_hole - Handle a hole (invalid PFNs) in a memory
+ *                            region
+ * @region    : Memory region containing the hole
+ * @flags     : Flags to pass to the handler function
+ * @pfn_offset: Starting PFN offset within the region
+ * @pfn_count : Number of PFNs in the hole
+ * @handler   : Callback function to invoke for the hole
+ *
+ * Invokes the handler function for a contiguous hole with the specified
+ * parameters.
+ *
+ * Return: Number of PFNs handled, or negative error code.
+ */
+static long mshv_region_process_hole(struct mshv_mem_region *region,
+				     u32 flags,
+				     u64 pfn_offset, u64 pfn_count,
+				     int (*handler)(struct mshv_mem_region *region,
+						    u32 flags,
+						    u64 pfn_offset,
+						    u64 pfn_count,
+						    bool huge_page))
+{
+	long ret;
+
+	ret = handler(region, flags, pfn_offset, pfn_count, 0);
+	if (ret)
+		return ret;
+
+	return pfn_count;
+}
+
+static long mshv_region_process_chunk(struct mshv_mem_region *region,
+				      u32 flags,
+				      u64 pfn_offset, u64 pfn_count,
+				      int (*handler)(struct mshv_mem_region *region,
+						     u32 flags,
+						     u64 pfn_offset,
+						     u64 pfn_count,
+						     bool huge_page))
+{
+	if (pfn_valid(region->mreg_pfns[pfn_offset]))
+		return mshv_region_process_pfns(region, flags,
+				pfn_offset, pfn_count,
+				handler);
+	else
+		return mshv_region_process_hole(region, flags,
+				pfn_offset, pfn_count,
+				handler);
+}
+
 /**
  * mshv_region_process_range - Processes a range of PFNs in a region.
  * @region    : Pointer to the memory region structure.
@@ -146,33 +197,47 @@ static int mshv_region_process_range(struct mshv_mem_region *region,
 						    u64 pfn_count,
 						    bool huge_page))
 {
-	u64 pfn_end;
+	u64 start, end;
 	long ret;
 
-	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
+	if (!pfn_count)
+		return 0;
+
+	if (check_add_overflow(pfn_offset, pfn_count, &end))
 		return -EOVERFLOW;
 
-	if (pfn_end > region->nr_pfns)
+	if (end > region->nr_pfns)
 		return -EINVAL;
 
-	while (pfn_count) {
-		/* Skip non-present pages */
-		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
-			pfn_offset++;
-			pfn_count--;
+	start = pfn_offset;
+	end = pfn_offset + 1;
+
+	while (end < pfn_offset + pfn_count) {
+		/*
+		 * Accumulate contiguous pfns with the same validity
+		 * (valid or not).
+		 */
+		if (pfn_valid(region->mreg_pfns[start]) ==
+		    pfn_valid(region->mreg_pfns[end])) {
+			end++;
 			continue;
 		}
 
-		ret = mshv_region_process_pfns(region, flags,
-					       pfn_offset, pfn_count,
-					       handler);
+		ret = mshv_region_process_chunk(region, flags,
+						start, end - start,
+						handler);
 		if (ret < 0)
 			return ret;
 
-		pfn_offset += ret;
-		pfn_count -= ret;
+		start += ret;
 	}
 
+	ret = mshv_region_process_chunk(region, flags,
+					start, end - start,
+					handler);
+	if (ret < 0)
+		return ret;
+
 	return 0;
 }
 
@@ -208,6 +273,9 @@ static int mshv_region_chunk_share(struct mshv_mem_region *region,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
+	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+		return -EINVAL;
+
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
@@ -233,6 +301,9 @@ static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
 				     u64 pfn_offset, u64 pfn_count,
 				     bool huge_page)
 {
+	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+		return -EINVAL;
+
 	if (huge_page)
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
@@ -256,6 +327,14 @@ static int mshv_region_chunk_remap(struct mshv_mem_region *region,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
+	/*
+	 * Remap missing pages with no access to let the
+	 * hypervisor track dirty pages, enabling precopy live
+	 * migration.
+	 */
+	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+		flags = HV_MAP_GPA_NO_ACCESS;
+
 	if (huge_page)
 		flags |= HV_MAP_GPA_LARGE_PAGE;
 
@@ -357,6 +436,9 @@ static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
 				   u64 pfn_offset, u64 pfn_count,
 				   bool huge_page)
 {
+	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
+		return 0;
+
 	if (huge_page)
 		flags |= HV_UNMAP_GPA_LARGE_PAGE;
 



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] mshv: Support regions with different VMAs
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 1/7] mshv: Convert from page pointers to PFNs Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 2/7] mshv: Add support to address range holes remapping Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-04-13 21:08   ` Michael Kelley
  2026-03-30 20:04 ` [PATCH 4/7] mshv: Move pinned region setup to mshv_regions.c Stanislav Kinsburskii
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Allow HMM fault handling across memory regions that span multiple VMAs
with different protection flags. The previous implementation assumed a
single VMA per region, which would fail when guest memory crosses VMA
boundaries.

Iterate through VMAs within the range and handle each separately with
appropriate protection flags, enabling more flexible memory region
configurations for partitions.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |   72 +++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index ed9c55841140..1bb1bfe177e2 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -492,37 +492,72 @@ int mshv_region_get(struct mshv_mem_region *region)
 }
 
 /**
- * mshv_region_hmm_fault_and_lock - Handle HMM faults and lock the memory region
+ * mshv_region_hmm_fault_and_lock - Handle HMM faults across VMAs and lock
+ *                                  the memory region
  * @region: Pointer to the memory region structure
- * @range: Pointer to the HMM range structure
+ * @start : Starting virtual address of the range to fault
+ * @end   : Ending virtual address of the range to fault (exclusive)
+ * @pfns  : Output array for page frame numbers with HMM flags
  *
  * This function performs the following steps:
  * 1. Reads the notifier sequence for the HMM range.
  * 2. Acquires a read lock on the memory map.
- * 3. Handles HMM faults for the specified range.
- * 4. Releases the read lock on the memory map.
- * 5. If successful, locks the memory region mutex.
- * 6. Verifies if the notifier sequence has changed during the operation.
- *    If it has, releases the mutex and returns -EBUSY to match with
- *    hmm_range_fault() return code for repeating.
+ * 3. Iterates through VMAs in the specified range, handling each
+ *    separately with appropriate protection flags (HMM_PFN_REQ_WRITE set
+ *    based on VMA flags).
+ * 4. Handles HMM faults for each VMA segment.
+ * 5. Releases the read lock on the memory map.
+ * 6. If successful, locks the memory region mutex.
+ * 7. Verifies if the notifier sequence has changed during the operation.
+ *    If it has, releases the mutex and returns -EBUSY to signal retry.
+ *
+ * The function expects the range [start, end] is backed by valid VMAs.
+ * Returns -EFAULT if any address in the range is not covered by a VMA.
  *
  * Return: 0 on success, a negative error code otherwise.
  */
 static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
-					  struct hmm_range *range)
+					  unsigned long start,
+					  unsigned long end,
+					  unsigned long *pfns)
 {
+	struct hmm_range range = {
+		.notifier = &region->mreg_mni,
+	};
 	int ret;
 
-	range->notifier_seq = mmu_interval_read_begin(range->notifier);
+	range.notifier_seq = mmu_interval_read_begin(range.notifier);
 	mmap_read_lock(region->mreg_mni.mm);
-	ret = hmm_range_fault(range);
+	while (start < end) {
+		struct vm_area_struct *vma;
+
+		vma = vma_lookup(current->mm, start);
+		if (!vma) {
+			ret = -EFAULT;
+			break;
+		}
+
+		range.hmm_pfns = pfns;
+		range.start = start;
+		range.end = min(vma->vm_end, end);
+		range.default_flags = HMM_PFN_REQ_FAULT;
+		if (vma->vm_flags & VM_WRITE)
+			range.default_flags |= HMM_PFN_REQ_WRITE;
+
+		ret = hmm_range_fault(&range);
+		if (ret)
+			break;
+
+		start = range.end + 1;
+		pfns += DIV_ROUND_UP(range.end - range.start, PAGE_SIZE);
+	}
 	mmap_read_unlock(region->mreg_mni.mm);
 	if (ret)
 		return ret;
 
 	mutex_lock(&region->mreg_mutex);
 
-	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+	if (mmu_interval_read_retry(range.notifier, range.notifier_seq)) {
 		mutex_unlock(&region->mreg_mutex);
 		cond_resched();
 		return -EBUSY;
@@ -546,10 +581,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
 static int mshv_region_range_fault(struct mshv_mem_region *region,
 				   u64 pfn_offset, u64 pfn_count)
 {
-	struct hmm_range range = {
-		.notifier = &region->mreg_mni,
-		.default_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
-	};
+	unsigned long start, end;
 	unsigned long *pfns;
 	int ret;
 	u64 i;
@@ -558,12 +590,12 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 	if (!pfns)
 		return -ENOMEM;
 
-	range.hmm_pfns = pfns;
-	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
-	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
+	start = region->start_uaddr + pfn_offset * PAGE_SIZE;
+	end = start + pfn_count * PAGE_SIZE;
 
 	do {
-		ret = mshv_region_hmm_fault_and_lock(region, &range);
+		ret = mshv_region_hmm_fault_and_lock(region, start, end,
+						     pfns);
 	} while (ret == -EBUSY);
 
 	if (ret)



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] mshv: Move pinned region setup to mshv_regions.c
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
                   ` (2 preceding siblings ...)
  2026-03-30 20:04 ` [PATCH 3/7] mshv: Support regions with different VMAs Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 5/7] mshv: Map populated pages on movable region creation Stanislav Kinsburskii
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Move mshv_prepare_pinned_region() from mshv_root_main.c to
mshv_regions.c and rename it to mshv_map_pinned_region(). This
co-locates the pinned region logic with the rest of the memory region
operations.

Make mshv_region_pin(), mshv_region_map(), mshv_region_share(),
mshv_region_unshare(), and mshv_region_invalidate() static, as they are
no longer called outside of mshv_regions.c.

Also fix a bug in the error handling where a mshv_region_map() failure
on a non-encrypted partition would be silently ignored, returning
success instead of propagating the error code.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |   79 ++++++++++++++++++++++++++++++++++++++++---
 drivers/hv/mshv_root.h      |    6 +--
 drivers/hv/mshv_root_main.c |   70 +-------------------------------------
 3 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 1bb1bfe177e2..133ec7771812 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -287,7 +287,7 @@ static int mshv_region_chunk_share(struct mshv_mem_region *region,
 					      flags, true);
 }
 
-int mshv_region_share(struct mshv_mem_region *region)
+static int mshv_region_share(struct mshv_mem_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
 
@@ -313,7 +313,7 @@ static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
 					      flags, false);
 }
 
-int mshv_region_unshare(struct mshv_mem_region *region)
+static int mshv_region_unshare(struct mshv_mem_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
 
@@ -353,7 +353,7 @@ static int mshv_region_remap_pfns(struct mshv_mem_region *region,
 					 mshv_region_chunk_remap);
 }
 
-int mshv_region_map(struct mshv_mem_region *region)
+static int mshv_region_map(struct mshv_mem_region *region)
 {
 	u32 map_flags = region->hv_map_flags;
 
@@ -377,12 +377,12 @@ static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
 	}
 }
 
-void mshv_region_invalidate(struct mshv_mem_region *region)
+static void mshv_region_invalidate(struct mshv_mem_region *region)
 {
 	mshv_region_invalidate_pfns(region, 0, region->nr_pfns);
 }
 
-int mshv_region_pin(struct mshv_mem_region *region)
+static int mshv_region_pin(struct mshv_mem_region *region)
 {
 	u64 done_count, nr_pfns, i;
 	unsigned long *pfns;
@@ -731,3 +731,72 @@ bool mshv_region_movable_init(struct mshv_mem_region *region)
 
 	return true;
 }
+
+/**
+ * mshv_map_pinned_region - Pin and map memory regions
+ * @region: Pointer to the memory region structure
+ *
+ * This function processes memory regions that are explicitly marked as pinned.
+ * Pinned regions are preallocated, mapped upfront, and do not rely on fault-based
+ * population. The function ensures the region is properly populated, handles
+ * encryption requirements for SNP partitions if applicable, maps the region,
+ * and performs necessary sharing or eviction operations based on the mapping
+ * result.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int mshv_map_pinned_region(struct mshv_mem_region *region)
+{
+	struct mshv_partition *partition = region->partition;
+	int ret;
+
+	ret = mshv_region_pin(region);
+	if (ret) {
+		pt_err(partition, "Failed to pin memory region: %d\n",
+		       ret);
+		goto err_out;
+	}
+
+	/*
+	 * For an SNP partition it is a requirement that for every memory region
+	 * that we are going to map for this partition we should make sure that
+	 * host access to that region is released. This is ensured by doing an
+	 * additional hypercall which will update the SLAT to release host
+	 * access to guest memory regions.
+	 */
+	if (mshv_partition_encrypted(partition)) {
+		ret = mshv_region_unshare(region);
+		if (ret) {
+			pt_err(partition,
+			       "Failed to unshare memory region (guest_pfn: %llu): %d\n",
+			       region->start_gfn, ret);
+			goto invalidate_region;
+		}
+	}
+
+	ret = mshv_region_map(region);
+	if (!ret)
+		return 0;
+
+	if (mshv_partition_encrypted(partition)) {
+		int shrc;
+
+		shrc = mshv_region_share(region);
+		if (!shrc)
+			goto invalidate_region;
+
+		pt_err(partition,
+		       "Failed to share memory region (guest_pfn: %llu): %d\n",
+		       region->start_gfn, shrc);
+		/*
+		 * Don't unpin if marking shared failed because pages are no
+		 * longer mapped in the host, ie root, anymore.
+		 */
+		goto err_out;
+	}
+
+invalidate_region:
+	mshv_region_invalidate(region);
+err_out:
+	return ret;
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index f1d4bee97a3f..d2e65a137bf4 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -368,15 +368,11 @@ extern u8 * __percpu *hv_synic_eventring_tail;
 
 struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 					   u64 uaddr, u32 flags);
-int mshv_region_share(struct mshv_mem_region *region);
-int mshv_region_unshare(struct mshv_mem_region *region);
-int mshv_region_map(struct mshv_mem_region *region);
-void mshv_region_invalidate(struct mshv_mem_region *region);
-int mshv_region_pin(struct mshv_mem_region *region);
 void mshv_region_put(struct mshv_mem_region *region);
 int mshv_region_get(struct mshv_mem_region *region);
 bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn);
 void mshv_region_movable_fini(struct mshv_mem_region *region);
 bool mshv_region_movable_init(struct mshv_mem_region *region);
+int mshv_map_pinned_region(struct mshv_mem_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 685e4b562186..c393b5144e0b 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1254,74 +1254,6 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	return 0;
 }
 
-/**
- * mshv_prepare_pinned_region - Pin and map memory regions
- * @region: Pointer to the memory region structure
- *
- * This function processes memory regions that are explicitly marked as pinned.
- * Pinned regions are preallocated, mapped upfront, and do not rely on fault-based
- * population. The function ensures the region is properly populated, handles
- * encryption requirements for SNP partitions if applicable, maps the region,
- * and performs necessary sharing or eviction operations based on the mapping
- * result.
- *
- * Return: 0 on success, negative error code on failure.
- */
-static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
-{
-	struct mshv_partition *partition = region->partition;
-	int ret;
-
-	ret = mshv_region_pin(region);
-	if (ret) {
-		pt_err(partition, "Failed to pin memory region: %d\n",
-		       ret);
-		goto err_out;
-	}
-
-	/*
-	 * For an SNP partition it is a requirement that for every memory region
-	 * that we are going to map for this partition we should make sure that
-	 * host access to that region is released. This is ensured by doing an
-	 * additional hypercall which will update the SLAT to release host
-	 * access to guest memory regions.
-	 */
-	if (mshv_partition_encrypted(partition)) {
-		ret = mshv_region_unshare(region);
-		if (ret) {
-			pt_err(partition,
-			       "Failed to unshare memory region (guest_pfn: %llu): %d\n",
-			       region->start_gfn, ret);
-			goto invalidate_region;
-		}
-	}
-
-	ret = mshv_region_map(region);
-	if (ret && mshv_partition_encrypted(partition)) {
-		int shrc;
-
-		shrc = mshv_region_share(region);
-		if (!shrc)
-			goto invalidate_region;
-
-		pt_err(partition,
-		       "Failed to share memory region (guest_pfn: %llu): %d\n",
-		       region->start_gfn, shrc);
-		/*
-		 * Don't unpin if marking shared failed because pages are no
-		 * longer mapped in the host, ie root, anymore.
-		 */
-		goto err_out;
-	}
-
-	return 0;
-
-invalidate_region:
-	mshv_region_invalidate(region);
-err_out:
-	return ret;
-}
-
 /*
  * This maps two things: guest RAM and for pci passthru mmio space.
  *
@@ -1364,7 +1296,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 
 	switch (region->mreg_type) {
 	case MSHV_REGION_TYPE_MEM_PINNED:
-		ret = mshv_prepare_pinned_region(region);
+		ret = mshv_map_pinned_region(region);
 		break;
 	case MSHV_REGION_TYPE_MEM_MOVABLE:
 		/*



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] mshv: Map populated pages on movable region creation
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
                   ` (3 preceding siblings ...)
  2026-03-30 20:04 ` [PATCH 4/7] mshv: Move pinned region setup to mshv_regions.c Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-04-13 21:09   ` Michael Kelley
  2026-03-30 20:04 ` [PATCH 6/7] mshv: Extract MMIO region mapping into separate function Stanislav Kinsburskii
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Map any populated pages into the hypervisor upfront when creating a
movable region, rather than waiting for faults. Previously, movable
regions were created with all pages marked as HV_MAP_GPA_NO_ACCESS
regardless of whether the userspace mapping contained populated pages.

This guarantees that if the caller passes a populated mapping, those
present pages will be mapped into the hypervisor immediately during
region creation instead of being faulted in later.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |   65 ++++++++++++++++++++++++++++++++-----------
 drivers/hv/mshv_root.h      |    1 +
 drivers/hv/mshv_root_main.c |   10 +------
 3 files changed, 50 insertions(+), 26 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 133ec7771812..28d3f488d89f 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -519,7 +519,8 @@ int mshv_region_get(struct mshv_mem_region *region)
 static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
 					  unsigned long start,
 					  unsigned long end,
-					  unsigned long *pfns)
+					  unsigned long *pfns,
+					  bool do_fault)
 {
 	struct hmm_range range = {
 		.notifier = &region->mreg_mni,
@@ -540,9 +541,12 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
 		range.hmm_pfns = pfns;
 		range.start = start;
 		range.end = min(vma->vm_end, end);
-		range.default_flags = HMM_PFN_REQ_FAULT;
-		if (vma->vm_flags & VM_WRITE)
-			range.default_flags |= HMM_PFN_REQ_WRITE;
+		range.default_flags = 0;
+		if (do_fault) {
+			range.default_flags = HMM_PFN_REQ_FAULT;
+			if (vma->vm_flags & VM_WRITE)
+				range.default_flags |= HMM_PFN_REQ_WRITE;
+		}
 
 		ret = hmm_range_fault(&range);
 		if (ret)
@@ -567,26 +571,40 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
 }
 
 /**
- * mshv_region_range_fault - Handle memory range faults for a given region.
- * @region: Pointer to the memory region structure.
- * @pfn_offset: Offset of the page within the region.
- * @pfn_count: Number of pages to handle.
+ * mshv_region_collect_and_map - Collect PFNs for a user range and map them
+ * @region    : memory region being processed
+ * @pfn_offset: PFNs offset within the region
+ * @pfn_count : number of PFNs to process
+ * @do_fault  : if true, fault in missing pages;
+ *              if false, collect only present pages
  *
- * This function resolves memory faults for a specified range of pages
- * within a memory region. It uses HMM (Heterogeneous Memory Management)
- * to fault in the required pages and updates the region's page array.
+ * Collects PFNs for the specified portion of @region from the
+ * corresponding userspace VMA and maps them into the hypervisor. The
+ * behavior depends on @do_fault:
  *
- * Return: 0 on success, negative error code on failure.
+ * - true: Fault in missing pages from userspace, ensuring all pages in the
+ *   range are present. Used for on-demand page population.
+ * - false: Collect PFNs only for pages already present in userspace,
+ *   leaving missing pages as invalid PFN markers.
+ *   Used for initial region setup.
+ *
+ * Collected PFNs are stored in region->mreg_pfns[] with HMM bookkeeping
+ * flags cleared, then the range is mapped into the hypervisor. Present
+ * PFNs get mapped with region access permissions; missing PFNs (zero
+ * entries) get mapped with no-access permissions.
+ *
+ * Return: 0 on success, negative errno on failure.
  */
-static int mshv_region_range_fault(struct mshv_mem_region *region,
-				   u64 pfn_offset, u64 pfn_count)
+static int mshv_region_collect_and_map(struct mshv_mem_region *region,
+				       u64 pfn_offset, u64 pfn_count,
+				       bool do_fault)
 {
 	unsigned long start, end;
 	unsigned long *pfns;
 	int ret;
 	u64 i;
 
-	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
+	pfns = vmalloc_array(pfn_count, sizeof(unsigned long));
 	if (!pfns)
 		return -ENOMEM;
 
@@ -595,7 +613,7 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 
 	do {
 		ret = mshv_region_hmm_fault_and_lock(region, start, end,
-						     pfns);
+						     pfns, do_fault);
 	} while (ret == -EBUSY);
 
 	if (ret)
@@ -613,10 +631,17 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
 
 	mutex_unlock(&region->mreg_mutex);
 out:
-	kfree(pfns);
+	vfree(pfns);
 	return ret;
 }
 
+static int mshv_region_range_fault(struct mshv_mem_region *region,
+				   u64 pfn_offset, u64 pfn_count)
+{
+	return mshv_region_collect_and_map(region, pfn_offset, pfn_count,
+					   true);
+}
+
 bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
 {
 	u64 pfn_offset, pfn_count;
@@ -800,3 +825,9 @@ int mshv_map_pinned_region(struct mshv_mem_region *region)
 err_out:
 	return ret;
 }
+
+int mshv_map_movable_region(struct mshv_mem_region *region)
+{
+	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
+					   false);
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index d2e65a137bf4..02c1c11f701c 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -374,5 +374,6 @@ bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn);
 void mshv_region_movable_fini(struct mshv_mem_region *region);
 bool mshv_region_movable_init(struct mshv_mem_region *region);
 int mshv_map_pinned_region(struct mshv_mem_region *region);
+int mshv_map_movable_region(struct mshv_mem_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index c393b5144e0b..91dab2a3bc92 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1299,15 +1299,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		ret = mshv_map_pinned_region(region);
 		break;
 	case MSHV_REGION_TYPE_MEM_MOVABLE:
-		/*
-		 * For movable memory regions, remap with no access to let
-		 * the hypervisor track dirty pages, enabling pre-copy live
-		 * migration.
-		 */
-		ret = hv_call_map_ram_pfns(partition->pt_id,
-					   region->start_gfn,
-					   region->nr_pfns,
-					   HV_MAP_GPA_NO_ACCESS, NULL);
+		ret = mshv_map_movable_region(region);
 		break;
 	case MSHV_REGION_TYPE_MMIO:
 		ret = hv_call_map_mmio_pfns(partition->pt_id,



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] mshv: Extract MMIO region mapping into separate function
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
                   ` (4 preceding siblings ...)
  2026-03-30 20:04 ` [PATCH 5/7] mshv: Map populated pages on movable region creation Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-03-30 20:04 ` [PATCH 7/7] mshv: Add tracepoint for map GPA hypercall Stanislav Kinsburskii
  2026-04-13 21:07 ` [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Michael Kelley
  7 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Extract the MMIO region mapping logic from mshv_map_user_memory() into
a dedicated mshv_map_mmio_region() function. This improves code
organization and consistency with the existing mshv_map_pinned_region()
and mshv_map_movable_region() functions.

The new function encapsulates the hv_call_map_mmio_pfns() call,
making the switch statement in mshv_map_user_memory() more concise
and maintaining a uniform pattern for all region types.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |    9 +++++++++
 drivers/hv/mshv_root.h      |    2 ++
 drivers/hv/mshv_root_main.c |    5 +----
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 28d3f488d89f..6b703b269a4f 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -831,3 +831,12 @@ int mshv_map_movable_region(struct mshv_mem_region *region)
 	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
 					   false);
 }
+
+int mshv_map_mmio_region(struct mshv_mem_region *region,
+			 unsigned long mmio_pfn)
+{
+	struct mshv_partition *partition = region->partition;
+
+	return hv_call_map_mmio_pfns(partition->pt_id, region->start_gfn,
+				     mmio_pfn, region->nr_pfns);
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 02c1c11f701c..1f92b9f85b60 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -375,5 +375,7 @@ void mshv_region_movable_fini(struct mshv_mem_region *region);
 bool mshv_region_movable_init(struct mshv_mem_region *region);
 int mshv_map_pinned_region(struct mshv_mem_region *region);
 int mshv_map_movable_region(struct mshv_mem_region *region);
+int mshv_map_mmio_region(struct mshv_mem_region *region,
+			 unsigned long mmio_pfn);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 91dab2a3bc92..adb09350205a 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1302,10 +1302,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		ret = mshv_map_movable_region(region);
 		break;
 	case MSHV_REGION_TYPE_MMIO:
-		ret = hv_call_map_mmio_pfns(partition->pt_id,
-					    region->start_gfn,
-					    mmio_pfn,
-					    region->nr_pfns);
+		ret = mshv_map_mmio_region(region, mmio_pfn);
 		break;
 	}
 



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] mshv: Add tracepoint for map GPA hypercall
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
                   ` (5 preceding siblings ...)
  2026-03-30 20:04 ` [PATCH 6/7] mshv: Extract MMIO region mapping into separate function Stanislav Kinsburskii
@ 2026-03-30 20:04 ` Stanislav Kinsburskii
  2026-04-13 21:07 ` [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Michael Kelley
  7 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-30 20:04 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Add tracing for GPA mapping hypercalls to aid in debugging memory
management issues in child partitions. The tracepoint captures both
successful and failed mapping attempts, including the number of pages
successfully mapped before any failure occurred.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root_hv_call.c |    3 +++
 drivers/hv/mshv_trace.h        |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index a95f2cfc5da5..7ed623668c8e 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -260,6 +260,9 @@ static int hv_do_map_pfns(u64 partition_id, u64 gfn, u64 pfns_count,
 		done += completed;
 	}
 
+	trace_mshv_map_pfns(partition_id, gfn, pfns_count, page_count,
+			    flags, mmio_spa, done, ret);
+
 	if (ret && done) {
 		u32 unmap_flags = 0;
 
diff --git a/drivers/hv/mshv_trace.h b/drivers/hv/mshv_trace.h
index 6b8fa477fa3b..efd2b5d4ab73 100644
--- a/drivers/hv/mshv_trace.h
+++ b/drivers/hv/mshv_trace.h
@@ -538,6 +538,42 @@ TRACE_EVENT(mshv_handle_gpa_intercept,
 	    )
 );
 
+TRACE_EVENT(mshv_map_pfns,
+	    TP_PROTO(u64 partition_id, u64 gfn, u64 pfn_count, u64 page_count, u32 flags,
+		     u64 mmio_spa, int done, int ret),
+	    TP_ARGS(partition_id, gfn, pfn_count, page_count, flags, mmio_spa, done, ret),
+	    TP_STRUCT__entry(
+		    __field(u64, partition_id)
+		    __field(u64, gfn)
+		    __field(u64, pfn_count)
+		    __field(u64, page_count)
+		    __field(u32, flags)
+		    __field(u64, mmio_spa)
+		    __field(int, done)
+		    __field(int, ret)
+	    ),
+	    TP_fast_assign(
+		    __entry->partition_id = partition_id;
+		    __entry->gfn = gfn;
+		    __entry->page_count = page_count;
+		    __entry->pfn_count = pfn_count;
+		    __entry->flags = flags;
+		    __entry->mmio_spa = mmio_spa;
+		    __entry->done = done;
+		    __entry->ret = ret;
+	    ),
+	    TP_printk("partition_id=%llu gfn=0x%llx pfn_count=%llu page_count=%llu flags=0x%x mmio_spa=0x%llx done=%d ret=%d",
+		    __entry->partition_id,
+		    __entry->gfn,
+		    __entry->pfn_count,
+		    __entry->page_count,
+		    __entry->flags,
+		    __entry->mmio_spa,
+		    __entry->done,
+		    __entry->ret
+	    )
+);
+
 #endif /* _MSHV_TRACE_H_ */
 
 /* This part must be outside protection */



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH 0/7] mshv: Refactor memory region management and map pages at creation
  2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
                   ` (6 preceding siblings ...)
  2026-03-30 20:04 ` [PATCH 7/7] mshv: Add tracepoint for map GPA hypercall Stanislav Kinsburskii
@ 2026-04-13 21:07 ` Michael Kelley
  2026-04-20 16:40   ` Stanislav Kinsburskii
  7 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-13 21:07 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> 
> This series refactors the mshv memory region subsystem in preparation
> for mapping populated pages into the hypervisor at movable region
> creation time, rather than relying solely on demand faulting.
> 
> The primary motivation is to ensure that when userspace passes a
> pre-populated mapping for a movable memory region, those pages are
> immediately visible to the hypervisor. Previously, all movable regions
> were created with HV_MAP_GPA_NO_ACCESS on every page regardless of
> whether the backing pages were already present, deferring all mapping
> to the fault handler. This added unnecessary fault overhead and
> complicated the initial setup of child partitions with pre-populated
> memory.
> 

This is a nice set of changes. Independent of the new functionality
for pre-populating, it improves the code organization and makes
it more regular.

See a few comments on individual patches. I noticed that Sashiko
wasn't able to review the series because it wouldn't apply. Hopefully
your v2 will apply. From what I've seen so far of Sashiko, it finds some
good issues. I did run the patch set through Co-Pilot, but that didn't
have the benefit of the AI prompts that Sashiko provides.

Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 1/7] mshv: Convert from page pointers to PFNs
  2026-03-30 20:04 ` [PATCH 1/7] mshv: Convert from page pointers to PFNs Stanislav Kinsburskii
@ 2026-04-13 21:08   ` Michael Kelley
  2026-04-20 16:21     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-13 21:08 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> 
> The HMM interface returns PFNs from hmm_range_fault(), and the
> hypervisor hypercalls operate on PFNs. Storing page pointers in
> between these interfaces requires unnecessary conversions and
> temporary allocations.
> 
> Store PFNs directly in memory regions to match the natural data flow.
> This eliminates the temporary PFN array allocation in the HMM fault
> path and reduces page_to_pfn() conversions throughout the driver.
> Convert to page structs via pfn_to_page() only when operations like
> unpin_user_page() require them.

General comment for this series:  PFN fields are typed as "unsigned long".
But pfn_offset and pfn_count are "u64".  GFNs are also "u64".  Any
reason not to make PFNs also "u64"? I know that pfn_valid() takes
an "unsigned long" input, but see comment below about pfn_valid().

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c      |  297 ++++++++++++++++++++++------------------
>  drivers/hv/mshv_root.h         |   20 +--
>  drivers/hv/mshv_root_hv_call.c |   50 +++----
>  drivers/hv/mshv_root_main.c    |   30 ++--
>  4 files changed, 212 insertions(+), 185 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index fdffd4f002f6..b1a707d16c07 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -18,12 +18,13 @@
>  #include "mshv_root.h"
> 
>  #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
> +#define MSHV_INVALID_PFN				ULONG_MAX
> 
>  /**
>   * mshv_chunk_stride - Compute stride for mapping guest memory
>   * @page      : The page to check for huge page backing
>   * @gfn       : Guest frame number for the mapping
> - * @page_count: Total number of pages in the mapping
> + * @pfn_count: Total number of pages in the mapping

Nit: The colons are misaligned after this change.

>   *
>   * Determines the appropriate stride (in pages) for mapping guest memory.
>   * Uses huge page stride if the backing page is huge and the guest mapping
> @@ -32,18 +33,18 @@
>   * Return: Stride in pages, or -EINVAL if page order is unsupported.
>   */
>  static int mshv_chunk_stride(struct page *page,
> -			     u64 gfn, u64 page_count)
> +			     u64 gfn, u64 pfn_count)
>  {
>  	unsigned int page_order;
> 
>  	/*
>  	 * Use single page stride by default. For huge page stride, the
>  	 * page must be compound and point to the head of the compound
> -	 * page, and both gfn and page_count must be huge-page aligned.
> +	 * page, and both gfn and pfn_count must be huge-page aligned.
>  	 */
>  	if (!PageCompound(page) || !PageHead(page) ||
>  	    !IS_ALIGNED(gfn, PTRS_PER_PMD) ||
> -	    !IS_ALIGNED(page_count, PTRS_PER_PMD))
> +	    !IS_ALIGNED(pfn_count, PTRS_PER_PMD))
>  		return 1;
> 
>  	page_order = folio_order(page_folio(page));
> @@ -57,60 +58,61 @@ static int mshv_chunk_stride(struct page *page,
>  /**
>   * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
>   *                             in a region.
> - * @region     : Pointer to the memory region structure.
> - * @flags      : Flags to pass to the handler.
> - * @page_offset: Offset into the region's pages array to start processing.
> - * @page_count : Number of pages to process.
> - * @handler    : Callback function to handle the chunk.
> + * @region    : Pointer to the memory region structure.
> + * @flags     : Flags to pass to the handler.
> + * @pfn_offset: Offset into the region's PFNs array to start processing.
> + * @pfn_count : Number of PFNs to process.
> + * @handler   : Callback function to handle the chunk.
>   *
> - * This function scans the region's pages starting from @page_offset,
> - * checking for contiguous present pages of the same size (normal or huge).
> - * It invokes @handler for the chunk of contiguous pages found. Returns the
> - * number of pages handled, or a negative error code if the first page is
> - * not present or the handler fails.
> + * This function scans the region's PFNs starting from @pfn_offset,
> + * checking for contiguous valid PFNs backed by pages of the same size
> + * (normal or huge). It invokes @handler for the chunk of contiguous valid
> + * PFNs found. Returns the number of PFNs handled, or a negative error code
> + * if the first PFN is invalid or the handler fails.
>   *
> - * Note: The @handler callback must be able to handle both normal and huge
> - * pages.
> + * Note: The @handler callback must be able to handle valid PFNs backed by
> + * both normal and huge pages.
>   *
>   * Return: Number of pages handled, or negative error code.
>   */
> -static long mshv_region_process_chunk(struct mshv_mem_region *region,
> -				      u32 flags,
> -				      u64 page_offset, u64 page_count,
> -				      int (*handler)(struct mshv_mem_region *region,
> -						     u32 flags,
> -						     u64 page_offset,
> -						     u64 page_count,
> -						     bool huge_page))
> +static long mshv_region_process_pfns(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 pfn_offset, u64 pfn_count,
> +				     int (*handler)(struct mshv_mem_region *region,
> +						    u32 flags,
> +						    u64 pfn_offset,
> +						    u64 pfn_count,
> +						    bool huge_page))
>  {
> -	u64 gfn = region->start_gfn + page_offset;
> +	u64 gfn = region->start_gfn + pfn_offset;
>  	u64 count;
> -	struct page *page;
> +	unsigned long pfn;
>  	int stride, ret;
> 
> -	page = region->mreg_pages[page_offset];
> -	if (!page)
> +	pfn = region->mreg_pfns[pfn_offset];
> +	if (!pfn_valid(pfn))
>  		return -EINVAL;
> 
> -	stride = mshv_chunk_stride(page, gfn, page_count);
> +	stride = mshv_chunk_stride(pfn_to_page(pfn), gfn, pfn_count);
>  	if (stride < 0)
>  		return stride;
> 
>  	/* Start at stride since the first stride is validated */
> -	for (count = stride; count < page_count; count += stride) {
> -		page = region->mreg_pages[page_offset + count];
> +	for (count = stride; count < pfn_count ; count += stride) {
> +		pfn = region->mreg_pfns[pfn_offset + count];
> 
> -		/* Break if current page is not present */
> -		if (!page)
> +		/* Break if current pfn is invalid */
> +		if (!pfn_valid(pfn))

pfn_valid() is a relatively expensive test to be doing in a loop
on what may be every single page. It does an RCU lock/unlock
and make other checks that aren't necessary here. Since
mreg_pfns[] is populated from mm calls, the only invalid PFNs
would be MSHV_INVALID_PFN that code in this module has
explicitly put there. Just testing against MSHV_INVALID_PFN
would be a lot faster here and elsewhere in this module. It's
really a "pfn set/not set" test. Defining a pfn_set() macro
here in this module that tests against MSHV_INVALID_PFN
would accomplish the same thing more efficiently.

>  			break;
> 
>  		/* Break if stride size changes */
> -		if (stride != mshv_chunk_stride(page, gfn + count,
> -						page_count - count))
> +		if (stride != mshv_chunk_stride(pfn_to_page(pfn),
> +						gfn + count,
> +						pfn_count - count))
>  			break;
>  	}
> 
> -	ret = handler(region, flags, page_offset, count, stride > 1);
> +	ret = handler(region, flags, pfn_offset, count, stride > 1);
>  	if (ret)
>  		return ret;
> 
> @@ -118,70 +120,73 @@ static long mshv_region_process_chunk(struct mshv_mem_region *region,
>  }
> 
>  /**
> - * mshv_region_process_range - Processes a range of memory pages in a
> - *                             region.
> - * @region     : Pointer to the memory region structure.
> - * @flags      : Flags to pass to the handler.
> - * @page_offset: Offset into the region's pages array to start processing.
> - * @page_count : Number of pages to process.
> - * @handler    : Callback function to handle each chunk of contiguous
> - *               pages.
> + * mshv_region_process_range - Processes a range of PFNs in a region.
> + * @region    : Pointer to the memory region structure.
> + * @flags     : Flags to pass to the handler.
> + * @pfn_offset: Offset into the region's PFNs array to start processing.
> + * @pfn_count : Number of PFNs to process.
> + * @handler   : Callback function to handle each chunk of contiguous
> + *              valid PFNs.
>   *
> - * Iterates over the specified range of pages in @region, skipping
> - * non-present pages. For each contiguous chunk of present pages, invokes
> - * @handler via mshv_region_process_chunk.
> + * Iterates over the specified range of PFNs in @region, skipping
> + * invalid PFNs. For each contiguous chunk of valid PFNS, invokes
> + * @handler via mshv_region_process_pfns.
>   *
> - * Note: The @handler callback must be able to handle both normal and huge
> - * pages.
> + * Note: The @handler callback must be able to handle PFNs backed by both
> + * normal and huge pages.
>   *
>   * Returns 0 on success, or a negative error code on failure.
>   */
>  static int mshv_region_process_range(struct mshv_mem_region *region,
>  				     u32 flags,
> -				     u64 page_offset, u64 page_count,
> +				     u64 pfn_offset, u64 pfn_count,
>  				     int (*handler)(struct mshv_mem_region *region,
>  						    u32 flags,
> -						    u64 page_offset,
> -						    u64 page_count,
> +						    u64 pfn_offset,
> +						    u64 pfn_count,
>  						    bool huge_page))
>  {
> +	u64 pfn_end;

In Patch 2 of this series, "pfn_end" is changed to just "end", and
the references are adjusted. Patch 2 could be a few lines smaller if it
was named "end" here and Patch 2 didn't have to change it.

>  	long ret;
> 
> -	if (page_offset + page_count > region->nr_pages)
> +	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
> +		return -EOVERFLOW;
> +
> +	if (pfn_end > region->nr_pfns)
>  		return -EINVAL;
> 
> -	while (page_count) {
> +	while (pfn_count) {
>  		/* Skip non-present pages */
> -		if (!region->mreg_pages[page_offset]) {
> -			page_offset++;
> -			page_count--;
> +		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
> +			pfn_offset++;
> +			pfn_count--;
>  			continue;
>  		}
> 
> -		ret = mshv_region_process_chunk(region, flags,
> -						page_offset,
> -						page_count,
> -						handler);
> +		ret = mshv_region_process_pfns(region, flags,
> +					       pfn_offset, pfn_count,
> +					       handler);
>  		if (ret < 0)
>  			return ret;
> 
> -		page_offset += ret;
> -		page_count -= ret;
> +		pfn_offset += ret;
> +		pfn_count -= ret;
>  	}
> 
>  	return 0;
>  }
> 
> -struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> +struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pfns,
>  					   u64 uaddr, u32 flags)
>  {
>  	struct mshv_mem_region *region;
> +	u64 i;
> 
> -	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
> +	region = vzalloc(sizeof(*region) + sizeof(unsigned long) * nr_pfns);

Use struct_size(region, mreg_pfns, nr_pfns) instead of open coding the arithmetic?

>  	if (!region)
>  		return ERR_PTR(-ENOMEM);
> 
> -	region->nr_pages = nr_pages;
> +	region->nr_pfns = nr_pfns;
>  	region->start_gfn = guest_pfn;
>  	region->start_uaddr = uaddr;
>  	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
> @@ -190,6 +195,9 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
>  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> 
> +	for (i = 0; i < nr_pfns; i++)
> +		region->mreg_pfns[i] = MSHV_INVALID_PFN;
> +
>  	kref_init(&region->mreg_refcount);
> 
>  	return region;
> @@ -197,15 +205,15 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> 
>  static int mshv_region_chunk_share(struct mshv_mem_region *region,
>  				   u32 flags,
> -				   u64 page_offset, u64 page_count,
> +				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
>  	if (huge_page)
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
>  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -					      region->mreg_pages + page_offset,
> -					      page_count,
> +					      region->mreg_pfns + pfn_offset,
> +					      pfn_count,
>  					      HV_MAP_GPA_READABLE |
>  					      HV_MAP_GPA_WRITABLE,
>  					      flags, true);
> @@ -216,21 +224,21 @@ int mshv_region_share(struct mshv_mem_region *region)
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> 
>  	return mshv_region_process_range(region, flags,
> -					 0, region->nr_pages,
> +					 0, region->nr_pfns,
>  					 mshv_region_chunk_share);
>  }
> 
>  static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
>  				     u32 flags,
> -				     u64 page_offset, u64 page_count,
> +				     u64 pfn_offset, u64 pfn_count,
>  				     bool huge_page)
>  {
>  	if (huge_page)
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
>  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -					      region->mreg_pages + page_offset,
> -					      page_count, 0,
> +					      region->mreg_pfns + pfn_offset,
> +					      pfn_count, 0,
>  					      flags, false);
>  }
> 
> @@ -239,30 +247,30 @@ int mshv_region_unshare(struct mshv_mem_region *region)
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> 
>  	return mshv_region_process_range(region, flags,
> -					 0, region->nr_pages,
> +					 0, region->nr_pfns,
>  					 mshv_region_chunk_unshare);
>  }
> 
>  static int mshv_region_chunk_remap(struct mshv_mem_region *region,
>  				   u32 flags,
> -				   u64 page_offset, u64 page_count,
> +				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
>  	if (huge_page)
>  		flags |= HV_MAP_GPA_LARGE_PAGE;
> 
> -	return hv_call_map_gpa_pages(region->partition->pt_id,
> -				     region->start_gfn + page_offset,
> -				     page_count, flags,
> -				     region->mreg_pages + page_offset);
> +	return hv_call_map_ram_pfns(region->partition->pt_id,
> +				    region->start_gfn + pfn_offset,
> +				    pfn_count, flags,
> +				    region->mreg_pfns + pfn_offset);
>  }
> 
> -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> -				   u32 map_flags,
> -				   u64 page_offset, u64 page_count)
> +static int mshv_region_remap_pfns(struct mshv_mem_region *region,
> +				  u32 map_flags,
> +				  u64 pfn_offset, u64 pfn_count)
>  {
>  	return mshv_region_process_range(region, map_flags,
> -					 page_offset, page_count,
> +					 pfn_offset, pfn_count,
>  					 mshv_region_chunk_remap);
>  }
> 
> @@ -270,38 +278,50 @@ int mshv_region_map(struct mshv_mem_region *region)
>  {
>  	u32 map_flags = region->hv_map_flags;
> 
> -	return mshv_region_remap_pages(region, map_flags,
> -				       0, region->nr_pages);
> +	return mshv_region_remap_pfns(region, map_flags,
> +				      0, region->nr_pfns);
>  }
> 
> -static void mshv_region_invalidate_pages(struct mshv_mem_region *region,
> -					 u64 page_offset, u64 page_count)
> +static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
> +					u64 pfn_offset, u64 pfn_count)
>  {
> -	if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
> -		unpin_user_pages(region->mreg_pages + page_offset, page_count);
> +	u64 i;
> +
> +	for (i = pfn_offset; i < pfn_offset + pfn_count; i++) {
> +		if (!pfn_valid(region->mreg_pfns[i]))
> +			continue;
> +
> +		if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
> +			unpin_user_page(pfn_to_page(region->mreg_pfns[i]));
> 
> -	memset(region->mreg_pages + page_offset, 0,
> -	       page_count * sizeof(struct page *));
> +		region->mreg_pfns[i] = MSHV_INVALID_PFN;
> +	}
>  }
> 
>  void mshv_region_invalidate(struct mshv_mem_region *region)
>  {
> -	mshv_region_invalidate_pages(region, 0, region->nr_pages);
> +	mshv_region_invalidate_pfns(region, 0, region->nr_pfns);
>  }
> 
>  int mshv_region_pin(struct mshv_mem_region *region)
>  {
> -	u64 done_count, nr_pages;
> +	u64 done_count, nr_pfns, i;
> +	unsigned long *pfns;
>  	struct page **pages;
>  	__u64 userspace_addr;
>  	int ret;
> 
> -	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
> -		pages = region->mreg_pages + done_count;
> +	pages = kmalloc_array(MSHV_PIN_PAGES_BATCH_SIZE,
> +			      sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	for (done_count = 0; done_count < region->nr_pfns; done_count += ret) {
> +		pfns = region->mreg_pfns + done_count;
>  		userspace_addr = region->start_uaddr +
>  				 done_count * HV_HYP_PAGE_SIZE;
> -		nr_pages = min(region->nr_pages - done_count,
> -			       MSHV_PIN_PAGES_BATCH_SIZE);
> +		nr_pfns = min(region->nr_pfns - done_count,
> +			      MSHV_PIN_PAGES_BATCH_SIZE);
> 
>  		/*
>  		 * Pinning assuming 4k pages works for large pages too.
> @@ -311,39 +331,44 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  		 * with the FOLL_LONGTERM flag does a large temporary
>  		 * allocation of contiguous memory.
>  		 */
> -		ret = pin_user_pages_fast(userspace_addr, nr_pages,
> +		ret = pin_user_pages_fast(userspace_addr, nr_pfns,
>  					  FOLL_WRITE | FOLL_LONGTERM,
>  					  pages);
> -		if (ret != nr_pages)
> +		if (ret != nr_pfns)
>  			goto release_pages;
> +
> +		for (i = 0; i < ret; i++)
> +			pfns[i] = page_to_pfn(pages[i]);
>  	}
> 
> +	kfree(pages);
>  	return 0;
> 
>  release_pages:
>  	if (ret > 0)
>  		done_count += ret;
> -	mshv_region_invalidate_pages(region, 0, done_count);
> +	mshv_region_invalidate_pfns(region, 0, done_count);
> +	kfree(pages);
>  	return ret < 0 ? ret : -ENOMEM;
>  }
> 
>  static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
>  				   u32 flags,
> -				   u64 page_offset, u64 page_count,
> +				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
>  	if (huge_page)
>  		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> 
> -	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> -				       region->start_gfn + page_offset,
> -				       page_count, flags);
> +	return hv_call_unmap_pfns(region->partition->pt_id,
> +				  region->start_gfn + pfn_offset,
> +				  pfn_count, flags);
>  }
> 
>  static int mshv_region_unmap(struct mshv_mem_region *region)
>  {
>  	return mshv_region_process_range(region, 0,
> -					 0, region->nr_pages,
> +					 0, region->nr_pfns,
>  					 mshv_region_chunk_unmap);
>  }
> 
> @@ -427,8 +452,8 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>  /**
>   * mshv_region_range_fault - Handle memory range faults for a given region.
>   * @region: Pointer to the memory region structure.
> - * @page_offset: Offset of the page within the region.
> - * @page_count: Number of pages to handle.
> + * @pfn_offset: Offset of the page within the region.
> + * @pfn_count: Number of pages to handle.
>   *
>   * This function resolves memory faults for a specified range of pages
>   * within a memory region. It uses HMM (Heterogeneous Memory Management)
> @@ -437,7 +462,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>   * Return: 0 on success, negative error code on failure.
>   */
>  static int mshv_region_range_fault(struct mshv_mem_region *region,
> -				   u64 page_offset, u64 page_count)
> +				   u64 pfn_offset, u64 pfn_count)
>  {
>  	struct hmm_range range = {
>  		.notifier = &region->mreg_mni,
> @@ -447,13 +472,13 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
>  	int ret;
>  	u64 i;
> 
> -	pfns = kmalloc_array(page_count, sizeof(*pfns), GFP_KERNEL);
> +	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
>  	if (!pfns)
>  		return -ENOMEM;
> 
>  	range.hmm_pfns = pfns;
> -	range.start = region->start_uaddr + page_offset * HV_HYP_PAGE_SIZE;
> -	range.end = range.start + page_count * HV_HYP_PAGE_SIZE;
> +	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
> +	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
> 
>  	do {
>  		ret = mshv_region_hmm_fault_and_lock(region, &range);
> @@ -462,11 +487,15 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
>  	if (ret)
>  		goto out;
> 
> -	for (i = 0; i < page_count; i++)
> -		region->mreg_pages[page_offset + i] = hmm_pfn_to_page(pfns[i]);
> +	for (i = 0; i < pfn_count; i++) {
> +		if (!(pfns[i] & HMM_PFN_VALID))
> +			continue;
> +		/* Drop HMM_PFN_* flags to ensure PFNs are valid. */
> +		region->mreg_pfns[pfn_offset + i] = pfns[i] & ~HMM_PFN_FLAGS;
> +	}
> 
> -	ret = mshv_region_remap_pages(region, region->hv_map_flags,
> -				      page_offset, page_count);
> +	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
> +				     pfn_offset, pfn_count);
> 
>  	mutex_unlock(&region->mreg_mutex);
>  out:
> @@ -476,24 +505,24 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> 
>  bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
>  {
> -	u64 page_offset, page_count;
> +	u64 pfn_offset, pfn_count;
>  	int ret;
> 
>  	/* Align the page offset to the nearest MSHV_MAP_FAULT_IN_PAGES. */
> -	page_offset = ALIGN_DOWN(gfn - region->start_gfn,
> -				 MSHV_MAP_FAULT_IN_PAGES);
> +	pfn_offset = ALIGN_DOWN(gfn - region->start_gfn,
> +				MSHV_MAP_FAULT_IN_PAGES);
> 
>  	/* Map more pages than requested to reduce the number of faults. */
> -	page_count = min(region->nr_pages - page_offset,
> -			 MSHV_MAP_FAULT_IN_PAGES);
> +	pfn_count = min(region->nr_pfns - pfn_offset,
> +			MSHV_MAP_FAULT_IN_PAGES);
> 
> -	ret = mshv_region_range_fault(region, page_offset, page_count);
> +	ret = mshv_region_range_fault(region, pfn_offset, pfn_count);
> 
>  	WARN_ONCE(ret,
> -		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, page_offset %llu, page_count %llu\n",
> +		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, pfn_offset %llu, pfn_count %llu\n",
>  		  region->partition->pt_id, region->start_uaddr,
> -		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
> -		  gfn, page_offset, page_count);
> +		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
> +		  gfn, pfn_offset, pfn_count);
> 
>  	return !ret;
>  }
> @@ -523,16 +552,16 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
>  	struct mshv_mem_region *region = container_of(mni,
>  						      struct mshv_mem_region,
>  						      mreg_mni);
> -	u64 page_offset, page_count;
> +	u64 pfn_offset, pfn_count;
>  	unsigned long mstart, mend;
>  	int ret = -EPERM;
> 
>  	mstart = max(range->start, region->start_uaddr);
>  	mend = min(range->end, region->start_uaddr +
> -		   (region->nr_pages << HV_HYP_PAGE_SHIFT));
> +		   (region->nr_pfns << HV_HYP_PAGE_SHIFT));
> 
> -	page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
> -	page_count = HVPFN_DOWN(mend - mstart);
> +	pfn_offset = HVPFN_DOWN(mstart - region->start_uaddr);
> +	pfn_count = HVPFN_DOWN(mend - mstart);
> 
>  	if (mmu_notifier_range_blockable(range))
>  		mutex_lock(&region->mreg_mutex);
> @@ -541,12 +570,12 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
> 
>  	mmu_interval_set_seq(mni, cur_seq);
> 
> -	ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
> -				      page_offset, page_count);
> +	ret = mshv_region_remap_pfns(region, HV_MAP_GPA_NO_ACCESS,
> +				     pfn_offset, pfn_count);
>  	if (ret)
>  		goto out_unlock;
> 
> -	mshv_region_invalidate_pages(region, page_offset, page_count);
> +	mshv_region_invalidate_pfns(region, pfn_offset, pfn_count);
> 
>  	mutex_unlock(&region->mreg_mutex);
> 
> @@ -558,9 +587,9 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
>  	WARN_ONCE(ret,
>  		  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
>  		  region->start_uaddr,
> -		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
> +		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
>  		  range->start, range->end, range->event,
> -		  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
> +		  pfn_offset, pfn_offset + pfn_count - 1, (u64)range->mm, ret);
>  	return false;
>  }
> 
> @@ -579,7 +608,7 @@ bool mshv_region_movable_init(struct mshv_mem_region *region)
> 
>  	ret = mmu_interval_notifier_insert(&region->mreg_mni, current->mm,
>  					   region->start_uaddr,
> -					   region->nr_pages << HV_HYP_PAGE_SHIFT,
> +					   region->nr_pfns << HV_HYP_PAGE_SHIFT,
>  					   &mshv_region_mni_ops);
>  	if (ret)
>  		return false;
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 947dfb76bb19..f1d4bee97a3f 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -84,15 +84,15 @@ enum mshv_region_type {
>  struct mshv_mem_region {
>  	struct hlist_node hnode;
>  	struct kref mreg_refcount;
> -	u64 nr_pages;
> +	u64 nr_pfns;
>  	u64 start_gfn;
>  	u64 start_uaddr;
>  	u32 hv_map_flags;
>  	struct mshv_partition *partition;
>  	enum mshv_region_type mreg_type;
>  	struct mmu_interval_notifier mreg_mni;
> -	struct mutex mreg_mutex;	/* protects region pages remapping */
> -	struct page *mreg_pages[];
> +	struct mutex mreg_mutex;	/* protects region PFNs remapping */
> +	unsigned long mreg_pfns[];
>  };
> 
>  struct mshv_irq_ack_notifier {
> @@ -282,11 +282,11 @@ int hv_call_create_partition(u64 flags,
>  int hv_call_initialize_partition(u64 partition_id);
>  int hv_call_finalize_partition(u64 partition_id);
>  int hv_call_delete_partition(u64 partition_id);
> -int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64
> numpgs);
> -int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> -			  u32 flags, struct page **pages);
> -int hv_call_unmap_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> -			    u32 flags);
> +int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs);
> +int hv_call_map_ram_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
> +			 u32 flags, unsigned long *pfns);
> +int hv_call_unmap_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
> +		       u32 flags);
>  int hv_call_delete_vp(u64 partition_id, u32 vp_index);
>  int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
>  				     u64 dest_addr,
> @@ -329,8 +329,8 @@ int hv_map_stats_page(enum hv_stats_object_type type,
>  int hv_unmap_stats_page(enum hv_stats_object_type type,
>  			struct hv_stats_page *page_addr,
>  			const union hv_stats_object_identity *identity);
> -int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> -				   u64 page_struct_count, u32 host_access,
> +int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
> +				   u64 pfns_count, u32 host_access,
>  				   u32 flags, u8 acquire);
>  int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
>  				      void *property_value, size_t property_value_sz);
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index cb55d4d4be2e..a95f2cfc5da5 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -188,17 +188,16 @@ int hv_call_delete_partition(u64 partition_id)
>  	return hv_result_to_errno(status);
>  }
> 
> -/* Ask the hypervisor to map guest ram pages or the guest mmio space */
> -static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> -			       u32 flags, struct page **pages, u64 mmio_spa)
> +static int hv_do_map_pfns(u64 partition_id, u64 gfn, u64 pfns_count,
> +			  u32 flags, unsigned long *pfns, u64 mmio_spa)
>  {
>  	struct hv_input_map_gpa_pages *input_page;
>  	u64 status, *pfnlist;
>  	unsigned long irq_flags, large_shift = 0;
>  	int ret = 0, done = 0;
> -	u64 page_count = page_struct_count;
> +	u64 page_count = pfns_count;
> 
> -	if (page_count == 0 || (pages && mmio_spa))
> +	if (page_count == 0 || (pfns && mmio_spa))
>  		return -EINVAL;
> 
>  	if (flags & HV_MAP_GPA_LARGE_PAGE) {
> @@ -227,14 +226,14 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
>  		for (i = 0; i < rep_count; i++)
>  			if (flags & HV_MAP_GPA_NO_ACCESS) {
>  				pfnlist[i] = 0;
> -			} else if (pages) {
> +			} else if (pfns) {
>  				u64 index = (done + i) << large_shift;
> 
> -				if (index >= page_struct_count) {
> +				if (index >= pfns_count) {
>  					ret = -EINVAL;
>  					break;
>  				}
> -				pfnlist[i] = page_to_pfn(pages[index]);
> +				pfnlist[i] = pfns[index];
>  			} else {
>  				pfnlist[i] = mmio_spa + done + i;
>  			}
> @@ -266,37 +265,37 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> 
>  		if (flags & HV_MAP_GPA_LARGE_PAGE)
>  			unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> -		hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
> +		hv_call_unmap_pfns(partition_id, gfn, done, unmap_flags);
>  	}
> 
>  	return ret;
>  }
> 
>  /* Ask the hypervisor to map guest ram pages */
> -int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> -			  u32 flags, struct page **pages)
> +int hv_call_map_ram_pfns(u64 partition_id, u64 gfn, u64 pfn_count,
> +			 u32 flags, unsigned long *pfns)
>  {
> -	return hv_do_map_gpa_hcall(partition_id, gpa_target, page_count,
> -				   flags, pages, 0);
> +	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
> +			      pfns, 0);
>  }
> 
> -/* Ask the hypervisor to map guest mmio space */
> -int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs)
> +int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa,
> +			  u64 pfn_count)
>  {
>  	int i;
>  	u32 flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE |
>  		    HV_MAP_GPA_NOT_CACHED;
> 
> -	for (i = 0; i < numpgs; i++)
> +	for (i = 0; i < pfn_count; i++)
>  		if (page_is_ram(mmio_spa + i))
>  			return -EINVAL;
> 
> -	return hv_do_map_gpa_hcall(partition_id, gfn, numpgs, flags, NULL,
> -				   mmio_spa);
> +	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
> +			      NULL, mmio_spa);
>  }
> 
> -int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
> -			    u32 flags)
> +int hv_call_unmap_pfns(u64 partition_id, u64 gfn, u64 page_count_4k,
> +		       u32 flags)
>  {
>  	struct hv_input_unmap_gpa_pages *input_page;
>  	u64 status, page_count = page_count_4k;
> @@ -1009,15 +1008,15 @@ int hv_unmap_stats_page(enum hv_stats_object_type type,
>  	return ret;
>  }
> 
> -int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> -				   u64 page_struct_count, u32 host_access,
> +int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
> +				   u64 pfns_count, u32 host_access,
>  				   u32 flags, u8 acquire)
>  {
>  	struct hv_input_modify_sparse_spa_page_host_access *input_page;
>  	u64 status;
>  	int done = 0;
>  	unsigned long irq_flags, large_shift = 0;
> -	u64 page_count = page_struct_count;
> +	u64 page_count = pfns_count;
>  	u16 code = acquire ? HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS :
>  			     HVCALL_RELEASE_SPARSE_SPA_PAGE_HOST_ACCESS;
> 
> @@ -1051,11 +1050,10 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>  		for (i = 0; i < rep_count; i++) {
>  			u64 index = (done + i) << large_shift;
> 
> -			if (index >= page_struct_count)
> +			if (index >= pfns_count)
>  				return -EINVAL;
> 
> -			input_page->spa_page_list[i] =
> -						page_to_pfn(pages[index]);
> +			input_page->spa_page_list[i] = pfns[index];
>  		}
> 
>  		status = hv_do_rep_hypercall(code, rep_count, 0, input_page,
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index f2d83d6c8c4f..685e4b562186 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -619,7 +619,7 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
> 
>  	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
>  		if (gfn >= region->start_gfn &&
> -		    gfn < region->start_gfn + region->nr_pages)
> +		    gfn < region->start_gfn + region->nr_pfns)
>  			return region;
>  	}
> 
> @@ -1221,20 +1221,20 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>  					bool is_mmio)
>  {
>  	struct mshv_mem_region *rg;
> -	u64 nr_pages = HVPFN_DOWN(mem->size);
> +	u64 nr_pfns = HVPFN_DOWN(mem->size);
> 
>  	/* Reject overlapping regions */
>  	spin_lock(&partition->pt_mem_regions_lock);
>  	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> +		if (mem->guest_pfn + nr_pfns <= rg->start_gfn ||
> +		    rg->start_gfn + rg->nr_pfns <= mem->guest_pfn)
>  			continue;
>  		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EEXIST;
>  	}
>  	spin_unlock(&partition->pt_mem_regions_lock);
> 
> -	rg = mshv_region_create(mem->guest_pfn, nr_pages,
> +	rg = mshv_region_create(mem->guest_pfn, nr_pfns,
>  				mem->userspace_addr, mem->flags);
>  	if (IS_ERR(rg))
>  		return PTR_ERR(rg);
> @@ -1372,21 +1372,21 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  		 * the hypervisor track dirty pages, enabling pre-copy live
>  		 * migration.
>  		 */
> -		ret = hv_call_map_gpa_pages(partition->pt_id,
> -					    region->start_gfn,
> -					    region->nr_pages,
> -					    HV_MAP_GPA_NO_ACCESS, NULL);
> +		ret = hv_call_map_ram_pfns(partition->pt_id,
> +					   region->start_gfn,
> +					   region->nr_pfns,
> +					   HV_MAP_GPA_NO_ACCESS, NULL);
>  		break;
>  	case MSHV_REGION_TYPE_MMIO:
> -		ret = hv_call_map_mmio_pages(partition->pt_id,
> -					     region->start_gfn,
> -					     mmio_pfn,
> -					     region->nr_pages);
> +		ret = hv_call_map_mmio_pfns(partition->pt_id,
> +					    region->start_gfn,
> +					    mmio_pfn,
> +					    region->nr_pfns);
>  		break;
>  	}
> 
>  	trace_mshv_map_user_memory(partition->pt_id, region->start_uaddr,
> -				   region->start_gfn, region->nr_pages,
> +				   region->start_gfn, region->nr_pfns,
>  				   region->hv_map_flags, ret);
> 
>  	if (ret)
> @@ -1424,7 +1424,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
>  	/* Paranoia check */
>  	if (region->start_uaddr != mem.userspace_addr ||
>  	    region->start_gfn != mem.guest_pfn ||
> -	    region->nr_pages != HVPFN_DOWN(mem.size)) {
> +	    region->nr_pfns != HVPFN_DOWN(mem.size)) {
>  		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EINVAL;
>  	}
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 2/7] mshv: Add support to address range holes remapping
  2026-03-30 20:04 ` [PATCH 2/7] mshv: Add support to address range holes remapping Stanislav Kinsburskii
@ 2026-04-13 21:08   ` Michael Kelley
  2026-04-20 16:24     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-13 21:08 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> 
> Consolidate memory region processing to handle both valid and invalid PFNs
> uniformly. This eliminates code duplication across remap, unmap, share, and
> unshare operations by using a common range processing interface.
> 
> Holes are now remapped with no-access permissions to enable
> hypervisor dirty page tracking for precopy live migration.
> 
> This refactoring is a precursor to an upcoming change that will map
> present pages in movable regions upon region creation, requiring
> consistent handling of both mapped and unmapped ranges.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |  108
> ++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 95 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index b1a707d16c07..ed9c55841140 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -119,6 +119,57 @@ static long mshv_region_process_pfns(struct mshv_mem_region *region,
>  	return count;
>  }
> 
> +/**
> + * mshv_region_process_hole - Handle a hole (invalid PFNs) in a memory
> + *                            region
> + * @region    : Memory region containing the hole
> + * @flags     : Flags to pass to the handler function
> + * @pfn_offset: Starting PFN offset within the region
> + * @pfn_count : Number of PFNs in the hole
> + * @handler   : Callback function to invoke for the hole
> + *
> + * Invokes the handler function for a contiguous hole with the specified
> + * parameters.
> + *
> + * Return: Number of PFNs handled, or negative error code.
> + */
> +static long mshv_region_process_hole(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 pfn_offset, u64 pfn_count,
> +				     int (*handler)(struct mshv_mem_region *region,
> +						    u32 flags,
> +						    u64 pfn_offset,
> +						    u64 pfn_count,
> +						    bool huge_page))
> +{
> +	long ret;
> +
> +	ret = handler(region, flags, pfn_offset, pfn_count, 0);
> +	if (ret)
> +		return ret;
> +
> +	return pfn_count;
> +}
> +
> +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> +				      u32 flags,
> +				      u64 pfn_offset, u64 pfn_count,
> +				      int (*handler)(struct mshv_mem_region *region,
> +						     u32 flags,
> +						     u64 pfn_offset,
> +						     u64 pfn_count,
> +						     bool huge_page))
> +{
> +	if (pfn_valid(region->mreg_pfns[pfn_offset]))
> +		return mshv_region_process_pfns(region, flags,
> +				pfn_offset, pfn_count,
> +				handler);
> +	else
> +		return mshv_region_process_hole(region, flags,
> +				pfn_offset, pfn_count,
> +				handler);
> +}
> +
>  /**
>   * mshv_region_process_range - Processes a range of PFNs in a region.
>   * @region    : Pointer to the memory region structure.
> @@ -146,33 +197,47 @@ static int mshv_region_process_range(struct mshv_mem_region *region,
>  						    u64 pfn_count,
>  						    bool huge_page))
>  {
> -	u64 pfn_end;
> +	u64 start, end;
>  	long ret;
> 
> -	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
> +	if (!pfn_count)
> +		return 0;
> +
> +	if (check_add_overflow(pfn_offset, pfn_count, &end))
>  		return -EOVERFLOW;
> 
> -	if (pfn_end > region->nr_pfns)
> +	if (end > region->nr_pfns)
>  		return -EINVAL;
> 
> -	while (pfn_count) {
> -		/* Skip non-present pages */
> -		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
> -			pfn_offset++;
> -			pfn_count--;
> +	start = pfn_offset;
> +	end = pfn_offset + 1;
> +
> +	while (end < pfn_offset + pfn_count) {
> +		/*
> +		 * Accumulate contiguous pfns with the same validity
> +		 * (valid or not).
> +		 */
> +		if (pfn_valid(region->mreg_pfns[start]) ==
> +		    pfn_valid(region->mreg_pfns[end])) {
> +			end++;
>  			continue;
>  		}
> 
> -		ret = mshv_region_process_pfns(region, flags,
> -					       pfn_offset, pfn_count,
> -					       handler);
> +		ret = mshv_region_process_chunk(region, flags,
> +						start, end - start,
> +						handler);
>  		if (ret < 0)
>  			return ret;
> 
> -		pfn_offset += ret;
> -		pfn_count -= ret;
> +		start += ret;
>  	}
> 
> +	ret = mshv_region_process_chunk(region, flags,
> +					start, end - start,
> +					handler);
> +	if (ret < 0)
> +		return ret;
> +
>  	return 0;
>  }
> 
> @@ -208,6 +273,9 @@ static int mshv_region_chunk_share(struct mshv_mem_region *region,
>  				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
> +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> +		return -EINVAL;
> +
>  	if (huge_page)
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
> @@ -233,6 +301,9 @@ static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
>  				     u64 pfn_offset, u64 pfn_count,
>  				     bool huge_page)
>  {
> +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> +		return -EINVAL;
> +
>  	if (huge_page)
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
> @@ -256,6 +327,14 @@ static int mshv_region_chunk_remap(struct mshv_mem_region *region,
>  				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
> +	/*
> +	 * Remap missing pages with no access to let the
> +	 * hypervisor track dirty pages, enabling precopy live
> +	 * migration.
> +	 */
> +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> +		flags = HV_MAP_GPA_NO_ACCESS;

Is it OK to wipe out any other flags that might be set? Certainly, any previous
flags in PERMISSIONS_MASK should be removed, but what about ADJUSTABLE
and NOT_CACHED?

> +
>  	if (huge_page)
>  		flags |= HV_MAP_GPA_LARGE_PAGE;
> 
> @@ -357,6 +436,9 @@ static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
>  				   u64 pfn_offset, u64 pfn_count,
>  				   bool huge_page)
>  {
> +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> +		return 0;
> +
>  	if (huge_page)
>  		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 3/7] mshv: Support regions with different VMAs
  2026-03-30 20:04 ` [PATCH 3/7] mshv: Support regions with different VMAs Stanislav Kinsburskii
@ 2026-04-13 21:08   ` Michael Kelley
  2026-04-20 16:29     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-13 21:08 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> 
> Allow HMM fault handling across memory regions that span multiple VMAs
> with different protection flags. The previous implementation assumed a
> single VMA per region, which would fail when guest memory crosses VMA
> boundaries.
> 
> Iterate through VMAs within the range and handle each separately with
> appropriate protection flags, enabling more flexible memory region
> configurations for partitions.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |   72 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index ed9c55841140..1bb1bfe177e2 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -492,37 +492,72 @@ int mshv_region_get(struct mshv_mem_region *region)
>  }
> 
>  /**
> - * mshv_region_hmm_fault_and_lock - Handle HMM faults and lock the memory region
> + * mshv_region_hmm_fault_and_lock - Handle HMM faults across VMAs and lock
> + *                                  the memory region
>   * @region: Pointer to the memory region structure
> - * @range: Pointer to the HMM range structure
> + * @start : Starting virtual address of the range to fault
> + * @end   : Ending virtual address of the range to fault (exclusive)
> + * @pfns  : Output array for page frame numbers with HMM flags
>   *
>   * This function performs the following steps:
>   * 1. Reads the notifier sequence for the HMM range.
>   * 2. Acquires a read lock on the memory map.
> - * 3. Handles HMM faults for the specified range.
> - * 4. Releases the read lock on the memory map.
> - * 5. If successful, locks the memory region mutex.
> - * 6. Verifies if the notifier sequence has changed during the operation.
> - *    If it has, releases the mutex and returns -EBUSY to match with
> - *    hmm_range_fault() return code for repeating.
> + * 3. Iterates through VMAs in the specified range, handling each
> + *    separately with appropriate protection flags (HMM_PFN_REQ_WRITE set
> + *    based on VMA flags).
> + * 4. Handles HMM faults for each VMA segment.
> + * 5. Releases the read lock on the memory map.
> + * 6. If successful, locks the memory region mutex.
> + * 7. Verifies if the notifier sequence has changed during the operation.
> + *    If it has, releases the mutex and returns -EBUSY to signal retry.
> + *
> + * The function expects the range [start, end] is backed by valid VMAs.

Use "[start, end)" to describe the range since end is exclusive.

> + * Returns -EFAULT if any address in the range is not covered by a VMA.
>   *
>   * Return: 0 on success, a negative error code otherwise.
>   */
>  static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> -					  struct hmm_range *range)
> +					  unsigned long start,
> +					  unsigned long end,
> +					  unsigned long *pfns)
>  {
> +	struct hmm_range range = {
> +		.notifier = &region->mreg_mni,
> +	};
>  	int ret;
> 
> -	range->notifier_seq = mmu_interval_read_begin(range->notifier);
> +	range.notifier_seq = mmu_interval_read_begin(range.notifier);
>  	mmap_read_lock(region->mreg_mni.mm);
> -	ret = hmm_range_fault(range);
> +	while (start < end) {
> +		struct vm_area_struct *vma;
> +
> +		vma = vma_lookup(current->mm, start);

The mmap_read_lock() was obtained on region->mreg_mni.mm, but the
lookup is done against current->mm. Maybe these are the same, but
it looks wrong.  (Pointed out by a Co-Pilot AI review.)

> +		if (!vma) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		range.hmm_pfns = pfns;
> +		range.start = start;
> +		range.end = min(vma->vm_end, end);
> +		range.default_flags = HMM_PFN_REQ_FAULT;
> +		if (vma->vm_flags & VM_WRITE)
> +			range.default_flags |= HMM_PFN_REQ_WRITE;
> +
> +		ret = hmm_range_fault(&range);
> +		if (ret)
> +			break;
> +
> +		start = range.end + 1;

Since range.end is exclusive, the +1 should not be done.

> +		pfns += DIV_ROUND_UP(range.end - range.start, PAGE_SIZE);

Just to confirm, range.end and range.start should always be page aligned,
right? So the ROUND_UP should never kick in.

> +	}
>  	mmap_read_unlock(region->mreg_mni.mm);
>  	if (ret)
>  		return ret;
> 
>  	mutex_lock(&region->mreg_mutex);
> 
> -	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
> +	if (mmu_interval_read_retry(range.notifier, range.notifier_seq)) {
>  		mutex_unlock(&region->mreg_mutex);
>  		cond_resched();
>  		return -EBUSY;
> @@ -546,10 +581,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>  static int mshv_region_range_fault(struct mshv_mem_region *region,
>  				   u64 pfn_offset, u64 pfn_count)
>  {
> -	struct hmm_range range = {
> -		.notifier = &region->mreg_mni,
> -		.default_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
> -	};
> +	unsigned long start, end;
>  	unsigned long *pfns;
>  	int ret;
>  	u64 i;
> @@ -558,12 +590,12 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
>  	if (!pfns)
>  		return -ENOMEM;
> 
> -	range.hmm_pfns = pfns;
> -	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
> -	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
> +	start = region->start_uaddr + pfn_offset * PAGE_SIZE;
> +	end = start + pfn_count * PAGE_SIZE;
> 
>  	do {
> -		ret = mshv_region_hmm_fault_and_lock(region, &range);
> +		ret = mshv_region_hmm_fault_and_lock(region, start, end,
> +						     pfns);
>  	} while (ret == -EBUSY);
> 
>  	if (ret)
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 5/7] mshv: Map populated pages on movable region creation
  2026-03-30 20:04 ` [PATCH 5/7] mshv: Map populated pages on movable region creation Stanislav Kinsburskii
@ 2026-04-13 21:09   ` Michael Kelley
  2026-04-20 16:35     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-13 21:09 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:05 PM
> 
> Map any populated pages into the hypervisor upfront when creating a
> movable region, rather than waiting for faults. Previously, movable
> regions were created with all pages marked as HV_MAP_GPA_NO_ACCESS
> regardless of whether the userspace mapping contained populated pages.
> 
> This guarantees that if the caller passes a populated mapping, those
> present pages will be mapped into the hypervisor immediately during
> region creation instead of being faulted in later.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c   |   65 ++++++++++++++++++++++++++++++++-----------
>  drivers/hv/mshv_root.h      |    1 +
>  drivers/hv/mshv_root_main.c |   10 +------
>  3 files changed, 50 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index 133ec7771812..28d3f488d89f 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -519,7 +519,8 @@ int mshv_region_get(struct mshv_mem_region *region)
>  static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>  					  unsigned long start,
>  					  unsigned long end,
> -					  unsigned long *pfns)
> +					  unsigned long *pfns,
> +					  bool do_fault)
>  {
>  	struct hmm_range range = {
>  		.notifier = &region->mreg_mni,
> @@ -540,9 +541,12 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>  		range.hmm_pfns = pfns;
>  		range.start = start;
>  		range.end = min(vma->vm_end, end);
> -		range.default_flags = HMM_PFN_REQ_FAULT;
> -		if (vma->vm_flags & VM_WRITE)
> -			range.default_flags |= HMM_PFN_REQ_WRITE;
> +		range.default_flags = 0;
> +		if (do_fault) {
> +			range.default_flags = HMM_PFN_REQ_FAULT;
> +			if (vma->vm_flags & VM_WRITE)
> +				range.default_flags |= HMM_PFN_REQ_WRITE;
> +		}
> 
>  		ret = hmm_range_fault(&range);
>  		if (ret)
> @@ -567,26 +571,40 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
>  }
> 
>  /**
> - * mshv_region_range_fault - Handle memory range faults for a given region.
> - * @region: Pointer to the memory region structure.
> - * @pfn_offset: Offset of the page within the region.
> - * @pfn_count: Number of pages to handle.
> + * mshv_region_collect_and_map - Collect PFNs for a user range and map them
> + * @region    : memory region being processed
> + * @pfn_offset: PFNs offset within the region
> + * @pfn_count : number of PFNs to process
> + * @do_fault  : if true, fault in missing pages;
> + *              if false, collect only present pages
>   *
> - * This function resolves memory faults for a specified range of pages
> - * within a memory region. It uses HMM (Heterogeneous Memory Management)
> - * to fault in the required pages and updates the region's page array.
> + * Collects PFNs for the specified portion of @region from the
> + * corresponding userspace VMA and maps them into the hypervisor. The

Actually, this should be "userspace VMAs" (i.e., plural)

> + * behavior depends on @do_fault:
>   *
> - * Return: 0 on success, negative error code on failure.
> + * - true: Fault in missing pages from userspace, ensuring all pages in the
> + *   range are present. Used for on-demand page population.
> + * - false: Collect PFNs only for pages already present in userspace,
> + *   leaving missing pages as invalid PFN markers.
> + *   Used for initial region setup.
> + *
> + * Collected PFNs are stored in region->mreg_pfns[] with HMM bookkeeping
> + * flags cleared, then the range is mapped into the hypervisor. Present
> + * PFNs get mapped with region access permissions; missing PFNs (zero
> + * entries) get mapped with no-access permissions.

Hmmm. The missing PFNs are just skipped and the mreg_pfns[] array
is not updated. Is the corresponding entry in mreg_pfns[] known to
already be set to MSHV_INVALID_PFN? When mapping a new movable
region, that appears to be so. I'm less sure about the 
mshv_region_range_fault() case, though mshv_region_invalidate_pfns()
does such initialization of any entries that are invalidated. At that point
in the code, I'd add a comment about that assumption, as it took me a
bit to figure it out.

So does the comment about "zero entries" refer to what is returned
by hmm_range_fault() via mshv_region_hmm_fault_and_lock()?
The mention of "zero entries" here is a bit confusing.

> + *
> + * Return: 0 on success, negative errno on failure.
>   */
> -static int mshv_region_range_fault(struct mshv_mem_region *region,
> -				   u64 pfn_offset, u64 pfn_count)
> +static int mshv_region_collect_and_map(struct mshv_mem_region *region,
> +				       u64 pfn_offset, u64 pfn_count,
> +				       bool do_fault)
>  {
>  	unsigned long start, end;
>  	unsigned long *pfns;
>  	int ret;
>  	u64 i;
> 
> -	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
> +	pfns = vmalloc_array(pfn_count, sizeof(unsigned long));
>  	if (!pfns)
>  		return -ENOMEM;
> 
> @@ -595,7 +613,7 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> 
>  	do {
>  		ret = mshv_region_hmm_fault_and_lock(region, start, end,
> -						     pfns);
> +						     pfns, do_fault);
>  	} while (ret == -EBUSY);
> 
>  	if (ret)
> @@ -613,10 +631,17 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> 
>  	mutex_unlock(&region->mreg_mutex);
>  out:
> -	kfree(pfns);
> +	vfree(pfns);
>  	return ret;
>  }
> 
> +static int mshv_region_range_fault(struct mshv_mem_region *region,
> +				   u64 pfn_offset, u64 pfn_count)
> +{
> +	return mshv_region_collect_and_map(region, pfn_offset, pfn_count,
> +					   true);
> +}
> +
>  bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
>  {
>  	u64 pfn_offset, pfn_count;
> @@ -800,3 +825,9 @@ int mshv_map_pinned_region(struct mshv_mem_region
> *region)
>  err_out:
>  	return ret;
>  }
> +
> +int mshv_map_movable_region(struct mshv_mem_region *region)
> +{
> +	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
> +					   false);
> +}
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index d2e65a137bf4..02c1c11f701c 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -374,5 +374,6 @@ bool mshv_region_handle_gfn_fault(struct mshv_mem_region
> *region, u64 gfn);
>  void mshv_region_movable_fini(struct mshv_mem_region *region);
>  bool mshv_region_movable_init(struct mshv_mem_region *region);
>  int mshv_map_pinned_region(struct mshv_mem_region *region);
> +int mshv_map_movable_region(struct mshv_mem_region *region);
> 
>  #endif /* _MSHV_ROOT_H_ */
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index c393b5144e0b..91dab2a3bc92 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1299,15 +1299,7 @@ mshv_map_user_memory(struct mshv_partition
> *partition,
>  		ret = mshv_map_pinned_region(region);
>  		break;
>  	case MSHV_REGION_TYPE_MEM_MOVABLE:
> -		/*
> -		 * For movable memory regions, remap with no access to let
> -		 * the hypervisor track dirty pages, enabling pre-copy live
> -		 * migration.
> -		 */
> -		ret = hv_call_map_ram_pfns(partition->pt_id,
> -					   region->start_gfn,
> -					   region->nr_pfns,
> -					   HV_MAP_GPA_NO_ACCESS, NULL);
> +		ret = mshv_map_movable_region(region);
>  		break;
>  	case MSHV_REGION_TYPE_MMIO:
>  		ret = hv_call_map_mmio_pfns(partition->pt_id,
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/7] mshv: Convert from page pointers to PFNs
  2026-04-13 21:08   ` Michael Kelley
@ 2026-04-20 16:21     ` Stanislav Kinsburskii
  2026-04-20 17:18       ` Michael Kelley
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 16:21 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 13, 2026 at 09:08:16PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > 
> > The HMM interface returns PFNs from hmm_range_fault(), and the
> > hypervisor hypercalls operate on PFNs. Storing page pointers in
> > between these interfaces requires unnecessary conversions and
> > temporary allocations.
> > 
> > Store PFNs directly in memory regions to match the natural data flow.
> > This eliminates the temporary PFN array allocation in the HMM fault
> > path and reduces page_to_pfn() conversions throughout the driver.
> > Convert to page structs via pfn_to_page() only when operations like
> > unpin_user_page() require them.
> 
> General comment for this series:  PFN fields are typed as "unsigned long".
> But pfn_offset and pfn_count are "u64".  GFNs are also "u64".  Any
> reason not to make PFNs also "u64"? I know that pfn_valid() takes
> an "unsigned long" input, but see comment below about pfn_valid().
> 

The only reason is to keep the type consistent with the standard Linux
kernel definition of PFN as unsigned long.

> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c      |  297 ++++++++++++++++++++++------------------
> >  drivers/hv/mshv_root.h         |   20 +--
> >  drivers/hv/mshv_root_hv_call.c |   50 +++----
> >  drivers/hv/mshv_root_main.c    |   30 ++--
> >  4 files changed, 212 insertions(+), 185 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index fdffd4f002f6..b1a707d16c07 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -18,12 +18,13 @@
> >  #include "mshv_root.h"
> > 
> >  #define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
> > +#define MSHV_INVALID_PFN				ULONG_MAX
> > 
> >  /**
> >   * mshv_chunk_stride - Compute stride for mapping guest memory
> >   * @page      : The page to check for huge page backing
> >   * @gfn       : Guest frame number for the mapping
> > - * @page_count: Total number of pages in the mapping
> > + * @pfn_count: Total number of pages in the mapping
> 
> Nit: The colons are misaligned after this change.
> 
> >   *
> >   * Determines the appropriate stride (in pages) for mapping guest memory.
> >   * Uses huge page stride if the backing page is huge and the guest mapping
> > @@ -32,18 +33,18 @@
> >   * Return: Stride in pages, or -EINVAL if page order is unsupported.
> >   */
> >  static int mshv_chunk_stride(struct page *page,
> > -			     u64 gfn, u64 page_count)
> > +			     u64 gfn, u64 pfn_count)
> >  {
> >  	unsigned int page_order;
> > 
> >  	/*
> >  	 * Use single page stride by default. For huge page stride, the
> >  	 * page must be compound and point to the head of the compound
> > -	 * page, and both gfn and page_count must be huge-page aligned.
> > +	 * page, and both gfn and pfn_count must be huge-page aligned.
> >  	 */
> >  	if (!PageCompound(page) || !PageHead(page) ||
> >  	    !IS_ALIGNED(gfn, PTRS_PER_PMD) ||
> > -	    !IS_ALIGNED(page_count, PTRS_PER_PMD))
> > +	    !IS_ALIGNED(pfn_count, PTRS_PER_PMD))
> >  		return 1;
> > 
> >  	page_order = folio_order(page_folio(page));
> > @@ -57,60 +58,61 @@ static int mshv_chunk_stride(struct page *page,
> >  /**
> >   * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> >   *                             in a region.
> > - * @region     : Pointer to the memory region structure.
> > - * @flags      : Flags to pass to the handler.
> > - * @page_offset: Offset into the region's pages array to start processing.
> > - * @page_count : Number of pages to process.
> > - * @handler    : Callback function to handle the chunk.
> > + * @region    : Pointer to the memory region structure.
> > + * @flags     : Flags to pass to the handler.
> > + * @pfn_offset: Offset into the region's PFNs array to start processing.
> > + * @pfn_count : Number of PFNs to process.
> > + * @handler   : Callback function to handle the chunk.
> >   *
> > - * This function scans the region's pages starting from @page_offset,
> > - * checking for contiguous present pages of the same size (normal or huge).
> > - * It invokes @handler for the chunk of contiguous pages found. Returns the
> > - * number of pages handled, or a negative error code if the first page is
> > - * not present or the handler fails.
> > + * This function scans the region's PFNs starting from @pfn_offset,
> > + * checking for contiguous valid PFNs backed by pages of the same size
> > + * (normal or huge). It invokes @handler for the chunk of contiguous valid
> > + * PFNs found. Returns the number of PFNs handled, or a negative error code
> > + * if the first PFN is invalid or the handler fails.
> >   *
> > - * Note: The @handler callback must be able to handle both normal and huge
> > - * pages.
> > + * Note: The @handler callback must be able to handle valid PFNs backed by
> > + * both normal and huge pages.
> >   *
> >   * Return: Number of pages handled, or negative error code.
> >   */
> > -static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > -				      u32 flags,
> > -				      u64 page_offset, u64 page_count,
> > -				      int (*handler)(struct mshv_mem_region *region,
> > -						     u32 flags,
> > -						     u64 page_offset,
> > -						     u64 page_count,
> > -						     bool huge_page))
> > +static long mshv_region_process_pfns(struct mshv_mem_region *region,
> > +				     u32 flags,
> > +				     u64 pfn_offset, u64 pfn_count,
> > +				     int (*handler)(struct mshv_mem_region *region,
> > +						    u32 flags,
> > +						    u64 pfn_offset,
> > +						    u64 pfn_count,
> > +						    bool huge_page))
> >  {
> > -	u64 gfn = region->start_gfn + page_offset;
> > +	u64 gfn = region->start_gfn + pfn_offset;
> >  	u64 count;
> > -	struct page *page;
> > +	unsigned long pfn;
> >  	int stride, ret;
> > 
> > -	page = region->mreg_pages[page_offset];
> > -	if (!page)
> > +	pfn = region->mreg_pfns[pfn_offset];
> > +	if (!pfn_valid(pfn))
> >  		return -EINVAL;
> > 
> > -	stride = mshv_chunk_stride(page, gfn, page_count);
> > +	stride = mshv_chunk_stride(pfn_to_page(pfn), gfn, pfn_count);
> >  	if (stride < 0)
> >  		return stride;
> > 
> >  	/* Start at stride since the first stride is validated */
> > -	for (count = stride; count < page_count; count += stride) {
> > -		page = region->mreg_pages[page_offset + count];
> > +	for (count = stride; count < pfn_count ; count += stride) {
> > +		pfn = region->mreg_pfns[pfn_offset + count];
> > 
> > -		/* Break if current page is not present */
> > -		if (!page)
> > +		/* Break if current pfn is invalid */
> > +		if (!pfn_valid(pfn))
> 
> pfn_valid() is a relatively expensive test to be doing in a loop
> on what may be every single page. It does an RCU lock/unlock
> and make other checks that aren't necessary here. Since
> mreg_pfns[] is populated from mm calls, the only invalid PFNs
> would be MSHV_INVALID_PFN that code in this module has
> explicitly put there. Just testing against MSHV_INVALID_PFN
> would be a lot faster here and elsewhere in this module. It's
> really a "pfn set/not set" test. Defining a pfn_set() macro
> here in this module that tests against MSHV_INVALID_PFN
> would accomplish the same thing more efficiently.
> 

Yes, we could do it the way you suggest. For completeness, I should add
that pfn_valid() is expensive only on 32-bit ARM and ARC, which we
don’t care about.

> >  			break;
> > 
> >  		/* Break if stride size changes */
> > -		if (stride != mshv_chunk_stride(page, gfn + count,
> > -						page_count - count))
> > +		if (stride != mshv_chunk_stride(pfn_to_page(pfn),
> > +						gfn + count,
> > +						pfn_count - count))
> >  			break;
> >  	}
> > 
> > -	ret = handler(region, flags, page_offset, count, stride > 1);
> > +	ret = handler(region, flags, pfn_offset, count, stride > 1);
> >  	if (ret)
> >  		return ret;
> > 
> > @@ -118,70 +120,73 @@ static long mshv_region_process_chunk(struct mshv_mem_region *region,
> >  }
> > 
> >  /**
> > - * mshv_region_process_range - Processes a range of memory pages in a
> > - *                             region.
> > - * @region     : Pointer to the memory region structure.
> > - * @flags      : Flags to pass to the handler.
> > - * @page_offset: Offset into the region's pages array to start processing.
> > - * @page_count : Number of pages to process.
> > - * @handler    : Callback function to handle each chunk of contiguous
> > - *               pages.
> > + * mshv_region_process_range - Processes a range of PFNs in a region.
> > + * @region    : Pointer to the memory region structure.
> > + * @flags     : Flags to pass to the handler.
> > + * @pfn_offset: Offset into the region's PFNs array to start processing.
> > + * @pfn_count : Number of PFNs to process.
> > + * @handler   : Callback function to handle each chunk of contiguous
> > + *              valid PFNs.
> >   *
> > - * Iterates over the specified range of pages in @region, skipping
> > - * non-present pages. For each contiguous chunk of present pages, invokes
> > - * @handler via mshv_region_process_chunk.
> > + * Iterates over the specified range of PFNs in @region, skipping
> > + * invalid PFNs. For each contiguous chunk of valid PFNS, invokes
> > + * @handler via mshv_region_process_pfns.
> >   *
> > - * Note: The @handler callback must be able to handle both normal and huge
> > - * pages.
> > + * Note: The @handler callback must be able to handle PFNs backed by both
> > + * normal and huge pages.
> >   *
> >   * Returns 0 on success, or a negative error code on failure.
> >   */
> >  static int mshv_region_process_range(struct mshv_mem_region *region,
> >  				     u32 flags,
> > -				     u64 page_offset, u64 page_count,
> > +				     u64 pfn_offset, u64 pfn_count,
> >  				     int (*handler)(struct mshv_mem_region *region,
> >  						    u32 flags,
> > -						    u64 page_offset,
> > -						    u64 page_count,
> > +						    u64 pfn_offset,
> > +						    u64 pfn_count,
> >  						    bool huge_page))
> >  {
> > +	u64 pfn_end;
> 
> In Patch 2 of this series, "pfn_end" is changed to just "end", and
> the references are adjusted. Patch 2 could be a few lines smaller if it
> was named "end" here and Patch 2 didn't have to change it.
> 

Sure, can do.

> >  	long ret;
> > 
> > -	if (page_offset + page_count > region->nr_pages)
> > +	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
> > +		return -EOVERFLOW;
> > +
> > +	if (pfn_end > region->nr_pfns)
> >  		return -EINVAL;
> > 
> > -	while (page_count) {
> > +	while (pfn_count) {
> >  		/* Skip non-present pages */
> > -		if (!region->mreg_pages[page_offset]) {
> > -			page_offset++;
> > -			page_count--;
> > +		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
> > +			pfn_offset++;
> > +			pfn_count--;
> >  			continue;
> >  		}
> > 
> > -		ret = mshv_region_process_chunk(region, flags,
> > -						page_offset,
> > -						page_count,
> > -						handler);
> > +		ret = mshv_region_process_pfns(region, flags,
> > +					       pfn_offset, pfn_count,
> > +					       handler);
> >  		if (ret < 0)
> >  			return ret;
> > 
> > -		page_offset += ret;
> > -		page_count -= ret;
> > +		pfn_offset += ret;
> > +		pfn_count -= ret;
> >  	}
> > 
> >  	return 0;
> >  }
> > 
> > -struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> > +struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pfns,
> >  					   u64 uaddr, u32 flags)
> >  {
> >  	struct mshv_mem_region *region;
> > +	u64 i;
> > 
> > -	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
> > +	region = vzalloc(sizeof(*region) + sizeof(unsigned long) * nr_pfns);
> 
> Use struct_size(region, mreg_pfns, nr_pfns) instead of open coding the arithmetic?
> 

This is new to me. Sure, will do.

Thanks,
Stanislav

> >  	if (!region)
> >  		return ERR_PTR(-ENOMEM);
> > 
> > -	region->nr_pages = nr_pages;
> > +	region->nr_pfns = nr_pfns;
> >  	region->start_gfn = guest_pfn;
> >  	region->start_uaddr = uaddr;
> >  	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
> > @@ -190,6 +195,9 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> >  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
> >  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> > 
> > +	for (i = 0; i < nr_pfns; i++)
> > +		region->mreg_pfns[i] = MSHV_INVALID_PFN;
> > +
> >  	kref_init(&region->mreg_refcount);
> > 
> >  	return region;
> > @@ -197,15 +205,15 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> > 
> >  static int mshv_region_chunk_share(struct mshv_mem_region *region,
> >  				   u32 flags,
> > -				   u64 page_offset, u64 page_count,
> > +				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> >  	if (huge_page)
> >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > 
> >  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > -					      region->mreg_pages + page_offset,
> > -					      page_count,
> > +					      region->mreg_pfns + pfn_offset,
> > +					      pfn_count,
> >  					      HV_MAP_GPA_READABLE |
> >  					      HV_MAP_GPA_WRITABLE,
> >  					      flags, true);
> > @@ -216,21 +224,21 @@ int mshv_region_share(struct mshv_mem_region *region)
> >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> > 
> >  	return mshv_region_process_range(region, flags,
> > -					 0, region->nr_pages,
> > +					 0, region->nr_pfns,
> >  					 mshv_region_chunk_share);
> >  }
> > 
> >  static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> >  				     u32 flags,
> > -				     u64 page_offset, u64 page_count,
> > +				     u64 pfn_offset, u64 pfn_count,
> >  				     bool huge_page)
> >  {
> >  	if (huge_page)
> >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > 
> >  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > -					      region->mreg_pages + page_offset,
> > -					      page_count, 0,
> > +					      region->mreg_pfns + pfn_offset,
> > +					      pfn_count, 0,
> >  					      flags, false);
> >  }
> > 
> > @@ -239,30 +247,30 @@ int mshv_region_unshare(struct mshv_mem_region *region)
> >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> > 
> >  	return mshv_region_process_range(region, flags,
> > -					 0, region->nr_pages,
> > +					 0, region->nr_pfns,
> >  					 mshv_region_chunk_unshare);
> >  }
> > 
> >  static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> >  				   u32 flags,
> > -				   u64 page_offset, u64 page_count,
> > +				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> >  	if (huge_page)
> >  		flags |= HV_MAP_GPA_LARGE_PAGE;
> > 
> > -	return hv_call_map_gpa_pages(region->partition->pt_id,
> > -				     region->start_gfn + page_offset,
> > -				     page_count, flags,
> > -				     region->mreg_pages + page_offset);
> > +	return hv_call_map_ram_pfns(region->partition->pt_id,
> > +				    region->start_gfn + pfn_offset,
> > +				    pfn_count, flags,
> > +				    region->mreg_pfns + pfn_offset);
> >  }
> > 
> > -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> > -				   u32 map_flags,
> > -				   u64 page_offset, u64 page_count)
> > +static int mshv_region_remap_pfns(struct mshv_mem_region *region,
> > +				  u32 map_flags,
> > +				  u64 pfn_offset, u64 pfn_count)
> >  {
> >  	return mshv_region_process_range(region, map_flags,
> > -					 page_offset, page_count,
> > +					 pfn_offset, pfn_count,
> >  					 mshv_region_chunk_remap);
> >  }
> > 
> > @@ -270,38 +278,50 @@ int mshv_region_map(struct mshv_mem_region *region)
> >  {
> >  	u32 map_flags = region->hv_map_flags;
> > 
> > -	return mshv_region_remap_pages(region, map_flags,
> > -				       0, region->nr_pages);
> > +	return mshv_region_remap_pfns(region, map_flags,
> > +				      0, region->nr_pfns);
> >  }
> > 
> > -static void mshv_region_invalidate_pages(struct mshv_mem_region *region,
> > -					 u64 page_offset, u64 page_count)
> > +static void mshv_region_invalidate_pfns(struct mshv_mem_region *region,
> > +					u64 pfn_offset, u64 pfn_count)
> >  {
> > -	if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
> > -		unpin_user_pages(region->mreg_pages + page_offset, page_count);
> > +	u64 i;
> > +
> > +	for (i = pfn_offset; i < pfn_offset + pfn_count; i++) {
> > +		if (!pfn_valid(region->mreg_pfns[i]))
> > +			continue;
> > +
> > +		if (region->mreg_type == MSHV_REGION_TYPE_MEM_PINNED)
> > +			unpin_user_page(pfn_to_page(region->mreg_pfns[i]));
> > 
> > -	memset(region->mreg_pages + page_offset, 0,
> > -	       page_count * sizeof(struct page *));
> > +		region->mreg_pfns[i] = MSHV_INVALID_PFN;
> > +	}
> >  }
> > 
> >  void mshv_region_invalidate(struct mshv_mem_region *region)
> >  {
> > -	mshv_region_invalidate_pages(region, 0, region->nr_pages);
> > +	mshv_region_invalidate_pfns(region, 0, region->nr_pfns);
> >  }
> > 
> >  int mshv_region_pin(struct mshv_mem_region *region)
> >  {
> > -	u64 done_count, nr_pages;
> > +	u64 done_count, nr_pfns, i;
> > +	unsigned long *pfns;
> >  	struct page **pages;
> >  	__u64 userspace_addr;
> >  	int ret;
> > 
> > -	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
> > -		pages = region->mreg_pages + done_count;
> > +	pages = kmalloc_array(MSHV_PIN_PAGES_BATCH_SIZE,
> > +			      sizeof(struct page *), GFP_KERNEL);
> > +	if (!pages)
> > +		return -ENOMEM;
> > +
> > +	for (done_count = 0; done_count < region->nr_pfns; done_count += ret) {
> > +		pfns = region->mreg_pfns + done_count;
> >  		userspace_addr = region->start_uaddr +
> >  				 done_count * HV_HYP_PAGE_SIZE;
> > -		nr_pages = min(region->nr_pages - done_count,
> > -			       MSHV_PIN_PAGES_BATCH_SIZE);
> > +		nr_pfns = min(region->nr_pfns - done_count,
> > +			      MSHV_PIN_PAGES_BATCH_SIZE);
> > 
> >  		/*
> >  		 * Pinning assuming 4k pages works for large pages too.
> > @@ -311,39 +331,44 @@ int mshv_region_pin(struct mshv_mem_region *region)
> >  		 * with the FOLL_LONGTERM flag does a large temporary
> >  		 * allocation of contiguous memory.
> >  		 */
> > -		ret = pin_user_pages_fast(userspace_addr, nr_pages,
> > +		ret = pin_user_pages_fast(userspace_addr, nr_pfns,
> >  					  FOLL_WRITE | FOLL_LONGTERM,
> >  					  pages);
> > -		if (ret != nr_pages)
> > +		if (ret != nr_pfns)
> >  			goto release_pages;
> > +
> > +		for (i = 0; i < ret; i++)
> > +			pfns[i] = page_to_pfn(pages[i]);
> >  	}
> > 
> > +	kfree(pages);
> >  	return 0;
> > 
> >  release_pages:
> >  	if (ret > 0)
> >  		done_count += ret;
> > -	mshv_region_invalidate_pages(region, 0, done_count);
> > +	mshv_region_invalidate_pfns(region, 0, done_count);
> > +	kfree(pages);
> >  	return ret < 0 ? ret : -ENOMEM;
> >  }
> > 
> >  static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> >  				   u32 flags,
> > -				   u64 page_offset, u64 page_count,
> > +				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> >  	if (huge_page)
> >  		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > 
> > -	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> > -				       region->start_gfn + page_offset,
> > -				       page_count, flags);
> > +	return hv_call_unmap_pfns(region->partition->pt_id,
> > +				  region->start_gfn + pfn_offset,
> > +				  pfn_count, flags);
> >  }
> > 
> >  static int mshv_region_unmap(struct mshv_mem_region *region)
> >  {
> >  	return mshv_region_process_range(region, 0,
> > -					 0, region->nr_pages,
> > +					 0, region->nr_pfns,
> >  					 mshv_region_chunk_unmap);
> >  }
> > 
> > @@ -427,8 +452,8 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >  /**
> >   * mshv_region_range_fault - Handle memory range faults for a given region.
> >   * @region: Pointer to the memory region structure.
> > - * @page_offset: Offset of the page within the region.
> > - * @page_count: Number of pages to handle.
> > + * @pfn_offset: Offset of the page within the region.
> > + * @pfn_count: Number of pages to handle.
> >   *
> >   * This function resolves memory faults for a specified range of pages
> >   * within a memory region. It uses HMM (Heterogeneous Memory Management)
> > @@ -437,7 +462,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >   * Return: 0 on success, negative error code on failure.
> >   */
> >  static int mshv_region_range_fault(struct mshv_mem_region *region,
> > -				   u64 page_offset, u64 page_count)
> > +				   u64 pfn_offset, u64 pfn_count)
> >  {
> >  	struct hmm_range range = {
> >  		.notifier = &region->mreg_mni,
> > @@ -447,13 +472,13 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> >  	int ret;
> >  	u64 i;
> > 
> > -	pfns = kmalloc_array(page_count, sizeof(*pfns), GFP_KERNEL);
> > +	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
> >  	if (!pfns)
> >  		return -ENOMEM;
> > 
> >  	range.hmm_pfns = pfns;
> > -	range.start = region->start_uaddr + page_offset * HV_HYP_PAGE_SIZE;
> > -	range.end = range.start + page_count * HV_HYP_PAGE_SIZE;
> > +	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
> > +	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
> > 
> >  	do {
> >  		ret = mshv_region_hmm_fault_and_lock(region, &range);
> > @@ -462,11 +487,15 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> >  	if (ret)
> >  		goto out;
> > 
> > -	for (i = 0; i < page_count; i++)
> > -		region->mreg_pages[page_offset + i] = hmm_pfn_to_page(pfns[i]);
> > +	for (i = 0; i < pfn_count; i++) {
> > +		if (!(pfns[i] & HMM_PFN_VALID))
> > +			continue;
> > +		/* Drop HMM_PFN_* flags to ensure PFNs are valid. */
> > +		region->mreg_pfns[pfn_offset + i] = pfns[i] & ~HMM_PFN_FLAGS;
> > +	}
> > 
> > -	ret = mshv_region_remap_pages(region, region->hv_map_flags,
> > -				      page_offset, page_count);
> > +	ret = mshv_region_remap_pfns(region, region->hv_map_flags,
> > +				     pfn_offset, pfn_count);
> > 
> >  	mutex_unlock(&region->mreg_mutex);
> >  out:
> > @@ -476,24 +505,24 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> > 
> >  bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
> >  {
> > -	u64 page_offset, page_count;
> > +	u64 pfn_offset, pfn_count;
> >  	int ret;
> > 
> >  	/* Align the page offset to the nearest MSHV_MAP_FAULT_IN_PAGES. */
> > -	page_offset = ALIGN_DOWN(gfn - region->start_gfn,
> > -				 MSHV_MAP_FAULT_IN_PAGES);
> > +	pfn_offset = ALIGN_DOWN(gfn - region->start_gfn,
> > +				MSHV_MAP_FAULT_IN_PAGES);
> > 
> >  	/* Map more pages than requested to reduce the number of faults. */
> > -	page_count = min(region->nr_pages - page_offset,
> > -			 MSHV_MAP_FAULT_IN_PAGES);
> > +	pfn_count = min(region->nr_pfns - pfn_offset,
> > +			MSHV_MAP_FAULT_IN_PAGES);
> > 
> > -	ret = mshv_region_range_fault(region, page_offset, page_count);
> > +	ret = mshv_region_range_fault(region, pfn_offset, pfn_count);
> > 
> >  	WARN_ONCE(ret,
> > -		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, page_offset %llu, page_count %llu\n",
> > +		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, pfn_offset %llu, pfn_count %llu\n",
> >  		  region->partition->pt_id, region->start_uaddr,
> > -		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
> > -		  gfn, page_offset, page_count);
> > +		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
> > +		  gfn, pfn_offset, pfn_count);
> > 
> >  	return !ret;
> >  }
> > @@ -523,16 +552,16 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
> >  	struct mshv_mem_region *region = container_of(mni,
> >  						      struct mshv_mem_region,
> >  						      mreg_mni);
> > -	u64 page_offset, page_count;
> > +	u64 pfn_offset, pfn_count;
> >  	unsigned long mstart, mend;
> >  	int ret = -EPERM;
> > 
> >  	mstart = max(range->start, region->start_uaddr);
> >  	mend = min(range->end, region->start_uaddr +
> > -		   (region->nr_pages << HV_HYP_PAGE_SHIFT));
> > +		   (region->nr_pfns << HV_HYP_PAGE_SHIFT));
> > 
> > -	page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
> > -	page_count = HVPFN_DOWN(mend - mstart);
> > +	pfn_offset = HVPFN_DOWN(mstart - region->start_uaddr);
> > +	pfn_count = HVPFN_DOWN(mend - mstart);
> > 
> >  	if (mmu_notifier_range_blockable(range))
> >  		mutex_lock(&region->mreg_mutex);
> > @@ -541,12 +570,12 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
> > 
> >  	mmu_interval_set_seq(mni, cur_seq);
> > 
> > -	ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
> > -				      page_offset, page_count);
> > +	ret = mshv_region_remap_pfns(region, HV_MAP_GPA_NO_ACCESS,
> > +				     pfn_offset, pfn_count);
> >  	if (ret)
> >  		goto out_unlock;
> > 
> > -	mshv_region_invalidate_pages(region, page_offset, page_count);
> > +	mshv_region_invalidate_pfns(region, pfn_offset, pfn_count);
> > 
> >  	mutex_unlock(&region->mreg_mutex);
> > 
> > @@ -558,9 +587,9 @@ static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
> >  	WARN_ONCE(ret,
> >  		  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
> >  		  region->start_uaddr,
> > -		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
> > +		  region->start_uaddr + (region->nr_pfns << HV_HYP_PAGE_SHIFT),
> >  		  range->start, range->end, range->event,
> > -		  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
> > +		  pfn_offset, pfn_offset + pfn_count - 1, (u64)range->mm, ret);
> >  	return false;
> >  }
> > 
> > @@ -579,7 +608,7 @@ bool mshv_region_movable_init(struct mshv_mem_region *region)
> > 
> >  	ret = mmu_interval_notifier_insert(&region->mreg_mni, current->mm,
> >  					   region->start_uaddr,
> > -					   region->nr_pages << HV_HYP_PAGE_SHIFT,
> > +					   region->nr_pfns << HV_HYP_PAGE_SHIFT,
> >  					   &mshv_region_mni_ops);
> >  	if (ret)
> >  		return false;
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index 947dfb76bb19..f1d4bee97a3f 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -84,15 +84,15 @@ enum mshv_region_type {
> >  struct mshv_mem_region {
> >  	struct hlist_node hnode;
> >  	struct kref mreg_refcount;
> > -	u64 nr_pages;
> > +	u64 nr_pfns;
> >  	u64 start_gfn;
> >  	u64 start_uaddr;
> >  	u32 hv_map_flags;
> >  	struct mshv_partition *partition;
> >  	enum mshv_region_type mreg_type;
> >  	struct mmu_interval_notifier mreg_mni;
> > -	struct mutex mreg_mutex;	/* protects region pages remapping */
> > -	struct page *mreg_pages[];
> > +	struct mutex mreg_mutex;	/* protects region PFNs remapping */
> > +	unsigned long mreg_pfns[];
> >  };
> > 
> >  struct mshv_irq_ack_notifier {
> > @@ -282,11 +282,11 @@ int hv_call_create_partition(u64 flags,
> >  int hv_call_initialize_partition(u64 partition_id);
> >  int hv_call_finalize_partition(u64 partition_id);
> >  int hv_call_delete_partition(u64 partition_id);
> > -int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64
> > numpgs);
> > -int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> > -			  u32 flags, struct page **pages);
> > -int hv_call_unmap_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> > -			    u32 flags);
> > +int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs);
> > +int hv_call_map_ram_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
> > +			 u32 flags, unsigned long *pfns);
> > +int hv_call_unmap_pfns(u64 partition_id, u64 gpa_target, u64 pfn_count,
> > +		       u32 flags);
> >  int hv_call_delete_vp(u64 partition_id, u32 vp_index);
> >  int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
> >  				     u64 dest_addr,
> > @@ -329,8 +329,8 @@ int hv_map_stats_page(enum hv_stats_object_type type,
> >  int hv_unmap_stats_page(enum hv_stats_object_type type,
> >  			struct hv_stats_page *page_addr,
> >  			const union hv_stats_object_identity *identity);
> > -int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> > -				   u64 page_struct_count, u32 host_access,
> > +int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
> > +				   u64 pfns_count, u32 host_access,
> >  				   u32 flags, u8 acquire);
> >  int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
> >  				      void *property_value, size_t property_value_sz);
> > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > index cb55d4d4be2e..a95f2cfc5da5 100644
> > --- a/drivers/hv/mshv_root_hv_call.c
> > +++ b/drivers/hv/mshv_root_hv_call.c
> > @@ -188,17 +188,16 @@ int hv_call_delete_partition(u64 partition_id)
> >  	return hv_result_to_errno(status);
> >  }
> > 
> > -/* Ask the hypervisor to map guest ram pages or the guest mmio space */
> > -static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > -			       u32 flags, struct page **pages, u64 mmio_spa)
> > +static int hv_do_map_pfns(u64 partition_id, u64 gfn, u64 pfns_count,
> > +			  u32 flags, unsigned long *pfns, u64 mmio_spa)
> >  {
> >  	struct hv_input_map_gpa_pages *input_page;
> >  	u64 status, *pfnlist;
> >  	unsigned long irq_flags, large_shift = 0;
> >  	int ret = 0, done = 0;
> > -	u64 page_count = page_struct_count;
> > +	u64 page_count = pfns_count;
> > 
> > -	if (page_count == 0 || (pages && mmio_spa))
> > +	if (page_count == 0 || (pfns && mmio_spa))
> >  		return -EINVAL;
> > 
> >  	if (flags & HV_MAP_GPA_LARGE_PAGE) {
> > @@ -227,14 +226,14 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> >  		for (i = 0; i < rep_count; i++)
> >  			if (flags & HV_MAP_GPA_NO_ACCESS) {
> >  				pfnlist[i] = 0;
> > -			} else if (pages) {
> > +			} else if (pfns) {
> >  				u64 index = (done + i) << large_shift;
> > 
> > -				if (index >= page_struct_count) {
> > +				if (index >= pfns_count) {
> >  					ret = -EINVAL;
> >  					break;
> >  				}
> > -				pfnlist[i] = page_to_pfn(pages[index]);
> > +				pfnlist[i] = pfns[index];
> >  			} else {
> >  				pfnlist[i] = mmio_spa + done + i;
> >  			}
> > @@ -266,37 +265,37 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
> > 
> >  		if (flags & HV_MAP_GPA_LARGE_PAGE)
> >  			unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > -		hv_call_unmap_gpa_pages(partition_id, gfn, done, unmap_flags);
> > +		hv_call_unmap_pfns(partition_id, gfn, done, unmap_flags);
> >  	}
> > 
> >  	return ret;
> >  }
> > 
> >  /* Ask the hypervisor to map guest ram pages */
> > -int hv_call_map_gpa_pages(u64 partition_id, u64 gpa_target, u64 page_count,
> > -			  u32 flags, struct page **pages)
> > +int hv_call_map_ram_pfns(u64 partition_id, u64 gfn, u64 pfn_count,
> > +			 u32 flags, unsigned long *pfns)
> >  {
> > -	return hv_do_map_gpa_hcall(partition_id, gpa_target, page_count,
> > -				   flags, pages, 0);
> > +	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
> > +			      pfns, 0);
> >  }
> > 
> > -/* Ask the hypervisor to map guest mmio space */
> > -int hv_call_map_mmio_pages(u64 partition_id, u64 gfn, u64 mmio_spa, u64 numpgs)
> > +int hv_call_map_mmio_pfns(u64 partition_id, u64 gfn, u64 mmio_spa,
> > +			  u64 pfn_count)
> >  {
> >  	int i;
> >  	u32 flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE |
> >  		    HV_MAP_GPA_NOT_CACHED;
> > 
> > -	for (i = 0; i < numpgs; i++)
> > +	for (i = 0; i < pfn_count; i++)
> >  		if (page_is_ram(mmio_spa + i))
> >  			return -EINVAL;
> > 
> > -	return hv_do_map_gpa_hcall(partition_id, gfn, numpgs, flags, NULL,
> > -				   mmio_spa);
> > +	return hv_do_map_pfns(partition_id, gfn, pfn_count, flags,
> > +			      NULL, mmio_spa);
> >  }
> > 
> > -int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
> > -			    u32 flags)
> > +int hv_call_unmap_pfns(u64 partition_id, u64 gfn, u64 page_count_4k,
> > +		       u32 flags)
> >  {
> >  	struct hv_input_unmap_gpa_pages *input_page;
> >  	u64 status, page_count = page_count_4k;
> > @@ -1009,15 +1008,15 @@ int hv_unmap_stats_page(enum hv_stats_object_type type,
> >  	return ret;
> >  }
> > 
> > -int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> > -				   u64 page_struct_count, u32 host_access,
> > +int hv_call_modify_spa_host_access(u64 partition_id, unsigned long *pfns,
> > +				   u64 pfns_count, u32 host_access,
> >  				   u32 flags, u8 acquire)
> >  {
> >  	struct hv_input_modify_sparse_spa_page_host_access *input_page;
> >  	u64 status;
> >  	int done = 0;
> >  	unsigned long irq_flags, large_shift = 0;
> > -	u64 page_count = page_struct_count;
> > +	u64 page_count = pfns_count;
> >  	u16 code = acquire ? HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS :
> >  			     HVCALL_RELEASE_SPARSE_SPA_PAGE_HOST_ACCESS;
> > 
> > @@ -1051,11 +1050,10 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> >  		for (i = 0; i < rep_count; i++) {
> >  			u64 index = (done + i) << large_shift;
> > 
> > -			if (index >= page_struct_count)
> > +			if (index >= pfns_count)
> >  				return -EINVAL;
> > 
> > -			input_page->spa_page_list[i] =
> > -						page_to_pfn(pages[index]);
> > +			input_page->spa_page_list[i] = pfns[index];
> >  		}
> > 
> >  		status = hv_do_rep_hypercall(code, rep_count, 0, input_page,
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index f2d83d6c8c4f..685e4b562186 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -619,7 +619,7 @@ mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
> > 
> >  	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
> >  		if (gfn >= region->start_gfn &&
> > -		    gfn < region->start_gfn + region->nr_pages)
> > +		    gfn < region->start_gfn + region->nr_pfns)
> >  			return region;
> >  	}
> > 
> > @@ -1221,20 +1221,20 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
> >  					bool is_mmio)
> >  {
> >  	struct mshv_mem_region *rg;
> > -	u64 nr_pages = HVPFN_DOWN(mem->size);
> > +	u64 nr_pfns = HVPFN_DOWN(mem->size);
> > 
> >  	/* Reject overlapping regions */
> >  	spin_lock(&partition->pt_mem_regions_lock);
> >  	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> > -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> > -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> > +		if (mem->guest_pfn + nr_pfns <= rg->start_gfn ||
> > +		    rg->start_gfn + rg->nr_pfns <= mem->guest_pfn)
> >  			continue;
> >  		spin_unlock(&partition->pt_mem_regions_lock);
> >  		return -EEXIST;
> >  	}
> >  	spin_unlock(&partition->pt_mem_regions_lock);
> > 
> > -	rg = mshv_region_create(mem->guest_pfn, nr_pages,
> > +	rg = mshv_region_create(mem->guest_pfn, nr_pfns,
> >  				mem->userspace_addr, mem->flags);
> >  	if (IS_ERR(rg))
> >  		return PTR_ERR(rg);
> > @@ -1372,21 +1372,21 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >  		 * the hypervisor track dirty pages, enabling pre-copy live
> >  		 * migration.
> >  		 */
> > -		ret = hv_call_map_gpa_pages(partition->pt_id,
> > -					    region->start_gfn,
> > -					    region->nr_pages,
> > -					    HV_MAP_GPA_NO_ACCESS, NULL);
> > +		ret = hv_call_map_ram_pfns(partition->pt_id,
> > +					   region->start_gfn,
> > +					   region->nr_pfns,
> > +					   HV_MAP_GPA_NO_ACCESS, NULL);
> >  		break;
> >  	case MSHV_REGION_TYPE_MMIO:
> > -		ret = hv_call_map_mmio_pages(partition->pt_id,
> > -					     region->start_gfn,
> > -					     mmio_pfn,
> > -					     region->nr_pages);
> > +		ret = hv_call_map_mmio_pfns(partition->pt_id,
> > +					    region->start_gfn,
> > +					    mmio_pfn,
> > +					    region->nr_pfns);
> >  		break;
> >  	}
> > 
> >  	trace_mshv_map_user_memory(partition->pt_id, region->start_uaddr,
> > -				   region->start_gfn, region->nr_pages,
> > +				   region->start_gfn, region->nr_pfns,
> >  				   region->hv_map_flags, ret);
> > 
> >  	if (ret)
> > @@ -1424,7 +1424,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
> >  	/* Paranoia check */
> >  	if (region->start_uaddr != mem.userspace_addr ||
> >  	    region->start_gfn != mem.guest_pfn ||
> > -	    region->nr_pages != HVPFN_DOWN(mem.size)) {
> > +	    region->nr_pfns != HVPFN_DOWN(mem.size)) {
> >  		spin_unlock(&partition->pt_mem_regions_lock);
> >  		return -EINVAL;
> >  	}
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] mshv: Add support to address range holes remapping
  2026-04-13 21:08   ` Michael Kelley
@ 2026-04-20 16:24     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 16:24 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 13, 2026 at 09:08:31PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > 
> > Consolidate memory region processing to handle both valid and invalid PFNs
> > uniformly. This eliminates code duplication across remap, unmap, share, and
> > unshare operations by using a common range processing interface.
> > 
> > Holes are now remapped with no-access permissions to enable
> > hypervisor dirty page tracking for precopy live migration.
> > 
> > This refactoring is a precursor to an upcoming change that will map
> > present pages in movable regions upon region creation, requiring
> > consistent handling of both mapped and unmapped ranges.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c |  108
> > ++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 95 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index b1a707d16c07..ed9c55841140 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -119,6 +119,57 @@ static long mshv_region_process_pfns(struct mshv_mem_region *region,
> >  	return count;
> >  }
> > 
> > +/**
> > + * mshv_region_process_hole - Handle a hole (invalid PFNs) in a memory
> > + *                            region
> > + * @region    : Memory region containing the hole
> > + * @flags     : Flags to pass to the handler function
> > + * @pfn_offset: Starting PFN offset within the region
> > + * @pfn_count : Number of PFNs in the hole
> > + * @handler   : Callback function to invoke for the hole
> > + *
> > + * Invokes the handler function for a contiguous hole with the specified
> > + * parameters.
> > + *
> > + * Return: Number of PFNs handled, or negative error code.
> > + */
> > +static long mshv_region_process_hole(struct mshv_mem_region *region,
> > +				     u32 flags,
> > +				     u64 pfn_offset, u64 pfn_count,
> > +				     int (*handler)(struct mshv_mem_region *region,
> > +						    u32 flags,
> > +						    u64 pfn_offset,
> > +						    u64 pfn_count,
> > +						    bool huge_page))
> > +{
> > +	long ret;
> > +
> > +	ret = handler(region, flags, pfn_offset, pfn_count, 0);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return pfn_count;
> > +}
> > +
> > +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > +				      u32 flags,
> > +				      u64 pfn_offset, u64 pfn_count,
> > +				      int (*handler)(struct mshv_mem_region *region,
> > +						     u32 flags,
> > +						     u64 pfn_offset,
> > +						     u64 pfn_count,
> > +						     bool huge_page))
> > +{
> > +	if (pfn_valid(region->mreg_pfns[pfn_offset]))
> > +		return mshv_region_process_pfns(region, flags,
> > +				pfn_offset, pfn_count,
> > +				handler);
> > +	else
> > +		return mshv_region_process_hole(region, flags,
> > +				pfn_offset, pfn_count,
> > +				handler);
> > +}
> > +
> >  /**
> >   * mshv_region_process_range - Processes a range of PFNs in a region.
> >   * @region    : Pointer to the memory region structure.
> > @@ -146,33 +197,47 @@ static int mshv_region_process_range(struct mshv_mem_region *region,
> >  						    u64 pfn_count,
> >  						    bool huge_page))
> >  {
> > -	u64 pfn_end;
> > +	u64 start, end;
> >  	long ret;
> > 
> > -	if (check_add_overflow(pfn_offset, pfn_count, &pfn_end))
> > +	if (!pfn_count)
> > +		return 0;
> > +
> > +	if (check_add_overflow(pfn_offset, pfn_count, &end))
> >  		return -EOVERFLOW;
> > 
> > -	if (pfn_end > region->nr_pfns)
> > +	if (end > region->nr_pfns)
> >  		return -EINVAL;
> > 
> > -	while (pfn_count) {
> > -		/* Skip non-present pages */
> > -		if (!pfn_valid(region->mreg_pfns[pfn_offset])) {
> > -			pfn_offset++;
> > -			pfn_count--;
> > +	start = pfn_offset;
> > +	end = pfn_offset + 1;
> > +
> > +	while (end < pfn_offset + pfn_count) {
> > +		/*
> > +		 * Accumulate contiguous pfns with the same validity
> > +		 * (valid or not).
> > +		 */
> > +		if (pfn_valid(region->mreg_pfns[start]) ==
> > +		    pfn_valid(region->mreg_pfns[end])) {
> > +			end++;
> >  			continue;
> >  		}
> > 
> > -		ret = mshv_region_process_pfns(region, flags,
> > -					       pfn_offset, pfn_count,
> > -					       handler);
> > +		ret = mshv_region_process_chunk(region, flags,
> > +						start, end - start,
> > +						handler);
> >  		if (ret < 0)
> >  			return ret;
> > 
> > -		pfn_offset += ret;
> > -		pfn_count -= ret;
> > +		start += ret;
> >  	}
> > 
> > +	ret = mshv_region_process_chunk(region, flags,
> > +					start, end - start,
> > +					handler);
> > +	if (ret < 0)
> > +		return ret;
> > +
> >  	return 0;
> >  }
> > 
> > @@ -208,6 +273,9 @@ static int mshv_region_chunk_share(struct mshv_mem_region *region,
> >  				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> > +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> > +		return -EINVAL;
> > +
> >  	if (huge_page)
> >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > 
> > @@ -233,6 +301,9 @@ static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> >  				     u64 pfn_offset, u64 pfn_count,
> >  				     bool huge_page)
> >  {
> > +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> > +		return -EINVAL;
> > +
> >  	if (huge_page)
> >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > 
> > @@ -256,6 +327,14 @@ static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> >  				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> > +	/*
> > +	 * Remap missing pages with no access to let the
> > +	 * hypervisor track dirty pages, enabling precopy live
> > +	 * migration.
> > +	 */
> > +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> > +		flags = HV_MAP_GPA_NO_ACCESS;
> 
> Is it OK to wipe out any other flags that might be set? Certainly, any previous
> flags in PERMISSIONS_MASK should be removed, but what about ADJUSTABLE
> and NOT_CACHED?
> 

Yes, this is the right approach. The HV_MAP_GPA_NO_ACCESS flag will
immediately cause a hypervisor fault on any access to the page. So
caching and adjustability no longer matter.

Thanks,
Stanislav

> > +
> >  	if (huge_page)
> >  		flags |= HV_MAP_GPA_LARGE_PAGE;
> > 
> > @@ -357,6 +436,9 @@ static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> >  				   u64 pfn_offset, u64 pfn_count,
> >  				   bool huge_page)
> >  {
> > +	if (!pfn_valid(region->mreg_pfns[pfn_offset]))
> > +		return 0;
> > +
> >  	if (huge_page)
> >  		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/7] mshv: Support regions with different VMAs
  2026-04-13 21:08   ` Michael Kelley
@ 2026-04-20 16:29     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 16:29 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 13, 2026 at 09:08:52PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > 
> > Allow HMM fault handling across memory regions that span multiple VMAs
> > with different protection flags. The previous implementation assumed a
> > single VMA per region, which would fail when guest memory crosses VMA
> > boundaries.
> > 
> > Iterate through VMAs within the range and handle each separately with
> > appropriate protection flags, enabling more flexible memory region
> > configurations for partitions.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c |   72 +++++++++++++++++++++++++++++++++------------
> >  1 file changed, 52 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index ed9c55841140..1bb1bfe177e2 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -492,37 +492,72 @@ int mshv_region_get(struct mshv_mem_region *region)
> >  }
> > 
> >  /**
> > - * mshv_region_hmm_fault_and_lock - Handle HMM faults and lock the memory region
> > + * mshv_region_hmm_fault_and_lock - Handle HMM faults across VMAs and lock
> > + *                                  the memory region
> >   * @region: Pointer to the memory region structure
> > - * @range: Pointer to the HMM range structure
> > + * @start : Starting virtual address of the range to fault
> > + * @end   : Ending virtual address of the range to fault (exclusive)
> > + * @pfns  : Output array for page frame numbers with HMM flags
> >   *
> >   * This function performs the following steps:
> >   * 1. Reads the notifier sequence for the HMM range.
> >   * 2. Acquires a read lock on the memory map.
> > - * 3. Handles HMM faults for the specified range.
> > - * 4. Releases the read lock on the memory map.
> > - * 5. If successful, locks the memory region mutex.
> > - * 6. Verifies if the notifier sequence has changed during the operation.
> > - *    If it has, releases the mutex and returns -EBUSY to match with
> > - *    hmm_range_fault() return code for repeating.
> > + * 3. Iterates through VMAs in the specified range, handling each
> > + *    separately with appropriate protection flags (HMM_PFN_REQ_WRITE set
> > + *    based on VMA flags).
> > + * 4. Handles HMM faults for each VMA segment.
> > + * 5. Releases the read lock on the memory map.
> > + * 6. If successful, locks the memory region mutex.
> > + * 7. Verifies if the notifier sequence has changed during the operation.
> > + *    If it has, releases the mutex and returns -EBUSY to signal retry.
> > + *
> > + * The function expects the range [start, end] is backed by valid VMAs.
> 
> Use "[start, end)" to describe the range since end is exclusive.
> 

Will do

> > + * Returns -EFAULT if any address in the range is not covered by a VMA.
> >   *
> >   * Return: 0 on success, a negative error code otherwise.
> >   */
> >  static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> > -					  struct hmm_range *range)
> > +					  unsigned long start,
> > +					  unsigned long end,
> > +					  unsigned long *pfns)
> >  {
> > +	struct hmm_range range = {
> > +		.notifier = &region->mreg_mni,
> > +	};
> >  	int ret;
> > 
> > -	range->notifier_seq = mmu_interval_read_begin(range->notifier);
> > +	range.notifier_seq = mmu_interval_read_begin(range.notifier);
> >  	mmap_read_lock(region->mreg_mni.mm);
> > -	ret = hmm_range_fault(range);
> > +	while (start < end) {
> > +		struct vm_area_struct *vma;
> > +
> > +		vma = vma_lookup(current->mm, start);
> 
> The mmap_read_lock() was obtained on region->mreg_mni.mm, but the
> lookup is done against current->mm. Maybe these are the same, but
> it looks wrong.  (Pointed out by a Co-Pilot AI review.)
> 

Yes, they arethe same, but I'll update to use the same mm for clarity.

> > +		if (!vma) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		range.hmm_pfns = pfns;
> > +		range.start = start;
> > +		range.end = min(vma->vm_end, end);
> > +		range.default_flags = HMM_PFN_REQ_FAULT;
> > +		if (vma->vm_flags & VM_WRITE)
> > +			range.default_flags |= HMM_PFN_REQ_WRITE;
> > +
> > +		ret = hmm_range_fault(&range);
> > +		if (ret)
> > +			break;
> > +
> > +		start = range.end + 1;
> 
> Since range.end is exclusive, the +1 should not be done.
> 

Is it always? I'll need to check to make sure the end passed to this
function is page aligned. If it is, then I'll remove the +1.

> > +		pfns += DIV_ROUND_UP(range.end - range.start, PAGE_SIZE);
> 
> Just to confirm, range.end and range.start should always be page aligned,
> right? So the ROUND_UP should never kick in.
> 

Same as above: if the end passed to this function is page aligned, then
I'll remove the DIV_ROUND_UP and just do a simple division.

Thanks,
Stanislav

> > +	}
> >  	mmap_read_unlock(region->mreg_mni.mm);
> >  	if (ret)
> >  		return ret;
> > 
> >  	mutex_lock(&region->mreg_mutex);
> > 
> > -	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
> > +	if (mmu_interval_read_retry(range.notifier, range.notifier_seq)) {
> >  		mutex_unlock(&region->mreg_mutex);
> >  		cond_resched();
> >  		return -EBUSY;
> > @@ -546,10 +581,7 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >  static int mshv_region_range_fault(struct mshv_mem_region *region,
> >  				   u64 pfn_offset, u64 pfn_count)
> >  {
> > -	struct hmm_range range = {
> > -		.notifier = &region->mreg_mni,
> > -		.default_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
> > -	};
> > +	unsigned long start, end;
> >  	unsigned long *pfns;
> >  	int ret;
> >  	u64 i;
> > @@ -558,12 +590,12 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> >  	if (!pfns)
> >  		return -ENOMEM;
> > 
> > -	range.hmm_pfns = pfns;
> > -	range.start = region->start_uaddr + pfn_offset * HV_HYP_PAGE_SIZE;
> > -	range.end = range.start + pfn_count * HV_HYP_PAGE_SIZE;
> > +	start = region->start_uaddr + pfn_offset * PAGE_SIZE;
> > +	end = start + pfn_count * PAGE_SIZE;
> > 
> >  	do {
> > -		ret = mshv_region_hmm_fault_and_lock(region, &range);
> > +		ret = mshv_region_hmm_fault_and_lock(region, start, end,
> > +						     pfns);
> >  	} while (ret == -EBUSY);
> > 
> >  	if (ret)
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/7] mshv: Map populated pages on movable region creation
  2026-04-13 21:09   ` Michael Kelley
@ 2026-04-20 16:35     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 16:35 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 13, 2026 at 09:09:08PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:05 PM
> > 
> > Map any populated pages into the hypervisor upfront when creating a
> > movable region, rather than waiting for faults. Previously, movable
> > regions were created with all pages marked as HV_MAP_GPA_NO_ACCESS
> > regardless of whether the userspace mapping contained populated pages.
> > 
> > This guarantees that if the caller passes a populated mapping, those
> > present pages will be mapped into the hypervisor immediately during
> > region creation instead of being faulted in later.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c   |   65 ++++++++++++++++++++++++++++++++-----------
> >  drivers/hv/mshv_root.h      |    1 +
> >  drivers/hv/mshv_root_main.c |   10 +------
> >  3 files changed, 50 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index 133ec7771812..28d3f488d89f 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -519,7 +519,8 @@ int mshv_region_get(struct mshv_mem_region *region)
> >  static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >  					  unsigned long start,
> >  					  unsigned long end,
> > -					  unsigned long *pfns)
> > +					  unsigned long *pfns,
> > +					  bool do_fault)
> >  {
> >  	struct hmm_range range = {
> >  		.notifier = &region->mreg_mni,
> > @@ -540,9 +541,12 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >  		range.hmm_pfns = pfns;
> >  		range.start = start;
> >  		range.end = min(vma->vm_end, end);
> > -		range.default_flags = HMM_PFN_REQ_FAULT;
> > -		if (vma->vm_flags & VM_WRITE)
> > -			range.default_flags |= HMM_PFN_REQ_WRITE;
> > +		range.default_flags = 0;
> > +		if (do_fault) {
> > +			range.default_flags = HMM_PFN_REQ_FAULT;
> > +			if (vma->vm_flags & VM_WRITE)
> > +				range.default_flags |= HMM_PFN_REQ_WRITE;
> > +		}
> > 
> >  		ret = hmm_range_fault(&range);
> >  		if (ret)
> > @@ -567,26 +571,40 @@ static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
> >  }
> > 
> >  /**
> > - * mshv_region_range_fault - Handle memory range faults for a given region.
> > - * @region: Pointer to the memory region structure.
> > - * @pfn_offset: Offset of the page within the region.
> > - * @pfn_count: Number of pages to handle.
> > + * mshv_region_collect_and_map - Collect PFNs for a user range and map them
> > + * @region    : memory region being processed
> > + * @pfn_offset: PFNs offset within the region
> > + * @pfn_count : number of PFNs to process
> > + * @do_fault  : if true, fault in missing pages;
> > + *              if false, collect only present pages
> >   *
> > - * This function resolves memory faults for a specified range of pages
> > - * within a memory region. It uses HMM (Heterogeneous Memory Management)
> > - * to fault in the required pages and updates the region's page array.
> > + * Collects PFNs for the specified portion of @region from the
> > + * corresponding userspace VMA and maps them into the hypervisor. The
> 
> Actually, this should be "userspace VMAs" (i.e., plural)
> 

Will change.

> > + * behavior depends on @do_fault:
> >   *
> > - * Return: 0 on success, negative error code on failure.
> > + * - true: Fault in missing pages from userspace, ensuring all pages in the
> > + *   range are present. Used for on-demand page population.
> > + * - false: Collect PFNs only for pages already present in userspace,
> > + *   leaving missing pages as invalid PFN markers.
> > + *   Used for initial region setup.
> > + *
> > + * Collected PFNs are stored in region->mreg_pfns[] with HMM bookkeeping
> > + * flags cleared, then the range is mapped into the hypervisor. Present
> > + * PFNs get mapped with region access permissions; missing PFNs (zero
> > + * entries) get mapped with no-access permissions.
> 
> Hmmm. The missing PFNs are just skipped and the mreg_pfns[] array
> is not updated. Is the corresponding entry in mreg_pfns[] known to
> already be set to MSHV_INVALID_PFN? When mapping a new movable
> region, that appears to be so. I'm less sure about the 
> mshv_region_range_fault() case, though mshv_region_invalidate_pfns()
> does such initialization of any entries that are invalidated. At that point
> in the code, I'd add a comment about that assumption, as it took me a
> bit to figure it out.
> 

This logic is called for movable regions only.
Should this be mentioned in the comment from your POV?

> So does the comment about "zero entries" refer to what is returned
> by hmm_range_fault() via mshv_region_hmm_fault_and_lock()?
> The mention of "zero entries" here is a bit confusing.
> 

"Zero entries" should be changed to invalid PFN markers, which are
defined as MSHV_INVALID_PFN. I'll update the comment to clarify that.

Thanks,
Stanislav

> > + *
> > + * Return: 0 on success, negative errno on failure.
> >   */
> > -static int mshv_region_range_fault(struct mshv_mem_region *region,
> > -				   u64 pfn_offset, u64 pfn_count)
> > +static int mshv_region_collect_and_map(struct mshv_mem_region *region,
> > +				       u64 pfn_offset, u64 pfn_count,
> > +				       bool do_fault)
> >  {
> >  	unsigned long start, end;
> >  	unsigned long *pfns;
> >  	int ret;
> >  	u64 i;
> > 
> > -	pfns = kmalloc_array(pfn_count, sizeof(*pfns), GFP_KERNEL);
> > +	pfns = vmalloc_array(pfn_count, sizeof(unsigned long));
> >  	if (!pfns)
> >  		return -ENOMEM;
> > 
> > @@ -595,7 +613,7 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> > 
> >  	do {
> >  		ret = mshv_region_hmm_fault_and_lock(region, start, end,
> > -						     pfns);
> > +						     pfns, do_fault);
> >  	} while (ret == -EBUSY);
> > 
> >  	if (ret)
> > @@ -613,10 +631,17 @@ static int mshv_region_range_fault(struct mshv_mem_region *region,
> > 
> >  	mutex_unlock(&region->mreg_mutex);
> >  out:
> > -	kfree(pfns);
> > +	vfree(pfns);
> >  	return ret;
> >  }
> > 
> > +static int mshv_region_range_fault(struct mshv_mem_region *region,
> > +				   u64 pfn_offset, u64 pfn_count)
> > +{
> > +	return mshv_region_collect_and_map(region, pfn_offset, pfn_count,
> > +					   true);
> > +}
> > +
> >  bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
> >  {
> >  	u64 pfn_offset, pfn_count;
> > @@ -800,3 +825,9 @@ int mshv_map_pinned_region(struct mshv_mem_region
> > *region)
> >  err_out:
> >  	return ret;
> >  }
> > +
> > +int mshv_map_movable_region(struct mshv_mem_region *region)
> > +{
> > +	return mshv_region_collect_and_map(region, 0, region->nr_pfns,
> > +					   false);
> > +}
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index d2e65a137bf4..02c1c11f701c 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -374,5 +374,6 @@ bool mshv_region_handle_gfn_fault(struct mshv_mem_region
> > *region, u64 gfn);
> >  void mshv_region_movable_fini(struct mshv_mem_region *region);
> >  bool mshv_region_movable_init(struct mshv_mem_region *region);
> >  int mshv_map_pinned_region(struct mshv_mem_region *region);
> > +int mshv_map_movable_region(struct mshv_mem_region *region);
> > 
> >  #endif /* _MSHV_ROOT_H_ */
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index c393b5144e0b..91dab2a3bc92 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1299,15 +1299,7 @@ mshv_map_user_memory(struct mshv_partition
> > *partition,
> >  		ret = mshv_map_pinned_region(region);
> >  		break;
> >  	case MSHV_REGION_TYPE_MEM_MOVABLE:
> > -		/*
> > -		 * For movable memory regions, remap with no access to let
> > -		 * the hypervisor track dirty pages, enabling pre-copy live
> > -		 * migration.
> > -		 */
> > -		ret = hv_call_map_ram_pfns(partition->pt_id,
> > -					   region->start_gfn,
> > -					   region->nr_pfns,
> > -					   HV_MAP_GPA_NO_ACCESS, NULL);
> > +		ret = mshv_map_movable_region(region);
> >  		break;
> >  	case MSHV_REGION_TYPE_MMIO:
> >  		ret = hv_call_map_mmio_pfns(partition->pt_id,
> > 
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/7] mshv: Refactor memory region management and map pages at creation
  2026-04-13 21:07 ` [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Michael Kelley
@ 2026-04-20 16:40   ` Stanislav Kinsburskii
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 16:40 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 13, 2026 at 09:07:59PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > 
> > This series refactors the mshv memory region subsystem in preparation
> > for mapping populated pages into the hypervisor at movable region
> > creation time, rather than relying solely on demand faulting.
> > 
> > The primary motivation is to ensure that when userspace passes a
> > pre-populated mapping for a movable memory region, those pages are
> > immediately visible to the hypervisor. Previously, all movable regions
> > were created with HV_MAP_GPA_NO_ACCESS on every page regardless of
> > whether the backing pages were already present, deferring all mapping
> > to the fault handler. This added unnecessary fault overhead and
> > complicated the initial setup of child partitions with pre-populated
> > memory.
> > 
> 
> This is a nice set of changes. Independent of the new functionality
> for pre-populating, it improves the code organization and makes
> it more regular.
> 
> See a few comments on individual patches. I noticed that Sashiko
> wasn't able to review the series because it wouldn't apply. Hopefully
> your v2 will apply. From what I've seen so far of Sashiko, it finds some
> good issues. I did run the patch set through Co-Pilot, but that didn't
> have the benefit of the AI prompts that Sashiko provides.
> 

Thank you for your time.
Indeed, hopefully sashiko will be able to review the v2.

Thanks,
Stanislav

> Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 1/7] mshv: Convert from page pointers to PFNs
  2026-04-20 16:21     ` Stanislav Kinsburskii
@ 2026-04-20 17:18       ` Michael Kelley
  2026-04-20 23:45         ` Stanislav Kinsburskii
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Kelley @ 2026-04-20 17:18 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, April 20, 2026 9:22 AM
> 
> On Mon, Apr 13, 2026 at 09:08:16PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > >

[snip]

> > > @@ -57,60 +58,61 @@ static int mshv_chunk_stride(struct page *page,
> > >  /**
> > >   * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> > >   *                             in a region.
> > > - * @region     : Pointer to the memory region structure.
> > > - * @flags      : Flags to pass to the handler.
> > > - * @page_offset: Offset into the region's pages array to start processing.
> > > - * @page_count : Number of pages to process.
> > > - * @handler    : Callback function to handle the chunk.
> > > + * @region    : Pointer to the memory region structure.
> > > + * @flags     : Flags to pass to the handler.
> > > + * @pfn_offset: Offset into the region's PFNs array to start processing.
> > > + * @pfn_count : Number of PFNs to process.
> > > + * @handler   : Callback function to handle the chunk.
> > >   *
> > > - * This function scans the region's pages starting from @page_offset,
> > > - * checking for contiguous present pages of the same size (normal or huge).
> > > - * It invokes @handler for the chunk of contiguous pages found. Returns the
> > > - * number of pages handled, or a negative error code if the first page is
> > > - * not present or the handler fails.
> > > + * This function scans the region's PFNs starting from @pfn_offset,
> > > + * checking for contiguous valid PFNs backed by pages of the same size
> > > + * (normal or huge). It invokes @handler for the chunk of contiguous valid
> > > + * PFNs found. Returns the number of PFNs handled, or a negative error code
> > > + * if the first PFN is invalid or the handler fails.
> > >   *
> > > - * Note: The @handler callback must be able to handle both normal and huge
> > > - * pages.
> > > + * Note: The @handler callback must be able to handle valid PFNs backed by
> > > + * both normal and huge pages.
> > >   *
> > >   * Return: Number of pages handled, or negative error code.
> > >   */
> > > -static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > > -				      u32 flags,
> > > -				      u64 page_offset, u64 page_count,
> > > -				      int (*handler)(struct mshv_mem_region *region,
> > > -						     u32 flags,
> > > -						     u64 page_offset,
> > > -						     u64 page_count,
> > > -						     bool huge_page))
> > > +static long mshv_region_process_pfns(struct mshv_mem_region *region,
> > > +				     u32 flags,
> > > +				     u64 pfn_offset, u64 pfn_count,
> > > +				     int (*handler)(struct mshv_mem_region *region,
> > > +						    u32 flags,
> > > +						    u64 pfn_offset,
> > > +						    u64 pfn_count,
> > > +						    bool huge_page))
> > >  {
> > > -	u64 gfn = region->start_gfn + page_offset;
> > > +	u64 gfn = region->start_gfn + pfn_offset;
> > >  	u64 count;
> > > -	struct page *page;
> > > +	unsigned long pfn;
> > >  	int stride, ret;
> > >
> > > -	page = region->mreg_pages[page_offset];
> > > -	if (!page)
> > > +	pfn = region->mreg_pfns[pfn_offset];
> > > +	if (!pfn_valid(pfn))
> > >  		return -EINVAL;
> > >
> > > -	stride = mshv_chunk_stride(page, gfn, page_count);
> > > +	stride = mshv_chunk_stride(pfn_to_page(pfn), gfn, pfn_count);
> > >  	if (stride < 0)
> > >  		return stride;
> > >
> > >  	/* Start at stride since the first stride is validated */
> > > -	for (count = stride; count < page_count; count += stride) {
> > > -		page = region->mreg_pages[page_offset + count];
> > > +	for (count = stride; count < pfn_count ; count += stride) {
> > > +		pfn = region->mreg_pfns[pfn_offset + count];
> > >
> > > -		/* Break if current page is not present */
> > > -		if (!page)
> > > +		/* Break if current pfn is invalid */
> > > +		if (!pfn_valid(pfn))
> >
> > pfn_valid() is a relatively expensive test to be doing in a loop
> > on what may be every single page. It does an RCU lock/unlock
> > and make other checks that aren't necessary here. Since
> > mreg_pfns[] is populated from mm calls, the only invalid PFNs
> > would be MSHV_INVALID_PFN that code in this module has
> > explicitly put there. Just testing against MSHV_INVALID_PFN
> > would be a lot faster here and elsewhere in this module. It's
> > really a "pfn set/not set" test. Defining a pfn_set() macro
> > here in this module that tests against MSHV_INVALID_PFN
> > would accomplish the same thing more efficiently.
> >
> 
> Yes, we could do it the way you suggest. For completeness, I should add
> that pfn_valid() is expensive only on 32-bit ARM and ARC, which we
> don’t care about.
> 

Could you elaborate? On x86, I'm seeing that pfn_valid() is about
220 bytes of code. It's the version in include/linux/mmzone.h, not
the simple version in include/asm-generic/memory_model.h. The
latter is used only for CONFIG_FLATMEM=y. Or is the root partition
kernel build setting CONFIG_FLATMEM_MANUAL and hence getting
the simple version?

Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/7] mshv: Convert from page pointers to PFNs
  2026-04-20 17:18       ` Michael Kelley
@ 2026-04-20 23:45         ` Stanislav Kinsburskii
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-20 23:45 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On Mon, Apr 20, 2026 at 05:18:10PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, April 20, 2026 9:22 AM
> > 
> > On Mon, Apr 13, 2026 at 09:08:16PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, March 30, 2026 1:04 PM
> > > >
> 
> [snip]
> 
> > > > @@ -57,60 +58,61 @@ static int mshv_chunk_stride(struct page *page,
> > > >  /**
> > > >   * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> > > >   *                             in a region.
> > > > - * @region     : Pointer to the memory region structure.
> > > > - * @flags      : Flags to pass to the handler.
> > > > - * @page_offset: Offset into the region's pages array to start processing.
> > > > - * @page_count : Number of pages to process.
> > > > - * @handler    : Callback function to handle the chunk.
> > > > + * @region    : Pointer to the memory region structure.
> > > > + * @flags     : Flags to pass to the handler.
> > > > + * @pfn_offset: Offset into the region's PFNs array to start processing.
> > > > + * @pfn_count : Number of PFNs to process.
> > > > + * @handler   : Callback function to handle the chunk.
> > > >   *
> > > > - * This function scans the region's pages starting from @page_offset,
> > > > - * checking for contiguous present pages of the same size (normal or huge).
> > > > - * It invokes @handler for the chunk of contiguous pages found. Returns the
> > > > - * number of pages handled, or a negative error code if the first page is
> > > > - * not present or the handler fails.
> > > > + * This function scans the region's PFNs starting from @pfn_offset,
> > > > + * checking for contiguous valid PFNs backed by pages of the same size
> > > > + * (normal or huge). It invokes @handler for the chunk of contiguous valid
> > > > + * PFNs found. Returns the number of PFNs handled, or a negative error code
> > > > + * if the first PFN is invalid or the handler fails.
> > > >   *
> > > > - * Note: The @handler callback must be able to handle both normal and huge
> > > > - * pages.
> > > > + * Note: The @handler callback must be able to handle valid PFNs backed by
> > > > + * both normal and huge pages.
> > > >   *
> > > >   * Return: Number of pages handled, or negative error code.
> > > >   */
> > > > -static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > > > -				      u32 flags,
> > > > -				      u64 page_offset, u64 page_count,
> > > > -				      int (*handler)(struct mshv_mem_region *region,
> > > > -						     u32 flags,
> > > > -						     u64 page_offset,
> > > > -						     u64 page_count,
> > > > -						     bool huge_page))
> > > > +static long mshv_region_process_pfns(struct mshv_mem_region *region,
> > > > +				     u32 flags,
> > > > +				     u64 pfn_offset, u64 pfn_count,
> > > > +				     int (*handler)(struct mshv_mem_region *region,
> > > > +						    u32 flags,
> > > > +						    u64 pfn_offset,
> > > > +						    u64 pfn_count,
> > > > +						    bool huge_page))
> > > >  {
> > > > -	u64 gfn = region->start_gfn + page_offset;
> > > > +	u64 gfn = region->start_gfn + pfn_offset;
> > > >  	u64 count;
> > > > -	struct page *page;
> > > > +	unsigned long pfn;
> > > >  	int stride, ret;
> > > >
> > > > -	page = region->mreg_pages[page_offset];
> > > > -	if (!page)
> > > > +	pfn = region->mreg_pfns[pfn_offset];
> > > > +	if (!pfn_valid(pfn))
> > > >  		return -EINVAL;
> > > >
> > > > -	stride = mshv_chunk_stride(page, gfn, page_count);
> > > > +	stride = mshv_chunk_stride(pfn_to_page(pfn), gfn, pfn_count);
> > > >  	if (stride < 0)
> > > >  		return stride;
> > > >
> > > >  	/* Start at stride since the first stride is validated */
> > > > -	for (count = stride; count < page_count; count += stride) {
> > > > -		page = region->mreg_pages[page_offset + count];
> > > > +	for (count = stride; count < pfn_count ; count += stride) {
> > > > +		pfn = region->mreg_pfns[pfn_offset + count];
> > > >
> > > > -		/* Break if current page is not present */
> > > > -		if (!page)
> > > > +		/* Break if current pfn is invalid */
> > > > +		if (!pfn_valid(pfn))
> > >
> > > pfn_valid() is a relatively expensive test to be doing in a loop
> > > on what may be every single page. It does an RCU lock/unlock
> > > and make other checks that aren't necessary here. Since
> > > mreg_pfns[] is populated from mm calls, the only invalid PFNs
> > > would be MSHV_INVALID_PFN that code in this module has
> > > explicitly put there. Just testing against MSHV_INVALID_PFN
> > > would be a lot faster here and elsewhere in this module. It's
> > > really a "pfn set/not set" test. Defining a pfn_set() macro
> > > here in this module that tests against MSHV_INVALID_PFN
> > > would accomplish the same thing more efficiently.
> > >
> > 
> > Yes, we could do it the way you suggest. For completeness, I should add
> > that pfn_valid() is expensive only on 32-bit ARM and ARC, which we
> > don’t care about.
> > 
> 
> Could you elaborate? On x86, I'm seeing that pfn_valid() is about
> 220 bytes of code. It's the version in include/linux/mmzone.h, not
> the simple version in include/asm-generic/memory_model.h. The
> latter is used only for CONFIG_FLATMEM=y. Or is the root partition
> kernel build setting CONFIG_FLATMEM_MANUAL and hence getting
> the simple version?
> 

I was wrong: this long function is indeed compiled for x86.
Still, it's not big of a runtime impact as taking the rcu lock is cheap,
but I'll simplify as proposed.

Thanks,
Stanislav

> Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-04-20 23:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 20:04 [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 1/7] mshv: Convert from page pointers to PFNs Stanislav Kinsburskii
2026-04-13 21:08   ` Michael Kelley
2026-04-20 16:21     ` Stanislav Kinsburskii
2026-04-20 17:18       ` Michael Kelley
2026-04-20 23:45         ` Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 2/7] mshv: Add support to address range holes remapping Stanislav Kinsburskii
2026-04-13 21:08   ` Michael Kelley
2026-04-20 16:24     ` Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 3/7] mshv: Support regions with different VMAs Stanislav Kinsburskii
2026-04-13 21:08   ` Michael Kelley
2026-04-20 16:29     ` Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 4/7] mshv: Move pinned region setup to mshv_regions.c Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 5/7] mshv: Map populated pages on movable region creation Stanislav Kinsburskii
2026-04-13 21:09   ` Michael Kelley
2026-04-20 16:35     ` Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 6/7] mshv: Extract MMIO region mapping into separate function Stanislav Kinsburskii
2026-03-30 20:04 ` [PATCH 7/7] mshv: Add tracepoint for map GPA hypercall Stanislav Kinsburskii
2026-04-13 21:07 ` [PATCH 0/7] mshv: Refactor memory region management and map pages at creation Michael Kelley
2026-04-20 16:40   ` Stanislav Kinsburskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox