linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Introduce movable pages for Hyper-V guests
@ 2025-10-06 15:06 Stanislav Kinsburskii
  2025-10-06 15:06 ` [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

From the start, the root-partition driver allocates, pins, and maps all
guest memory into the hypervisor at guest creation. This is simple: Linux
cannot move the pages, so the guest’s view in Linux and in Microsoft
Hypervisor never diverges.

However, this approach has major drawbacks:
 - NUMA: affinity can’t be changed at runtime, so you can’t migrate guest memory closer to the CPUs running it → performance hit.
 - Memory management: unused guest memory can’t be swapped out, compacted, or merged.
 - Provisioning time: upfront allocation/pinning slows guest create/destroy.
 - Overcommit: no memory overcommit on hosts with pinned-guest memory.

This series adds movable memory pages for Hyper-V child partitions. Guest
pages are no longer allocated upfront; they’re allocated and mapped into
the hypervisor on demand (i.e., when the guest touches a GFN that isn’t yet
backed by a host PFN).
When a page is moved, Linux no longer holds it and it is unmapped from the hypervisor.
As a result, Hyper-V guests behave like regular Linux processes, enabling standard Linux memory features to apply to guests.

Exceptions (still pinned):
 1. Encrypted guests (explicit).
 2. Guests with passthrough devices (implicitly pinned by the VFIO framework).

v4:
 - Fix a bug in batch unmapping can skip mapped pages when selecting a new
   batch due to wrong offset calculation.
 - Fix an error message in case of failed memory region pinning.

v3:
 - Region is invalidated even if the mm has no users.
 - Page remapping logic is updated to support 2M-unaligned remappings for
   regions that are PMD-aligned, which can occur during both faults and
   invalidations.

v2:
 - Split unmap batching into a separate patch.
 - Fixed commit messages from v1 review.
 - Renamed a few functions for clarity.

---

Stanislav Kinsburskii (5):
      Drivers: hv: Refactor and rename memory region handling functions
      Drivers: hv: Centralize guest memory region destruction
      Drivers: hv: Batch GPA unmap operations to improve large region performance
      Drivers: hv: Ensure large page GPA mapping is PMD-aligned
      Drivers: hv: Add support for movable memory regions


 drivers/hv/Kconfig             |    1 
 drivers/hv/mshv_root.h         |   10 +
 drivers/hv/mshv_root_hv_call.c |    2 
 drivers/hv/mshv_root_main.c    |  495 +++++++++++++++++++++++++++++++++-------
 4 files changed, 424 insertions(+), 84 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions
  2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
@ 2025-10-06 15:06 ` Stanislav Kinsburskii
  2025-10-10 22:49   ` Nuno Das Neves
  2025-10-06 15:06 ` [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Simplify and unify memory region management to improve code clarity and
reliability. Consolidate pinning and invalidation logic, adopt consistent
naming, and remove redundant checks to reduce complexity.

Enhance documentation and update call sites for maintainability.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |   80 +++++++++++++++++++------------------------
 1 file changed, 36 insertions(+), 44 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index fa42c40e1e02f..e923947d3c54d 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1120,8 +1120,8 @@ mshv_region_map(struct mshv_mem_region *region)
 }
 
 static void
-mshv_region_evict_pages(struct mshv_mem_region *region,
-			u64 page_offset, u64 page_count)
+mshv_region_invalidate_pages(struct mshv_mem_region *region,
+			     u64 page_offset, u64 page_count)
 {
 	if (region->flags.range_pinned)
 		unpin_user_pages(region->pages + page_offset, page_count);
@@ -1131,29 +1131,24 @@ mshv_region_evict_pages(struct mshv_mem_region *region,
 }
 
 static void
-mshv_region_evict(struct mshv_mem_region *region)
+mshv_region_invalidate(struct mshv_mem_region *region)
 {
-	mshv_region_evict_pages(region, 0, region->nr_pages);
+	mshv_region_invalidate_pages(region, 0, region->nr_pages);
 }
 
 static int
-mshv_region_populate_pages(struct mshv_mem_region *region,
-			   u64 page_offset, u64 page_count)
+mshv_region_pin(struct mshv_mem_region *region)
 {
 	u64 done_count, nr_pages;
 	struct page **pages;
 	__u64 userspace_addr;
 	int ret;
 
-	if (page_offset + page_count > region->nr_pages)
-		return -EINVAL;
-
-	for (done_count = 0; done_count < page_count; done_count += ret) {
-		pages = region->pages + page_offset + done_count;
+	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
+		pages = region->pages + done_count;
 		userspace_addr = region->start_uaddr +
-				(page_offset + done_count) *
-				HV_HYP_PAGE_SIZE;
-		nr_pages = min(page_count - done_count,
+				 done_count * HV_HYP_PAGE_SIZE;
+		nr_pages = min(region->nr_pages - done_count,
 			       MSHV_PIN_PAGES_BATCH_SIZE);
 
 		/*
@@ -1164,34 +1159,23 @@ mshv_region_populate_pages(struct mshv_mem_region *region,
 		 * with the FOLL_LONGTERM flag does a large temporary
 		 * allocation of contiguous memory.
 		 */
-		if (region->flags.range_pinned)
-			ret = pin_user_pages_fast(userspace_addr,
-						  nr_pages,
-						  FOLL_WRITE | FOLL_LONGTERM,
-						  pages);
-		else
-			ret = -EOPNOTSUPP;
-
+		ret = pin_user_pages_fast(userspace_addr, nr_pages,
+					  FOLL_WRITE | FOLL_LONGTERM,
+					  pages);
 		if (ret < 0)
 			goto release_pages;
 	}
 
-	if (PageHuge(region->pages[page_offset]))
+	if (PageHuge(region->pages[0]))
 		region->flags.large_pages = true;
 
 	return 0;
 
 release_pages:
-	mshv_region_evict_pages(region, page_offset, done_count);
+	mshv_region_invalidate_pages(region, 0, done_count);
 	return ret;
 }
 
-static int
-mshv_region_populate(struct mshv_mem_region *region)
-{
-	return mshv_region_populate_pages(region, 0, region->nr_pages);
-}
-
 static struct mshv_mem_region *
 mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 {
@@ -1264,19 +1248,27 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	return 0;
 }
 
-/*
- * Map guest ram. if snp, make sure to release that from the host first
- * Side Effects: In case of failure, pages are unpinned when feasible.
+/**
+ * mshv_prepare_pinned_region - Pin and map memory regions
+ * @region: Pointer to the memory region structure
+ *
+ * This function processes memory regions that are explicitly marked as pinned.
+ * Pinned regions are preallocated, mapped upfront, and do not rely on fault-based
+ * population. The function ensures the region is properly populated, handles
+ * encryption requirements for SNP partitions if applicable, maps the region,
+ * and performs necessary sharing or eviction operations based on the mapping
+ * result.
+ *
+ * Return: 0 on success, negative error code on failure.
  */
-static int
-mshv_partition_mem_region_map(struct mshv_mem_region *region)
+static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 	int ret;
 
-	ret = mshv_region_populate(region);
+	ret = mshv_region_pin(region);
 	if (ret) {
-		pt_err(partition, "Failed to populate memory region: %d\n",
+		pt_err(partition, "Failed to pin memory region: %d\n",
 		       ret);
 		goto err_out;
 	}
@@ -1294,7 +1286,7 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 			pt_err(partition,
 			       "Failed to unshare memory region (guest_pfn: %llu): %d\n",
 			       region->start_gfn, ret);
-			goto evict_region;
+			goto invalidate_region;
 		}
 	}
 
@@ -1304,7 +1296,7 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 
 		shrc = mshv_partition_region_share(region);
 		if (!shrc)
-			goto evict_region;
+			goto invalidate_region;
 
 		pt_err(partition,
 		       "Failed to share memory region (guest_pfn: %llu): %d\n",
@@ -1318,8 +1310,8 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 
 	return 0;
 
-evict_region:
-	mshv_region_evict(region);
+invalidate_region:
+	mshv_region_invalidate(region);
 err_out:
 	return ret;
 }
@@ -1368,7 +1360,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		ret = hv_call_map_mmio_pages(partition->pt_id, mem.guest_pfn,
 					     mmio_pfn, HVPFN_DOWN(mem.size));
 	else
-		ret = mshv_partition_mem_region_map(region);
+		ret = mshv_prepare_pinned_region(region);
 
 	if (ret)
 		goto errout;
@@ -1413,7 +1405,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
 				region->nr_pages, unmap_flags);
 
-	mshv_region_evict(region);
+	mshv_region_invalidate(region);
 
 	vfree(region);
 	return 0;
@@ -1827,7 +1819,7 @@ static void destroy_partition(struct mshv_partition *partition)
 			}
 		}
 
-		mshv_region_evict(region);
+		mshv_region_invalidate(region);
 
 		vfree(region);
 	}



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction
  2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
  2025-10-06 15:06 ` [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
@ 2025-10-06 15:06 ` Stanislav Kinsburskii
  2025-10-10 22:54   ` Nuno Das Neves
  2025-10-06 15:06 ` [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance Stanislav Kinsburskii
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Centralize guest memory region destruction to prevent resource leaks and
inconsistent cleanup across unmap and partition destruction paths.

Unify region removal, encrypted partition access recovery, and region
invalidation to improve maintainability and reliability. Reduce code
duplication and make future updates less error-prone by encapsulating
cleanup logic in a single helper.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |   65 ++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index e923947d3c54d..97e322f3c6b5e 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1375,13 +1375,42 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	return ret;
 }
 
+static void mshv_partition_destroy_region(struct mshv_mem_region *region)
+{
+	struct mshv_partition *partition = region->partition;
+	u32 unmap_flags = 0;
+	int ret;
+
+	hlist_del(&region->hnode);
+
+	if (mshv_partition_encrypted(partition)) {
+		ret = mshv_partition_region_share(region);
+		if (ret) {
+			pt_err(partition,
+			       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
+			       ret);
+			return;
+		}
+	}
+
+	if (region->flags.large_pages)
+		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
+
+	/* ignore unmap failures and continue as process may be exiting */
+	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
+				region->nr_pages, unmap_flags);
+
+	mshv_region_invalidate(region);
+
+	vfree(region);
+}
+
 /* Called for unmapping both the guest ram and the mmio space */
 static long
 mshv_unmap_user_memory(struct mshv_partition *partition,
 		       struct mshv_user_mem_region mem)
 {
 	struct mshv_mem_region *region;
-	u32 unmap_flags = 0;
 
 	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
 		return -EINVAL;
@@ -1396,18 +1425,8 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	    region->nr_pages != HVPFN_DOWN(mem.size))
 		return -EINVAL;
 
-	hlist_del(&region->hnode);
+	mshv_partition_destroy_region(region);
 
-	if (region->flags.large_pages)
-		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
-
-	/* ignore unmap failures and continue as process may be exiting */
-	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
-				region->nr_pages, unmap_flags);
-
-	mshv_region_invalidate(region);
-
-	vfree(region);
 	return 0;
 }
 
@@ -1743,8 +1762,8 @@ static void destroy_partition(struct mshv_partition *partition)
 {
 	struct mshv_vp *vp;
 	struct mshv_mem_region *region;
-	int i, ret;
 	struct hlist_node *n;
+	int i;
 
 	if (refcount_read(&partition->pt_ref_count)) {
 		pt_err(partition,
@@ -1804,25 +1823,9 @@ static void destroy_partition(struct mshv_partition *partition)
 
 	remove_partition(partition);
 
-	/* Remove regions, regain access to the memory and unpin the pages */
 	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
-				  hnode) {
-		hlist_del(&region->hnode);
-
-		if (mshv_partition_encrypted(partition)) {
-			ret = mshv_partition_region_share(region);
-			if (ret) {
-				pt_err(partition,
-				       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
-				      ret);
-				return;
-			}
-		}
-
-		mshv_region_invalidate(region);
-
-		vfree(region);
-	}
+				  hnode)
+		mshv_partition_destroy_region(region);
 
 	/* Withdraw and free all pages we deposited */
 	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance
  2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
  2025-10-06 15:06 ` [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
  2025-10-06 15:06 ` [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
@ 2025-10-06 15:06 ` Stanislav Kinsburskii
  2025-10-06 17:09   ` Michael Kelley
  2025-10-10 23:06   ` Nuno Das Neves
  2025-10-06 15:06 ` [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned Stanislav Kinsburskii
  2025-10-06 15:06 ` [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
  4 siblings, 2 replies; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Reduce overhead when unmapping large memory regions by batching GPA unmap
operations in 2MB-aligned chunks.

Use a dedicated constant for batch size to improve code clarity and
maintainability.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root.h         |    2 ++
 drivers/hv/mshv_root_hv_call.c |    2 +-
 drivers/hv/mshv_root_main.c    |   28 +++++++++++++++++++++++++---
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index e3931b0f12693..97e64d5341b6e 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
 
 #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
 
+#define MSHV_MAX_UNMAP_GPA_PAGES	512
+
 struct mshv_vp {
 	u32 vp_index;
 	struct mshv_partition *vp_partition;
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index c9c274f29c3c6..0696024ccfe31 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -17,7 +17,7 @@
 /* Determined empirically */
 #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
 #define HV_MAP_GPA_DEPOSIT_PAGES	256
-#define HV_UMAP_GPA_PAGES		512
+#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
 
 #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
 
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 97e322f3c6b5e..b61bef6b9c132 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 static void mshv_partition_destroy_region(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
+	u64 gfn, gfn_count, start_gfn, end_gfn;
 	u32 unmap_flags = 0;
 	int ret;
 
@@ -1396,9 +1397,30 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
 	if (region->flags.large_pages)
 		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
 
-	/* ignore unmap failures and continue as process may be exiting */
-	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
-				region->nr_pages, unmap_flags);
+	start_gfn = region->start_gfn;
+	end_gfn = region->start_gfn + region->nr_pages;
+
+	for (gfn = start_gfn; gfn < end_gfn; gfn += gfn_count) {
+		if (gfn % MSHV_MAX_UNMAP_GPA_PAGES)
+			gfn_count = ALIGN(gfn, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
+		else
+			gfn_count = MSHV_MAX_UNMAP_GPA_PAGES;
+
+		if (gfn + gfn_count > end_gfn)
+			gfn_count = end_gfn - gfn;
+
+		/* Skip if all pages in this range if none is mapped */
+		if (!memchr_inv(region->pages + (gfn - start_gfn), 0,
+				gfn_count * sizeof(struct page *)))
+			continue;
+
+		ret = hv_call_unmap_gpa_pages(partition->pt_id, gfn,
+					      gfn_count, unmap_flags);
+		if (ret)
+			pt_err(partition,
+			       "Failed to unmap GPA pages %#llx-%#llx: %d\n",
+			       gfn, gfn + gfn_count - 1, ret);
+	}
 
 	mshv_region_invalidate(region);
 



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned
  2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (2 preceding siblings ...)
  2025-10-06 15:06 ` [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance Stanislav Kinsburskii
@ 2025-10-06 15:06 ` Stanislav Kinsburskii
  2025-10-10 23:07   ` Nuno Das Neves
  2025-10-06 15:06 ` [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
  4 siblings, 1 reply; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

With the upcoming introduction of movable pages, a region doesn't guarantee
always having large pages mapped. Both mapping on fault and unmapping
during PTE invalidation may not be 2M-aligned, while the hypervisor
requires both the GFN and page count to be 2M-aligned to use the large page
flag.

Update the logic for large page mapping in mshv_region_remap_pages() to
require both page_offset and page_count to be PMD-aligned.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index b61bef6b9c132..cb462495f34b5 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -34,6 +34,8 @@
 #include "mshv.h"
 #include "mshv_root.h"
 
+#define VALUE_PMD_ALIGNED(c)			(!((c) & (PTRS_PER_PMD - 1)))
+
 MODULE_AUTHOR("Microsoft");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
@@ -1100,7 +1102,9 @@ mshv_region_remap_pages(struct mshv_mem_region *region, u32 map_flags,
 	if (page_offset + page_count > region->nr_pages)
 		return -EINVAL;
 
-	if (region->flags.large_pages)
+	if (region->flags.large_pages &&
+	    VALUE_PMD_ALIGNED(page_offset) &&
+	    VALUE_PMD_ALIGNED(page_count))
 		map_flags |= HV_MAP_GPA_LARGE_PAGE;
 
 	/* ask the hypervisor to map guest ram */



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions
  2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (3 preceding siblings ...)
  2025-10-06 15:06 ` [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned Stanislav Kinsburskii
@ 2025-10-06 15:06 ` Stanislav Kinsburskii
  2025-10-10 17:48   ` kernel test robot
  2025-10-11  1:12   ` kernel test robot
  4 siblings, 2 replies; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 15:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Introduce support for movable memory regions in the Hyper-V root partition
driver, thus improving memory management flexibility and preparing the
driver for advanced use cases such as dynamic memory remapping.

Integrate mmu_interval_notifier for movable regions, implement functions to
handle HMM faults and memory invalidation, and update memory region mapping
logic to support movable regions.

While MMU notifiers are commonly used in virtualization drivers, this
implementation leverages HMM (Heterogeneous Memory Management) for its
tailored functionality. HMM provides a ready-made framework for mirroring,
invalidation, and fault handling, avoiding the need to reimplement these
mechanisms for a single callback. Although MMU notifiers are more generic,
using HMM reduces boilerplate and ensures maintainability by utilizing a
mechanism specifically designed for such use cases.

Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/Kconfig          |    1 
 drivers/hv/mshv_root.h      |    8 +
 drivers/hv/mshv_root_main.c |  328 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 327 insertions(+), 10 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index e24f6299c3760..9d24a8c8c52e3 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -68,6 +68,7 @@ config MSHV_ROOT
 	depends on PAGE_SIZE_4KB
 	select EVENTFD
 	select VIRT_XFER_TO_GUEST_WORK
+	select HMM_MIRROR
 	default n
 	help
 	  Select this option to enable support for booting and running as root
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 97e64d5341b6e..13367c84497c0 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -15,6 +15,7 @@
 #include <linux/hashtable.h>
 #include <linux/dev_printk.h>
 #include <linux/build_bug.h>
+#include <linux/mmu_notifier.h>
 #include <uapi/linux/mshv.h>
 
 /*
@@ -81,9 +82,14 @@ struct mshv_mem_region {
 	struct {
 		u64 large_pages:  1; /* 2MiB */
 		u64 range_pinned: 1;
-		u64 reserved:	 62;
+		u64 is_ram	: 1; /* mem region can be ram or mmio */
+		u64 reserved:	 61;
 	} flags;
 	struct mshv_partition *partition;
+#if defined(CONFIG_MMU_NOTIFIER)
+	struct mmu_interval_notifier mni;
+	struct mutex mutex;	/* protects region pages remapping */
+#endif
 	struct page *pages[];
 };
 
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index cb462495f34b5..bc0ea39bcd255 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -29,6 +29,7 @@
 #include <linux/crash_dump.h>
 #include <linux/panic_notifier.h>
 #include <linux/vmalloc.h>
+#include <linux/hmm.h>
 
 #include "mshv_eventfd.h"
 #include "mshv.h"
@@ -36,6 +37,8 @@
 
 #define VALUE_PMD_ALIGNED(c)			(!((c) & (PTRS_PER_PMD - 1)))
 
+#define MSHV_MAP_FAULT_IN_PAGES			HPAGE_PMD_NR
+
 MODULE_AUTHOR("Microsoft");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
@@ -76,6 +79,11 @@ static int mshv_vp_mmap(struct file *file, struct vm_area_struct *vma);
 static vm_fault_t mshv_vp_fault(struct vm_fault *vmf);
 static int mshv_init_async_handler(struct mshv_partition *partition);
 static void mshv_async_hvcall_handler(void *data, u64 *status);
+static struct mshv_mem_region
+	*mshv_partition_region_by_gfn(struct mshv_partition *pt, u64 gfn);
+static int mshv_region_remap_pages(struct mshv_mem_region *region,
+				   u32 map_flags, u64 page_offset,
+				   u64 page_count);
 
 static const union hv_input_vtl input_vtl_zero;
 static const union hv_input_vtl input_vtl_normal = {
@@ -602,14 +610,197 @@ static long mshv_run_vp_with_root_scheduler(struct mshv_vp *vp)
 static_assert(sizeof(struct hv_message) <= MSHV_RUN_VP_BUF_SZ,
 	      "sizeof(struct hv_message) must not exceed MSHV_RUN_VP_BUF_SZ");
 
+#ifdef CONFIG_X86_64
+
+#if defined(CONFIG_MMU_NOTIFIER)
+/**
+ * mshv_region_hmm_fault_and_lock - Handle HMM faults and lock the memory region
+ * @region: Pointer to the memory region structure
+ * @range: Pointer to the HMM range structure
+ *
+ * This function performs the following steps:
+ * 1. Reads the notifier sequence for the HMM range.
+ * 2. Acquires a read lock on the memory map.
+ * 3. Handles HMM faults for the specified range.
+ * 4. Releases the read lock on the memory map.
+ * 5. If successful, locks the memory region mutex.
+ * 6. Verifies if the notifier sequence has changed during the operation.
+ *    If it has, releases the mutex and returns -EBUSY to match with
+ *    hmm_range_fault() return code for repeating.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
+					  struct hmm_range *range)
+{
+	int ret;
+
+	range->notifier_seq = mmu_interval_read_begin(range->notifier);
+	mmap_read_lock(region->mni.mm);
+	ret = hmm_range_fault(range);
+	mmap_read_unlock(region->mni.mm);
+	if (ret)
+		return ret;
+
+	mutex_lock(&region->mutex);
+
+	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+		mutex_unlock(&region->mutex);
+		cond_resched();
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/**
+ * mshv_region_range_fault - Handle memory range faults for a given region.
+ * @region: Pointer to the memory region structure.
+ * @page_offset: Offset of the page within the region.
+ * @page_count: Number of pages to handle.
+ *
+ * This function resolves memory faults for a specified range of pages
+ * within a memory region. It uses HMM (Heterogeneous Memory Management)
+ * to fault in the required pages and updates the region's page array.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int mshv_region_range_fault(struct mshv_mem_region *region,
+				   u64 page_offset, u64 page_count)
+{
+	struct hmm_range range = {
+		.notifier = &region->mni,
+		.default_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
+	};
+	unsigned long *pfns;
+	int ret;
+	u64 i;
+
+	pfns = kmalloc_array(page_count, sizeof(unsigned long), GFP_KERNEL);
+	if (!pfns)
+		return -ENOMEM;
+
+	range.hmm_pfns = pfns;
+	range.start = region->start_uaddr + page_offset * HV_HYP_PAGE_SIZE;
+	range.end = range.start + page_count * HV_HYP_PAGE_SIZE;
+
+	do {
+		ret = mshv_region_hmm_fault_and_lock(region, &range);
+	} while (ret == -EBUSY);
+
+	if (ret)
+		goto out;
+
+	for (i = 0; i < page_count; i++)
+		region->pages[page_offset + i] = hmm_pfn_to_page(pfns[i]);
+
+	if (PageHuge(region->pages[page_offset]))
+		region->flags.large_pages = true;
+
+	ret = mshv_region_remap_pages(region, region->hv_map_flags,
+				      page_offset, page_count);
+
+	mutex_unlock(&region->mutex);
+out:
+	kfree(pfns);
+	return ret;
+}
+#else /* CONFIG_MMU_NOTIFIER */
+static int mshv_region_range_fault(struct mshv_mem_region *region,
+				   u64 page_offset, u64 page_count)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_MMU_NOTIFIER */
+
+static bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
+{
+	u64 page_offset, page_count;
+	int ret;
+
+	if (WARN_ON_ONCE(region->flags.range_pinned))
+		return false;
+
+	/* Align the page offset to the nearest MSHV_MAP_FAULT_IN_PAGES. */
+	page_offset = ALIGN_DOWN(gfn - region->start_gfn,
+				 MSHV_MAP_FAULT_IN_PAGES);
+
+	/* Map more pages than requested to reduce the number of faults. */
+	page_count = min(region->nr_pages - page_offset,
+			 MSHV_MAP_FAULT_IN_PAGES);
+
+	ret = mshv_region_range_fault(region, page_offset, page_count);
+
+	WARN_ONCE(ret,
+		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, page_offset %llu, page_count %llu\n",
+		  region->partition->pt_id, region->start_uaddr,
+		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
+		  gfn, page_offset, page_count);
+
+	return !ret;
+}
+
+/**
+ * mshv_handle_gpa_intercept - Handle GPA (Guest Physical Address) intercepts.
+ * @vp: Pointer to the virtual processor structure.
+ *
+ * This function processes GPA intercepts by identifying the memory region
+ * corresponding to the intercepted GPA, aligning the page offset, and
+ * mapping the required pages. It ensures that the region is valid and
+ * handles faults efficiently by mapping multiple pages at once.
+ *
+ * Return: true if the intercept was handled successfully, false otherwise.
+ */
+static bool mshv_handle_gpa_intercept(struct mshv_vp *vp)
+{
+	struct mshv_partition *p = vp->vp_partition;
+	struct mshv_mem_region *region;
+	struct hv_x64_memory_intercept_message *msg;
+	u64 gfn;
+
+	msg = (struct hv_x64_memory_intercept_message *)
+		vp->vp_intercept_msg_page->u.payload;
+
+	gfn = HVPFN_DOWN(msg->guest_physical_address);
+
+	region = mshv_partition_region_by_gfn(p, gfn);
+	if (!region)
+		return false;
+
+	if (WARN_ON_ONCE(!region->flags.is_ram))
+		return false;
+
+	if (WARN_ON_ONCE(region->flags.range_pinned))
+		return false;
+
+	return mshv_region_handle_gfn_fault(region, gfn);
+}
+
+#else	/* CONFIG_X86_64 */
+
+static bool mshv_handle_gpa_intercept(struct mshv_vp *vp) { return false; }
+
+#endif	/* CONFIG_X86_64 */
+
+static bool mshv_vp_handle_intercept(struct mshv_vp *vp)
+{
+	switch (vp->vp_intercept_msg_page->header.message_type) {
+	case HVMSG_GPA_INTERCEPT:
+		return mshv_handle_gpa_intercept(vp);
+	}
+	return false;
+}
+
 static long mshv_vp_ioctl_run_vp(struct mshv_vp *vp, void __user *ret_msg)
 {
 	long rc;
 
-	if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
-		rc = mshv_run_vp_with_root_scheduler(vp);
-	else
-		rc = mshv_run_vp_with_hyp_scheduler(vp);
+	do {
+		if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
+			rc = mshv_run_vp_with_root_scheduler(vp);
+		else
+			rc = mshv_run_vp_with_hyp_scheduler(vp);
+	} while (rc == 0 && mshv_vp_handle_intercept(vp));
 
 	if (rc)
 		return rc;
@@ -1209,6 +1400,110 @@ mshv_partition_region_by_uaddr(struct mshv_partition *partition, u64 uaddr)
 	return NULL;
 }
 
+#if defined(CONFIG_MMU_NOTIFIER)
+static void mshv_region_movable_fini(struct mshv_mem_region *region)
+{
+	if (region->flags.range_pinned)
+		return;
+
+	mmu_interval_notifier_remove(&region->mni);
+}
+
+/**
+ * mshv_region_interval_invalidate - Invalidate a range of memory region
+ * @mni: Pointer to the mmu_interval_notifier structure
+ * @range: Pointer to the mmu_notifier_range structure
+ * @cur_seq: Current sequence number for the interval notifier
+ *
+ * This function invalidates a memory region by remapping its pages with
+ * no access permissions. It locks the region's mutex to ensure thread safety
+ * and updates the sequence number for the interval notifier. If the range
+ * is blockable, it uses a blocking lock; otherwise, it attempts a non-blocking
+ * lock and returns false if unsuccessful.
+ *
+ * NOTE: Failure to invalidate a region is a serious error, as the pages will
+ * be considered freed while they are still mapped by the hypervisor.
+ * Any attempt to access such pages will likely crash the system.
+ *
+ * Return: true if the region was successfully invalidated, false otherwise.
+ */
+static bool
+mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
+				const struct mmu_notifier_range *range,
+				unsigned long cur_seq)
+{
+	struct mshv_mem_region *region = container_of(mni,
+						struct mshv_mem_region,
+						mni);
+	u64 page_offset, page_count;
+	unsigned long mstart, mend;
+	int ret;
+
+	if (mmu_notifier_range_blockable(range))
+		mutex_lock(&region->mutex);
+	else if (!mutex_trylock(&region->mutex))
+		goto out_fail;
+
+	mmu_interval_set_seq(mni, cur_seq);
+
+	mstart = max(range->start, region->start_uaddr);
+	mend = min(range->end, region->start_uaddr +
+		   (region->nr_pages << HV_HYP_PAGE_SHIFT));
+
+	page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
+	page_count = HVPFN_DOWN(mend - mstart);
+
+	ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
+				      page_offset, page_count);
+	if (ret)
+		goto out_fail;
+
+	mshv_region_invalidate_pages(region, page_offset, page_count);
+
+	mutex_unlock(&region->mutex);
+
+	return true;
+
+out_fail:
+	WARN_ONCE(ret,
+		  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
+		  region->start_uaddr,
+		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
+		  range->start, range->end, range->event,
+		  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
+	return false;
+}
+
+static const struct mmu_interval_notifier_ops mshv_region_mni_ops = {
+	.invalidate = mshv_region_interval_invalidate,
+};
+
+static bool mshv_region_movable_init(struct mshv_mem_region *region)
+{
+	int ret;
+
+	ret = mmu_interval_notifier_insert(&region->mni, current->mm,
+					   region->start_uaddr,
+					   region->nr_pages << HV_HYP_PAGE_SHIFT,
+					   &mshv_region_mni_ops);
+	if (ret)
+		return false;
+
+	mutex_init(&region->mutex);
+
+	return true;
+}
+#else
+static inline void mshv_region_movable_fini(struct mshv_mem_region *region)
+{
+}
+
+static inline bool mshv_region_movable_init(struct mshv_mem_region *region)
+{
+	return false;
+}
+#endif
+
 /*
  * NB: caller checks and makes sure mem->size is page aligned
  * Returns: 0 with regionpp updated on success, or -errno
@@ -1241,9 +1536,14 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	if (mem->flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
 		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
 
-	/* Note: large_pages flag populated when we pin the pages */
-	if (!is_mmio)
-		region->flags.range_pinned = true;
+	/* Note: large_pages flag populated when pages are allocated. */
+	if (!is_mmio) {
+		region->flags.is_ram = true;
+
+		if (mshv_partition_encrypted(partition) ||
+		    !mshv_region_movable_init(region))
+			region->flags.range_pinned = true;
+	}
 
 	region->partition = partition;
 
@@ -1363,9 +1663,16 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	if (is_mmio)
 		ret = hv_call_map_mmio_pages(partition->pt_id, mem.guest_pfn,
 					     mmio_pfn, HVPFN_DOWN(mem.size));
-	else
+	else if (region->flags.range_pinned)
 		ret = mshv_prepare_pinned_region(region);
-
+	else
+		/*
+		 * For non-pinned regions, remap with no access to let the
+		 * hypervisor track dirty pages, enabling pre-copy live
+		 * migration.
+		 */
+		ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
+					      0, region->nr_pages);
 	if (ret)
 		goto errout;
 
@@ -1388,6 +1695,9 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
 
 	hlist_del(&region->hnode);
 
+	if (region->flags.is_ram)
+		mshv_region_movable_fini(region);
+
 	if (mshv_partition_encrypted(partition)) {
 		ret = mshv_partition_region_share(region);
 		if (ret) {



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance
  2025-10-06 15:06 ` [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance Stanislav Kinsburskii
@ 2025-10-06 17:09   ` Michael Kelley
  2025-10-06 23:32     ` Stanislav Kinsburskii
  2025-10-10 23:06   ` Nuno Das Neves
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kelley @ 2025-10-06 17:09 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, October 6, 2025 8:06 AM
> 
> Reduce overhead when unmapping large memory regions by batching GPA unmap
> operations in 2MB-aligned chunks.
> 
> Use a dedicated constant for batch size to improve code clarity and
> maintainability.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root.h         |    2 ++
>  drivers/hv/mshv_root_hv_call.c |    2 +-
>  drivers/hv/mshv_root_main.c    |   28 +++++++++++++++++++++++++---
>  3 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index e3931b0f12693..97e64d5341b6e 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
> 
>  #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
> 
> +#define MSHV_MAX_UNMAP_GPA_PAGES	512
> +
>  struct mshv_vp {
>  	u32 vp_index;
>  	struct mshv_partition *vp_partition;
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index c9c274f29c3c6..0696024ccfe31 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -17,7 +17,7 @@
>  /* Determined empirically */
>  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
>  #define HV_MAP_GPA_DEPOSIT_PAGES	256
> -#define HV_UMAP_GPA_PAGES		512
> +#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
> 
>  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 97e322f3c6b5e..b61bef6b9c132 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  static void mshv_partition_destroy_region(struct mshv_mem_region *region)
>  {
>  	struct mshv_partition *partition = region->partition;
> +	u64 gfn, gfn_count, start_gfn, end_gfn;
>  	u32 unmap_flags = 0;
>  	int ret;
> 
> @@ -1396,9 +1397,30 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
>  	if (region->flags.large_pages)
>  		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> 
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> +	start_gfn = region->start_gfn;
> +	end_gfn = region->start_gfn + region->nr_pages;
> +
> +	for (gfn = start_gfn; gfn < end_gfn; gfn += gfn_count) {
> +		if (gfn % MSHV_MAX_UNMAP_GPA_PAGES)
> +			gfn_count = ALIGN(gfn, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> +		else
> +			gfn_count = MSHV_MAX_UNMAP_GPA_PAGES;

You could do the entire if/else as:

		gfn_count = ALIGN(gfn + 1, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;

Using "gfn + 1" handles the case where gfn is already aligned. Arguably, this is a bit
more obscure, so it's just a suggestion.

> +
> +		if (gfn + gfn_count > end_gfn)
> +			gfn_count = end_gfn - gfn;

Or
		gfn_count = min(gfn_count, end_gfn - gfn);

I usually prefer the "min" function instead of an "if" statement if logically
the intent is to compute the minimum. But again, just a suggestion.

> +
> +		/* Skip if all pages in this range if none is mapped */
> +		if (!memchr_inv(region->pages + (gfn - start_gfn), 0,
> +				gfn_count * sizeof(struct page *)))
> +			continue;
> +
> +		ret = hv_call_unmap_gpa_pages(partition->pt_id, gfn,
> +					      gfn_count, unmap_flags);
> +		if (ret)
> +			pt_err(partition,
> +			       "Failed to unmap GPA pages %#llx-%#llx: %d\n",
> +			       gfn, gfn + gfn_count - 1, ret);
> +	}

Overall, I think this algorithm looks good and handles all the edge cases.

Michael

> 
>  	mshv_region_invalidate(region);
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance
  2025-10-06 17:09   ` Michael Kelley
@ 2025-10-06 23:32     ` Stanislav Kinsburskii
  2025-10-07  0:02       ` Michael Kelley
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-06 23:32 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Oct 06, 2025 at 05:09:07PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, October 6, 2025 8:06 AM
> > 
> > Reduce overhead when unmapping large memory regions by batching GPA unmap
> > operations in 2MB-aligned chunks.
> > 
> > Use a dedicated constant for batch size to improve code clarity and
> > maintainability.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_root.h         |    2 ++
> >  drivers/hv/mshv_root_hv_call.c |    2 +-
> >  drivers/hv/mshv_root_main.c    |   28 +++++++++++++++++++++++++---
> >  3 files changed, 28 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index e3931b0f12693..97e64d5341b6e 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
> > 
> >  #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
> > 
> > +#define MSHV_MAX_UNMAP_GPA_PAGES	512
> > +
> >  struct mshv_vp {
> >  	u32 vp_index;
> >  	struct mshv_partition *vp_partition;
> > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > index c9c274f29c3c6..0696024ccfe31 100644
> > --- a/drivers/hv/mshv_root_hv_call.c
> > +++ b/drivers/hv/mshv_root_hv_call.c
> > @@ -17,7 +17,7 @@
> >  /* Determined empirically */
> >  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
> >  #define HV_MAP_GPA_DEPOSIT_PAGES	256
> > -#define HV_UMAP_GPA_PAGES		512
> > +#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
> > 
> >  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> > 
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index 97e322f3c6b5e..b61bef6b9c132 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >  static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> >  {
> >  	struct mshv_partition *partition = region->partition;
> > +	u64 gfn, gfn_count, start_gfn, end_gfn;
> >  	u32 unmap_flags = 0;
> >  	int ret;
> > 
> > @@ -1396,9 +1397,30 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> >  	if (region->flags.large_pages)
> >  		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > 
> > -	/* ignore unmap failures and continue as process may be exiting */
> > -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> > -				region->nr_pages, unmap_flags);
> > +	start_gfn = region->start_gfn;
> > +	end_gfn = region->start_gfn + region->nr_pages;
> > +
> > +	for (gfn = start_gfn; gfn < end_gfn; gfn += gfn_count) {
> > +		if (gfn % MSHV_MAX_UNMAP_GPA_PAGES)
> > +			gfn_count = ALIGN(gfn, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> > +		else
> > +			gfn_count = MSHV_MAX_UNMAP_GPA_PAGES;
> 
> You could do the entire if/else as:
> 
> 		gfn_count = ALIGN(gfn + 1, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> 
> Using "gfn + 1" handles the case where gfn is already aligned. Arguably, this is a bit
> more obscure, so it's just a suggestion.
> 
> > +
> > +		if (gfn + gfn_count > end_gfn)
> > +			gfn_count = end_gfn - gfn;
> 
> Or
> 		gfn_count = min(gfn_count, end_gfn - gfn);
> 
> I usually prefer the "min" function instead of an "if" statement if logically
> the intent is to compute the minimum. But again, just a suggestion.
> 
> > +
> > +		/* Skip if all pages in this range if none is mapped */
> > +		if (!memchr_inv(region->pages + (gfn - start_gfn), 0,
> > +				gfn_count * sizeof(struct page *)))
> > +			continue;
> > +
> > +		ret = hv_call_unmap_gpa_pages(partition->pt_id, gfn,
> > +					      gfn_count, unmap_flags);
> > +		if (ret)
> > +			pt_err(partition,
> > +			       "Failed to unmap GPA pages %#llx-%#llx: %d\n",
> > +			       gfn, gfn + gfn_count - 1, ret);
> > +	}
> 
> Overall, I think this algorithm looks good and handles all the edge cases.
> 

Thank you for your suggestions. I also generally prefer reducing the
code in a similar way, but in this case, I deliberately chose a more
elaborate approach to improve clarity.

So, if you don’t mind, I’d rather keep it as is, since this version is
easy to understand and self-documenting.

Thanks,
Stanislav

> Michael
> 
> > 
> >  	mshv_region_invalidate(region);
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance
  2025-10-06 23:32     ` Stanislav Kinsburskii
@ 2025-10-07  0:02       ` Michael Kelley
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kelley @ 2025-10-07  0:02 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, October 6, 2025 4:33 PM
> 
> On Mon, Oct 06, 2025 at 05:09:07PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, October 6, 2025 8:06 AM
> > >
> > > Reduce overhead when unmapping large memory regions by batching GPA unmap
> > > operations in 2MB-aligned chunks.
> > >
> > > Use a dedicated constant for batch size to improve code clarity and
> > > maintainability.
> > >
> > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > ---
> > >  drivers/hv/mshv_root.h         |    2 ++
> > >  drivers/hv/mshv_root_hv_call.c |    2 +-
> > >  drivers/hv/mshv_root_main.c    |   28 +++++++++++++++++++++++++---
> > >  3 files changed, 28 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > > index e3931b0f12693..97e64d5341b6e 100644
> > > --- a/drivers/hv/mshv_root.h
> > > +++ b/drivers/hv/mshv_root.h
> > > @@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
> > >
> > >  #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
> > >
> > > +#define MSHV_MAX_UNMAP_GPA_PAGES	512
> > > +
> > >  struct mshv_vp {
> > >  	u32 vp_index;
> > >  	struct mshv_partition *vp_partition;
> > > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > > index c9c274f29c3c6..0696024ccfe31 100644
> > > --- a/drivers/hv/mshv_root_hv_call.c
> > > +++ b/drivers/hv/mshv_root_hv_call.c
> > > @@ -17,7 +17,7 @@
> > >  /* Determined empirically */
> > >  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
> > >  #define HV_MAP_GPA_DEPOSIT_PAGES	256
> > > -#define HV_UMAP_GPA_PAGES		512
> > > +#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
> > >
> > >  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> > >
> > > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > > index 97e322f3c6b5e..b61bef6b9c132 100644
> > > --- a/drivers/hv/mshv_root_main.c
> > > +++ b/drivers/hv/mshv_root_main.c
> > > @@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
> > >  static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> > >  {
> > >  	struct mshv_partition *partition = region->partition;
> > > +	u64 gfn, gfn_count, start_gfn, end_gfn;
> > >  	u32 unmap_flags = 0;
> > >  	int ret;
> > >
> > > @@ -1396,9 +1397,30 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> > >  	if (region->flags.large_pages)
> > >  		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > >
> > > -	/* ignore unmap failures and continue as process may be exiting */
> > > -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> > > -				region->nr_pages, unmap_flags);
> > > +	start_gfn = region->start_gfn;
> > > +	end_gfn = region->start_gfn + region->nr_pages;
> > > +
> > > +	for (gfn = start_gfn; gfn < end_gfn; gfn += gfn_count) {
> > > +		if (gfn % MSHV_MAX_UNMAP_GPA_PAGES)
> > > +			gfn_count = ALIGN(gfn, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> > > +		else
> > > +			gfn_count = MSHV_MAX_UNMAP_GPA_PAGES;
> >
> > You could do the entire if/else as:
> >
> > 		gfn_count = ALIGN(gfn + 1, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> >
> > Using "gfn + 1" handles the case where gfn is already aligned. Arguably, this is a bit
> > more obscure, so it's just a suggestion.
> >
> > > +
> > > +		if (gfn + gfn_count > end_gfn)
> > > +			gfn_count = end_gfn - gfn;
> >
> > Or
> > 		gfn_count = min(gfn_count, end_gfn - gfn);
> >
> > I usually prefer the "min" function instead of an "if" statement if logically
> > the intent is to compute the minimum. But again, just a suggestion.
> >
> > > +
> > > +		/* Skip if all pages in this range if none is mapped */
> > > +		if (!memchr_inv(region->pages + (gfn - start_gfn), 0,
> > > +				gfn_count * sizeof(struct page *)))
> > > +			continue;
> > > +
> > > +		ret = hv_call_unmap_gpa_pages(partition->pt_id, gfn,
> > > +					      gfn_count, unmap_flags);
> > > +		if (ret)
> > > +			pt_err(partition,
> > > +			       "Failed to unmap GPA pages %#llx-%#llx: %d\n",
> > > +			       gfn, gfn + gfn_count - 1, ret);
> > > +	}
> >
> > Overall, I think this algorithm looks good and handles all the edge cases.
> >
> 
> Thank you for your suggestions. I also generally prefer reducing the
> code in a similar way, but in this case, I deliberately chose a more
> elaborate approach to improve clarity.
> 
> So, if you don’t mind, I’d rather keep it as is, since this version is
> easy to understand and self-documenting.
> 

Yes, that's fine with me.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions
  2025-10-06 15:06 ` [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
@ 2025-10-10 17:48   ` kernel test robot
  2025-10-11  1:12   ` kernel test robot
  1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-10-10 17:48 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: llvm, oe-kbuild-all, linux-hyperv, linux-kernel

Hi Stanislav,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on next-20251010]
[cannot apply to v6.17]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Kinsburskii/Drivers-hv-Refactor-and-rename-memory-region-handling-functions/20251010-111917
base:   linus/master
patch link:    https://lore.kernel.org/r/175976319844.16834.4747024333732752980.stgit%40skinsburskii-cloud-desktop.internal.cloudapp.net
patch subject: [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions
config: x86_64-randconfig-006-20251010 (https://download.01.org/0day-ci/archive/20251011/202510110134.RmOV83Fz-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510110134.RmOV83Fz-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510110134.RmOV83Fz-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/hv/mshv_root_main.c:1410:11: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    1410 |         else if (!mutex_trylock(&region->mutex))
         |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/hv/mshv_root_main.c:1434:12: note: uninitialized use occurs here
    1434 |         WARN_ONCE(ret,
         |                   ^~~
   include/asm-generic/bug.h:152:18: note: expanded from macro 'WARN_ONCE'
     152 |         DO_ONCE_LITE_IF(condition, WARN, 1, format)
         |                         ^~~~~~~~~
   include/linux/once_lite.h:28:27: note: expanded from macro 'DO_ONCE_LITE_IF'
      28 |                 bool __ret_do_once = !!(condition);                     \
         |                                         ^~~~~~~~~
   drivers/hv/mshv_root_main.c:1410:7: note: remove the 'if' if its condition is always false
    1410 |         else if (!mutex_trylock(&region->mutex))
         |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1411 |                 goto out_fail;
         |                 ~~~~~~~~~~~~~
   drivers/hv/mshv_root_main.c:1406:9: note: initialize the variable 'ret' to silence this warning
    1406 |         int ret;
         |                ^
         |                 = 0
   1 warning generated.


vim +1410 drivers/hv/mshv_root_main.c

  1377	
  1378	/**
  1379	 * mshv_region_interval_invalidate - Invalidate a range of memory region
  1380	 * @mni: Pointer to the mmu_interval_notifier structure
  1381	 * @range: Pointer to the mmu_notifier_range structure
  1382	 * @cur_seq: Current sequence number for the interval notifier
  1383	 *
  1384	 * This function invalidates a memory region by remapping its pages with
  1385	 * no access permissions. It locks the region's mutex to ensure thread safety
  1386	 * and updates the sequence number for the interval notifier. If the range
  1387	 * is blockable, it uses a blocking lock; otherwise, it attempts a non-blocking
  1388	 * lock and returns false if unsuccessful.
  1389	 *
  1390	 * NOTE: Failure to invalidate a region is a serious error, as the pages will
  1391	 * be considered freed while they are still mapped by the hypervisor.
  1392	 * Any attempt to access such pages will likely crash the system.
  1393	 *
  1394	 * Return: true if the region was successfully invalidated, false otherwise.
  1395	 */
  1396	static bool
  1397	mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
  1398					const struct mmu_notifier_range *range,
  1399					unsigned long cur_seq)
  1400	{
  1401		struct mshv_mem_region *region = container_of(mni,
  1402							struct mshv_mem_region,
  1403							mni);
  1404		u64 page_offset, page_count;
  1405		unsigned long mstart, mend;
  1406		int ret;
  1407	
  1408		if (mmu_notifier_range_blockable(range))
  1409			mutex_lock(&region->mutex);
> 1410		else if (!mutex_trylock(&region->mutex))
  1411			goto out_fail;
  1412	
  1413		mmu_interval_set_seq(mni, cur_seq);
  1414	
  1415		mstart = max(range->start, region->start_uaddr);
  1416		mend = min(range->end, region->start_uaddr +
  1417			   (region->nr_pages << HV_HYP_PAGE_SHIFT));
  1418	
  1419		page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
  1420		page_count = HVPFN_DOWN(mend - mstart);
  1421	
  1422		ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
  1423					      page_offset, page_count);
  1424		if (ret)
  1425			goto out_fail;
  1426	
  1427		mshv_region_invalidate_pages(region, page_offset, page_count);
  1428	
  1429		mutex_unlock(&region->mutex);
  1430	
  1431		return true;
  1432	
  1433	out_fail:
  1434		WARN_ONCE(ret,
  1435			  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
  1436			  region->start_uaddr,
  1437			  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
  1438			  range->start, range->end, range->event,
  1439			  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
  1440		return false;
  1441	}
  1442	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions
  2025-10-06 15:06 ` [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
@ 2025-10-10 22:49   ` Nuno Das Neves
  0 siblings, 0 replies; 15+ messages in thread
From: Nuno Das Neves @ 2025-10-10 22:49 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 10/6/2025 8:06 AM, Stanislav Kinsburskii wrote:
> Simplify and unify memory region management to improve code clarity and
> reliability. Consolidate pinning and invalidation logic, adopt consistent
> naming, and remove redundant checks to reduce complexity.
> 
> Enhance documentation and update call sites for maintainability.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |   80 +++++++++++++++++++------------------------
>  1 file changed, 36 insertions(+), 44 deletions(-)
> 

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction
  2025-10-06 15:06 ` [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
@ 2025-10-10 22:54   ` Nuno Das Neves
  0 siblings, 0 replies; 15+ messages in thread
From: Nuno Das Neves @ 2025-10-10 22:54 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 10/6/2025 8:06 AM, Stanislav Kinsburskii wrote:
> Centralize guest memory region destruction to prevent resource leaks and
> inconsistent cleanup across unmap and partition destruction paths.
> 
> Unify region removal, encrypted partition access recovery, and region
> invalidation to improve maintainability and reliability. Reduce code
> duplication and make future updates less error-prone by encapsulating
> cleanup logic in a single helper.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |   65 ++++++++++++++++++++++---------------------
>  1 file changed, 34 insertions(+), 31 deletions(-)
> 

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance
  2025-10-06 15:06 ` [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance Stanislav Kinsburskii
  2025-10-06 17:09   ` Michael Kelley
@ 2025-10-10 23:06   ` Nuno Das Neves
  1 sibling, 0 replies; 15+ messages in thread
From: Nuno Das Neves @ 2025-10-10 23:06 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 10/6/2025 8:06 AM, Stanislav Kinsburskii wrote:
> Reduce overhead when unmapping large memory regions by batching GPA unmap
> operations in 2MB-aligned chunks.
> 
> Use a dedicated constant for batch size to improve code clarity and
> maintainability.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root.h         |    2 ++
>  drivers/hv/mshv_root_hv_call.c |    2 +-
>  drivers/hv/mshv_root_main.c    |   28 +++++++++++++++++++++++++---
>  3 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index e3931b0f12693..97e64d5341b6e 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
>  
>  #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
>  
> +#define MSHV_MAX_UNMAP_GPA_PAGES	512
> +
>  struct mshv_vp {
>  	u32 vp_index;
>  	struct mshv_partition *vp_partition;
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index c9c274f29c3c6..0696024ccfe31 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -17,7 +17,7 @@
>  /* Determined empirically */
>  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
>  #define HV_MAP_GPA_DEPOSIT_PAGES	256
> -#define HV_UMAP_GPA_PAGES		512
> +#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
>  
>  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
>  
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 97e322f3c6b5e..b61bef6b9c132 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  static void mshv_partition_destroy_region(struct mshv_mem_region *region)
>  {
>  	struct mshv_partition *partition = region->partition;
> +	u64 gfn, gfn_count, start_gfn, end_gfn;
>  	u32 unmap_flags = 0;
>  	int ret;
>  
> @@ -1396,9 +1397,30 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
>  	if (region->flags.large_pages)
>  		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
>  
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> +	start_gfn = region->start_gfn;
> +	end_gfn = region->start_gfn + region->nr_pages;
> +
> +	for (gfn = start_gfn; gfn < end_gfn; gfn += gfn_count) {
> +		if (gfn % MSHV_MAX_UNMAP_GPA_PAGES)
> +			gfn_count = ALIGN(gfn, MSHV_MAX_UNMAP_GPA_PAGES) - gfn;
> +		else
> +			gfn_count = MSHV_MAX_UNMAP_GPA_PAGES;
> +
> +		if (gfn + gfn_count > end_gfn)
> +			gfn_count = end_gfn - gfn;
> +
> +		/* Skip if all pages in this range if none is mapped */

nit: comment grammar. Suggest "Skip all pages in this range if none are mapped"

Looks good otherwise.

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

> +		if (!memchr_inv(region->pages + (gfn - start_gfn), 0,
> +				gfn_count * sizeof(struct page *)))
> +			continue;
> +
> +		ret = hv_call_unmap_gpa_pages(partition->pt_id, gfn,
> +					      gfn_count, unmap_flags);
> +		if (ret)
> +			pt_err(partition,
> +			       "Failed to unmap GPA pages %#llx-%#llx: %d\n",
> +			       gfn, gfn + gfn_count - 1, ret);
> +	}
>  
>  	mshv_region_invalidate(region);
>  
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned
  2025-10-06 15:06 ` [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned Stanislav Kinsburskii
@ 2025-10-10 23:07   ` Nuno Das Neves
  0 siblings, 0 replies; 15+ messages in thread
From: Nuno Das Neves @ 2025-10-10 23:07 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 10/6/2025 8:06 AM, Stanislav Kinsburskii wrote:
> With the upcoming introduction of movable pages, a region doesn't guarantee
> always having large pages mapped. Both mapping on fault and unmapping
> during PTE invalidation may not be 2M-aligned, while the hypervisor
> requires both the GFN and page count to be 2M-aligned to use the large page
> flag.
> 
> Update the logic for large page mapping in mshv_region_remap_pages() to
> require both page_offset and page_count to be PMD-aligned.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index b61bef6b9c132..cb462495f34b5 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -34,6 +34,8 @@
>  #include "mshv.h"
>  #include "mshv_root.h"
>  
> +#define VALUE_PMD_ALIGNED(c)			(!((c) & (PTRS_PER_PMD - 1)))
> +
>  MODULE_AUTHOR("Microsoft");
>  MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
> @@ -1100,7 +1102,9 @@ mshv_region_remap_pages(struct mshv_mem_region *region, u32 map_flags,
>  	if (page_offset + page_count > region->nr_pages)
>  		return -EINVAL;
>  
> -	if (region->flags.large_pages)
> +	if (region->flags.large_pages &&
> +	    VALUE_PMD_ALIGNED(page_offset) &&
> +	    VALUE_PMD_ALIGNED(page_count))
>  		map_flags |= HV_MAP_GPA_LARGE_PAGE;
>  
>  	/* ask the hypervisor to map guest ram */
> 
> 

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions
  2025-10-06 15:06 ` [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
  2025-10-10 17:48   ` kernel test robot
@ 2025-10-11  1:12   ` kernel test robot
  1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-10-11  1:12 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: llvm, oe-kbuild-all, linux-hyperv, linux-kernel

Hi Stanislav,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on next-20251010]
[cannot apply to v6.17]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Kinsburskii/Drivers-hv-Refactor-and-rename-memory-region-handling-functions/20251010-111917
base:   linus/master
patch link:    https://lore.kernel.org/r/175976319844.16834.4747024333732752980.stgit%40skinsburskii-cloud-desktop.internal.cloudapp.net
patch subject: [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions
config: x86_64-buildonly-randconfig-001-20251011 (https://download.01.org/0day-ci/archive/20251011/202510110819.km2ehSaB-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510110819.km2ehSaB-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510110819.km2ehSaB-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/hmm.c:667:7: error: call to undeclared function 'mmu_interval_check_retry'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     667 |                 if (mmu_interval_check_retry(range->notifier,
         |                     ^
   1 error generated.


vim +/mmu_interval_check_retry +667 mm/hmm.c

7b86ac3371b70c3 Christoph Hellwig 2019-08-28  634  
9a4903e49e495bf Christoph Hellwig 2019-07-25  635  /**
9a4903e49e495bf Christoph Hellwig 2019-07-25  636   * hmm_range_fault - try to fault some address in a virtual address range
f970b977e068aa5 Jason Gunthorpe   2020-03-27  637   * @range:	argument structure
9a4903e49e495bf Christoph Hellwig 2019-07-25  638   *
be957c886d92aa9 Jason Gunthorpe   2020-05-01  639   * Returns 0 on success or one of the following error codes:
73231612dc7c907 Jérôme Glisse     2019-05-13  640   *
9a4903e49e495bf Christoph Hellwig 2019-07-25  641   * -EINVAL:	Invalid arguments or mm or virtual address is in an invalid vma
9a4903e49e495bf Christoph Hellwig 2019-07-25  642   *		(e.g., device file vma).
73231612dc7c907 Jérôme Glisse     2019-05-13  643   * -ENOMEM:	Out of memory.
9a4903e49e495bf Christoph Hellwig 2019-07-25  644   * -EPERM:	Invalid permission (e.g., asking for write and range is read
9a4903e49e495bf Christoph Hellwig 2019-07-25  645   *		only).
9a4903e49e495bf Christoph Hellwig 2019-07-25  646   * -EBUSY:	The range has been invalidated and the caller needs to wait for
9a4903e49e495bf Christoph Hellwig 2019-07-25  647   *		the invalidation to finish.
f970b977e068aa5 Jason Gunthorpe   2020-03-27  648   * -EFAULT:     A page was requested to be valid and could not be made valid
f970b977e068aa5 Jason Gunthorpe   2020-03-27  649   *              ie it has no backing VMA or it is illegal to access
74eee180b935fcb Jérôme Glisse     2017-09-08  650   *
f970b977e068aa5 Jason Gunthorpe   2020-03-27  651   * This is similar to get_user_pages(), except that it can read the page tables
f970b977e068aa5 Jason Gunthorpe   2020-03-27  652   * without mutating them (ie causing faults).
74eee180b935fcb Jérôme Glisse     2017-09-08  653   */
be957c886d92aa9 Jason Gunthorpe   2020-05-01  654  int hmm_range_fault(struct hmm_range *range)
74eee180b935fcb Jérôme Glisse     2017-09-08  655  {
d28c2c9a487708b Ralph Campbell    2019-11-04  656  	struct hmm_vma_walk hmm_vma_walk = {
d28c2c9a487708b Ralph Campbell    2019-11-04  657  		.range = range,
d28c2c9a487708b Ralph Campbell    2019-11-04  658  		.last = range->start,
d28c2c9a487708b Ralph Campbell    2019-11-04  659  	};
a22dd506400d0f4 Jason Gunthorpe   2019-11-12  660  	struct mm_struct *mm = range->notifier->mm;
74eee180b935fcb Jérôme Glisse     2017-09-08  661  	int ret;
74eee180b935fcb Jérôme Glisse     2017-09-08  662  
42fc541404f2497 Michel Lespinasse 2020-06-08  663  	mmap_assert_locked(mm);
74eee180b935fcb Jérôme Glisse     2017-09-08  664  
a3e0d41c2b1f86b Jérôme Glisse     2019-05-13  665  	do {
a3e0d41c2b1f86b Jérôme Glisse     2019-05-13  666  		/* If range is no longer valid force retry. */
a22dd506400d0f4 Jason Gunthorpe   2019-11-12 @667  		if (mmu_interval_check_retry(range->notifier,
a22dd506400d0f4 Jason Gunthorpe   2019-11-12  668  					     range->notifier_seq))
2bcbeaefde2f038 Christoph Hellwig 2019-07-24  669  			return -EBUSY;
d28c2c9a487708b Ralph Campbell    2019-11-04  670  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
7b86ac3371b70c3 Christoph Hellwig 2019-08-28  671  				      &hmm_walk_ops, &hmm_vma_walk);
be957c886d92aa9 Jason Gunthorpe   2020-05-01  672  		/*
be957c886d92aa9 Jason Gunthorpe   2020-05-01  673  		 * When -EBUSY is returned the loop restarts with
be957c886d92aa9 Jason Gunthorpe   2020-05-01  674  		 * hmm_vma_walk.last set to an address that has not been stored
be957c886d92aa9 Jason Gunthorpe   2020-05-01  675  		 * in pfns. All entries < last in the pfn array are set to their
be957c886d92aa9 Jason Gunthorpe   2020-05-01  676  		 * output, and all >= are still at their input values.
be957c886d92aa9 Jason Gunthorpe   2020-05-01  677  		 */
d28c2c9a487708b Ralph Campbell    2019-11-04  678  	} while (ret == -EBUSY);
73231612dc7c907 Jérôme Glisse     2019-05-13  679  	return ret;
74eee180b935fcb Jérôme Glisse     2017-09-08  680  }
73231612dc7c907 Jérôme Glisse     2019-05-13  681  EXPORT_SYMBOL(hmm_range_fault);
8cad47130566123 Leon Romanovsky   2025-04-28  682  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-10-11  1:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-06 15:06 [PATCH v4 0/5] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
2025-10-06 15:06 ` [PATCH v4 1/5] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
2025-10-10 22:49   ` Nuno Das Neves
2025-10-06 15:06 ` [PATCH v4 2/5] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
2025-10-10 22:54   ` Nuno Das Neves
2025-10-06 15:06 ` [PATCH v4 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance Stanislav Kinsburskii
2025-10-06 17:09   ` Michael Kelley
2025-10-06 23:32     ` Stanislav Kinsburskii
2025-10-07  0:02       ` Michael Kelley
2025-10-10 23:06   ` Nuno Das Neves
2025-10-06 15:06 ` [PATCH v4 4/5] Drivers: hv: Ensure large page GPA mapping is PMD-aligned Stanislav Kinsburskii
2025-10-10 23:07   ` Nuno Das Neves
2025-10-06 15:06 ` [PATCH v4 5/5] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
2025-10-10 17:48   ` kernel test robot
2025-10-11  1:12   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).