[PATCH v7 0/7] Introduce movable pages for Hyper-V guests

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 0/7] Introduce movable pages for Hyper-V guests
@ 2025-11-26  2:08 Stanislav Kinsburskii
  2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

From the start, the root-partition driver allocates, pins, and maps all
guest memory into the hypervisor at guest creation. This is simple: Linux
cannot move the pages, so the guest’s view in Linux and in Microsoft
Hypervisor never diverges.

However, this approach has major drawbacks:
 - NUMA: affinity can’t be changed at runtime, so you can’t migrate guest memory closer to the CPUs running it → performance hit.
 - Memory management: unused guest memory can’t be swapped out, compacted, or merged.
 - Provisioning time: upfront allocation/pinning slows guest create/destroy.
 - Overcommit: no memory overcommit on hosts with pinned-guest memory.

This series adds movable memory pages for Hyper-V child partitions. Guest
pages are no longer allocated upfront; they’re allocated and mapped into
the hypervisor on demand (i.e., when the guest touches a GFN that isn’t yet
backed by a host PFN).
When a page is moved, Linux no longer holds it and it is unmapped from the hypervisor.
As a result, Hyper-V guests behave like regular Linux processes, enabling standard Linux memory features to apply to guests.

Exceptions (still pinned):
 1. Encrypted guests (explicit).
 2. Guests with passthrough devices (implicitly pinned by the VFIO framework).

v7:
 - Only the first two patches remain unchanged from v6.
 - Introduced reference counting for memory regions to resolve a race
   condition between region servicing (faulting and invalidation) and region
   destruction.
 - Corrected the assumption that regions starting with a huge page contain
   only huge pages; the code now properly handles regions with mixed page
   size segments.
 - Consolidated region management logic into a dedicated file.
 - Updated the driver to select MMU_NOTIFIER, removing support for
   configurations without this option.
 - Cleaned up and refactored the region management code.
 - Fixed a build issue reported by the kernel test robot for configurations
   where HPAGE_PMD_NR is defined to result in build bug.
 - Replaced VALUE_PMD_ALIGNED with the generic IS_ALIGNED macro.
 - Simplified region flags by introducing a region type for clarity.
 - Improved commit messages.

v6:
 - Fix a bug in large page remapping where setting the large map flag based
   on the PFN offset's large page alignment within the region implicitly
   assumed that the region's start offset was also large page aligned,
   which could cause map hypercall failures.
 - Fix a bug in large page unmapping where setting the large unmap flag for
   an unaligned guest PFN range could result in unmap hypercall failures.

v5:
 - Fix a bug in MMU notifier handling where an uninitialized 'ret' variable
   could cause the warning about failed page invalidation to be skipped.
 - Improve comment grammar regarding skipping the unmapping of non-mapped pages.

v4:
 - Fix a bug in batch unmapping can skip mapped pages when selecting a new
   batch due to wrong offset calculation.
 - Fix an error message in case of failed memory region pinning.

v3:
 - Region is invalidated even if the mm has no users.
 - Page remapping logic is updated to support 2M-unaligned remappings for
   regions that are PMD-aligned, which can occur during both faults and
   invalidations.

v2:
 - Split unmap batching into a separate patch.
 - Fixed commit messages from v1 review.
 - Renamed a few functions for clarity.

---

Stanislav Kinsburskii (7):
      Drivers: hv: Refactor and rename memory region handling functions
      Drivers: hv: Centralize guest memory region destruction
      Drivers: hv: Move region management to mshv_regions.c
      Drivers: hv: Fix huge page handling in memory region traversal
      Drivers: hv: Improve region overlap detection in partition create
      Drivers: hv: Add refcount and locking to mem regions
      Drivers: hv: Add support for movable memory regions


 drivers/hv/Kconfig          |    2 
 drivers/hv/Makefile         |    2 
 drivers/hv/mshv_regions.c   |  548 +++++++++++++++++++++++++++++++++++++++++++
 drivers/hv/mshv_root.h      |   32 ++-
 drivers/hv/mshv_root_main.c |  382 +++++++++++++-----------------
 5 files changed, 745 insertions(+), 221 deletions(-)
 create mode 100644 drivers/hv/mshv_regions.c


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
@ 2025-11-26  2:08 ` Stanislav Kinsburskii
  2025-12-01 11:20   ` Anirudh Rayabharam
  2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Simplify and unify memory region management to improve code clarity and
reliability. Consolidate pinning and invalidation logic, adopt consistent
naming, and remove redundant checks to reduce complexity.

Enhance documentation and update call sites for maintainability.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |   80 +++++++++++++++++++------------------------
 1 file changed, 36 insertions(+), 44 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index bc15d6f6922f..fec82619684a 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1114,8 +1114,8 @@ mshv_region_map(struct mshv_mem_region *region)
 }
 
 static void
-mshv_region_evict_pages(struct mshv_mem_region *region,
-			u64 page_offset, u64 page_count)
+mshv_region_invalidate_pages(struct mshv_mem_region *region,
+			     u64 page_offset, u64 page_count)
 {
 	if (region->flags.range_pinned)
 		unpin_user_pages(region->pages + page_offset, page_count);
@@ -1125,29 +1125,24 @@ mshv_region_evict_pages(struct mshv_mem_region *region,
 }
 
 static void
-mshv_region_evict(struct mshv_mem_region *region)
+mshv_region_invalidate(struct mshv_mem_region *region)
 {
-	mshv_region_evict_pages(region, 0, region->nr_pages);
+	mshv_region_invalidate_pages(region, 0, region->nr_pages);
 }
 
 static int
-mshv_region_populate_pages(struct mshv_mem_region *region,
-			   u64 page_offset, u64 page_count)
+mshv_region_pin(struct mshv_mem_region *region)
 {
 	u64 done_count, nr_pages;
 	struct page **pages;
 	__u64 userspace_addr;
 	int ret;
 
-	if (page_offset + page_count > region->nr_pages)
-		return -EINVAL;
-
-	for (done_count = 0; done_count < page_count; done_count += ret) {
-		pages = region->pages + page_offset + done_count;
+	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
+		pages = region->pages + done_count;
 		userspace_addr = region->start_uaddr +
-				(page_offset + done_count) *
-				HV_HYP_PAGE_SIZE;
-		nr_pages = min(page_count - done_count,
+				 done_count * HV_HYP_PAGE_SIZE;
+		nr_pages = min(region->nr_pages - done_count,
 			       MSHV_PIN_PAGES_BATCH_SIZE);
 
 		/*
@@ -1158,34 +1153,23 @@ mshv_region_populate_pages(struct mshv_mem_region *region,
 		 * with the FOLL_LONGTERM flag does a large temporary
 		 * allocation of contiguous memory.
 		 */
-		if (region->flags.range_pinned)
-			ret = pin_user_pages_fast(userspace_addr,
-						  nr_pages,
-						  FOLL_WRITE | FOLL_LONGTERM,
-						  pages);
-		else
-			ret = -EOPNOTSUPP;
-
+		ret = pin_user_pages_fast(userspace_addr, nr_pages,
+					  FOLL_WRITE | FOLL_LONGTERM,
+					  pages);
 		if (ret < 0)
 			goto release_pages;
 	}
 
-	if (PageHuge(region->pages[page_offset]))
+	if (PageHuge(region->pages[0]))
 		region->flags.large_pages = true;
 
 	return 0;
 
 release_pages:
-	mshv_region_evict_pages(region, page_offset, done_count);
+	mshv_region_invalidate_pages(region, 0, done_count);
 	return ret;
 }
 
-static int
-mshv_region_populate(struct mshv_mem_region *region)
-{
-	return mshv_region_populate_pages(region, 0, region->nr_pages);
-}
-
 static struct mshv_mem_region *
 mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 {
@@ -1245,19 +1229,27 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	return 0;
 }
 
-/*
- * Map guest ram. if snp, make sure to release that from the host first
- * Side Effects: In case of failure, pages are unpinned when feasible.
+/**
+ * mshv_prepare_pinned_region - Pin and map memory regions
+ * @region: Pointer to the memory region structure
+ *
+ * This function processes memory regions that are explicitly marked as pinned.
+ * Pinned regions are preallocated, mapped upfront, and do not rely on fault-based
+ * population. The function ensures the region is properly populated, handles
+ * encryption requirements for SNP partitions if applicable, maps the region,
+ * and performs necessary sharing or eviction operations based on the mapping
+ * result.
+ *
+ * Return: 0 on success, negative error code on failure.
  */
-static int
-mshv_partition_mem_region_map(struct mshv_mem_region *region)
+static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
 	int ret;
 
-	ret = mshv_region_populate(region);
+	ret = mshv_region_pin(region);
 	if (ret) {
-		pt_err(partition, "Failed to populate memory region: %d\n",
+		pt_err(partition, "Failed to pin memory region: %d\n",
 		       ret);
 		goto err_out;
 	}
@@ -1275,7 +1267,7 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 			pt_err(partition,
 			       "Failed to unshare memory region (guest_pfn: %llu): %d\n",
 			       region->start_gfn, ret);
-			goto evict_region;
+			goto invalidate_region;
 		}
 	}
 
@@ -1285,7 +1277,7 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 
 		shrc = mshv_partition_region_share(region);
 		if (!shrc)
-			goto evict_region;
+			goto invalidate_region;
 
 		pt_err(partition,
 		       "Failed to share memory region (guest_pfn: %llu): %d\n",
@@ -1299,8 +1291,8 @@ mshv_partition_mem_region_map(struct mshv_mem_region *region)
 
 	return 0;
 
-evict_region:
-	mshv_region_evict(region);
+invalidate_region:
+	mshv_region_invalidate(region);
 err_out:
 	return ret;
 }
@@ -1349,7 +1341,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
 		ret = hv_call_map_mmio_pages(partition->pt_id, mem.guest_pfn,
 					     mmio_pfn, HVPFN_DOWN(mem.size));
 	else
-		ret = mshv_partition_mem_region_map(region);
+		ret = mshv_prepare_pinned_region(region);
 
 	if (ret)
 		goto errout;
@@ -1394,7 +1386,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
 				region->nr_pages, unmap_flags);
 
-	mshv_region_evict(region);
+	mshv_region_invalidate(region);
 
 	vfree(region);
 	return 0;
@@ -1812,7 +1804,7 @@ static void destroy_partition(struct mshv_partition *partition)
 			}
 		}
 
-		mshv_region_evict(region);
+		mshv_region_invalidate(region);
 
 		vfree(region);
 	}



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
  2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
@ 2025-11-26  2:08 ` Stanislav Kinsburskii
  2025-12-01 11:12   ` Anirudh Rayabharam
  2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:08 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Centralize guest memory region destruction to prevent resource leaks and
inconsistent cleanup across unmap and partition destruction paths.

Unify region removal, encrypted partition access recovery, and region
invalidation to improve maintainability and reliability. Reduce code
duplication and make future updates less error-prone by encapsulating
cleanup logic in a single helper.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |   65 ++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index fec82619684a..ec18984c3f2d 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1356,13 +1356,42 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	return ret;
 }
 
+static void mshv_partition_destroy_region(struct mshv_mem_region *region)
+{
+	struct mshv_partition *partition = region->partition;
+	u32 unmap_flags = 0;
+	int ret;
+
+	hlist_del(&region->hnode);
+
+	if (mshv_partition_encrypted(partition)) {
+		ret = mshv_partition_region_share(region);
+		if (ret) {
+			pt_err(partition,
+			       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
+			       ret);
+			return;
+		}
+	}
+
+	if (region->flags.large_pages)
+		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
+
+	/* ignore unmap failures and continue as process may be exiting */
+	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
+				region->nr_pages, unmap_flags);
+
+	mshv_region_invalidate(region);
+
+	vfree(region);
+}
+
 /* Called for unmapping both the guest ram and the mmio space */
 static long
 mshv_unmap_user_memory(struct mshv_partition *partition,
 		       struct mshv_user_mem_region mem)
 {
 	struct mshv_mem_region *region;
-	u32 unmap_flags = 0;
 
 	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
 		return -EINVAL;
@@ -1377,18 +1406,8 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	    region->nr_pages != HVPFN_DOWN(mem.size))
 		return -EINVAL;
 
-	hlist_del(&region->hnode);
+	mshv_partition_destroy_region(region);
 
-	if (region->flags.large_pages)
-		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
-
-	/* ignore unmap failures and continue as process may be exiting */
-	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
-				region->nr_pages, unmap_flags);
-
-	mshv_region_invalidate(region);
-
-	vfree(region);
 	return 0;
 }
 
@@ -1724,8 +1743,8 @@ static void destroy_partition(struct mshv_partition *partition)
 {
 	struct mshv_vp *vp;
 	struct mshv_mem_region *region;
-	int i, ret;
 	struct hlist_node *n;
+	int i;
 
 	if (refcount_read(&partition->pt_ref_count)) {
 		pt_err(partition,
@@ -1789,25 +1808,9 @@ static void destroy_partition(struct mshv_partition *partition)
 
 	remove_partition(partition);
 
-	/* Remove regions, regain access to the memory and unpin the pages */
 	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
-				  hnode) {
-		hlist_del(&region->hnode);
-
-		if (mshv_partition_encrypted(partition)) {
-			ret = mshv_partition_region_share(region);
-			if (ret) {
-				pt_err(partition,
-				       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
-				      ret);
-				return;
-			}
-		}
-
-		mshv_region_invalidate(region);
-
-		vfree(region);
-	}
+				  hnode)
+		mshv_partition_destroy_region(region);
 
 	/* Withdraw and free all pages we deposited */
 	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
  2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
  2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
@ 2025-11-26  2:09 ` Stanislav Kinsburskii
  2025-12-01 11:06   ` Anirudh Rayabharam
  2025-12-03 18:13   ` Nuno Das Neves
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Refactor memory region management functions from mshv_root_main.c into
mshv_regions.c for better modularity and code organization.

Adjust function calls and headers to use the new implementation. Improve
maintainability and separation of concerns in the mshv_root module.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/Makefile         |    2 
 drivers/hv/mshv_regions.c   |  175 +++++++++++++++++++++++++++++++++++++++++++
 drivers/hv/mshv_root.h      |   10 ++
 drivers/hv/mshv_root_main.c |  176 +++----------------------------------------
 4 files changed, 198 insertions(+), 165 deletions(-)
 create mode 100644 drivers/hv/mshv_regions.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 58b8d07639f3..46d4f4f1b252 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -14,7 +14,7 @@ hv_vmbus-y := vmbus_drv.o \
 hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
 mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
-	       mshv_root_hv_call.o mshv_portid_table.o
+	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
 mshv_vtl-y := mshv_vtl_main.o
 
 # Code that must be built-in
diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
new file mode 100644
index 000000000000..35b866670840
--- /dev/null
+++ b/drivers/hv/mshv_regions.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2025, Microsoft Corporation.
+ *
+ * Memory region management for mshv_root module.
+ *
+ * Authors: Microsoft Linux virtualization team
+ */
+
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+
+#include <asm/mshyperv.h>
+
+#include "mshv_root.h"
+
+struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
+					   u64 uaddr, u32 flags,
+					   bool is_mmio)
+{
+	struct mshv_mem_region *region;
+
+	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
+	if (!region)
+		return ERR_PTR(-ENOMEM);
+
+	region->nr_pages = nr_pages;
+	region->start_gfn = guest_pfn;
+	region->start_uaddr = uaddr;
+	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
+	if (flags & BIT(MSHV_SET_MEM_BIT_WRITABLE))
+		region->hv_map_flags |= HV_MAP_GPA_WRITABLE;
+	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
+		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
+
+	/* Note: large_pages flag populated when we pin the pages */
+	if (!is_mmio)
+		region->flags.range_pinned = true;
+
+	return region;
+}
+
+int mshv_region_share(struct mshv_mem_region *region)
+{
+	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
+
+	if (region->flags.large_pages)
+		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
+
+	return hv_call_modify_spa_host_access(region->partition->pt_id,
+			region->pages, region->nr_pages,
+			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
+			flags, true);
+}
+
+int mshv_region_unshare(struct mshv_mem_region *region)
+{
+	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
+
+	if (region->flags.large_pages)
+		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
+
+	return hv_call_modify_spa_host_access(region->partition->pt_id,
+			region->pages, region->nr_pages,
+			0,
+			flags, false);
+}
+
+static int mshv_region_remap_pages(struct mshv_mem_region *region,
+				   u32 map_flags,
+				   u64 page_offset, u64 page_count)
+{
+	if (page_offset + page_count > region->nr_pages)
+		return -EINVAL;
+
+	if (region->flags.large_pages)
+		map_flags |= HV_MAP_GPA_LARGE_PAGE;
+
+	return hv_call_map_gpa_pages(region->partition->pt_id,
+				     region->start_gfn + page_offset,
+				     page_count, map_flags,
+				     region->pages + page_offset);
+}
+
+int mshv_region_map(struct mshv_mem_region *region)
+{
+	u32 map_flags = region->hv_map_flags;
+
+	return mshv_region_remap_pages(region, map_flags,
+				       0, region->nr_pages);
+}
+
+static void mshv_region_invalidate_pages(struct mshv_mem_region *region,
+					 u64 page_offset, u64 page_count)
+{
+	if (region->flags.range_pinned)
+		unpin_user_pages(region->pages + page_offset, page_count);
+
+	memset(region->pages + page_offset, 0,
+	       page_count * sizeof(struct page *));
+}
+
+void mshv_region_invalidate(struct mshv_mem_region *region)
+{
+	mshv_region_invalidate_pages(region, 0, region->nr_pages);
+}
+
+int mshv_region_pin(struct mshv_mem_region *region)
+{
+	u64 done_count, nr_pages;
+	struct page **pages;
+	__u64 userspace_addr;
+	int ret;
+
+	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
+		pages = region->pages + done_count;
+		userspace_addr = region->start_uaddr +
+				 done_count * HV_HYP_PAGE_SIZE;
+		nr_pages = min(region->nr_pages - done_count,
+			       MSHV_PIN_PAGES_BATCH_SIZE);
+
+		/*
+		 * Pinning assuming 4k pages works for large pages too.
+		 * All page structs within the large page are returned.
+		 *
+		 * Pin requests are batched because pin_user_pages_fast
+		 * with the FOLL_LONGTERM flag does a large temporary
+		 * allocation of contiguous memory.
+		 */
+		ret = pin_user_pages_fast(userspace_addr, nr_pages,
+					  FOLL_WRITE | FOLL_LONGTERM,
+					  pages);
+		if (ret < 0)
+			goto release_pages;
+	}
+
+	if (PageHuge(region->pages[0]))
+		region->flags.large_pages = true;
+
+	return 0;
+
+release_pages:
+	mshv_region_invalidate_pages(region, 0, done_count);
+	return ret;
+}
+
+void mshv_region_destroy(struct mshv_mem_region *region)
+{
+	struct mshv_partition *partition = region->partition;
+	u32 unmap_flags = 0;
+	int ret;
+
+	hlist_del(&region->hnode);
+
+	if (mshv_partition_encrypted(partition)) {
+		ret = mshv_region_share(region);
+		if (ret) {
+			pt_err(partition,
+			       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
+			       ret);
+			return;
+		}
+	}
+
+	if (region->flags.large_pages)
+		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
+
+	/* ignore unmap failures and continue as process may be exiting */
+	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
+				region->nr_pages, unmap_flags);
+
+	mshv_region_invalidate(region);
+
+	vfree(region);
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 3eb815011b46..0366f416c2f0 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -312,4 +312,14 @@ extern struct mshv_root mshv_root;
 extern enum hv_scheduler_type hv_scheduler_type;
 extern u8 * __percpu *hv_synic_eventring_tail;
 
+struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
+					   u64 uaddr, u32 flags,
+					   bool is_mmio);
+int mshv_region_share(struct mshv_mem_region *region);
+int mshv_region_unshare(struct mshv_mem_region *region);
+int mshv_region_map(struct mshv_mem_region *region);
+void mshv_region_invalidate(struct mshv_mem_region *region);
+int mshv_region_pin(struct mshv_mem_region *region);
+void mshv_region_destroy(struct mshv_mem_region *region);
+
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index ec18984c3f2d..5dfb933da981 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1059,117 +1059,6 @@ static void mshv_async_hvcall_handler(void *data, u64 *status)
 	*status = partition->async_hypercall_status;
 }
 
-static int
-mshv_partition_region_share(struct mshv_mem_region *region)
-{
-	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
-
-	if (region->flags.large_pages)
-		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
-
-	return hv_call_modify_spa_host_access(region->partition->pt_id,
-			region->pages, region->nr_pages,
-			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
-			flags, true);
-}
-
-static int
-mshv_partition_region_unshare(struct mshv_mem_region *region)
-{
-	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
-
-	if (region->flags.large_pages)
-		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
-
-	return hv_call_modify_spa_host_access(region->partition->pt_id,
-			region->pages, region->nr_pages,
-			0,
-			flags, false);
-}
-
-static int
-mshv_region_remap_pages(struct mshv_mem_region *region, u32 map_flags,
-			u64 page_offset, u64 page_count)
-{
-	if (page_offset + page_count > region->nr_pages)
-		return -EINVAL;
-
-	if (region->flags.large_pages)
-		map_flags |= HV_MAP_GPA_LARGE_PAGE;
-
-	/* ask the hypervisor to map guest ram */
-	return hv_call_map_gpa_pages(region->partition->pt_id,
-				     region->start_gfn + page_offset,
-				     page_count, map_flags,
-				     region->pages + page_offset);
-}
-
-static int
-mshv_region_map(struct mshv_mem_region *region)
-{
-	u32 map_flags = region->hv_map_flags;
-
-	return mshv_region_remap_pages(region, map_flags,
-				       0, region->nr_pages);
-}
-
-static void
-mshv_region_invalidate_pages(struct mshv_mem_region *region,
-			     u64 page_offset, u64 page_count)
-{
-	if (region->flags.range_pinned)
-		unpin_user_pages(region->pages + page_offset, page_count);
-
-	memset(region->pages + page_offset, 0,
-	       page_count * sizeof(struct page *));
-}
-
-static void
-mshv_region_invalidate(struct mshv_mem_region *region)
-{
-	mshv_region_invalidate_pages(region, 0, region->nr_pages);
-}
-
-static int
-mshv_region_pin(struct mshv_mem_region *region)
-{
-	u64 done_count, nr_pages;
-	struct page **pages;
-	__u64 userspace_addr;
-	int ret;
-
-	for (done_count = 0; done_count < region->nr_pages; done_count += ret) {
-		pages = region->pages + done_count;
-		userspace_addr = region->start_uaddr +
-				 done_count * HV_HYP_PAGE_SIZE;
-		nr_pages = min(region->nr_pages - done_count,
-			       MSHV_PIN_PAGES_BATCH_SIZE);
-
-		/*
-		 * Pinning assuming 4k pages works for large pages too.
-		 * All page structs within the large page are returned.
-		 *
-		 * Pin requests are batched because pin_user_pages_fast
-		 * with the FOLL_LONGTERM flag does a large temporary
-		 * allocation of contiguous memory.
-		 */
-		ret = pin_user_pages_fast(userspace_addr, nr_pages,
-					  FOLL_WRITE | FOLL_LONGTERM,
-					  pages);
-		if (ret < 0)
-			goto release_pages;
-	}
-
-	if (PageHuge(region->pages[0]))
-		region->flags.large_pages = true;
-
-	return 0;
-
-release_pages:
-	mshv_region_invalidate_pages(region, 0, done_count);
-	return ret;
-}
-
 static struct mshv_mem_region *
 mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
 {
@@ -1193,7 +1082,7 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 					struct mshv_mem_region **regionpp,
 					bool is_mmio)
 {
-	struct mshv_mem_region *region, *rg;
+	struct mshv_mem_region *rg;
 	u64 nr_pages = HVPFN_DOWN(mem->size);
 
 	/* Reject overlapping regions */
@@ -1205,26 +1094,15 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 		return -EEXIST;
 	}
 
-	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
-	if (!region)
-		return -ENOMEM;
-
-	region->nr_pages = nr_pages;
-	region->start_gfn = mem->guest_pfn;
-	region->start_uaddr = mem->userspace_addr;
-	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
-	if (mem->flags & BIT(MSHV_SET_MEM_BIT_WRITABLE))
-		region->hv_map_flags |= HV_MAP_GPA_WRITABLE;
-	if (mem->flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
-		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
-
-	/* Note: large_pages flag populated when we pin the pages */
-	if (!is_mmio)
-		region->flags.range_pinned = true;
+	rg = mshv_region_create(mem->guest_pfn, nr_pages,
+				mem->userspace_addr, mem->flags,
+				is_mmio);
+	if (IS_ERR(rg))
+		return PTR_ERR(rg);
 
-	region->partition = partition;
+	rg->partition = partition;
 
-	*regionpp = region;
+	*regionpp = rg;
 
 	return 0;
 }
@@ -1262,7 +1140,7 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
 	 * access to guest memory regions.
 	 */
 	if (mshv_partition_encrypted(partition)) {
-		ret = mshv_partition_region_unshare(region);
+		ret = mshv_region_unshare(region);
 		if (ret) {
 			pt_err(partition,
 			       "Failed to unshare memory region (guest_pfn: %llu): %d\n",
@@ -1275,7 +1153,7 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
 	if (ret && mshv_partition_encrypted(partition)) {
 		int shrc;
 
-		shrc = mshv_partition_region_share(region);
+		shrc = mshv_region_share(region);
 		if (!shrc)
 			goto invalidate_region;
 
@@ -1356,36 +1234,6 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	return ret;
 }
 
-static void mshv_partition_destroy_region(struct mshv_mem_region *region)
-{
-	struct mshv_partition *partition = region->partition;
-	u32 unmap_flags = 0;
-	int ret;
-
-	hlist_del(&region->hnode);
-
-	if (mshv_partition_encrypted(partition)) {
-		ret = mshv_partition_region_share(region);
-		if (ret) {
-			pt_err(partition,
-			       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
-			       ret);
-			return;
-		}
-	}
-
-	if (region->flags.large_pages)
-		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
-
-	/* ignore unmap failures and continue as process may be exiting */
-	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
-				region->nr_pages, unmap_flags);
-
-	mshv_region_invalidate(region);
-
-	vfree(region);
-}
-
 /* Called for unmapping both the guest ram and the mmio space */
 static long
 mshv_unmap_user_memory(struct mshv_partition *partition,
@@ -1406,7 +1254,7 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	    region->nr_pages != HVPFN_DOWN(mem.size))
 		return -EINVAL;
 
-	mshv_partition_destroy_region(region);
+	mshv_region_destroy(region);
 
 	return 0;
 }
@@ -1810,7 +1658,7 @@ static void destroy_partition(struct mshv_partition *partition)
 
 	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
 				  hnode)
-		mshv_partition_destroy_region(region);
+		mshv_region_destroy(region);
 
 	/* Withdraw and free all pages we deposited */
 	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (2 preceding siblings ...)
  2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
@ 2025-11-26  2:09 ` Stanislav Kinsburskii
  2025-11-27 10:59   ` kernel test robot
                     ` (3 more replies)
  2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

The previous code assumed that if a region's first page was huge, the
entire region consisted of huge pages and stored this in a large_pages
flag. This premise is incorrect not only for movable regions (where
pages can be split and merged on invalidate callbacks or page faults),
but even for pinned regions: THPs can be split and merged during
allocation, so a large, pinned region may contain a mix of huge and
regular pages.

This change removes the large_pages flag and replaces region-wide
assumptions with per-chunk inspection of the actual page size when
mapping, unmapping, sharing, and unsharing. This makes huge page
handling correct for mixed-page regions and avoids relying on stale
metadata that can easily become invalid as memory is remapped.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++------
 drivers/hv/mshv_root.h    |    3 -
 2 files changed, 184 insertions(+), 32 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 35b866670840..d535d2e3e811 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -14,6 +14,124 @@
 
 #include "mshv_root.h"
 
+/**
+ * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
+ *                             in a region.
+ * @region     : Pointer to the memory region structure.
+ * @flags      : Flags to pass to the handler.
+ * @page_offset: Offset into the region's pages array to start processing.
+ * @page_count : Number of pages to process.
+ * @handler    : Callback function to handle the chunk.
+ *
+ * This function scans the region's pages starting from @page_offset,
+ * checking for contiguous present pages of the same size (normal or huge).
+ * It invokes @handler for the chunk of contiguous pages found. Returns the
+ * number of pages handled, or a negative error code if the first page is
+ * not present or the handler fails.
+ *
+ * Note: The @handler callback must be able to handle both normal and huge
+ * pages.
+ *
+ * Return: Number of pages handled, or negative error code.
+ */
+static long mshv_region_process_chunk(struct mshv_mem_region *region,
+				      u32 flags,
+				      u64 page_offset, u64 page_count,
+				      int (*handler)(struct mshv_mem_region *region,
+						     u32 flags,
+						     u64 page_offset,
+						     u64 page_count))
+{
+	u64 count, stride;
+	unsigned int page_order;
+	struct page *page;
+	int ret;
+
+	page = region->pages[page_offset];
+	if (!page)
+		return -EINVAL;
+
+	page_order = folio_order(page_folio(page));
+	/* 1G huge pages aren't supported by the hypercalls */
+	if (page_order == PUD_ORDER)
+		return -EINVAL;
+
+	stride = 1 << page_order;
+
+	/* Start at stride since the first page is validated */
+	for (count = stride; count < page_count; count += stride) {
+		page = region->pages[page_offset + count];
+
+		/* Break if current page is not present */
+		if (!page)
+			break;
+
+		/* Break if page size changes */
+		if (page_order != folio_order(page_folio(page)))
+			break;
+	}
+
+	ret = handler(region, flags, page_offset, count);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+/**
+ * mshv_region_process_range - Processes a range of memory pages in a
+ *                             region.
+ * @region     : Pointer to the memory region structure.
+ * @flags      : Flags to pass to the handler.
+ * @page_offset: Offset into the region's pages array to start processing.
+ * @page_count : Number of pages to process.
+ * @handler    : Callback function to handle each chunk of contiguous
+ *               pages.
+ *
+ * Iterates over the specified range of pages in @region, skipping
+ * non-present pages. For each contiguous chunk of present pages, invokes
+ * @handler via mshv_region_process_chunk.
+ *
+ * Note: The @handler callback must be able to handle both normal and huge
+ * pages.
+ *
+ * Returns 0 on success, or a negative error code on failure.
+ */
+static int mshv_region_process_range(struct mshv_mem_region *region,
+				     u32 flags,
+				     u64 page_offset, u64 page_count,
+				     int (*handler)(struct mshv_mem_region *region,
+						    u32 flags,
+						    u64 page_offset,
+						    u64 page_count))
+{
+	long ret;
+
+	if (page_offset + page_count > region->nr_pages)
+		return -EINVAL;
+
+	while (page_count) {
+		/* Skip non-present pages */
+		if (!region->pages[page_offset]) {
+			page_offset++;
+			page_count--;
+			continue;
+		}
+
+		ret = mshv_region_process_chunk(region, flags,
+						page_offset,
+						page_count,
+						handler);
+		if (ret < 0)
+			return ret;
+
+		page_offset += ret;
+		page_count -= ret;
+	}
+
+	return 0;
+}
+
 struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 					   u64 uaddr, u32 flags,
 					   bool is_mmio)
@@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
 		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
 
-	/* Note: large_pages flag populated when we pin the pages */
 	if (!is_mmio)
 		region->flags.range_pinned = true;
 
 	return region;
 }
 
+static int mshv_region_chunk_share(struct mshv_mem_region *region,
+				   u32 flags,
+				   u64 page_offset, u64 page_count)
+{
+	if (PageTransCompound(region->pages[page_offset]))
+		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
+
+	return hv_call_modify_spa_host_access(region->partition->pt_id,
+					      region->pages + page_offset,
+					      page_count,
+					      HV_MAP_GPA_READABLE |
+					      HV_MAP_GPA_WRITABLE,
+					      flags, true);
+}
+
 int mshv_region_share(struct mshv_mem_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
 
-	if (region->flags.large_pages)
+	return mshv_region_process_range(region, flags,
+					 0, region->nr_pages,
+					 mshv_region_chunk_share);
+}
+
+static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
+				     u32 flags,
+				     u64 page_offset, u64 page_count)
+{
+	if (PageTransCompound(region->pages[page_offset]))
 		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
 
 	return hv_call_modify_spa_host_access(region->partition->pt_id,
-			region->pages, region->nr_pages,
-			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
-			flags, true);
+					      region->pages + page_offset,
+					      page_count, 0,
+					      flags, false);
 }
 
 int mshv_region_unshare(struct mshv_mem_region *region)
 {
 	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
 
-	if (region->flags.large_pages)
-		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
-
-	return hv_call_modify_spa_host_access(region->partition->pt_id,
-			region->pages, region->nr_pages,
-			0,
-			flags, false);
+	return mshv_region_process_range(region, flags,
+					 0, region->nr_pages,
+					 mshv_region_chunk_unshare);
 }
 
-static int mshv_region_remap_pages(struct mshv_mem_region *region,
-				   u32 map_flags,
+static int mshv_region_chunk_remap(struct mshv_mem_region *region,
+				   u32 flags,
 				   u64 page_offset, u64 page_count)
 {
-	if (page_offset + page_count > region->nr_pages)
-		return -EINVAL;
-
-	if (region->flags.large_pages)
-		map_flags |= HV_MAP_GPA_LARGE_PAGE;
+	if (PageTransCompound(region->pages[page_offset]))
+		flags |= HV_MAP_GPA_LARGE_PAGE;
 
 	return hv_call_map_gpa_pages(region->partition->pt_id,
 				     region->start_gfn + page_offset,
-				     page_count, map_flags,
+				     page_count, flags,
 				     region->pages + page_offset);
 }
 
+static int mshv_region_remap_pages(struct mshv_mem_region *region,
+				   u32 map_flags,
+				   u64 page_offset, u64 page_count)
+{
+	return mshv_region_process_range(region, map_flags,
+					 page_offset, page_count,
+					 mshv_region_chunk_remap);
+}
+
 int mshv_region_map(struct mshv_mem_region *region)
 {
 	u32 map_flags = region->hv_map_flags;
@@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
 			goto release_pages;
 	}
 
-	if (PageHuge(region->pages[0]))
-		region->flags.large_pages = true;
-
 	return 0;
 
 release_pages:
@@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
 	return ret;
 }
 
+static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
+				   u32 flags,
+				   u64 page_offset, u64 page_count)
+{
+	if (PageTransCompound(region->pages[page_offset]))
+		flags |= HV_UNMAP_GPA_LARGE_PAGE;
+
+	return hv_call_unmap_gpa_pages(region->partition->pt_id,
+				       region->start_gfn + page_offset,
+				       page_count, 0);
+}
+
+static int mshv_region_unmap(struct mshv_mem_region *region)
+{
+	return mshv_region_process_range(region, 0,
+					 0, region->nr_pages,
+					 mshv_region_chunk_unmap);
+}
+
 void mshv_region_destroy(struct mshv_mem_region *region)
 {
 	struct mshv_partition *partition = region->partition;
-	u32 unmap_flags = 0;
 	int ret;
 
 	hlist_del(&region->hnode);
@@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region *region)
 		}
 	}
 
-	if (region->flags.large_pages)
-		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
-
-	/* ignore unmap failures and continue as process may be exiting */
-	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
-				region->nr_pages, unmap_flags);
+	mshv_region_unmap(region);
 
 	mshv_region_invalidate(region);
 
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 0366f416c2f0..ff3374f13691 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -77,9 +77,8 @@ struct mshv_mem_region {
 	u64 start_uaddr;
 	u32 hv_map_flags;
 	struct {
-		u64 large_pages:  1; /* 2MiB */
 		u64 range_pinned: 1;
-		u64 reserved:	 62;
+		u64 reserved:	 63;
 	} flags;
 	struct mshv_partition *partition;
 	struct page *pages[];



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (3 preceding siblings ...)
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
@ 2025-11-26  2:09 ` Stanislav Kinsburskii
  2025-12-01 15:06   ` Anirudh Rayabharam
                     ` (2 more replies)
  2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
  2025-11-26  2:09 ` [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
  6 siblings, 3 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Refactor region overlap check in mshv_partition_create_region to use
mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
manual iteration.

This is a cleaner approach that leverages existing functionality to
accurately detect overlapping memory regions.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 5dfb933da981..ae600b927f49 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	u64 nr_pages = HVPFN_DOWN(mem->size);
 
 	/* Reject overlapping regions */
-	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
-		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
-		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
-			continue;
-
+	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
+	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
 		return -EEXIST;
-	}
 
 	rg = mshv_region_create(mem->guest_pfn, nr_pages,
 				mem->userspace_addr, mem->flags,



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (4 preceding siblings ...)
  2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
@ 2025-11-26  2:09 ` Stanislav Kinsburskii
  2025-12-04 16:48   ` Michael Kelley
  2025-11-26  2:09 ` [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii
  6 siblings, 1 reply; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Introduce kref-based reference counting and spinlock protection for
memory regions in Hyper-V partition management. This change improves
memory region lifecycle management and ensures thread-safe access to the
region list.

Also improves the check for overlapped memory regions during region
creation, preventing duplicate or conflicting mappings.

Previously, the regions list was protected by the partition mutex.
However, this approach is too heavy for frequent fault and invalidation
operations. Finer grained locking is now used to improve efficiency and
concurrency.

This is a precursor to supporting movable memory regions. Fault and
invalidation handling for movable regions will require safe traversal of
the region list and holding a region reference while performing
invalidation or fault operations.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/mshv_regions.c   |   19 ++++++++++++++++---
 drivers/hv/mshv_root.h      |    6 +++++-
 drivers/hv/mshv_root_main.c |   34 ++++++++++++++++++++++++++--------
 3 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index d535d2e3e811..6450a7ed8493 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -7,6 +7,7 @@
  * Authors: Microsoft Linux virtualization team
  */
 
+#include <linux/kref.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
 
@@ -154,6 +155,8 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 	if (!is_mmio)
 		region->flags.range_pinned = true;
 
+	kref_init(&region->refcount);
+
 	return region;
 }
 
@@ -303,13 +306,13 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
 					 mshv_region_chunk_unmap);
 }
 
-void mshv_region_destroy(struct mshv_mem_region *region)
+static void mshv_region_destroy(struct kref *ref)
 {
+	struct mshv_mem_region *region =
+		container_of(ref, struct mshv_mem_region, refcount);
 	struct mshv_partition *partition = region->partition;
 	int ret;
 
-	hlist_del(&region->hnode);
-
 	if (mshv_partition_encrypted(partition)) {
 		ret = mshv_region_share(region);
 		if (ret) {
@@ -326,3 +329,13 @@ void mshv_region_destroy(struct mshv_mem_region *region)
 
 	vfree(region);
 }
+
+void mshv_region_put(struct mshv_mem_region *region)
+{
+	kref_put(&region->refcount, mshv_region_destroy);
+}
+
+int mshv_region_get(struct mshv_mem_region *region)
+{
+	return kref_get_unless_zero(&region->refcount);
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index ff3374f13691..4249534ba900 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -72,6 +72,7 @@ do { \
 
 struct mshv_mem_region {
 	struct hlist_node hnode;
+	struct kref refcount;
 	u64 nr_pages;
 	u64 start_gfn;
 	u64 start_uaddr;
@@ -97,6 +98,8 @@ struct mshv_partition {
 	u64 pt_id;
 	refcount_t pt_ref_count;
 	struct mutex pt_mutex;
+
+	spinlock_t pt_mem_regions_lock;
 	struct hlist_head pt_mem_regions; // not ordered
 
 	u32 pt_vp_count;
@@ -319,6 +322,7 @@ int mshv_region_unshare(struct mshv_mem_region *region);
 int mshv_region_map(struct mshv_mem_region *region);
 void mshv_region_invalidate(struct mshv_mem_region *region);
 int mshv_region_pin(struct mshv_mem_region *region);
-void mshv_region_destroy(struct mshv_mem_region *region);
+void mshv_region_put(struct mshv_mem_region *region);
+int mshv_region_get(struct mshv_mem_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index ae600b927f49..1ef2a28beb17 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1086,9 +1086,13 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	u64 nr_pages = HVPFN_DOWN(mem->size);
 
 	/* Reject overlapping regions */
+	spin_lock(&partition->pt_mem_regions_lock);
 	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
-	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
+	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1)) {
+		spin_unlock(&partition->pt_mem_regions_lock);
 		return -EEXIST;
+	}
+	spin_unlock(&partition->pt_mem_regions_lock);
 
 	rg = mshv_region_create(mem->guest_pfn, nr_pages,
 				mem->userspace_addr, mem->flags,
@@ -1220,8 +1224,9 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	if (ret)
 		goto errout;
 
-	/* Install the new region */
+	spin_lock(&partition->pt_mem_regions_lock);
 	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
+	spin_unlock(&partition->pt_mem_regions_lock);
 
 	return 0;
 
@@ -1240,17 +1245,27 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
 	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
 		return -EINVAL;
 
+	spin_lock(&partition->pt_mem_regions_lock);
+
 	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
-	if (!region)
-		return -EINVAL;
+	if (!region) {
+		spin_unlock(&partition->pt_mem_regions_lock);
+		return -ENOENT;
+	}
 
 	/* Paranoia check */
 	if (region->start_uaddr != mem.userspace_addr ||
 	    region->start_gfn != mem.guest_pfn ||
-	    region->nr_pages != HVPFN_DOWN(mem.size))
+	    region->nr_pages != HVPFN_DOWN(mem.size)) {
+		spin_unlock(&partition->pt_mem_regions_lock);
 		return -EINVAL;
+	}
+
+	hlist_del(&region->hnode);
 
-	mshv_region_destroy(region);
+	spin_unlock(&partition->pt_mem_regions_lock);
+
+	mshv_region_put(region);
 
 	return 0;
 }
@@ -1653,8 +1668,10 @@ static void destroy_partition(struct mshv_partition *partition)
 	remove_partition(partition);
 
 	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
-				  hnode)
-		mshv_region_destroy(region);
+				  hnode) {
+		hlist_del(&region->hnode);
+		mshv_region_put(region);
+	}
 
 	/* Withdraw and free all pages we deposited */
 	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
@@ -1852,6 +1869,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
 
 	INIT_HLIST_HEAD(&partition->pt_devices);
 
+	spin_lock_init(&partition->pt_mem_regions_lock);
 	INIT_HLIST_HEAD(&partition->pt_mem_regions);
 
 	mshv_eventfd_init(partition);



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions
  2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
                   ` (5 preceding siblings ...)
  2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
@ 2025-11-26  2:09 ` Stanislav Kinsburskii
  6 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-11-26  2:09 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui; +Cc: linux-hyperv, linux-kernel

Introduce support for movable memory regions in the Hyper-V root partition
driver to improve memory management flexibility and enable advanced use
cases such as dynamic memory remapping.

Mirror the address space between the Linux root partition and guest VMs
using HMM. The root partition owns the memory, while guest VMs act as
devices with page tables managed via hypercalls. MSHV handles VP intercepts
by invoking hmm_range_fault() and updating SLAT entries. When memory is
reclaimed, HMM invalidates the relevant regions, prompting MSHV to clear
SLAT entries; guest VMs will fault again on access.

Integrate mmu_interval_notifier for movable regions, implement handlers for
HMM faults and memory invalidation, and update memory region mapping logic
to support movable regions.

While MMU notifiers are commonly used in virtualization drivers, this
implementation leverages HMM (Heterogeneous Memory Management) for its
specialized functionality. HMM provides a framework for mirroring,
invalidation, and fault handling, reducing boilerplate and improving
maintainability compared to generic MMU notifiers.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/Kconfig          |    2 
 drivers/hv/mshv_regions.c   |  215 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/hv/mshv_root.h      |   17 +++
 drivers/hv/mshv_root_main.c |  139 +++++++++++++++++++++++-----
 4 files changed, 343 insertions(+), 30 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index d4a8d349200c..7937ac0cbd0f 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -76,6 +76,8 @@ config MSHV_ROOT
 	depends on PAGE_SIZE_4KB
 	select EVENTFD
 	select VIRT_XFER_TO_GUEST_WORK
+	select HMM_MIRROR
+	select MMU_NOTIFIER
 	default n
 	help
 	  Select this option to enable support for booting and running as root
diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
index 6450a7ed8493..d7b0f012c3be 100644
--- a/drivers/hv/mshv_regions.c
+++ b/drivers/hv/mshv_regions.c
@@ -7,6 +7,8 @@
  * Authors: Microsoft Linux virtualization team
  */
 
+#include <linux/hmm.h>
+#include <linux/hyperv.h>
 #include <linux/kref.h>
 #include <linux/mm.h>
 #include <linux/vmalloc.h>
@@ -15,6 +17,8 @@
 
 #include "mshv_root.h"
 
+#define MSHV_MAP_FAULT_IN_PAGES				PTRS_PER_PMD
+
 /**
  * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
  *                             in a region.
@@ -152,9 +156,6 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
 	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
 		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
 
-	if (!is_mmio)
-		region->flags.range_pinned = true;
-
 	kref_init(&region->refcount);
 
 	return region;
@@ -239,7 +240,7 @@ int mshv_region_map(struct mshv_mem_region *region)
 static void mshv_region_invalidate_pages(struct mshv_mem_region *region,
 					 u64 page_offset, u64 page_count)
 {
-	if (region->flags.range_pinned)
+	if (region->type == MSHV_REGION_TYPE_MEM_PINNED)
 		unpin_user_pages(region->pages + page_offset, page_count);
 
 	memset(region->pages + page_offset, 0,
@@ -313,6 +314,9 @@ static void mshv_region_destroy(struct kref *ref)
 	struct mshv_partition *partition = region->partition;
 	int ret;
 
+	if (region->type == MSHV_REGION_TYPE_MEM_MOVABLE)
+		mshv_region_movable_fini(region);
+
 	if (mshv_partition_encrypted(partition)) {
 		ret = mshv_region_share(region);
 		if (ret) {
@@ -339,3 +343,206 @@ int mshv_region_get(struct mshv_mem_region *region)
 {
 	return kref_get_unless_zero(&region->refcount);
 }
+
+/**
+ * mshv_region_hmm_fault_and_lock - Handle HMM faults and lock the memory region
+ * @region: Pointer to the memory region structure
+ * @range: Pointer to the HMM range structure
+ *
+ * This function performs the following steps:
+ * 1. Reads the notifier sequence for the HMM range.
+ * 2. Acquires a read lock on the memory map.
+ * 3. Handles HMM faults for the specified range.
+ * 4. Releases the read lock on the memory map.
+ * 5. If successful, locks the memory region mutex.
+ * 6. Verifies if the notifier sequence has changed during the operation.
+ *    If it has, releases the mutex and returns -EBUSY to match with
+ *    hmm_range_fault() return code for repeating.
+ *
+ * Return: 0 on success, a negative error code otherwise.
+ */
+static int mshv_region_hmm_fault_and_lock(struct mshv_mem_region *region,
+					  struct hmm_range *range)
+{
+	int ret;
+
+	range->notifier_seq = mmu_interval_read_begin(range->notifier);
+	mmap_read_lock(region->mni.mm);
+	ret = hmm_range_fault(range);
+	mmap_read_unlock(region->mni.mm);
+	if (ret)
+		return ret;
+
+	mutex_lock(&region->mutex);
+
+	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+		mutex_unlock(&region->mutex);
+		cond_resched();
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/**
+ * mshv_region_range_fault - Handle memory range faults for a given region.
+ * @region: Pointer to the memory region structure.
+ * @page_offset: Offset of the page within the region.
+ * @page_count: Number of pages to handle.
+ *
+ * This function resolves memory faults for a specified range of pages
+ * within a memory region. It uses HMM (Heterogeneous Memory Management)
+ * to fault in the required pages and updates the region's page array.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int mshv_region_range_fault(struct mshv_mem_region *region,
+				   u64 page_offset, u64 page_count)
+{
+	struct hmm_range range = {
+		.notifier = &region->mni,
+		.default_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
+	};
+	unsigned long *pfns;
+	int ret;
+	u64 i;
+
+	pfns = kmalloc_array(page_count, sizeof(unsigned long), GFP_KERNEL);
+	if (!pfns)
+		return -ENOMEM;
+
+	range.hmm_pfns = pfns;
+	range.start = region->start_uaddr + page_offset * HV_HYP_PAGE_SIZE;
+	range.end = range.start + page_count * HV_HYP_PAGE_SIZE;
+
+	do {
+		ret = mshv_region_hmm_fault_and_lock(region, &range);
+	} while (ret == -EBUSY);
+
+	if (ret)
+		goto out;
+
+	for (i = 0; i < page_count; i++)
+		region->pages[page_offset + i] = hmm_pfn_to_page(pfns[i]);
+
+	ret = mshv_region_remap_pages(region, region->hv_map_flags,
+				      page_offset, page_count);
+
+	mutex_unlock(&region->mutex);
+out:
+	kfree(pfns);
+	return ret;
+}
+
+bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn)
+{
+	u64 page_offset, page_count;
+	int ret;
+
+	/* Align the page offset to the nearest MSHV_MAP_FAULT_IN_PAGES. */
+	page_offset = ALIGN_DOWN(gfn - region->start_gfn,
+				 MSHV_MAP_FAULT_IN_PAGES);
+
+	/* Map more pages than requested to reduce the number of faults. */
+	page_count = min(region->nr_pages - page_offset,
+			 MSHV_MAP_FAULT_IN_PAGES);
+
+	ret = mshv_region_range_fault(region, page_offset, page_count);
+
+	WARN_ONCE(ret,
+		  "p%llu: GPA intercept failed: region %#llx-%#llx, gfn %#llx, page_offset %llu, page_count %llu\n",
+		  region->partition->pt_id, region->start_uaddr,
+		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
+		  gfn, page_offset, page_count);
+
+	return !ret;
+}
+
+/**
+ * mshv_region_interval_invalidate - Invalidate a range of memory region
+ * @mni: Pointer to the mmu_interval_notifier structure
+ * @range: Pointer to the mmu_notifier_range structure
+ * @cur_seq: Current sequence number for the interval notifier
+ *
+ * This function invalidates a memory region by remapping its pages with
+ * no access permissions. It locks the region's mutex to ensure thread safety
+ * and updates the sequence number for the interval notifier. If the range
+ * is blockable, it uses a blocking lock; otherwise, it attempts a non-blocking
+ * lock and returns false if unsuccessful.
+ *
+ * NOTE: Failure to invalidate a region is a serious error, as the pages will
+ * be considered freed while they are still mapped by the hypervisor.
+ * Any attempt to access such pages will likely crash the system.
+ *
+ * Return: true if the region was successfully invalidated, false otherwise.
+ */
+static bool mshv_region_interval_invalidate(struct mmu_interval_notifier *mni,
+					    const struct mmu_notifier_range *range,
+					    unsigned long cur_seq)
+{
+	struct mshv_mem_region *region = container_of(mni,
+						      struct mshv_mem_region,
+						      mni);
+	u64 page_offset, page_count;
+	unsigned long mstart, mend;
+	int ret = -EPERM;
+
+	if (mmu_notifier_range_blockable(range))
+		mutex_lock(&region->mutex);
+	else if (!mutex_trylock(&region->mutex))
+		goto out_fail;
+
+	mmu_interval_set_seq(mni, cur_seq);
+
+	mstart = max(range->start, region->start_uaddr);
+	mend = min(range->end, region->start_uaddr +
+		   (region->nr_pages << HV_HYP_PAGE_SHIFT));
+
+	page_offset = HVPFN_DOWN(mstart - region->start_uaddr);
+	page_count = HVPFN_DOWN(mend - mstart);
+
+	ret = mshv_region_remap_pages(region, HV_MAP_GPA_NO_ACCESS,
+				      page_offset, page_count);
+	if (ret)
+		goto out_fail;
+
+	mshv_region_invalidate_pages(region, page_offset, page_count);
+
+	mutex_unlock(&region->mutex);
+
+	return true;
+
+out_fail:
+	WARN_ONCE(ret,
+		  "Failed to invalidate region %#llx-%#llx (range %#lx-%#lx, event: %u, pages %#llx-%#llx, mm: %#llx): %d\n",
+		  region->start_uaddr,
+		  region->start_uaddr + (region->nr_pages << HV_HYP_PAGE_SHIFT),
+		  range->start, range->end, range->event,
+		  page_offset, page_offset + page_count - 1, (u64)range->mm, ret);
+	return false;
+}
+
+static const struct mmu_interval_notifier_ops mshv_region_mni_ops = {
+	.invalidate = mshv_region_interval_invalidate,
+};
+
+void mshv_region_movable_fini(struct mshv_mem_region *region)
+{
+	mmu_interval_notifier_remove(&region->mni);
+}
+
+bool mshv_region_movable_init(struct mshv_mem_region *region)
+{
+	int ret;
+
+	ret = mmu_interval_notifier_insert(&region->mni, current->mm,
+					   region->start_uaddr,
+					   region->nr_pages << HV_HYP_PAGE_SHIFT,
+					   &mshv_region_mni_ops);
+	if (ret)
+		return false;
+
+	mutex_init(&region->mutex);
+
+	return true;
+}
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 4249534ba900..9cd76076d490 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -15,6 +15,7 @@
 #include <linux/hashtable.h>
 #include <linux/dev_printk.h>
 #include <linux/build_bug.h>
+#include <linux/mmu_notifier.h>
 #include <uapi/linux/mshv.h>
 
 /*
@@ -70,6 +71,12 @@ do { \
 #define vp_info(v, fmt, ...)	vp_devprintk(info, v, fmt, ##__VA_ARGS__)
 #define vp_dbg(v, fmt, ...)	vp_devprintk(dbg, v, fmt, ##__VA_ARGS__)
 
+enum mshv_region_type {
+	MSHV_REGION_TYPE_MEM_PINNED,
+	MSHV_REGION_TYPE_MEM_MOVABLE,
+	MSHV_REGION_TYPE_MMIO
+};
+
 struct mshv_mem_region {
 	struct hlist_node hnode;
 	struct kref refcount;
@@ -77,11 +84,10 @@ struct mshv_mem_region {
 	u64 start_gfn;
 	u64 start_uaddr;
 	u32 hv_map_flags;
-	struct {
-		u64 range_pinned: 1;
-		u64 reserved:	 63;
-	} flags;
 	struct mshv_partition *partition;
+	enum mshv_region_type type;
+	struct mmu_interval_notifier mni;
+	struct mutex mutex;	/* protects region pages remapping */
 	struct page *pages[];
 };
 
@@ -324,5 +330,8 @@ void mshv_region_invalidate(struct mshv_mem_region *region);
 int mshv_region_pin(struct mshv_mem_region *region);
 void mshv_region_put(struct mshv_mem_region *region);
 int mshv_region_get(struct mshv_mem_region *region);
+bool mshv_region_handle_gfn_fault(struct mshv_mem_region *region, u64 gfn);
+void mshv_region_movable_fini(struct mshv_mem_region *region);
+bool mshv_region_movable_init(struct mshv_mem_region *region);
 
 #endif /* _MSHV_ROOT_H_ */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 1ef2a28beb17..6003fb4477bc 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -594,14 +594,98 @@ static long mshv_run_vp_with_root_scheduler(struct mshv_vp *vp)
 static_assert(sizeof(struct hv_message) <= MSHV_RUN_VP_BUF_SZ,
 	      "sizeof(struct hv_message) must not exceed MSHV_RUN_VP_BUF_SZ");
 
+static struct mshv_mem_region *
+mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
+{
+	struct mshv_mem_region *region;
+
+	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
+		if (gfn >= region->start_gfn &&
+		    gfn < region->start_gfn + region->nr_pages)
+			return region;
+	}
+
+	return NULL;
+}
+
+#ifdef CONFIG_X86_64
+static struct mshv_mem_region *
+mshv_partition_region_by_gfn_get(struct mshv_partition *p, u64 gfn)
+{
+	struct mshv_mem_region *region;
+
+	spin_lock(&p->pt_mem_regions_lock);
+	region = mshv_partition_region_by_gfn(p, gfn);
+	if (!region || !mshv_region_get(region)) {
+		spin_unlock(&p->pt_mem_regions_lock);
+		return NULL;
+	}
+	spin_unlock(&p->pt_mem_regions_lock);
+
+	return region;
+}
+
+/**
+ * mshv_handle_gpa_intercept - Handle GPA (Guest Physical Address) intercepts.
+ * @vp: Pointer to the virtual processor structure.
+ *
+ * This function processes GPA intercepts by identifying the memory region
+ * corresponding to the intercepted GPA, aligning the page offset, and
+ * mapping the required pages. It ensures that the region is valid and
+ * handles faults efficiently by mapping multiple pages at once.
+ *
+ * Return: true if the intercept was handled successfully, false otherwise.
+ */
+static bool mshv_handle_gpa_intercept(struct mshv_vp *vp)
+{
+	struct mshv_partition *p = vp->vp_partition;
+	struct mshv_mem_region *region;
+	struct hv_x64_memory_intercept_message *msg;
+	bool ret;
+	u64 gfn;
+
+	msg = (struct hv_x64_memory_intercept_message *)
+		vp->vp_intercept_msg_page->u.payload;
+
+	gfn = HVPFN_DOWN(msg->guest_physical_address);
+
+	region = mshv_partition_region_by_gfn_get(p, gfn);
+	if (!region)
+		return false;
+
+	/* Only movable memory ranges are supported for GPA intercepts */
+	if (region->type == MSHV_REGION_TYPE_MEM_MOVABLE)
+		ret = mshv_region_handle_gfn_fault(region, gfn);
+	else
+		ret = false;
+
+	mshv_region_put(region);
+
+	return ret;
+}
+#else  /* CONFIG_X86_64 */
+static bool mshv_handle_gpa_intercept(struct mshv_vp *vp) { return false; }
+#endif /* CONFIG_X86_64 */
+
+static bool mshv_vp_handle_intercept(struct mshv_vp *vp)
+{
+	switch (vp->vp_intercept_msg_page->header.message_type) {
+	case HVMSG_GPA_INTERCEPT:
+		return mshv_handle_gpa_intercept(vp);
+	}
+	return false;
+}
+
 static long mshv_vp_ioctl_run_vp(struct mshv_vp *vp, void __user *ret_msg)
 {
 	long rc;
 
-	if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
-		rc = mshv_run_vp_with_root_scheduler(vp);
-	else
-		rc = mshv_run_vp_with_hyp_scheduler(vp);
+	do {
+		if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
+			rc = mshv_run_vp_with_root_scheduler(vp);
+		else
+			rc = mshv_run_vp_with_hyp_scheduler(vp);
+	} while (rc == 0 && mshv_vp_handle_intercept(vp));
 
 	if (rc)
 		return rc;
@@ -1059,20 +1143,6 @@ static void mshv_async_hvcall_handler(void *data, u64 *status)
 	*status = partition->async_hypercall_status;
 }
 
-static struct mshv_mem_region *
-mshv_partition_region_by_gfn(struct mshv_partition *partition, u64 gfn)
-{
-	struct mshv_mem_region *region;
-
-	hlist_for_each_entry(region, &partition->pt_mem_regions, hnode) {
-		if (gfn >= region->start_gfn &&
-		    gfn < region->start_gfn + region->nr_pages)
-			return region;
-	}
-
-	return NULL;
-}
-
 /*
  * NB: caller checks and makes sure mem->size is page aligned
  * Returns: 0 with regionpp updated on success, or -errno
@@ -1100,6 +1170,14 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
 	if (IS_ERR(rg))
 		return PTR_ERR(rg);
 
+	if (is_mmio)
+		rg->type = MSHV_REGION_TYPE_MMIO;
+	else if (mshv_partition_encrypted(partition) ||
+		 !mshv_region_movable_init(rg))
+		rg->type = MSHV_REGION_TYPE_MEM_PINNED;
+	else
+		rg->type = MSHV_REGION_TYPE_MEM_MOVABLE;
+
 	rg->partition = partition;
 
 	*regionpp = rg;
@@ -1215,11 +1293,28 @@ mshv_map_user_memory(struct mshv_partition *partition,
 	if (ret)
 		return ret;
 
-	if (is_mmio)
-		ret = hv_call_map_mmio_pages(partition->pt_id, mem.guest_pfn,
-					     mmio_pfn, HVPFN_DOWN(mem.size));
-	else
+	switch (region->type) {
+	case MSHV_REGION_TYPE_MEM_PINNED:
 		ret = mshv_prepare_pinned_region(region);
+		break;
+	case MSHV_REGION_TYPE_MEM_MOVABLE:
+		/*
+		 * For movable memory regions, remap with no access to let
+		 * the hypervisor track dirty pages, enabling pre-copy live
+		 * migration.
+		 */
+		ret = hv_call_map_gpa_pages(partition->pt_id,
+					    region->start_gfn,
+					    region->nr_pages,
+					    HV_MAP_GPA_NO_ACCESS, NULL);
+		break;
+	case MSHV_REGION_TYPE_MMIO:
+		ret = hv_call_map_mmio_pages(partition->pt_id,
+					     region->start_gfn,
+					     mmio_pfn,
+					     region->nr_pages);
+		break;
+	}
 
 	if (ret)
 		goto errout;



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
@ 2025-11-27 10:59   ` kernel test robot
  2025-12-01 15:09   ` Anirudh Rayabharam
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2025-11-27 10:59 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: llvm, oe-kbuild-all, linux-hyperv, linux-kernel

Hi Stanislav,

kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20251125]
[cannot apply to linus/master v6.18-rc7 v6.18-rc6 v6.18-rc5 v6.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Stanislav-Kinsburskii/Drivers-hv-Refactor-and-rename-memory-region-handling-functions/20251126-101138
base:   next-20251125
patch link:    https://lore.kernel.org/r/176412295155.447063.16512843211428609586.stgit%40skinsburskii-cloud-desktop.internal.cloudapp.net
patch subject: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
config: x86_64-randconfig-076-20251127 (https://download.01.org/0day-ci/archive/20251127/202511271830.nH1cbyQI-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251127/202511271830.nH1cbyQI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511271830.nH1cbyQI-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/hv/mshv_regions.c:288:12: warning: parameter 'flags' set but not used [-Wunused-but-set-parameter]
     288 |                                    u32 flags,
         |                                        ^
   1 warning generated.


vim +/flags +288 drivers/hv/mshv_regions.c

   286	
   287	static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
 > 288					   u32 flags,
   289					   u64 page_offset, u64 page_count)
   290	{
   291		if (PageTransCompound(region->pages[page_offset]))
   292			flags |= HV_UNMAP_GPA_LARGE_PAGE;
   293	
   294		return hv_call_unmap_gpa_pages(region->partition->pt_id,
   295					       region->start_gfn + page_offset,
   296					       page_count, 0);
   297	}
   298	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c
  2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
@ 2025-12-01 11:06   ` Anirudh Rayabharam
  2025-12-01 16:46     ` Stanislav Kinsburskii
  2025-12-03 18:13   ` Nuno Das Neves
  1 sibling, 1 reply; 30+ messages in thread
From: Anirudh Rayabharam @ 2025-12-01 11:06 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Nov 26, 2025 at 02:09:05AM +0000, Stanislav Kinsburskii wrote:
> Refactor memory region management functions from mshv_root_main.c into
> mshv_regions.c for better modularity and code organization.
> 
> Adjust function calls and headers to use the new implementation. Improve
> maintainability and separation of concerns in the mshv_root module.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/Makefile         |    2 
>  drivers/hv/mshv_regions.c   |  175 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/hv/mshv_root.h      |   10 ++
>  drivers/hv/mshv_root_main.c |  176 +++----------------------------------------
>  4 files changed, 198 insertions(+), 165 deletions(-)
>  create mode 100644 drivers/hv/mshv_regions.c
> 
> diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> index 58b8d07639f3..46d4f4f1b252 100644
> --- a/drivers/hv/Makefile
> +++ b/drivers/hv/Makefile
> @@ -14,7 +14,7 @@ hv_vmbus-y := vmbus_drv.o \
>  hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
>  hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
>  mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> -	       mshv_root_hv_call.o mshv_portid_table.o
> +	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
>  mshv_vtl-y := mshv_vtl_main.o
>  
>  # Code that must be built-in
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> new file mode 100644
> index 000000000000..35b866670840
> --- /dev/null
> +++ b/drivers/hv/mshv_regions.c

How about mshv_mem_regions.c?

Nevertheless:

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction
  2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
@ 2025-12-01 11:12   ` Anirudh Rayabharam
  0 siblings, 0 replies; 30+ messages in thread
From: Anirudh Rayabharam @ 2025-12-01 11:12 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Nov 26, 2025 at 02:08:57AM +0000, Stanislav Kinsburskii wrote:
> Centralize guest memory region destruction to prevent resource leaks and
> inconsistent cleanup across unmap and partition destruction paths.
> 
> Unify region removal, encrypted partition access recovery, and region
> invalidation to improve maintainability and reliability. Reduce code
> duplication and make future updates less error-prone by encapsulating
> cleanup logic in a single helper.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |   65 ++++++++++++++++++++++---------------------
>  1 file changed, 34 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index fec82619684a..ec18984c3f2d 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1356,13 +1356,42 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  	return ret;
>  }
>  
> +static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> +{
> +	struct mshv_partition *partition = region->partition;
> +	u32 unmap_flags = 0;
> +	int ret;
> +
> +	hlist_del(&region->hnode);
> +
> +	if (mshv_partition_encrypted(partition)) {
> +		ret = mshv_partition_region_share(region);
> +		if (ret) {
> +			pt_err(partition,
> +			       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
> +			       ret);
> +			return;
> +		}
> +	}
> +
> +	if (region->flags.large_pages)
> +		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> +
> +	/* ignore unmap failures and continue as process may be exiting */
> +	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> +				region->nr_pages, unmap_flags);
> +
> +	mshv_region_invalidate(region);
> +
> +	vfree(region);
> +}
> +
>  /* Called for unmapping both the guest ram and the mmio space */
>  static long
>  mshv_unmap_user_memory(struct mshv_partition *partition,
>  		       struct mshv_user_mem_region mem)
>  {
>  	struct mshv_mem_region *region;
> -	u32 unmap_flags = 0;
>  
>  	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
>  		return -EINVAL;
> @@ -1377,18 +1406,8 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
>  	    region->nr_pages != HVPFN_DOWN(mem.size))
>  		return -EINVAL;
>  
> -	hlist_del(&region->hnode);
> +	mshv_partition_destroy_region(region);
>  
> -	if (region->flags.large_pages)
> -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> -
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> -
> -	mshv_region_invalidate(region);
> -
> -	vfree(region);
>  	return 0;
>  }
>  
> @@ -1724,8 +1743,8 @@ static void destroy_partition(struct mshv_partition *partition)
>  {
>  	struct mshv_vp *vp;
>  	struct mshv_mem_region *region;
> -	int i, ret;
>  	struct hlist_node *n;
> +	int i;
>  
>  	if (refcount_read(&partition->pt_ref_count)) {
>  		pt_err(partition,
> @@ -1789,25 +1808,9 @@ static void destroy_partition(struct mshv_partition *partition)
>  
>  	remove_partition(partition);
>  
> -	/* Remove regions, regain access to the memory and unpin the pages */
>  	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
> -				  hnode) {
> -		hlist_del(&region->hnode);
> -
> -		if (mshv_partition_encrypted(partition)) {
> -			ret = mshv_partition_region_share(region);
> -			if (ret) {
> -				pt_err(partition,
> -				       "Failed to regain access to memory, unpinning user pages will fail and crash the host error: %d\n",
> -				      ret);
> -				return;
> -			}
> -		}
> -
> -		mshv_region_invalidate(region);
> -
> -		vfree(region);
> -	}
> +				  hnode)
> +		mshv_partition_destroy_region(region);
>  
>  	/* Withdraw and free all pages we deposited */
>  	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> 
> 

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions
  2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
@ 2025-12-01 11:20   ` Anirudh Rayabharam
  0 siblings, 0 replies; 30+ messages in thread
From: Anirudh Rayabharam @ 2025-12-01 11:20 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Nov 26, 2025 at 02:08:52AM +0000, Stanislav Kinsburskii wrote:
> Simplify and unify memory region management to improve code clarity and
> reliability. Consolidate pinning and invalidation logic, adopt consistent
> naming, and remove redundant checks to reduce complexity.
> 
> Enhance documentation and update call sites for maintainability.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |   80 +++++++++++++++++++------------------------
>  1 file changed, 36 insertions(+), 44 deletions(-)

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
@ 2025-12-01 15:06   ` Anirudh Rayabharam
  2025-12-02 18:39   ` Michael Kelley
  2025-12-03 18:58   ` Nuno Das Neves
  2 siblings, 0 replies; 30+ messages in thread
From: Anirudh Rayabharam @ 2025-12-01 15:06 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Nov 26, 2025 at 02:09:17AM +0000, Stanislav Kinsburskii wrote:
> Refactor region overlap check in mshv_partition_create_region to use
> mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
> manual iteration.
> 
> This is a cleaner approach that leverages existing functionality to
> accurately detect overlapping memory regions.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 5dfb933da981..ae600b927f49 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>  	u64 nr_pages = HVPFN_DOWN(mem->size);
>  
>  	/* Reject overlapping regions */
> -	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> -			continue;
> -
> +	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
>  		return -EEXIST;
> -	}
>  
>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>  				mem->userspace_addr, mem->flags,
> 
> 

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
  2025-11-27 10:59   ` kernel test robot
@ 2025-12-01 15:09   ` Anirudh Rayabharam
  2025-12-01 18:26     ` Stanislav Kinsburskii
  2025-12-03 18:50   ` Nuno Das Neves
  2025-12-04 16:03   ` Michael Kelley
  3 siblings, 1 reply; 30+ messages in thread
From: Anirudh Rayabharam @ 2025-12-01 15:09 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Nov 26, 2025 at 02:09:11AM +0000, Stanislav Kinsburskii wrote:
> The previous code assumed that if a region's first page was huge, the
> entire region consisted of huge pages and stored this in a large_pages
> flag. This premise is incorrect not only for movable regions (where
> pages can be split and merged on invalidate callbacks or page faults),
> but even for pinned regions: THPs can be split and merged during
> allocation, so a large, pinned region may contain a mix of huge and
> regular pages.
> 
> This change removes the large_pages flag and replaces region-wide
> assumptions with per-chunk inspection of the actual page size when
> mapping, unmapping, sharing, and unsharing. This makes huge page
> handling correct for mixed-page regions and avoids relying on stale
> metadata that can easily become invalid as memory is remapped.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++------
>  drivers/hv/mshv_root.h    |    3 -
>  2 files changed, 184 insertions(+), 32 deletions(-)

Except the warning reported by kernel test robot:

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c
  2025-12-01 11:06   ` Anirudh Rayabharam
@ 2025-12-01 16:46     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-01 16:46 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Mon, Dec 01, 2025 at 11:06:07AM +0000, Anirudh Rayabharam wrote:
> On Wed, Nov 26, 2025 at 02:09:05AM +0000, Stanislav Kinsburskii wrote:
> > Refactor memory region management functions from mshv_root_main.c into
> > mshv_regions.c for better modularity and code organization.
> > 
> > Adjust function calls and headers to use the new implementation. Improve
> > maintainability and separation of concerns in the mshv_root module.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/Makefile         |    2 
> >  drivers/hv/mshv_regions.c   |  175 +++++++++++++++++++++++++++++++++++++++++++
> >  drivers/hv/mshv_root.h      |   10 ++
> >  drivers/hv/mshv_root_main.c |  176 +++----------------------------------------
> >  4 files changed, 198 insertions(+), 165 deletions(-)
> >  create mode 100644 drivers/hv/mshv_regions.c
> > 
> > diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> > index 58b8d07639f3..46d4f4f1b252 100644
> > --- a/drivers/hv/Makefile
> > +++ b/drivers/hv/Makefile
> > @@ -14,7 +14,7 @@ hv_vmbus-y := vmbus_drv.o \
> >  hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
> >  hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
> >  mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> > -	       mshv_root_hv_call.o mshv_portid_table.o
> > +	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
> >  mshv_vtl-y := mshv_vtl_main.o
> >  
> >  # Code that must be built-in
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > new file mode 100644
> > index 000000000000..35b866670840
> > --- /dev/null
> > +++ b/drivers/hv/mshv_regions.c
> 
> How about mshv_mem_regions.c?
> 

I'd rather rename mshv_mem_region into mshv_region instead, as MMIO
regions aren't memory regions.

Thanks,
Stanislav

> Nevertheless:
> 
> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-12-01 15:09   ` Anirudh Rayabharam
@ 2025-12-01 18:26     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-01 18:26 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Mon, Dec 01, 2025 at 03:09:41PM +0000, Anirudh Rayabharam wrote:
> On Wed, Nov 26, 2025 at 02:09:11AM +0000, Stanislav Kinsburskii wrote:
> > The previous code assumed that if a region's first page was huge, the
> > entire region consisted of huge pages and stored this in a large_pages
> > flag. This premise is incorrect not only for movable regions (where
> > pages can be split and merged on invalidate callbacks or page faults),
> > but even for pinned regions: THPs can be split and merged during
> > allocation, so a large, pinned region may contain a mix of huge and
> > regular pages.
> > 
> > This change removes the large_pages flag and replaces region-wide
> > assumptions with per-chunk inspection of the actual page size when
> > mapping, unmapping, sharing, and unsharing. This makes huge page
> > handling correct for mixed-page regions and avoids relying on stale
> > metadata that can easily become invalid as memory is remapped.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++------
> >  drivers/hv/mshv_root.h    |    3 -
> >  2 files changed, 184 insertions(+), 32 deletions(-)
> 
> Except the warning reported by kernel test robot:
> 

This one is a good catch.
I'll fix it in the next revision.

Thanks,
Stanislav

> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
  2025-12-01 15:06   ` Anirudh Rayabharam
@ 2025-12-02 18:39   ` Michael Kelley
  2025-12-03 17:46     ` Stanislav Kinsburskii
  2025-12-03 18:58   ` Nuno Das Neves
  2 siblings, 1 reply; 30+ messages in thread
From: Michael Kelley @ 2025-12-02 18:39 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, Nuno Das Neves
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> 
> Refactor region overlap check in mshv_partition_create_region to use
> mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
> manual iteration.
> 
> This is a cleaner approach that leverages existing functionality to
> accurately detect overlapping memory regions.

Unfortunately, the cleaner approach doesn't work. :-( It doesn't detect a
new region request that completely overlaps an existing region.

See https://lore.kernel.org/linux-hyperv/6a5f4ed5-63ae-4760-84c9-7290aaff8bd1@linux.microsoft.com/T/#ma91254da1900de61da520acb96c0de38c43562f6.
I couldn't see anything that prevents the scenario. Nuno created this 
patch less than a month ago: https://lore.kernel.org/linux-hyperv/1762467211-8213-2-git-send-email-nunodasneves@linux.microsoft.com/.

Michael

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 5dfb933da981..ae600b927f49 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct
> mshv_partition *partition,
>  	u64 nr_pages = HVPFN_DOWN(mem->size);
> 
>  	/* Reject overlapping regions */
> -	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> -			continue;
> -
> +	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
>  		return -EEXIST;
> -	}
> 
>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>  				mem->userspace_addr, mem->flags,
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-12-02 18:39   ` Michael Kelley
@ 2025-12-03 17:46     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-03 17:46 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, Nuno Das Neves, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Tue, Dec 02, 2025 at 06:39:51PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> > 
> > Refactor region overlap check in mshv_partition_create_region to use
> > mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
> > manual iteration.
> > 
> > This is a cleaner approach that leverages existing functionality to
> > accurately detect overlapping memory regions.
> 
> Unfortunately, the cleaner approach doesn't work. :-( It doesn't detect a
> new region request that completely overlaps an existing region.
> 
> See https://lore.kernel.org/linux-hyperv/6a5f4ed5-63ae-4760-84c9-7290aaff8bd1@linux.microsoft.com/T/#ma91254da1900de61da520acb96c0de38c43562f6.
> I couldn't see anything that prevents the scenario. Nuno created this 
> patch less than a month ago: https://lore.kernel.org/linux-hyperv/1762467211-8213-2-git-send-email-nunodasneves@linux.microsoft.com/.
> 
> Michael
> 

I see. 
I'll drop it then.

Thanks,
Stanislav

> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_root_main.c |    8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index 5dfb933da981..ae600b927f49 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct
> > mshv_partition *partition,
> >  	u64 nr_pages = HVPFN_DOWN(mem->size);
> > 
> >  	/* Reject overlapping regions */
> > -	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> > -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> > -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> > -			continue;
> > -
> > +	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> > +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
> >  		return -EEXIST;
> > -	}
> > 
> >  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
> >  				mem->userspace_addr, mem->flags,
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c
  2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
  2025-12-01 11:06   ` Anirudh Rayabharam
@ 2025-12-03 18:13   ` Nuno Das Neves
  2025-12-03 18:20     ` Stanislav Kinsburskii
  1 sibling, 1 reply; 30+ messages in thread
From: Nuno Das Neves @ 2025-12-03 18:13 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
> Refactor memory region management functions from mshv_root_main.c into
> mshv_regions.c for better modularity and code organization.
> 
> Adjust function calls and headers to use the new implementation. Improve
> maintainability and separation of concerns in the mshv_root module.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/Makefile         |    2 
>  drivers/hv/mshv_regions.c   |  175 +++++++++++++++++++++++++++++++++++++++++++
>  drivers/hv/mshv_root.h      |   10 ++
>  drivers/hv/mshv_root_main.c |  176 +++----------------------------------------
>  4 files changed, 198 insertions(+), 165 deletions(-)
>  create mode 100644 drivers/hv/mshv_regions.c
> 
> diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> index 58b8d07639f3..46d4f4f1b252 100644
> --- a/drivers/hv/Makefile
> +++ b/drivers/hv/Makefile
> @@ -14,7 +14,7 @@ hv_vmbus-y := vmbus_drv.o \
>  hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
>  hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
>  mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> -	       mshv_root_hv_call.o mshv_portid_table.o
> +	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
>  mshv_vtl-y := mshv_vtl_main.o
>  
>  # Code that must be built-in
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> new file mode 100644
> index 000000000000..35b866670840
> --- /dev/null
> +++ b/drivers/hv/mshv_regions.c
> @@ -0,0 +1,175 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, Microsoft Corporation.
> + *
> + * Memory region management for mshv_root module.
> + *
> + * Authors: Microsoft Linux virtualization team
> + */
> +
> +#include <linux/mm.h>
> +#include <linux/vmalloc.h>
> +
> +#include <asm/mshyperv.h>
> +
> +#include "mshv_root.h"
> +
> +struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> +					   u64 uaddr, u32 flags,

nit: we use 'flags' here to mean MSHV_SET_MEM flags, but below in
mshv_region_share/unshare() we use it to mean HV_MAP_GPA flags.

Renaming 'flags' to 'mshv_flags' here could improve the clarity.

> +					   bool is_mmio)
> +{
> +	struct mshv_mem_region *region;
> +
> +	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
> +	if (!region)
> +		return ERR_PTR(-ENOMEM);
> +
> +	region->nr_pages = nr_pages;
> +	region->start_gfn = guest_pfn;
> +	region->start_uaddr = uaddr;
> +	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
> +	if (flags & BIT(MSHV_SET_MEM_BIT_WRITABLE))
> +		region->hv_map_flags |= HV_MAP_GPA_WRITABLE;
> +	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
> +		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> +
> +	/* Note: large_pages flag populated when we pin the pages */
> +	if (!is_mmio)
> +		region->flags.range_pinned = true;
> +
> +	return region;
> +}
> +
> +int mshv_region_share(struct mshv_mem_region *region)
> +{
> +	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> +
> +	if (region->flags.large_pages)
> +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> +
> +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> +			region->pages, region->nr_pages,
> +			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> +			flags, true);
> +}
> +
> +int mshv_region_unshare(struct mshv_mem_region *region)
> +{
> +	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> +
> +	if (region->flags.large_pages)
> +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> +
> +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> +			region->pages, region->nr_pages,
> +			0,
> +			flags, false);
> +}<snip>

Looks fine to me. Fixing the nit is optional.

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c
  2025-12-03 18:13   ` Nuno Das Neves
@ 2025-12-03 18:20     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-03 18:20 UTC (permalink / raw)
  To: Nuno Das Neves; +Cc: kys, haiyangz, wei.liu, decui, linux-hyperv, linux-kernel

On Wed, Dec 03, 2025 at 10:13:52AM -0800, Nuno Das Neves wrote:
> On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
> > Refactor memory region management functions from mshv_root_main.c into
> > mshv_regions.c for better modularity and code organization.
> > 
> > Adjust function calls and headers to use the new implementation. Improve
> > maintainability and separation of concerns in the mshv_root module.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/Makefile         |    2 
> >  drivers/hv/mshv_regions.c   |  175 +++++++++++++++++++++++++++++++++++++++++++
> >  drivers/hv/mshv_root.h      |   10 ++
> >  drivers/hv/mshv_root_main.c |  176 +++----------------------------------------
> >  4 files changed, 198 insertions(+), 165 deletions(-)
> >  create mode 100644 drivers/hv/mshv_regions.c
> > 
> > diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> > index 58b8d07639f3..46d4f4f1b252 100644
> > --- a/drivers/hv/Makefile
> > +++ b/drivers/hv/Makefile
> > @@ -14,7 +14,7 @@ hv_vmbus-y := vmbus_drv.o \
> >  hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
> >  hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
> >  mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> > -	       mshv_root_hv_call.o mshv_portid_table.o
> > +	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
> >  mshv_vtl-y := mshv_vtl_main.o
> >  
> >  # Code that must be built-in
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > new file mode 100644
> > index 000000000000..35b866670840
> > --- /dev/null
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -0,0 +1,175 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2025, Microsoft Corporation.
> > + *
> > + * Memory region management for mshv_root module.
> > + *
> > + * Authors: Microsoft Linux virtualization team
> > + */
> > +
> > +#include <linux/mm.h>
> > +#include <linux/vmalloc.h>
> > +
> > +#include <asm/mshyperv.h>
> > +
> > +#include "mshv_root.h"
> > +
> > +struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> > +					   u64 uaddr, u32 flags,
> 
> nit: we use 'flags' here to mean MSHV_SET_MEM flags, but below in
> mshv_region_share/unshare() we use it to mean HV_MAP_GPA flags.
> 
> Renaming 'flags' to 'mshv_flags' here could improve the clarity.
> 

I'd rather change it in a follow up to reduce rebase churn for the
subsequent changes.

Thanks,
Stanislav

> > +					   bool is_mmio)
> > +{
> > +	struct mshv_mem_region *region;
> > +
> > +	region = vzalloc(sizeof(*region) + sizeof(struct page *) * nr_pages);
> > +	if (!region)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	region->nr_pages = nr_pages;
> > +	region->start_gfn = guest_pfn;
> > +	region->start_uaddr = uaddr;
> > +	region->hv_map_flags = HV_MAP_GPA_READABLE | HV_MAP_GPA_ADJUSTABLE;
> > +	if (flags & BIT(MSHV_SET_MEM_BIT_WRITABLE))
> > +		region->hv_map_flags |= HV_MAP_GPA_WRITABLE;
> > +	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
> > +		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> > +
> > +	/* Note: large_pages flag populated when we pin the pages */
> > +	if (!is_mmio)
> > +		region->flags.range_pinned = true;
> > +
> > +	return region;
> > +}
> > +
> > +int mshv_region_share(struct mshv_mem_region *region)
> > +{
> > +	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> > +
> > +	if (region->flags.large_pages)
> > +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > +
> > +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > +			region->pages, region->nr_pages,
> > +			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> > +			flags, true);
> > +}
> > +
> > +int mshv_region_unshare(struct mshv_mem_region *region)
> > +{
> > +	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> > +
> > +	if (region->flags.large_pages)
> > +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > +
> > +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > +			region->pages, region->nr_pages,
> > +			0,
> > +			flags, false);
> > +}<snip>
> 
> Looks fine to me. Fixing the nit is optional.
> 
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
  2025-11-27 10:59   ` kernel test robot
  2025-12-01 15:09   ` Anirudh Rayabharam
@ 2025-12-03 18:50   ` Nuno Das Neves
  2025-12-04 16:03   ` Michael Kelley
  3 siblings, 0 replies; 30+ messages in thread
From: Nuno Das Neves @ 2025-12-03 18:50 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
> The previous code assumed that if a region's first page was huge, the
> entire region consisted of huge pages and stored this in a large_pages
> flag. This premise is incorrect not only for movable regions (where
> pages can be split and merged on invalidate callbacks or page faults),
> but even for pinned regions: THPs can be split and merged during
> allocation, so a large, pinned region may contain a mix of huge and
> regular pages.
> 
> This change removes the large_pages flag and replaces region-wide
> assumptions with per-chunk inspection of the actual page size when
> mapping, unmapping, sharing, and unsharing. This makes huge page
> handling correct for mixed-page regions and avoids relying on stale
> metadata that can easily become invalid as memory is remapped.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++------
>  drivers/hv/mshv_root.h    |    3 -
>  2 files changed, 184 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index 35b866670840..d535d2e3e811 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -14,6 +14,124 @@
>  
>  #include "mshv_root.h"
>  
> +/**
> + * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> + *                             in a region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle the chunk.
> + *
> + * This function scans the region's pages starting from @page_offset,
> + * checking for contiguous present pages of the same size (normal or huge).
> + * It invokes @handler for the chunk of contiguous pages found. Returns the
> + * number of pages handled, or a negative error code if the first page is
> + * not present or the handler fails.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Return: Number of pages handled, or negative error code.
> + */
> +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> +				      u32 flags,
> +				      u64 page_offset, u64 page_count,
> +				      int (*handler)(struct mshv_mem_region *region,
> +						     u32 flags,
> +						     u64 page_offset,
> +						     u64 page_count))
> +{
> +	u64 count, stride;
> +	unsigned int page_order;
> +	struct page *page;
> +	int ret;
> +
> +	page = region->pages[page_offset];
> +	if (!page)
> +		return -EINVAL;
> +
> +	page_order = folio_order(page_folio(page));
> +	/* 1G huge pages aren't supported by the hypercalls */
> +	if (page_order == PUD_ORDER)
> +		return -EINVAL;

I'd prefer to be explicit about exactly which page_orders we *do*
support instead of just disallowing PUD_ORDER.

Without looking up folio_order(), there's an implication here that
page_order can be anything except PUD_ORDER, but that's not the case;
there's only 2 valid values for page_order.

The comment can instead read something like:
"The hypervisor only supports 4K and 2M page sizes"

> +
> +	stride = 1 << page_order;
> +> +	/* Start at stride since the first page is validated */
> +	for (count = stride; count < page_count; count += stride) {
> +		page = region->pages[page_offset + count];
> +
> +		/* Break if current page is not present */
> +		if (!page)
> +			break;
> +
> +		/* Break if page size changes */
> +		if (page_order != folio_order(page_folio(page)))
> +			break;
> +	}
> +
> +	ret = handler(region, flags, page_offset, count);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +/**
> + * mshv_region_process_range - Processes a range of memory pages in a
> + *                             region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle each chunk of contiguous
> + *               pages.
> + *
> + * Iterates over the specified range of pages in @region, skipping
> + * non-present pages. For each contiguous chunk of present pages, invokes
> + * @handler via mshv_region_process_chunk.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Returns 0 on success, or a negative error code on failure.
> + */
> +static int mshv_region_process_range(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count,
> +				     int (*handler)(struct mshv_mem_region *region,
> +						    u32 flags,
> +						    u64 page_offset,
> +						    u64 page_count))
> +{
> +	long ret;
> +
> +	if (page_offset + page_count > region->nr_pages)
> +		return -EINVAL;
> +
> +	while (page_count) {
> +		/* Skip non-present pages */
> +		if (!region->pages[page_offset]) {
> +			page_offset++;
> +			page_count--;
> +			continue;
> +		}
> +
> +		ret = mshv_region_process_chunk(region, flags,
> +						page_offset,
> +						page_count,
> +						handler);
> +		if (ret < 0)
> +			return ret;
> +
> +		page_offset += ret;
> +		page_count -= ret;
> +	}
> +
> +	return 0;
> +}
> +
>  struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  					   u64 uaddr, u32 flags,
>  					   bool is_mmio)
> @@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
>  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
>  
> -	/* Note: large_pages flag populated when we pin the pages */
>  	if (!is_mmio)
>  		region->flags.range_pinned = true;
>  
>  	return region;
>  }
>  
> +static int mshv_region_chunk_share(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))

PageTransCompound() returns false if CONFIG_TRANSPARENT_HUGEPAGE is not
enabled. This won't work for hugetlb pages, will it?

Do we need to check if (PageHuge(page) || PageTransCompound(page)) ?

> +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> +
> +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> +					      region->pages + page_offset,
> +					      page_count,
> +					      HV_MAP_GPA_READABLE |
> +					      HV_MAP_GPA_WRITABLE,
> +					      flags, true);
> +}
> +
>  int mshv_region_share(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
>  
> -	if (region->flags.large_pages)
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_share);
> +}
> +
> +static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
>  
>  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> -			flags, true);
> +					      region->pages + page_offset,
> +					      page_count, 0,
> +					      flags, false);
>  }
>  
>  int mshv_region_unshare(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
>  
> -	if (region->flags.large_pages)
> -		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> -
> -	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			0,
> -			flags, false);
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unshare);
>  }
>  
> -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> -				   u32 map_flags,
> +static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> +				   u32 flags,
>  				   u64 page_offset, u64 page_count)
nit: Why the name change from map_flags to flags? It seems to create
some noise here.

>  {
> -	if (page_offset + page_count > region->nr_pages)
> -		return -EINVAL;
> -
> -	if (region->flags.large_pages)
> -		map_flags |= HV_MAP_GPA_LARGE_PAGE;
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_MAP_GPA_LARGE_PAGE;
>  
>  	return hv_call_map_gpa_pages(region->partition->pt_id,
>  				     region->start_gfn + page_offset,
> -				     page_count, map_flags,
> +				     page_count, flags,
>  				     region->pages + page_offset);
>  }
>  
> +static int mshv_region_remap_pages(struct mshv_mem_region *region,
> +				   u32 map_flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	return mshv_region_process_range(region, map_flags,
> +					 page_offset, page_count,
> +					 mshv_region_chunk_remap);
> +}
> +
>  int mshv_region_map(struct mshv_mem_region *region)
>  {
>  	u32 map_flags = region->hv_map_flags;
> @@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  			goto release_pages;
>  	}
>  
> -	if (PageHuge(region->pages[0]))
> -		region->flags.large_pages = true;
> -
>  	return 0;
>  
>  release_pages:
> @@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  	return ret;
>  }
>  
> +static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> +
> +	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> +				       region->start_gfn + page_offset,
> +				       page_count, 0);
> +}
> +
> +static int mshv_region_unmap(struct mshv_mem_region *region)
> +{
> +	return mshv_region_process_range(region, 0,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unmap);
> +}
> +
>  void mshv_region_destroy(struct mshv_mem_region *region)
>  {
>  	struct mshv_partition *partition = region->partition;
> -	u32 unmap_flags = 0;
>  	int ret;
>  
>  	hlist_del(&region->hnode);
> @@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region *region)
>  		}
>  	}
>  
> -	if (region->flags.large_pages)
> -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> -
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> +	mshv_region_unmap(region);
>  
>  	mshv_region_invalidate(region);
>  
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 0366f416c2f0..ff3374f13691 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -77,9 +77,8 @@ struct mshv_mem_region {
>  	u64 start_uaddr;
>  	u32 hv_map_flags;
>  	struct {
> -		u64 large_pages:  1; /* 2MiB */
>  		u64 range_pinned: 1;
> -		u64 reserved:	 62;
> +		u64 reserved:	 63;
>  	} flags;
>  	struct mshv_partition *partition;
>  	struct page *pages[];
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
  2025-12-01 15:06   ` Anirudh Rayabharam
  2025-12-02 18:39   ` Michael Kelley
@ 2025-12-03 18:58   ` Nuno Das Neves
  2025-12-03 19:36     ` Nuno Das Neves
  2 siblings, 1 reply; 30+ messages in thread
From: Nuno Das Neves @ 2025-12-03 18:58 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
> Refactor region overlap check in mshv_partition_create_region to use
> mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
> manual iteration.
> 
> This is a cleaner approach that leverages existing functionality to
> accurately detect overlapping memory regions.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_main.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 5dfb933da981..ae600b927f49 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>  	u64 nr_pages = HVPFN_DOWN(mem->size);
>  
>  	/* Reject overlapping regions */
> -	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
> -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
> -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
> -			continue;
> -
> +	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
>  		return -EEXIST;

This logic does not work. I fixed this check in
ba9eb9b86d23 mshv: Fix create memory region overlap check

This change would just be reverting that fix.

Consider an existing region at 0x2000 of size 0x1000. The user
tries to map a new region at 0x1000 of size 0x3000. Since the new region
starts before and ends after the existing region, the overlap would not
be detected by this logic. It just checks if an existing region contains
0x1000 or 0x4000 - 1 which it does not. This is why a manual iteration
here is needed.

> -	}
>  
>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>  				mem->userspace_addr, mem->flags,
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create
  2025-12-03 18:58   ` Nuno Das Neves
@ 2025-12-03 19:36     ` Nuno Das Neves
  0 siblings, 0 replies; 30+ messages in thread
From: Nuno Das Neves @ 2025-12-03 19:36 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui
  Cc: linux-hyperv, linux-kernel

On 12/3/2025 10:58 AM, Nuno Das Neves wrote:
> On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
>> Refactor region overlap check in mshv_partition_create_region to use
>> mshv_partition_region_by_gfn for both start and end guest PFNs, replacing
>> manual iteration.
>>
>> This is a cleaner approach that leverages existing functionality to
>> accurately detect overlapping memory regions.
>>
>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>> ---
>>  drivers/hv/mshv_root_main.c |    8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>> index 5dfb933da981..ae600b927f49 100644
>> --- a/drivers/hv/mshv_root_main.c
>> +++ b/drivers/hv/mshv_root_main.c
>> @@ -1086,13 +1086,9 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>>  	u64 nr_pages = HVPFN_DOWN(mem->size);
>>  
>>  	/* Reject overlapping regions */
>> -	hlist_for_each_entry(rg, &partition->pt_mem_regions, hnode) {
>> -		if (mem->guest_pfn + nr_pages <= rg->start_gfn ||
>> -		    rg->start_gfn + rg->nr_pages <= mem->guest_pfn)
>> -			continue;
>> -
>> +	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
>> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
>>  		return -EEXIST;
> 
> This logic does not work. I fixed this check in
> ba9eb9b86d23 mshv: Fix create memory region overlap check
> 
> This change would just be reverting that fix.
> 
> Consider an existing region at 0x2000 of size 0x1000. The user
> tries to map a new region at 0x1000 of size 0x3000. Since the new region
> starts before and ends after the existing region, the overlap would not
> be detected by this logic. It just checks if an existing region contains
> 0x1000 or 0x4000 - 1 which it does not. This is why a manual iteration
> here is needed.
> 

Apologies, after sending this I realized you already dropped the patch.

>> -	}
>>  
>>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>>  				mem->userspace_addr, mem->flags,
>>
>>
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
                     ` (2 preceding siblings ...)
  2025-12-03 18:50   ` Nuno Das Neves
@ 2025-12-04 16:03   ` Michael Kelley
  2025-12-04 21:08     ` Stanislav Kinsburskii
  3 siblings, 1 reply; 30+ messages in thread
From: Michael Kelley @ 2025-12-04 16:03 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> 
> The previous code assumed that if a region's first page was huge, the
> entire region consisted of huge pages and stored this in a large_pages
> flag. This premise is incorrect not only for movable regions (where
> pages can be split and merged on invalidate callbacks or page faults),
> but even for pinned regions: THPs can be split and merged during
> allocation, so a large, pinned region may contain a mix of huge and
> regular pages.
> 
> This change removes the large_pages flag and replaces region-wide
> assumptions with per-chunk inspection of the actual page size when
> mapping, unmapping, sharing, and unsharing. This makes huge page
> handling correct for mixed-page regions and avoids relying on stale
> metadata that can easily become invalid as memory is remapped.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++-
> -----
>  drivers/hv/mshv_root.h    |    3 -
>  2 files changed, 184 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index 35b866670840..d535d2e3e811 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -14,6 +14,124 @@
> 
>  #include "mshv_root.h"
> 
> +/**
> + * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> + *                             in a region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle the chunk.
> + *
> + * This function scans the region's pages starting from @page_offset,
> + * checking for contiguous present pages of the same size (normal or huge).
> + * It invokes @handler for the chunk of contiguous pages found. Returns the
> + * number of pages handled, or a negative error code if the first page is
> + * not present or the handler fails.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Return: Number of pages handled, or negative error code.
> + */
> +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> +				      u32 flags,
> +				      u64 page_offset, u64 page_count,
> +				      int (*handler)(struct mshv_mem_region *region,
> +						     u32 flags,
> +						     u64 page_offset,
> +						     u64 page_count))
> +{
> +	u64 count, stride;
> +	unsigned int page_order;
> +	struct page *page;
> +	int ret;
> +
> +	page = region->pages[page_offset];
> +	if (!page)
> +		return -EINVAL;
> +
> +	page_order = folio_order(page_folio(page));
> +	/* 1G huge pages aren't supported by the hypercalls */
> +	if (page_order == PUD_ORDER)
> +		return -EINVAL;

In the general case, folio_order() could return a variety of values ranging from
0 up to at least PUD_ORDER.  For example, "2" would be valid value in file
system code that uses folios to do I/O in 16K blocks instead of just 4K blocks.
Since this function is trying to find contiguous chunks of either single pages
or 2M huge pages, I think you are expecting only three possible values: 0,
PMD_ORDER, or PUD_ORDER. But do you know that "2" (for example)
would never be returned? The memory involved here is populated using
pin_user_pages_fast() for the pinned case, or using hmm_range_fault() for
the movable case. I don't know mm behavior well enough to know if those
functions could ever populate with a folio with an order other than 0 or
PMD_ORDER. If such a folio could ever be used, then the way you check
for a page size change won't be valid. For purposes of informing Hyper-V
about 2 Meg pages, folio orders 0 through 8 are all equivalent, with folio
order 9 (PMD_ORDER) being the marker for the start of 2 Meg large page.

Somebody who knows mm behavior better than I do should comment. Or
maybe you could just be more defensive and handle the case of folio orders
not equal to 0 or PMD_ORDER.

> +
> +	stride = 1 << page_order;
> +
> +	/* Start at stride since the first page is validated */
> +	for (count = stride; count < page_count; count += stride) {

This striding doesn't work properly in the general case. Suppose the
page_offset value puts the start of the chunk in the middle of a 2 Meg
page, and that 2 Meg page is then followed by a bunch of single pages.
(Presumably the mmu notifier "invalidate" callback could do this.)
The use of the full stride here jumps over the remaining portion of the
2 Meg page plus some number of the single pages, which isn't what you
want. For the striding to work, it must figure out how much remains in the
initial large page, and then once the striding is aligned to the large page
boundaries, the full stride length works.

Also, what do the hypercalls in the handler functions do if a chunk starts
in the middle of a 2 Meg page? It looks like the handler functions will set
the *_LARGE_PAGE flag to the hypercall but then the hv_call_* function
will fail if the page_count isn't 2 Meg aligned.

> +		page = region->pages[page_offset + count];
> +
> +		/* Break if current page is not present */
> +		if (!page)
> +			break;
> +
> +		/* Break if page size changes */
> +		if (page_order != folio_order(page_folio(page)))
> +			break;
> +	}
> +
> +	ret = handler(region, flags, page_offset, count);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +/**
> + * mshv_region_process_range - Processes a range of memory pages in a
> + *                             region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle each chunk of contiguous
> + *               pages.
> + *
> + * Iterates over the specified range of pages in @region, skipping
> + * non-present pages. For each contiguous chunk of present pages, invokes
> + * @handler via mshv_region_process_chunk.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Returns 0 on success, or a negative error code on failure.
> + */
> +static int mshv_region_process_range(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count,
> +				     int (*handler)(struct mshv_mem_region *region,
> +						    u32 flags,
> +						    u64 page_offset,
> +						    u64 page_count))
> +{
> +	long ret;
> +
> +	if (page_offset + page_count > region->nr_pages)
> +		return -EINVAL;
> +
> +	while (page_count) {
> +		/* Skip non-present pages */
> +		if (!region->pages[page_offset]) {
> +			page_offset++;
> +			page_count--;
> +			continue;
> +		}
> +
> +		ret = mshv_region_process_chunk(region, flags,
> +						page_offset,
> +						page_count,
> +						handler);
> +		if (ret < 0)
> +			return ret;
> +
> +		page_offset += ret;
> +		page_count -= ret;
> +	}
> +
> +	return 0;
> +}
> +
>  struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  					   u64 uaddr, u32 flags,
>  					   bool is_mmio)
> @@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
>  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> 
> -	/* Note: large_pages flag populated when we pin the pages */
>  	if (!is_mmio)
>  		region->flags.range_pinned = true;
> 
>  	return region;
>  }
> 
> +static int mshv_region_chunk_share(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;

mshv_region_process_chunk() uses folio_size() to detect single pages vs. 2 Meg
large pages. Here you are using PageTransCompound(). Any reason for the
difference? This may be perfectly OK, but my knowledge of mm is too limited to
know for sure. Looking at the implementations of folio_size() and
PageTransCompound(), they seem to be looking at different fields in the struct page,
and I don't know if the different fields are always in sync. Another case for someone
with mm expertise to review carefully ....

Michael

> +
> +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> +					      region->pages + page_offset,
> +					      page_count,
> +					      HV_MAP_GPA_READABLE |
> +					      HV_MAP_GPA_WRITABLE,
> +					      flags, true);
> +}
> +
>  int mshv_region_share(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> 
> -	if (region->flags.large_pages)
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_share);
> +}
> +
> +static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
>  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> -			flags, true);
> +					      region->pages + page_offset,
> +					      page_count, 0,
> +					      flags, false);
>  }
> 
>  int mshv_region_unshare(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> 
> -	if (region->flags.large_pages)
> -		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> -
> -	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			0,
> -			flags, false);
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unshare);
>  }
> 
> -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> -				   u32 map_flags,
> +static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> +				   u32 flags,
>  				   u64 page_offset, u64 page_count)
>  {
> -	if (page_offset + page_count > region->nr_pages)
> -		return -EINVAL;
> -
> -	if (region->flags.large_pages)
> -		map_flags |= HV_MAP_GPA_LARGE_PAGE;
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_MAP_GPA_LARGE_PAGE;
> 
>  	return hv_call_map_gpa_pages(region->partition->pt_id,
>  				     region->start_gfn + page_offset,
> -				     page_count, map_flags,
> +				     page_count, flags,
>  				     region->pages + page_offset);
>  }
> 
> +static int mshv_region_remap_pages(struct mshv_mem_region *region,
> +				   u32 map_flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	return mshv_region_process_range(region, map_flags,
> +					 page_offset, page_count,
> +					 mshv_region_chunk_remap);
> +}
> +
>  int mshv_region_map(struct mshv_mem_region *region)
>  {
>  	u32 map_flags = region->hv_map_flags;
> @@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  			goto release_pages;
>  	}
> 
> -	if (PageHuge(region->pages[0]))
> -		region->flags.large_pages = true;
> -
>  	return 0;
> 
>  release_pages:
> @@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  	return ret;
>  }
> 
> +static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> +
> +	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> +				       region->start_gfn + page_offset,
> +				       page_count, 0);
> +}
> +
> +static int mshv_region_unmap(struct mshv_mem_region *region)
> +{
> +	return mshv_region_process_range(region, 0,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unmap);
> +}
> +
>  void mshv_region_destroy(struct mshv_mem_region *region)
>  {
>  	struct mshv_partition *partition = region->partition;
> -	u32 unmap_flags = 0;
>  	int ret;
> 
>  	hlist_del(&region->hnode);
> @@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region *region)
>  		}
>  	}
> 
> -	if (region->flags.large_pages)
> -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> -
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> +	mshv_region_unmap(region);
> 
>  	mshv_region_invalidate(region);
> 
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 0366f416c2f0..ff3374f13691 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -77,9 +77,8 @@ struct mshv_mem_region {
>  	u64 start_uaddr;
>  	u32 hv_map_flags;
>  	struct {
> -		u64 large_pages:  1; /* 2MiB */
>  		u64 range_pinned: 1;
> -		u64 reserved:	 62;
> +		u64 reserved:	 63;
>  	} flags;
>  	struct mshv_partition *partition;
>  	struct page *pages[];
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions
  2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
@ 2025-12-04 16:48   ` Michael Kelley
  2025-12-04 21:23     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kelley @ 2025-12-04 16:48 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> 
> Introduce kref-based reference counting and spinlock protection for
> memory regions in Hyper-V partition management. This change improves
> memory region lifecycle management and ensures thread-safe access to the
> region list.
> 
> Also improves the check for overlapped memory regions during region
> creation, preventing duplicate or conflicting mappings.

This paragraph seems spurious. I think it applies to what's in Patch 5 of this
series.

> 
> Previously, the regions list was protected by the partition mutex.
> However, this approach is too heavy for frequent fault and invalidation
> operations. Finer grained locking is now used to improve efficiency and
> concurrency.
> 
> This is a precursor to supporting movable memory regions. Fault and
> invalidation handling for movable regions will require safe traversal of
> the region list and holding a region reference while performing
> invalidation or fault operations.

The commit message discussion about the need for the refcounting and
locking seemed a bit vague to me. It wasn't entirely clear whether these
changes are bug fixing existing race conditions, or whether they are new
functionality to support movable regions.

In looking at the existing code, it seems that the main serialization mechanisms
are that partition ioctls are serialized on pt_mutex, and VP ioctls are serialized
on vp_mutex (though multiple VP ioctls can be in progress simultaneously
against different VPs). The serialization of partition ioctls ensures that region
manipulation is serialized, and that, for example, two region creations can't
both verify that there's no overlap, but then overlap with each other. And
region creation and deletion are serialized. In current code, the VP ioctls don't
look at the region data structures, so there can't be any races between
partition and VP ioctls (which are not serialized with each other). The only
question I had about existing code is the mshv_partition_release() function,
which proceeds without serializing against any partition ioctls, but maybe
higher-level file system code ensures that no ioctls are in progress before
the .release callback is made.

The new requirement is movable regions, where the VP ioctl MSHV_RUN_VP
needs to look at region data structures. You've said that in the last paragraph
of your commit message. So I'm reading this as that the new locking is
needed specifically because multiple MSHV_RUN_VP ioctls will likely be
in flight simultaneously, and they are not currently serialized with the
region operations initiated by partition ioctls. And then there are the
"invalidate" callbacks that are running on some other kernel thread and
which also needs synchronization to do region manipulation.

Maybe I'm just looking for a little bit of a written "road map" somewhere
that describes the intended locking scheme at a high level. :-)

Michael

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c   |   19 ++++++++++++++++---
>  drivers/hv/mshv_root.h      |    6 +++++-
>  drivers/hv/mshv_root_main.c |   34 ++++++++++++++++++++++++++--------
>  3 files changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index d535d2e3e811..6450a7ed8493 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -7,6 +7,7 @@
>   * Authors: Microsoft Linux virtualization team
>   */
> 
> +#include <linux/kref.h>
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
> 
> @@ -154,6 +155,8 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (!is_mmio)
>  		region->flags.range_pinned = true;
> 
> +	kref_init(&region->refcount);
> +
>  	return region;
>  }
> 
> @@ -303,13 +306,13 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
>  					 mshv_region_chunk_unmap);
>  }
> 
> -void mshv_region_destroy(struct mshv_mem_region *region)
> +static void mshv_region_destroy(struct kref *ref)
>  {
> +	struct mshv_mem_region *region =
> +		container_of(ref, struct mshv_mem_region, refcount);
>  	struct mshv_partition *partition = region->partition;
>  	int ret;
> 
> -	hlist_del(&region->hnode);
> -
>  	if (mshv_partition_encrypted(partition)) {
>  		ret = mshv_region_share(region);
>  		if (ret) {
> @@ -326,3 +329,13 @@ void mshv_region_destroy(struct mshv_mem_region *region)
> 
>  	vfree(region);
>  }
> +
> +void mshv_region_put(struct mshv_mem_region *region)
> +{
> +	kref_put(&region->refcount, mshv_region_destroy);
> +}
> +
> +int mshv_region_get(struct mshv_mem_region *region)
> +{
> +	return kref_get_unless_zero(&region->refcount);
> +}
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index ff3374f13691..4249534ba900 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -72,6 +72,7 @@ do { \
> 
>  struct mshv_mem_region {
>  	struct hlist_node hnode;
> +	struct kref refcount;
>  	u64 nr_pages;
>  	u64 start_gfn;
>  	u64 start_uaddr;
> @@ -97,6 +98,8 @@ struct mshv_partition {
>  	u64 pt_id;
>  	refcount_t pt_ref_count;
>  	struct mutex pt_mutex;
> +
> +	spinlock_t pt_mem_regions_lock;
>  	struct hlist_head pt_mem_regions; // not ordered
> 
>  	u32 pt_vp_count;
> @@ -319,6 +322,7 @@ int mshv_region_unshare(struct mshv_mem_region *region);
>  int mshv_region_map(struct mshv_mem_region *region);
>  void mshv_region_invalidate(struct mshv_mem_region *region);
>  int mshv_region_pin(struct mshv_mem_region *region);
> -void mshv_region_destroy(struct mshv_mem_region *region);
> +void mshv_region_put(struct mshv_mem_region *region);
> +int mshv_region_get(struct mshv_mem_region *region);
> 
>  #endif /* _MSHV_ROOT_H_ */
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index ae600b927f49..1ef2a28beb17 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1086,9 +1086,13 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>  	u64 nr_pages = HVPFN_DOWN(mem->size);
> 
>  	/* Reject overlapping regions */
> +	spin_lock(&partition->pt_mem_regions_lock);
>  	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> -	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1)) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EEXIST;
> +	}
> +	spin_unlock(&partition->pt_mem_regions_lock);
> 
>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>  				mem->userspace_addr, mem->flags,
> @@ -1220,8 +1224,9 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  	if (ret)
>  		goto errout;
> 
> -	/* Install the new region */
> +	spin_lock(&partition->pt_mem_regions_lock);
>  	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> +	spin_unlock(&partition->pt_mem_regions_lock);
> 
>  	return 0;
> 
> @@ -1240,17 +1245,27 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
>  	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
>  		return -EINVAL;
> 
> +	spin_lock(&partition->pt_mem_regions_lock);
> +
>  	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> -	if (!region)
> -		return -EINVAL;
> +	if (!region) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
> +		return -ENOENT;
> +	}
> 
>  	/* Paranoia check */
>  	if (region->start_uaddr != mem.userspace_addr ||
>  	    region->start_gfn != mem.guest_pfn ||
> -	    region->nr_pages != HVPFN_DOWN(mem.size))
> +	    region->nr_pages != HVPFN_DOWN(mem.size)) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EINVAL;
> +	}
> +
> +	hlist_del(&region->hnode);
> 
> -	mshv_region_destroy(region);
> +	spin_unlock(&partition->pt_mem_regions_lock);
> +
> +	mshv_region_put(region);
> 
>  	return 0;
>  }
> @@ -1653,8 +1668,10 @@ static void destroy_partition(struct mshv_partition *partition)
>  	remove_partition(partition);
> 
>  	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
> -				  hnode)
> -		mshv_region_destroy(region);
> +				  hnode) {
> +		hlist_del(&region->hnode);
> +		mshv_region_put(region);
> +	}
> 
>  	/* Withdraw and free all pages we deposited */
>  	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> @@ -1852,6 +1869,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> 
>  	INIT_HLIST_HEAD(&partition->pt_devices);
> 
> +	spin_lock_init(&partition->pt_mem_regions_lock);
>  	INIT_HLIST_HEAD(&partition->pt_mem_regions);
> 
>  	mshv_eventfd_init(partition);
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-12-04 16:03   ` Michael Kelley
@ 2025-12-04 21:08     ` Stanislav Kinsburskii
  2025-12-11 17:37       ` Michael Kelley
  0 siblings, 1 reply; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-04 21:08 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Dec 04, 2025 at 04:03:26PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> > 
> > The previous code assumed that if a region's first page was huge, the
> > entire region consisted of huge pages and stored this in a large_pages
> > flag. This premise is incorrect not only for movable regions (where
> > pages can be split and merged on invalidate callbacks or page faults),
> > but even for pinned regions: THPs can be split and merged during
> > allocation, so a large, pinned region may contain a mix of huge and
> > regular pages.
> > 
> > This change removes the large_pages flag and replaces region-wide
> > assumptions with per-chunk inspection of the actual page size when
> > mapping, unmapping, sharing, and unsharing. This makes huge page
> > handling correct for mixed-page regions and avoids relying on stale
> > metadata that can easily become invalid as memory is remapped.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++-
> > -----
> >  drivers/hv/mshv_root.h    |    3 -
> >  2 files changed, 184 insertions(+), 32 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index 35b866670840..d535d2e3e811 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -14,6 +14,124 @@
> > 
> >  #include "mshv_root.h"
> > 
> > +/**
> > + * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> > + *                             in a region.
> > + * @region     : Pointer to the memory region structure.
> > + * @flags      : Flags to pass to the handler.
> > + * @page_offset: Offset into the region's pages array to start processing.
> > + * @page_count : Number of pages to process.
> > + * @handler    : Callback function to handle the chunk.
> > + *
> > + * This function scans the region's pages starting from @page_offset,
> > + * checking for contiguous present pages of the same size (normal or huge).
> > + * It invokes @handler for the chunk of contiguous pages found. Returns the
> > + * number of pages handled, or a negative error code if the first page is
> > + * not present or the handler fails.
> > + *
> > + * Note: The @handler callback must be able to handle both normal and huge
> > + * pages.
> > + *
> > + * Return: Number of pages handled, or negative error code.
> > + */
> > +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > +				      u32 flags,
> > +				      u64 page_offset, u64 page_count,
> > +				      int (*handler)(struct mshv_mem_region *region,
> > +						     u32 flags,
> > +						     u64 page_offset,
> > +						     u64 page_count))
> > +{
> > +	u64 count, stride;
> > +	unsigned int page_order;
> > +	struct page *page;
> > +	int ret;
> > +
> > +	page = region->pages[page_offset];
> > +	if (!page)
> > +		return -EINVAL;
> > +
> > +	page_order = folio_order(page_folio(page));
> > +	/* 1G huge pages aren't supported by the hypercalls */
> > +	if (page_order == PUD_ORDER)
> > +		return -EINVAL;
> 
> In the general case, folio_order() could return a variety of values ranging from
> 0 up to at least PUD_ORDER.  For example, "2" would be valid value in file
> system code that uses folios to do I/O in 16K blocks instead of just 4K blocks.
> Since this function is trying to find contiguous chunks of either single pages
> or 2M huge pages, I think you are expecting only three possible values: 0,
> PMD_ORDER, or PUD_ORDER. But do you know that "2" (for example)
> would never be returned? The memory involved here is populated using
> pin_user_pages_fast() for the pinned case, or using hmm_range_fault() for
> the movable case. I don't know mm behavior well enough to know if those
> functions could ever populate with a folio with an order other than 0 or
> PMD_ORDER. If such a folio could ever be used, then the way you check
> for a page size change won't be valid. For purposes of informing Hyper-V
> about 2 Meg pages, folio orders 0 through 8 are all equivalent, with folio
> order 9 (PMD_ORDER) being the marker for the start of 2 Meg large page.
> 
> Somebody who knows mm behavior better than I do should comment. Or
> maybe you could just be more defensive and handle the case of folio orders
> not equal to 0 or PMD_ORDER.
> 

Thanks for the comment.
This is addressed this in v9 by expclitly checking for 0 and HUGE_PMD_ORDER.

> > +
> > +	stride = 1 << page_order;
> > +
> > +	/* Start at stride since the first page is validated */
> > +	for (count = stride; count < page_count; count += stride) {
> 
> This striding doesn't work properly in the general case. Suppose the
> page_offset value puts the start of the chunk in the middle of a 2 Meg
> page, and that 2 Meg page is then followed by a bunch of single pages.
> (Presumably the mmu notifier "invalidate" callback could do this.)
> The use of the full stride here jumps over the remaining portion of the
> 2 Meg page plus some number of the single pages, which isn't what you
> want. For the striding to work, it must figure out how much remains in the
> initial large page, and then once the striding is aligned to the large page
> boundaries, the full stride length works.
> 
> Also, what do the hypercalls in the handler functions do if a chunk starts
> in the middle of a 2 Meg page? It looks like the handler functions will set
> the *_LARGE_PAGE flag to the hypercall but then the hv_call_* function
> will fail if the page_count isn't 2 Meg aligned.
> 

This situation you described is not possible, because invalidation
callback simply can't invalidate a part of the huge page even in THP
case (leave aside hugetlb case) without splitting it beforehand, and
splitting a huge page requires invalidation of the whole huge page
first.

> > +		page = region->pages[page_offset + count];
> > +
> > +		/* Break if current page is not present */
> > +		if (!page)
> > +			break;
> > +
> > +		/* Break if page size changes */
> > +		if (page_order != folio_order(page_folio(page)))
> > +			break;
> > +	}
> > +
> > +	ret = handler(region, flags, page_offset, count);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return count;
> > +}
> > +
> > +/**
> > + * mshv_region_process_range - Processes a range of memory pages in a
> > + *                             region.
> > + * @region     : Pointer to the memory region structure.
> > + * @flags      : Flags to pass to the handler.
> > + * @page_offset: Offset into the region's pages array to start processing.
> > + * @page_count : Number of pages to process.
> > + * @handler    : Callback function to handle each chunk of contiguous
> > + *               pages.
> > + *
> > + * Iterates over the specified range of pages in @region, skipping
> > + * non-present pages. For each contiguous chunk of present pages, invokes
> > + * @handler via mshv_region_process_chunk.
> > + *
> > + * Note: The @handler callback must be able to handle both normal and huge
> > + * pages.
> > + *
> > + * Returns 0 on success, or a negative error code on failure.
> > + */
> > +static int mshv_region_process_range(struct mshv_mem_region *region,
> > +				     u32 flags,
> > +				     u64 page_offset, u64 page_count,
> > +				     int (*handler)(struct mshv_mem_region *region,
> > +						    u32 flags,
> > +						    u64 page_offset,
> > +						    u64 page_count))
> > +{
> > +	long ret;
> > +
> > +	if (page_offset + page_count > region->nr_pages)
> > +		return -EINVAL;
> > +
> > +	while (page_count) {
> > +		/* Skip non-present pages */
> > +		if (!region->pages[page_offset]) {
> > +			page_offset++;
> > +			page_count--;
> > +			continue;
> > +		}
> > +
> > +		ret = mshv_region_process_chunk(region, flags,
> > +						page_offset,
> > +						page_count,
> > +						handler);
> > +		if (ret < 0)
> > +			return ret;
> > +
> > +		page_offset += ret;
> > +		page_count -= ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> >  					   u64 uaddr, u32 flags,
> >  					   bool is_mmio)
> > @@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> >  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
> >  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> > 
> > -	/* Note: large_pages flag populated when we pin the pages */
> >  	if (!is_mmio)
> >  		region->flags.range_pinned = true;
> > 
> >  	return region;
> >  }
> > 
> > +static int mshv_region_chunk_share(struct mshv_mem_region *region,
> > +				   u32 flags,
> > +				   u64 page_offset, u64 page_count)
> > +{
> > +	if (PageTransCompound(region->pages[page_offset]))
> > +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> 
> mshv_region_process_chunk() uses folio_size() to detect single pages vs. 2 Meg
> large pages. Here you are using PageTransCompound(). Any reason for the
> difference? This may be perfectly OK, but my knowledge of mm is too limited to
> know for sure. Looking at the implementations of folio_size() and
> PageTransCompound(), they seem to be looking at different fields in the struct page,
> and I don't know if the different fields are always in sync. Another case for someone
> with mm expertise to review carefully ....
> 

Indeed, folio_order could be used here as well PageTransCompound could
be used in the chunk processing function (but then the size of the page
would still needed to be checked).
On the other hand, there is subtle difference between the chunk
procesing function and the callback in calls: the latter doesn't
validate the input, thus the chunk processing function should.

Thanks,
Stanislav

> Michael
> 
> > +
> > +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > +					      region->pages + page_offset,
> > +					      page_count,
> > +					      HV_MAP_GPA_READABLE |
> > +					      HV_MAP_GPA_WRITABLE,
> > +					      flags, true);
> > +}
> > +
> >  int mshv_region_share(struct mshv_mem_region *region)
> >  {
> >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> > 
> > -	if (region->flags.large_pages)
> > +	return mshv_region_process_range(region, flags,
> > +					 0, region->nr_pages,
> > +					 mshv_region_chunk_share);
> > +}
> > +
> > +static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> > +				     u32 flags,
> > +				     u64 page_offset, u64 page_count)
> > +{
> > +	if (PageTransCompound(region->pages[page_offset]))
> >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > 
> >  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > -			region->pages, region->nr_pages,
> > -			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> > -			flags, true);
> > +					      region->pages + page_offset,
> > +					      page_count, 0,
> > +					      flags, false);
> >  }
> > 
> >  int mshv_region_unshare(struct mshv_mem_region *region)
> >  {
> >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> > 
> > -	if (region->flags.large_pages)
> > -		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > -
> > -	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > -			region->pages, region->nr_pages,
> > -			0,
> > -			flags, false);
> > +	return mshv_region_process_range(region, flags,
> > +					 0, region->nr_pages,
> > +					 mshv_region_chunk_unshare);
> >  }
> > 
> > -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> > -				   u32 map_flags,
> > +static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> > +				   u32 flags,
> >  				   u64 page_offset, u64 page_count)
> >  {
> > -	if (page_offset + page_count > region->nr_pages)
> > -		return -EINVAL;
> > -
> > -	if (region->flags.large_pages)
> > -		map_flags |= HV_MAP_GPA_LARGE_PAGE;
> > +	if (PageTransCompound(region->pages[page_offset]))
> > +		flags |= HV_MAP_GPA_LARGE_PAGE;
> > 
> >  	return hv_call_map_gpa_pages(region->partition->pt_id,
> >  				     region->start_gfn + page_offset,
> > -				     page_count, map_flags,
> > +				     page_count, flags,
> >  				     region->pages + page_offset);
> >  }
> > 
> > +static int mshv_region_remap_pages(struct mshv_mem_region *region,
> > +				   u32 map_flags,
> > +				   u64 page_offset, u64 page_count)
> > +{
> > +	return mshv_region_process_range(region, map_flags,
> > +					 page_offset, page_count,
> > +					 mshv_region_chunk_remap);
> > +}
> > +
> >  int mshv_region_map(struct mshv_mem_region *region)
> >  {
> >  	u32 map_flags = region->hv_map_flags;
> > @@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
> >  			goto release_pages;
> >  	}
> > 
> > -	if (PageHuge(region->pages[0]))
> > -		region->flags.large_pages = true;
> > -
> >  	return 0;
> > 
> >  release_pages:
> > @@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
> >  	return ret;
> >  }
> > 
> > +static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> > +				   u32 flags,
> > +				   u64 page_offset, u64 page_count)
> > +{
> > +	if (PageTransCompound(region->pages[page_offset]))
> > +		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > +
> > +	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> > +				       region->start_gfn + page_offset,
> > +				       page_count, 0);
> > +}
> > +
> > +static int mshv_region_unmap(struct mshv_mem_region *region)
> > +{
> > +	return mshv_region_process_range(region, 0,
> > +					 0, region->nr_pages,
> > +					 mshv_region_chunk_unmap);
> > +}
> > +
> >  void mshv_region_destroy(struct mshv_mem_region *region)
> >  {
> >  	struct mshv_partition *partition = region->partition;
> > -	u32 unmap_flags = 0;
> >  	int ret;
> > 
> >  	hlist_del(&region->hnode);
> > @@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region *region)
> >  		}
> >  	}
> > 
> > -	if (region->flags.large_pages)
> > -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > -
> > -	/* ignore unmap failures and continue as process may be exiting */
> > -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> > -				region->nr_pages, unmap_flags);
> > +	mshv_region_unmap(region);
> > 
> >  	mshv_region_invalidate(region);
> > 
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index 0366f416c2f0..ff3374f13691 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -77,9 +77,8 @@ struct mshv_mem_region {
> >  	u64 start_uaddr;
> >  	u32 hv_map_flags;
> >  	struct {
> > -		u64 large_pages:  1; /* 2MiB */
> >  		u64 range_pinned: 1;
> > -		u64 reserved:	 62;
> > +		u64 reserved:	 63;
> >  	} flags;
> >  	struct mshv_partition *partition;
> >  	struct page *pages[];
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions
  2025-12-04 16:48   ` Michael Kelley
@ 2025-12-04 21:23     ` Stanislav Kinsburskii
  0 siblings, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-04 21:23 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Dec 04, 2025 at 04:48:01PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> > 
> > Introduce kref-based reference counting and spinlock protection for
> > memory regions in Hyper-V partition management. This change improves
> > memory region lifecycle management and ensures thread-safe access to the
> > region list.
> > 
> > Also improves the check for overlapped memory regions during region
> > creation, preventing duplicate or conflicting mappings.
> 
> This paragraph seems spurious. I think it applies to what's in Patch 5 of this
> series.
> 

Indeed,this chunk escaped cleanup after refactoring.

> > 
> > Previously, the regions list was protected by the partition mutex.
> > However, this approach is too heavy for frequent fault and invalidation
> > operations. Finer grained locking is now used to improve efficiency and
> > concurrency.
> > 
> > This is a precursor to supporting movable memory regions. Fault and
> > invalidation handling for movable regions will require safe traversal of
> > the region list and holding a region reference while performing
> > invalidation or fault operations.
> 
> The commit message discussion about the need for the refcounting and
> locking seemed a bit vague to me. It wasn't entirely clear whether these
> changes are bug fixing existing race conditions, or whether they are new
> functionality to support movable regions.
> 
> In looking at the existing code, it seems that the main serialization mechanisms
> are that partition ioctls are serialized on pt_mutex, and VP ioctls are serialized
> on vp_mutex (though multiple VP ioctls can be in progress simultaneously
> against different VPs). The serialization of partition ioctls ensures that region
> manipulation is serialized, and that, for example, two region creations can't
> both verify that there's no overlap, but then overlap with each other. And
> region creation and deletion are serialized. In current code, the VP ioctls don't
> look at the region data structures, so there can't be any races between
> partition and VP ioctls (which are not serialized with each other). The only
> question I had about existing code is the mshv_partition_release() function,
> which proceeds without serializing against any partition ioctls, but maybe
> higher-level file system code ensures that no ioctls are in progress before
> the .release callback is made.
> 
> The new requirement is movable regions, where the VP ioctl MSHV_RUN_VP
> needs to look at region data structures. You've said that in the last paragraph
> of your commit message. So I'm reading this as that the new locking is
> needed specifically because multiple MSHV_RUN_VP ioctls will likely be
> in flight simultaneously, and they are not currently serialized with the
> region operations initiated by partition ioctls. And then there are the
> "invalidate" callbacks that are running on some other kernel thread and
> which also needs synchronization to do region manipulation.
> 
> Maybe I'm just looking for a little bit of a written "road map" somewhere
> that describes the intended locking scheme at a high level. :-)
> 
> Michael
> 

You understand this correctly.

In short, there were only two concurrent operations on regions before
movable pages were introduced: addition and removal. Both could happen
only via the partition ioctl, which is serialized by the partition
mutex, so everything was simple.

With the introduction of movable pages, regions — both the list of
regions and the region contents themselves — are accessed by partition
VP threads, which do not hold the partition mutex. While access to
region contents is protected by a per-region mutex, nothing prevents the
VMM from removing and destroying a region from underneath a VP thread
that is currently servicing a page fault or invalidation. This, in turn,
leads to a general protection fault.

This commit solves the issue by making the region a reference-counted
object so it persists while being serviced, and by adding a spinlock to
protect list traversal.

Thanks, Stanislav

> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/mshv_regions.c   |   19 ++++++++++++++++---
> >  drivers/hv/mshv_root.h      |    6 +++++-
> >  drivers/hv/mshv_root_main.c |   34 ++++++++++++++++++++++++++--------
> >  3 files changed, 47 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index d535d2e3e811..6450a7ed8493 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -7,6 +7,7 @@
> >   * Authors: Microsoft Linux virtualization team
> >   */
> > 
> > +#include <linux/kref.h>
> >  #include <linux/mm.h>
> >  #include <linux/vmalloc.h>
> > 
> > @@ -154,6 +155,8 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> >  	if (!is_mmio)
> >  		region->flags.range_pinned = true;
> > 
> > +	kref_init(&region->refcount);
> > +
> >  	return region;
> >  }
> > 
> > @@ -303,13 +306,13 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
> >  					 mshv_region_chunk_unmap);
> >  }
> > 
> > -void mshv_region_destroy(struct mshv_mem_region *region)
> > +static void mshv_region_destroy(struct kref *ref)
> >  {
> > +	struct mshv_mem_region *region =
> > +		container_of(ref, struct mshv_mem_region, refcount);
> >  	struct mshv_partition *partition = region->partition;
> >  	int ret;
> > 
> > -	hlist_del(&region->hnode);
> > -
> >  	if (mshv_partition_encrypted(partition)) {
> >  		ret = mshv_region_share(region);
> >  		if (ret) {
> > @@ -326,3 +329,13 @@ void mshv_region_destroy(struct mshv_mem_region *region)
> > 
> >  	vfree(region);
> >  }
> > +
> > +void mshv_region_put(struct mshv_mem_region *region)
> > +{
> > +	kref_put(&region->refcount, mshv_region_destroy);
> > +}
> > +
> > +int mshv_region_get(struct mshv_mem_region *region)
> > +{
> > +	return kref_get_unless_zero(&region->refcount);
> > +}
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index ff3374f13691..4249534ba900 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -72,6 +72,7 @@ do { \
> > 
> >  struct mshv_mem_region {
> >  	struct hlist_node hnode;
> > +	struct kref refcount;
> >  	u64 nr_pages;
> >  	u64 start_gfn;
> >  	u64 start_uaddr;
> > @@ -97,6 +98,8 @@ struct mshv_partition {
> >  	u64 pt_id;
> >  	refcount_t pt_ref_count;
> >  	struct mutex pt_mutex;
> > +
> > +	spinlock_t pt_mem_regions_lock;
> >  	struct hlist_head pt_mem_regions; // not ordered
> > 
> >  	u32 pt_vp_count;
> > @@ -319,6 +322,7 @@ int mshv_region_unshare(struct mshv_mem_region *region);
> >  int mshv_region_map(struct mshv_mem_region *region);
> >  void mshv_region_invalidate(struct mshv_mem_region *region);
> >  int mshv_region_pin(struct mshv_mem_region *region);
> > -void mshv_region_destroy(struct mshv_mem_region *region);
> > +void mshv_region_put(struct mshv_mem_region *region);
> > +int mshv_region_get(struct mshv_mem_region *region);
> > 
> >  #endif /* _MSHV_ROOT_H_ */
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index ae600b927f49..1ef2a28beb17 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1086,9 +1086,13 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
> >  	u64 nr_pages = HVPFN_DOWN(mem->size);
> > 
> >  	/* Reject overlapping regions */
> > +	spin_lock(&partition->pt_mem_regions_lock);
> >  	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> > -	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
> > +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1)) {
> > +		spin_unlock(&partition->pt_mem_regions_lock);
> >  		return -EEXIST;
> > +	}
> > +	spin_unlock(&partition->pt_mem_regions_lock);
> > 
> >  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
> >  				mem->userspace_addr, mem->flags,
> > @@ -1220,8 +1224,9 @@ mshv_map_user_memory(struct mshv_partition *partition,
> >  	if (ret)
> >  		goto errout;
> > 
> > -	/* Install the new region */
> > +	spin_lock(&partition->pt_mem_regions_lock);
> >  	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> > +	spin_unlock(&partition->pt_mem_regions_lock);
> > 
> >  	return 0;
> > 
> > @@ -1240,17 +1245,27 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
> >  	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
> >  		return -EINVAL;
> > 
> > +	spin_lock(&partition->pt_mem_regions_lock);
> > +
> >  	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> > -	if (!region)
> > -		return -EINVAL;
> > +	if (!region) {
> > +		spin_unlock(&partition->pt_mem_regions_lock);
> > +		return -ENOENT;
> > +	}
> > 
> >  	/* Paranoia check */
> >  	if (region->start_uaddr != mem.userspace_addr ||
> >  	    region->start_gfn != mem.guest_pfn ||
> > -	    region->nr_pages != HVPFN_DOWN(mem.size))
> > +	    region->nr_pages != HVPFN_DOWN(mem.size)) {
> > +		spin_unlock(&partition->pt_mem_regions_lock);
> >  		return -EINVAL;
> > +	}
> > +
> > +	hlist_del(&region->hnode);
> > 
> > -	mshv_region_destroy(region);
> > +	spin_unlock(&partition->pt_mem_regions_lock);
> > +
> > +	mshv_region_put(region);
> > 
> >  	return 0;
> >  }
> > @@ -1653,8 +1668,10 @@ static void destroy_partition(struct mshv_partition *partition)
> >  	remove_partition(partition);
> > 
> >  	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
> > -				  hnode)
> > -		mshv_region_destroy(region);
> > +				  hnode) {
> > +		hlist_del(&region->hnode);
> > +		mshv_region_put(region);
> > +	}
> > 
> >  	/* Withdraw and free all pages we deposited */
> >  	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> > @@ -1852,6 +1869,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> > 
> >  	INIT_HLIST_HEAD(&partition->pt_devices);
> > 
> > +	spin_lock_init(&partition->pt_mem_regions_lock);
> >  	INIT_HLIST_HEAD(&partition->pt_mem_regions);
> > 
> >  	mshv_eventfd_init(partition);
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-12-04 21:08     ` Stanislav Kinsburskii
@ 2025-12-11 17:37       ` Michael Kelley
  2025-12-15 20:12         ` Stanislav Kinsburskii
  2025-12-17  0:54         ` Stanislav Kinsburskii
  0 siblings, 2 replies; 30+ messages in thread
From: Michael Kelley @ 2025-12-11 17:37 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, December 4, 2025 1:09 PM
> 
> On Thu, Dec 04, 2025 at 04:03:26PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> > >

[snip]

> > > +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> > > +				      u32 flags,
> > > +				      u64 page_offset, u64 page_count,
> > > +				      int (*handler)(struct mshv_mem_region *region,
> > > +						     u32 flags,
> > > +						     u64 page_offset,
> > > +						     u64 page_count))
> > > +{
> > > +	u64 count, stride;
> > > +	unsigned int page_order;
> > > +	struct page *page;
> > > +	int ret;
> > > +
> > > +	page = region->pages[page_offset];
> > > +	if (!page)
> > > +		return -EINVAL;
> > > +
> > > +	page_order = folio_order(page_folio(page));
> > > +	/* 1G huge pages aren't supported by the hypercalls */
> > > +	if (page_order == PUD_ORDER)
> > > +		return -EINVAL;
> >
> > In the general case, folio_order() could return a variety of values ranging from
> > 0 up to at least PUD_ORDER.  For example, "2" would be valid value in file
> > system code that uses folios to do I/O in 16K blocks instead of just 4K blocks.
> > Since this function is trying to find contiguous chunks of either single pages
> > or 2M huge pages, I think you are expecting only three possible values: 0,
> > PMD_ORDER, or PUD_ORDER. But do you know that "2" (for example)
> > would never be returned? The memory involved here is populated using
> > pin_user_pages_fast() for the pinned case, or using hmm_range_fault() for
> > the movable case. I don't know mm behavior well enough to know if those
> > functions could ever populate with a folio with an order other than 0 or
> > PMD_ORDER. If such a folio could ever be used, then the way you check
> > for a page size change won't be valid. For purposes of informing Hyper-V
> > about 2 Meg pages, folio orders 0 through 8 are all equivalent, with folio
> > order 9 (PMD_ORDER) being the marker for the start of 2 Meg large page.
> >
> > Somebody who knows mm behavior better than I do should comment. Or
> > maybe you could just be more defensive and handle the case of folio orders
> > not equal to 0 or PMD_ORDER.
> >
> 
> Thanks for the comment.
> This is addressed this in v9 by expclitly checking for 0 and HUGE_PMD_ORDER.
> 
> > > +
> > > +	stride = 1 << page_order;
> > > +
> > > +	/* Start at stride since the first page is validated */
> > > +	for (count = stride; count < page_count; count += stride) {
> >
> > This striding doesn't work properly in the general case. Suppose the
> > page_offset value puts the start of the chunk in the middle of a 2 Meg
> > page, and that 2 Meg page is then followed by a bunch of single pages.
> > (Presumably the mmu notifier "invalidate" callback could do this.)
> > The use of the full stride here jumps over the remaining portion of the
> > 2 Meg page plus some number of the single pages, which isn't what you
> > want. For the striding to work, it must figure out how much remains in the
> > initial large page, and then once the striding is aligned to the large page
> > boundaries, the full stride length works.
> >
> > Also, what do the hypercalls in the handler functions do if a chunk starts
> > in the middle of a 2 Meg page? It looks like the handler functions will set
> > the *_LARGE_PAGE flag to the hypercall but then the hv_call_* function
> > will fail if the page_count isn't 2 Meg aligned.
> >
> 
> This situation you described is not possible, because invalidation
> callback simply can't invalidate a part of the huge page even in THP
> case (leave aside hugetlb case) without splitting it beforehand, and
> splitting a huge page requires invalidation of the whole huge page
> first.

I've been playing around with mmu notifiers and 2 Meg pages. At least in my
experiment, there's a case where the .invalidate callback is invoked on a
range *before* the 2 Meg page is split. The kernel code that does this is
in zap_page_range_single_batched(). Early on this function calls
mmu_notifier_invalidate_range_start(), which invokes the .invalidate
callback on the initial range. Later on, unmap_single_vma() is called, which
does the split and eventually makes a second .invalidate callback for the
entire 2 Meg page.

Details:  My experiment is a user space program that does the following:

1. Allocates 16 Megs of memory on a 16 Meg boundary using
posix_memalign(). So this is private anonymous memory. Transparent
huge pages are enabled.

2. Writes to a byte in each 4K page so they are all populated. 
/proc/meminfo shows eight 2 Meg pages have been allocated.

3. Creates an mmu notifier for the allocated 16 Megs, using an ioctl
hacked into the kernel for experimentation purposes.

4. Uses madvise() with the DONTNEED option to free 32 Kbytes on a 4K
page boundary somewhere in the 16 Meg allocation. This results in an mmu
notifier invalidate callback for that 32 Kbytes. Then there's a second invalidate
callback covering the entire 2 Meg page that contains the 32 Kbyte range.
Kernel stack traces for the two invalidate callbacks show them originating
in zap_page_range_single_batched().

5. Sleeps for 60 seconds. During that time, khugepaged wakes up and does
hpage_collapse_scan_pmd() -> collapse_huge_page(), which generates a third
.invalidate callback for the 2 Meg page. I'm haven't investigated what this is
all about.

6. Interestingly, if Step 4 above does a slightly different operation using
mprotect() with PROT_READ instead of madvise(), the 2 Meg page is split first.
The .invalidate callback for the full 2 Meg happens before the .invalidate
callback for the specified range.

The root partition probably isn't doing madvise() with DONTNEED for memory
allocated for guests. But regardless of what user space does or doesn't do, MSHV's
invalidate callback path should be made safe for this case. Maybe that's just
detecting it and returning an error (and maybe a WARN_ON) if user space
doesn't need it to work.

Michael

> 
> > > +		page = region->pages[page_offset + count];
> > > +
> > > +		/* Break if current page is not present */
> > > +		if (!page)
> > > +			break;
> > > +
> > > +		/* Break if page size changes */
> > > +		if (page_order != folio_order(page_folio(page)))
> > > +			break;
> > > +	}
> > > +
> > > +	ret = handler(region, flags, page_offset, count);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	return count;
> > > +}
> > > +
> > > +/**
> > > + * mshv_region_process_range - Processes a range of memory pages in a
> > > + *                             region.
> > > + * @region     : Pointer to the memory region structure.
> > > + * @flags      : Flags to pass to the handler.
> > > + * @page_offset: Offset into the region's pages array to start processing.
> > > + * @page_count : Number of pages to process.
> > > + * @handler    : Callback function to handle each chunk of contiguous
> > > + *               pages.
> > > + *
> > > + * Iterates over the specified range of pages in @region, skipping
> > > + * non-present pages. For each contiguous chunk of present pages, invokes
> > > + * @handler via mshv_region_process_chunk.
> > > + *
> > > + * Note: The @handler callback must be able to handle both normal and huge
> > > + * pages.
> > > + *
> > > + * Returns 0 on success, or a negative error code on failure.
> > > + */
> > > +static int mshv_region_process_range(struct mshv_mem_region *region,
> > > +				     u32 flags,
> > > +				     u64 page_offset, u64 page_count,
> > > +				     int (*handler)(struct mshv_mem_region *region,
> > > +						    u32 flags,
> > > +						    u64 page_offset,
> > > +						    u64 page_count))
> > > +{
> > > +	long ret;
> > > +
> > > +	if (page_offset + page_count > region->nr_pages)
> > > +		return -EINVAL;
> > > +
> > > +	while (page_count) {
> > > +		/* Skip non-present pages */
> > > +		if (!region->pages[page_offset]) {
> > > +			page_offset++;
> > > +			page_count--;
> > > +			continue;
> > > +		}
> > > +
> > > +		ret = mshv_region_process_chunk(region, flags,
> > > +						page_offset,
> > > +						page_count,
> > > +						handler);
> > > +		if (ret < 0)
> > > +			return ret;
> > > +
> > > +		page_offset += ret;
> > > +		page_count -= ret;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> > >  					   u64 uaddr, u32 flags,
> > >  					   bool is_mmio)
> > > @@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64
> guest_pfn, u64 nr_pages,
> > >  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
> > >  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
> > >
> > > -	/* Note: large_pages flag populated when we pin the pages */
> > >  	if (!is_mmio)
> > >  		region->flags.range_pinned = true;
> > >
> > >  	return region;
> > >  }
> > >
> > > +static int mshv_region_chunk_share(struct mshv_mem_region *region,
> > > +				   u32 flags,
> > > +				   u64 page_offset, u64 page_count)
> > > +{
> > > +	if (PageTransCompound(region->pages[page_offset]))
> > > +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> >
> > mshv_region_process_chunk() uses folio_size() to detect single pages vs. 2 Meg
> > large pages. Here you are using PageTransCompound(). Any reason for the
> > difference? This may be perfectly OK, but my knowledge of mm is too limited to
> > know for sure. Looking at the implementations of folio_size() and
> > PageTransCompound(), they seem to be looking at different fields in the struct page,
> > and I don't know if the different fields are always in sync. Another case for someone
> > with mm expertise to review carefully ....
> >
> 
> Indeed, folio_order could be used here as well PageTransCompound could
> be used in the chunk processing function (but then the size of the page
> would still needed to be checked).
> On the other hand, there is subtle difference between the chunk
> procesing function and the callback in calls: the latter doesn't
> validate the input, thus the chunk processing function should.
> 
> Thanks,
> Stanislav
> 
> > Michael
> >
> > > +
> > > +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > > +					      region->pages + page_offset,
> > > +					      page_count,
> > > +					      HV_MAP_GPA_READABLE |
> > > +					      HV_MAP_GPA_WRITABLE,
> > > +					      flags, true);
> > > +}
> > > +
> > >  int mshv_region_share(struct mshv_mem_region *region)
> > >  {
> > >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
> > >
> > > -	if (region->flags.large_pages)
> > > +	return mshv_region_process_range(region, flags,
> > > +					 0, region->nr_pages,
> > > +					 mshv_region_chunk_share);
> > > +}
> > > +
> > > +static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> > > +				     u32 flags,
> > > +				     u64 page_offset, u64 page_count)
> > > +{
> > > +	if (PageTransCompound(region->pages[page_offset]))
> > >  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > >
> > >  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > > -			region->pages, region->nr_pages,
> > > -			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> > > -			flags, true);
> > > +					      region->pages + page_offset,
> > > +					      page_count, 0,
> > > +					      flags, false);
> > >  }
> > >
> > >  int mshv_region_unshare(struct mshv_mem_region *region)
> > >  {
> > >  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
> > >
> > > -	if (region->flags.large_pages)
> > > -		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> > > -
> > > -	return hv_call_modify_spa_host_access(region->partition->pt_id,
> > > -			region->pages, region->nr_pages,
> > > -			0,
> > > -			flags, false);
> > > +	return mshv_region_process_range(region, flags,
> > > +					 0, region->nr_pages,
> > > +					 mshv_region_chunk_unshare);
> > >  }
> > >
> > > -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> > > -				   u32 map_flags,
> > > +static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> > > +				   u32 flags,
> > >  				   u64 page_offset, u64 page_count)
> > >  {
> > > -	if (page_offset + page_count > region->nr_pages)
> > > -		return -EINVAL;
> > > -
> > > -	if (region->flags.large_pages)
> > > -		map_flags |= HV_MAP_GPA_LARGE_PAGE;
> > > +	if (PageTransCompound(region->pages[page_offset]))
> > > +		flags |= HV_MAP_GPA_LARGE_PAGE;
> > >
> > >  	return hv_call_map_gpa_pages(region->partition->pt_id,
> > >  				     region->start_gfn + page_offset,
> > > -				     page_count, map_flags,
> > > +				     page_count, flags,
> > >  				     region->pages + page_offset);
> > >  }
> > >
> > > +static int mshv_region_remap_pages(struct mshv_mem_region *region,
> > > +				   u32 map_flags,
> > > +				   u64 page_offset, u64 page_count)
> > > +{
> > > +	return mshv_region_process_range(region, map_flags,
> > > +					 page_offset, page_count,
> > > +					 mshv_region_chunk_remap);
> > > +}
> > > +
> > >  int mshv_region_map(struct mshv_mem_region *region)
> > >  {
> > >  	u32 map_flags = region->hv_map_flags;
> > > @@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
> > >  			goto release_pages;
> > >  	}
> > >
> > > -	if (PageHuge(region->pages[0]))
> > > -		region->flags.large_pages = true;
> > > -
> > >  	return 0;
> > >
> > >  release_pages:
> > > @@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
> > >  	return ret;
> > >  }
> > >
> > > +static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> > > +				   u32 flags,
> > > +				   u64 page_offset, u64 page_count)
> > > +{
> > > +	if (PageTransCompound(region->pages[page_offset]))
> > > +		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > > +
> > > +	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> > > +				       region->start_gfn + page_offset,
> > > +				       page_count, 0);
> > > +}
> > > +
> > > +static int mshv_region_unmap(struct mshv_mem_region *region)
> > > +{
> > > +	return mshv_region_process_range(region, 0,
> > > +					 0, region->nr_pages,
> > > +					 mshv_region_chunk_unmap);
> > > +}
> > > +
> > >  void mshv_region_destroy(struct mshv_mem_region *region)
> > >  {
> > >  	struct mshv_partition *partition = region->partition;
> > > -	u32 unmap_flags = 0;
> > >  	int ret;
> > >
> > >  	hlist_del(&region->hnode);
> > > @@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region
> *region)
> > >  		}
> > >  	}
> > >
> > > -	if (region->flags.large_pages)
> > > -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > > -
> > > -	/* ignore unmap failures and continue as process may be exiting */
> > > -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> > > -				region->nr_pages, unmap_flags);
> > > +	mshv_region_unmap(region);
> > >
> > >  	mshv_region_invalidate(region);
> > >
> > > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > > index 0366f416c2f0..ff3374f13691 100644
> > > --- a/drivers/hv/mshv_root.h
> > > +++ b/drivers/hv/mshv_root.h
> > > @@ -77,9 +77,8 @@ struct mshv_mem_region {
> > >  	u64 start_uaddr;
> > >  	u32 hv_map_flags;
> > >  	struct {
> > > -		u64 large_pages:  1; /* 2MiB */
> > >  		u64 range_pinned: 1;
> > > -		u64 reserved:	 62;
> > > +		u64 reserved:	 63;
> > >  	} flags;
> > >  	struct mshv_partition *partition;
> > >  	struct page *pages[];
> > >
> > >
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-12-11 17:37       ` Michael Kelley
@ 2025-12-15 20:12         ` Stanislav Kinsburskii
  2025-12-17  0:54         ` Stanislav Kinsburskii
  1 sibling, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-15 20:12 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Dec 11, 2025 at 05:37:26PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, December 4, 2025 1:09 PM
> > 
> > On Thu, Dec 04, 2025 at 04:03:26PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> > > >
> 
> [snip]
> 

<snip>

> > > > +
> > > > +	stride = 1 << page_order;
> > > > +
> > > > +	/* Start at stride since the first page is validated */
> > > > +	for (count = stride; count < page_count; count += stride) {
> > >
> > > This striding doesn't work properly in the general case. Suppose the
> > > page_offset value puts the start of the chunk in the middle of a 2 Meg
> > > page, and that 2 Meg page is then followed by a bunch of single pages.
> > > (Presumably the mmu notifier "invalidate" callback could do this.)
> > > The use of the full stride here jumps over the remaining portion of the
> > > 2 Meg page plus some number of the single pages, which isn't what you
> > > want. For the striding to work, it must figure out how much remains in the
> > > initial large page, and then once the striding is aligned to the large page
> > > boundaries, the full stride length works.
> > >
> > > Also, what do the hypercalls in the handler functions do if a chunk starts
> > > in the middle of a 2 Meg page? It looks like the handler functions will set
> > > the *_LARGE_PAGE flag to the hypercall but then the hv_call_* function
> > > will fail if the page_count isn't 2 Meg aligned.
> > >
> > 
> > This situation you described is not possible, because invalidation
> > callback simply can't invalidate a part of the huge page even in THP
> > case (leave aside hugetlb case) without splitting it beforehand, and
> > splitting a huge page requires invalidation of the whole huge page
> > first.
> 
> I've been playing around with mmu notifiers and 2 Meg pages. At least in my
> experiment, there's a case where the .invalidate callback is invoked on a
> range *before* the 2 Meg page is split. The kernel code that does this is
> in zap_page_range_single_batched(). Early on this function calls
> mmu_notifier_invalidate_range_start(), which invokes the .invalidate
> callback on the initial range. Later on, unmap_single_vma() is called, which
> does the split and eventually makes a second .invalidate callback for the
> entire 2 Meg page.
> 
> Details:  My experiment is a user space program that does the following:
> 
> 1. Allocates 16 Megs of memory on a 16 Meg boundary using
> posix_memalign(). So this is private anonymous memory. Transparent
> huge pages are enabled.
> 
> 2. Writes to a byte in each 4K page so they are all populated. 
> /proc/meminfo shows eight 2 Meg pages have been allocated.
> 
> 3. Creates an mmu notifier for the allocated 16 Megs, using an ioctl
> hacked into the kernel for experimentation purposes.
> 
> 4. Uses madvise() with the DONTNEED option to free 32 Kbytes on a 4K
> page boundary somewhere in the 16 Meg allocation. This results in an mmu
> notifier invalidate callback for that 32 Kbytes. Then there's a second invalidate
> callback covering the entire 2 Meg page that contains the 32 Kbyte range.
> Kernel stack traces for the two invalidate callbacks show them originating
> in zap_page_range_single_batched().
> 
> 5. Sleeps for 60 seconds. During that time, khugepaged wakes up and does
> hpage_collapse_scan_pmd() -> collapse_huge_page(), which generates a third
> .invalidate callback for the 2 Meg page. I'm haven't investigated what this is
> all about.
> 
> 6. Interestingly, if Step 4 above does a slightly different operation using
> mprotect() with PROT_READ instead of madvise(), the 2 Meg page is split first.
> The .invalidate callback for the full 2 Meg happens before the .invalidate
> callback for the specified range.
> 
> The root partition probably isn't doing madvise() with DONTNEED for memory
> allocated for guests. But regardless of what user space does or doesn't do, MSHV's
> invalidate callback path should be made safe for this case. Maybe that's just
> detecting it and returning an error (and maybe a WARN_ON) if user space
> doesn't need it to work.
> 

This is a deep research, Michael. Thanks a lot for you effort.
I'll think more about it and will likely follow up.

Thank you,
Stanislav

> Michael
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
  2025-12-11 17:37       ` Michael Kelley
  2025-12-15 20:12         ` Stanislav Kinsburskii
@ 2025-12-17  0:54         ` Stanislav Kinsburskii
  1 sibling, 0 replies; 30+ messages in thread
From: Stanislav Kinsburskii @ 2025-12-17  0:54 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Dec 11, 2025 at 05:37:26PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, December 4, 2025 1:09 PM

<snip>


> I've been playing around with mmu notifiers and 2 Meg pages. At least in my
> experiment, there's a case where the .invalidate callback is invoked on a
> range *before* the 2 Meg page is split. The kernel code that does this is
> in zap_page_range_single_batched(). Early on this function calls
> mmu_notifier_invalidate_range_start(), which invokes the .invalidate
> callback on the initial range. Later on, unmap_single_vma() is called, which
> does the split and eventually makes a second .invalidate callback for the
> entire 2 Meg page.
> 
> Details:  My experiment is a user space program that does the following:
> 
> 1. Allocates 16 Megs of memory on a 16 Meg boundary using
> posix_memalign(). So this is private anonymous memory. Transparent
> huge pages are enabled.
> 
> 2. Writes to a byte in each 4K page so they are all populated. 
> /proc/meminfo shows eight 2 Meg pages have been allocated.
> 
> 3. Creates an mmu notifier for the allocated 16 Megs, using an ioctl
> hacked into the kernel for experimentation purposes.
> 
> 4. Uses madvise() with the DONTNEED option to free 32 Kbytes on a 4K
> page boundary somewhere in the 16 Meg allocation. This results in an mmu
> notifier invalidate callback for that 32 Kbytes. Then there's a second invalidate
> callback covering the entire 2 Meg page that contains the 32 Kbyte range.
> Kernel stack traces for the two invalidate callbacks show them originating
> in zap_page_range_single_batched().
> 
> 5. Sleeps for 60 seconds. During that time, khugepaged wakes up and does
> hpage_collapse_scan_pmd() -> collapse_huge_page(), which generates a third
> .invalidate callback for the 2 Meg page. I'm haven't investigated what this is
> all about.
> 
> 6. Interestingly, if Step 4 above does a slightly different operation using
> mprotect() with PROT_READ instead of madvise(), the 2 Meg page is split first.
> The .invalidate callback for the full 2 Meg happens before the .invalidate
> callback for the specified range.
> 
> The root partition probably isn't doing madvise() with DONTNEED for memory
> allocated for guests. But regardless of what user space does or doesn't do, MSHV's
> invalidate callback path should be made safe for this case. Maybe that's just
> detecting it and returning an error (and maybe a WARN_ON) if user space
> doesn't need it to work.
> 
> Michael
> 

The issue is addressed by "mshv: Align huge page stride with guest
mapping" patch.

Thanks a lot once again for your help in identifying it,
Stanislav

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-12-17  0:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
2025-12-01 11:20   ` Anirudh Rayabharam
2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
2025-12-01 11:12   ` Anirudh Rayabharam
2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
2025-12-01 11:06   ` Anirudh Rayabharam
2025-12-01 16:46     ` Stanislav Kinsburskii
2025-12-03 18:13   ` Nuno Das Neves
2025-12-03 18:20     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
2025-11-27 10:59   ` kernel test robot
2025-12-01 15:09   ` Anirudh Rayabharam
2025-12-01 18:26     ` Stanislav Kinsburskii
2025-12-03 18:50   ` Nuno Das Neves
2025-12-04 16:03   ` Michael Kelley
2025-12-04 21:08     ` Stanislav Kinsburskii
2025-12-11 17:37       ` Michael Kelley
2025-12-15 20:12         ` Stanislav Kinsburskii
2025-12-17  0:54         ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
2025-12-01 15:06   ` Anirudh Rayabharam
2025-12-02 18:39   ` Michael Kelley
2025-12-03 17:46     ` Stanislav Kinsburskii
2025-12-03 18:58   ` Nuno Das Neves
2025-12-03 19:36     ` Nuno Das Neves
2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
2025-12-04 16:48   ` Michael Kelley
2025-12-04 21:23     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).