public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nuno Das Neves <nunodasneves@linux.microsoft.com>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
Date: Wed, 3 Dec 2025 10:50:16 -0800	[thread overview]
Message-ID: <f483d08c-3ff8-4d8f-88b6-c5a89334cb4b@linux.microsoft.com> (raw)
In-Reply-To: <176412295155.447063.16512843211428609586.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On 11/25/2025 6:09 PM, Stanislav Kinsburskii wrote:
> The previous code assumed that if a region's first page was huge, the
> entire region consisted of huge pages and stored this in a large_pages
> flag. This premise is incorrect not only for movable regions (where
> pages can be split and merged on invalidate callbacks or page faults),
> but even for pinned regions: THPs can be split and merged during
> allocation, so a large, pinned region may contain a mix of huge and
> regular pages.
> 
> This change removes the large_pages flag and replaces region-wide
> assumptions with per-chunk inspection of the actual page size when
> mapping, unmapping, sharing, and unsharing. This makes huge page
> handling correct for mixed-page regions and avoids relying on stale
> metadata that can easily become invalid as memory is remapped.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c |  213 +++++++++++++++++++++++++++++++++++++++------
>  drivers/hv/mshv_root.h    |    3 -
>  2 files changed, 184 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index 35b866670840..d535d2e3e811 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -14,6 +14,124 @@
>  
>  #include "mshv_root.h"
>  
> +/**
> + * mshv_region_process_chunk - Processes a contiguous chunk of memory pages
> + *                             in a region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle the chunk.
> + *
> + * This function scans the region's pages starting from @page_offset,
> + * checking for contiguous present pages of the same size (normal or huge).
> + * It invokes @handler for the chunk of contiguous pages found. Returns the
> + * number of pages handled, or a negative error code if the first page is
> + * not present or the handler fails.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Return: Number of pages handled, or negative error code.
> + */
> +static long mshv_region_process_chunk(struct mshv_mem_region *region,
> +				      u32 flags,
> +				      u64 page_offset, u64 page_count,
> +				      int (*handler)(struct mshv_mem_region *region,
> +						     u32 flags,
> +						     u64 page_offset,
> +						     u64 page_count))
> +{
> +	u64 count, stride;
> +	unsigned int page_order;
> +	struct page *page;
> +	int ret;
> +
> +	page = region->pages[page_offset];
> +	if (!page)
> +		return -EINVAL;
> +
> +	page_order = folio_order(page_folio(page));
> +	/* 1G huge pages aren't supported by the hypercalls */
> +	if (page_order == PUD_ORDER)
> +		return -EINVAL;

I'd prefer to be explicit about exactly which page_orders we *do*
support instead of just disallowing PUD_ORDER.

Without looking up folio_order(), there's an implication here that
page_order can be anything except PUD_ORDER, but that's not the case;
there's only 2 valid values for page_order.

The comment can instead read something like:
"The hypervisor only supports 4K and 2M page sizes"

> +
> +	stride = 1 << page_order;
> +> +	/* Start at stride since the first page is validated */
> +	for (count = stride; count < page_count; count += stride) {
> +		page = region->pages[page_offset + count];
> +
> +		/* Break if current page is not present */
> +		if (!page)
> +			break;
> +
> +		/* Break if page size changes */
> +		if (page_order != folio_order(page_folio(page)))
> +			break;
> +	}
> +
> +	ret = handler(region, flags, page_offset, count);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +/**
> + * mshv_region_process_range - Processes a range of memory pages in a
> + *                             region.
> + * @region     : Pointer to the memory region structure.
> + * @flags      : Flags to pass to the handler.
> + * @page_offset: Offset into the region's pages array to start processing.
> + * @page_count : Number of pages to process.
> + * @handler    : Callback function to handle each chunk of contiguous
> + *               pages.
> + *
> + * Iterates over the specified range of pages in @region, skipping
> + * non-present pages. For each contiguous chunk of present pages, invokes
> + * @handler via mshv_region_process_chunk.
> + *
> + * Note: The @handler callback must be able to handle both normal and huge
> + * pages.
> + *
> + * Returns 0 on success, or a negative error code on failure.
> + */
> +static int mshv_region_process_range(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count,
> +				     int (*handler)(struct mshv_mem_region *region,
> +						    u32 flags,
> +						    u64 page_offset,
> +						    u64 page_count))
> +{
> +	long ret;
> +
> +	if (page_offset + page_count > region->nr_pages)
> +		return -EINVAL;
> +
> +	while (page_count) {
> +		/* Skip non-present pages */
> +		if (!region->pages[page_offset]) {
> +			page_offset++;
> +			page_count--;
> +			continue;
> +		}
> +
> +		ret = mshv_region_process_chunk(region, flags,
> +						page_offset,
> +						page_count,
> +						handler);
> +		if (ret < 0)
> +			return ret;
> +
> +		page_offset += ret;
> +		page_count -= ret;
> +	}
> +
> +	return 0;
> +}
> +
>  struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  					   u64 uaddr, u32 flags,
>  					   bool is_mmio)
> @@ -33,55 +151,80 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (flags & BIT(MSHV_SET_MEM_BIT_EXECUTABLE))
>  		region->hv_map_flags |= HV_MAP_GPA_EXECUTABLE;
>  
> -	/* Note: large_pages flag populated when we pin the pages */
>  	if (!is_mmio)
>  		region->flags.range_pinned = true;
>  
>  	return region;
>  }
>  
> +static int mshv_region_chunk_share(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))

PageTransCompound() returns false if CONFIG_TRANSPARENT_HUGEPAGE is not
enabled. This won't work for hugetlb pages, will it?

Do we need to check if (PageHuge(page) || PageTransCompound(page)) ?

> +		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> +
> +	return hv_call_modify_spa_host_access(region->partition->pt_id,
> +					      region->pages + page_offset,
> +					      page_count,
> +					      HV_MAP_GPA_READABLE |
> +					      HV_MAP_GPA_WRITABLE,
> +					      flags, true);
> +}
> +
>  int mshv_region_share(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_SHARED;
>  
> -	if (region->flags.large_pages)
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_share);
> +}
> +
> +static int mshv_region_chunk_unshare(struct mshv_mem_region *region,
> +				     u32 flags,
> +				     u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
>  		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
>  
>  	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			HV_MAP_GPA_READABLE | HV_MAP_GPA_WRITABLE,
> -			flags, true);
> +					      region->pages + page_offset,
> +					      page_count, 0,
> +					      flags, false);
>  }
>  
>  int mshv_region_unshare(struct mshv_mem_region *region)
>  {
>  	u32 flags = HV_MODIFY_SPA_PAGE_HOST_ACCESS_MAKE_EXCLUSIVE;
>  
> -	if (region->flags.large_pages)
> -		flags |= HV_MODIFY_SPA_PAGE_HOST_ACCESS_LARGE_PAGE;
> -
> -	return hv_call_modify_spa_host_access(region->partition->pt_id,
> -			region->pages, region->nr_pages,
> -			0,
> -			flags, false);
> +	return mshv_region_process_range(region, flags,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unshare);
>  }
>  
> -static int mshv_region_remap_pages(struct mshv_mem_region *region,
> -				   u32 map_flags,
> +static int mshv_region_chunk_remap(struct mshv_mem_region *region,
> +				   u32 flags,
>  				   u64 page_offset, u64 page_count)
nit: Why the name change from map_flags to flags? It seems to create
some noise here.

>  {
> -	if (page_offset + page_count > region->nr_pages)
> -		return -EINVAL;
> -
> -	if (region->flags.large_pages)
> -		map_flags |= HV_MAP_GPA_LARGE_PAGE;
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_MAP_GPA_LARGE_PAGE;
>  
>  	return hv_call_map_gpa_pages(region->partition->pt_id,
>  				     region->start_gfn + page_offset,
> -				     page_count, map_flags,
> +				     page_count, flags,
>  				     region->pages + page_offset);
>  }
>  
> +static int mshv_region_remap_pages(struct mshv_mem_region *region,
> +				   u32 map_flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	return mshv_region_process_range(region, map_flags,
> +					 page_offset, page_count,
> +					 mshv_region_chunk_remap);
> +}
> +
>  int mshv_region_map(struct mshv_mem_region *region)
>  {
>  	u32 map_flags = region->hv_map_flags;
> @@ -134,9 +277,6 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  			goto release_pages;
>  	}
>  
> -	if (PageHuge(region->pages[0]))
> -		region->flags.large_pages = true;
> -
>  	return 0;
>  
>  release_pages:
> @@ -144,10 +284,28 @@ int mshv_region_pin(struct mshv_mem_region *region)
>  	return ret;
>  }
>  
> +static int mshv_region_chunk_unmap(struct mshv_mem_region *region,
> +				   u32 flags,
> +				   u64 page_offset, u64 page_count)
> +{
> +	if (PageTransCompound(region->pages[page_offset]))
> +		flags |= HV_UNMAP_GPA_LARGE_PAGE;
> +
> +	return hv_call_unmap_gpa_pages(region->partition->pt_id,
> +				       region->start_gfn + page_offset,
> +				       page_count, 0);
> +}
> +
> +static int mshv_region_unmap(struct mshv_mem_region *region)
> +{
> +	return mshv_region_process_range(region, 0,
> +					 0, region->nr_pages,
> +					 mshv_region_chunk_unmap);
> +}
> +
>  void mshv_region_destroy(struct mshv_mem_region *region)
>  {
>  	struct mshv_partition *partition = region->partition;
> -	u32 unmap_flags = 0;
>  	int ret;
>  
>  	hlist_del(&region->hnode);
> @@ -162,12 +320,7 @@ void mshv_region_destroy(struct mshv_mem_region *region)
>  		}
>  	}
>  
> -	if (region->flags.large_pages)
> -		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> -
> -	/* ignore unmap failures and continue as process may be exiting */
> -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> -				region->nr_pages, unmap_flags);
> +	mshv_region_unmap(region);
>  
>  	mshv_region_invalidate(region);
>  
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 0366f416c2f0..ff3374f13691 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -77,9 +77,8 @@ struct mshv_mem_region {
>  	u64 start_uaddr;
>  	u32 hv_map_flags;
>  	struct {
> -		u64 large_pages:  1; /* 2MiB */
>  		u64 range_pinned: 1;
> -		u64 reserved:	 62;
> +		u64 reserved:	 63;
>  	} flags;
>  	struct mshv_partition *partition;
>  	struct page *pages[];
> 
> 


  parent reply	other threads:[~2025-12-03 18:50 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
2025-12-01 11:20   ` Anirudh Rayabharam
2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
2025-12-01 11:12   ` Anirudh Rayabharam
2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
2025-12-01 11:06   ` Anirudh Rayabharam
2025-12-01 16:46     ` Stanislav Kinsburskii
2025-12-03 18:13   ` Nuno Das Neves
2025-12-03 18:20     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
2025-11-27 10:59   ` kernel test robot
2025-12-01 15:09   ` Anirudh Rayabharam
2025-12-01 18:26     ` Stanislav Kinsburskii
2025-12-03 18:50   ` Nuno Das Neves [this message]
2025-12-04 16:03   ` Michael Kelley
2025-12-04 21:08     ` Stanislav Kinsburskii
2025-12-11 17:37       ` Michael Kelley
2025-12-15 20:12         ` Stanislav Kinsburskii
2025-12-17  0:54         ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
2025-12-01 15:06   ` Anirudh Rayabharam
2025-12-02 18:39   ` Michael Kelley
2025-12-03 17:46     ` Stanislav Kinsburskii
2025-12-03 18:58   ` Nuno Das Neves
2025-12-03 19:36     ` Nuno Das Neves
2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
2025-12-04 16:48   ` Michael Kelley
2025-12-04 21:23     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f483d08c-3ff8-4d8f-88b6-c5a89334cb4b@linux.microsoft.com \
    --to=nunodasneves@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox