Re: [PATCH] mshv: Align huge page stride with guest mapping

Linux-HyperV List
 help / color / mirror / Atom feed

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: "kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"decui@microsoft.com" <decui@microsoft.com>,
	"longli@microsoft.com" <longli@microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mshv: Align huge page stride with guest mapping
Date: Tue, 23 Dec 2025 08:26:23 -0800	[thread overview]
Message-ID: <aUrCr5wBSTrGm-IM@skinsburskii.localdomain> (raw)
In-Reply-To: <SN6PR02MB4157AAFDD8BD5BDCD2D3DB99D4B5A@SN6PR02MB4157.namprd02.prod.outlook.com>

On Tue, Dec 23, 2025 at 03:51:22PM +0000, Michael Kelley wrote:
> From: Michael Kelley Sent: Monday, December 22, 2025 10:25 AM
> > 
> [snip]
> > 
> > Separately, in looking at this, I spotted another potential problem with
> > 2 Meg mappings that somewhat depends on hypervisor behavior that I'm
> > not clear on. To create a new region, the user space VMM issues the
> > MSHV_GET_GUEST_MEMORY ioctl, specifying the userspace address, the
> > size, and the guest PFN. The only requirement on these values is that the
> > userspace address and size be page aligned. But suppose a 4 Meg region is
> > specified where the userspace address and the guest PFN have different
> > offsets modulo 2 Meg. The userspace address range gets populated first,
> > and may contain a 2 Meg large page. Then when mshv_chunk_stride()
> > detects a 2 Meg aligned guest PFN so HVCALL_MAP_GPA_PAGES can be told
> > to create a 2 Meg mapping for the guest, the corresponding system PFN in
> > the page array may not be 2 Meg aligned. What does the hypervisor do in
> > this case? It can't create a 2 Meg mapping, right? So does it silently fallback
> > to creating 4K mappings, or does it return an error? Returning an error would
> > seem to be problematic for movable pages because the error wouldn't
> > occur until the guest VM is running and takes a range fault on the region.
> > Silently falling back to creating 4K mappings has performance implications,
> > though I guess it would work. My question is whether the
> > MSHV_GET_GUEST_MEMORY ioctl should detect this case and return an
> > error immediately.
> > 
> 
> In thinking about this more, I can answer my own question about the
> hypervisor behavior. When HVCALL_MAP_GPA_PAGES is set, the full
> list of 4K system PFNs is not provided as an input to the hypercall, so
> the hypervisor cannot silently fall back to 4K mappings. Assuming
> sequential PFNs would be wrong, so it must return an error if the
> alignment of a system PFN isn't on a 2 Meg boundary.
> 
> For a pinned region, this error happens in mshv_region_map() as
> called from  mshv_prepare_pinned_region(), so will propagate back
> to the ioctl. But the error happens only if pin_user_pages_fast()
> allocates one or more 2 Meg pages. So creating a pinned region
> where the guest PFN and userspace address have different offsets
> modulo 2 Meg might or might not succeed.
> 
> For a movable region, the error probably can't occur.
> mshv_region_handle_gfn_fault() builds an aligned 2 Meg chunk
> around the faulting guest PFN. mshv_region_range_fault() then
> determines the corresponding userspace addr, which won't be on
> a 2 Meg boundary, so the allocated memory won't contain a 2 Meg
> page. With no 2 Meg pages, mshv_region_remap_pages() will
> always do 4K mappings and will succeed. The downside is that a
> movable region with a guest PFN and userspace address with
> different offsets never gets any 2 Meg pages or mappings.
> 
> My conclusion is the same -- such misalignment should not be
> allowed when creating a region that has the potential to use 2 Meg
> pages. Regions less than 2 Meg in size could be excluded from such
> a requirement if there is benefit in doing so. It's possible to have
> regions up to (but not including) 4 Meg where the alignment prevents
> having a 2 Meg page, and those could also be excluded from the
> requirement.
> 

I'm not sure I understand the problem.  
There are three cases to consider:  
1. Guest mapping, where page sizes are controlled by the guest.  
2. Host mapping, where page sizes are controlled by the host.  
3. Hypervisor mapping, where page sizes are controlled by the hypervisor.  

The first case is not relevant here and is included for completeness.  

The second and third cases (host and hypervisor) share the memory layout, but it is up to each entity to decide which page sizes to use. For example, the host might map the proposed 4M region with only 4K pages, even if a 2M page is available in the middle. In this case, the host will map the memory as represented by 4K pages, but the hypervisor can still discover the 2M page in the middle and adjust its page tables to use a 2M page.  

This adjustment happens at runtime. Could this be the missing detail here?  

Thanks,  
Stanislav  

> Michael

next prev parent reply	other threads:[~2025-12-23 16:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  0:41 [PATCH] mshv: Align huge page stride with guest mapping Stanislav Kinsburskii
2025-12-18 19:41 ` Michael Kelley
2025-12-19 22:53   ` Stanislav Kinsburskii
2025-12-22 18:25     ` Michael Kelley
2025-12-23 15:51       ` Michael Kelley
2025-12-23 16:26         ` Stanislav Kinsburskii [this message]
2025-12-23 19:17           ` Michael Kelley
2026-01-02 17:42             ` Stanislav Kinsburskii
2026-01-02 18:04               ` Michael Kelley
2026-01-02 20:03                 ` Stanislav Kinsburskii
2026-01-02 21:13                   ` Michael Kelley
2026-01-02 23:35                     ` Stanislav Kinsburskii
2026-01-03  1:16                       ` Michael Kelley
2026-01-05 17:25                         ` Stanislav Kinsburskii
2026-01-05 18:07                           ` Michael Kelley
2026-01-05 19:47                             ` Stanislav Kinsburskii
2026-01-07 18:39                               ` Stanislav Kinsburskii
2025-12-23 16:27       ` Stanislav Kinsburskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUrCr5wBSTrGm-IM@skinsburskii.localdomain \
    --to=skinsburskii@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox