Linux-HyperV List
 help / color / mirror / Atom feed
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: "kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"decui@microsoft.com" <decui@microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal
Date: Tue, 16 Dec 2025 16:54:34 -0800	[thread overview]
Message-ID: <aUH_Sh6Mvta7AH2Q@skinsburskii.localdomain> (raw)
In-Reply-To: <SN6PR02MB4157978DFAA6C2584D0678E1D4A1A@SN6PR02MB4157.namprd02.prod.outlook.com>

On Thu, Dec 11, 2025 at 05:37:26PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, December 4, 2025 1:09 PM

<snip>


> I've been playing around with mmu notifiers and 2 Meg pages. At least in my
> experiment, there's a case where the .invalidate callback is invoked on a
> range *before* the 2 Meg page is split. The kernel code that does this is
> in zap_page_range_single_batched(). Early on this function calls
> mmu_notifier_invalidate_range_start(), which invokes the .invalidate
> callback on the initial range. Later on, unmap_single_vma() is called, which
> does the split and eventually makes a second .invalidate callback for the
> entire 2 Meg page.
> 
> Details:  My experiment is a user space program that does the following:
> 
> 1. Allocates 16 Megs of memory on a 16 Meg boundary using
> posix_memalign(). So this is private anonymous memory. Transparent
> huge pages are enabled.
> 
> 2. Writes to a byte in each 4K page so they are all populated. 
> /proc/meminfo shows eight 2 Meg pages have been allocated.
> 
> 3. Creates an mmu notifier for the allocated 16 Megs, using an ioctl
> hacked into the kernel for experimentation purposes.
> 
> 4. Uses madvise() with the DONTNEED option to free 32 Kbytes on a 4K
> page boundary somewhere in the 16 Meg allocation. This results in an mmu
> notifier invalidate callback for that 32 Kbytes. Then there's a second invalidate
> callback covering the entire 2 Meg page that contains the 32 Kbyte range.
> Kernel stack traces for the two invalidate callbacks show them originating
> in zap_page_range_single_batched().
> 
> 5. Sleeps for 60 seconds. During that time, khugepaged wakes up and does
> hpage_collapse_scan_pmd() -> collapse_huge_page(), which generates a third
> .invalidate callback for the 2 Meg page. I'm haven't investigated what this is
> all about.
> 
> 6. Interestingly, if Step 4 above does a slightly different operation using
> mprotect() with PROT_READ instead of madvise(), the 2 Meg page is split first.
> The .invalidate callback for the full 2 Meg happens before the .invalidate
> callback for the specified range.
> 
> The root partition probably isn't doing madvise() with DONTNEED for memory
> allocated for guests. But regardless of what user space does or doesn't do, MSHV's
> invalidate callback path should be made safe for this case. Maybe that's just
> detecting it and returning an error (and maybe a WARN_ON) if user space
> doesn't need it to work.
> 
> Michael
> 

The issue is addressed by "mshv: Align huge page stride with guest
mapping" patch.

Thanks a lot once again for your help in identifying it,
Stanislav

  parent reply	other threads:[~2025-12-17  0:54 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26  2:08 [PATCH v7 0/7] Introduce movable pages for Hyper-V guests Stanislav Kinsburskii
2025-11-26  2:08 ` [PATCH v7 1/7] Drivers: hv: Refactor and rename memory region handling functions Stanislav Kinsburskii
2025-12-01 11:20   ` Anirudh Rayabharam
2025-11-26  2:08 ` [PATCH v7 2/7] Drivers: hv: Centralize guest memory region destruction Stanislav Kinsburskii
2025-12-01 11:12   ` Anirudh Rayabharam
2025-11-26  2:09 ` [PATCH v7 3/7] Drivers: hv: Move region management to mshv_regions.c Stanislav Kinsburskii
2025-12-01 11:06   ` Anirudh Rayabharam
2025-12-01 16:46     ` Stanislav Kinsburskii
2025-12-03 18:13   ` Nuno Das Neves
2025-12-03 18:20     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 4/7] Drivers: hv: Fix huge page handling in memory region traversal Stanislav Kinsburskii
2025-11-27 10:59   ` kernel test robot
2025-12-01 15:09   ` Anirudh Rayabharam
2025-12-01 18:26     ` Stanislav Kinsburskii
2025-12-03 18:50   ` Nuno Das Neves
2025-12-04 16:03   ` Michael Kelley
2025-12-04 21:08     ` Stanislav Kinsburskii
2025-12-11 17:37       ` Michael Kelley
2025-12-15 20:12         ` Stanislav Kinsburskii
2025-12-17  0:54         ` Stanislav Kinsburskii [this message]
2025-11-26  2:09 ` [PATCH v7 5/7] Drivers: hv: Improve region overlap detection in partition create Stanislav Kinsburskii
2025-12-01 15:06   ` Anirudh Rayabharam
2025-12-02 18:39   ` Michael Kelley
2025-12-03 17:46     ` Stanislav Kinsburskii
2025-12-03 18:58   ` Nuno Das Neves
2025-12-03 19:36     ` Nuno Das Neves
2025-11-26  2:09 ` [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem regions Stanislav Kinsburskii
2025-12-04 16:48   ` Michael Kelley
2025-12-04 21:23     ` Stanislav Kinsburskii
2025-11-26  2:09 ` [PATCH v7 7/7] Drivers: hv: Add support for movable memory regions Stanislav Kinsburskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUH_Sh6Mvta7AH2Q@skinsburskii.localdomain \
    --to=skinsburskii@linux.microsoft.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox