From: Marcelo Tosatti <mtosatti@redhat.com>
To: Florian Schmidt <flosch@nutanix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Zhao Liu <zhao1.liu@intel.com>,
qemu-devel@nongnu.org
Subject: Re: [RFC PATCH 2/2] Add HvExtCallGetBootZeroedMemory
Date: Wed, 22 Apr 2026 23:16:59 -0300 [thread overview]
Message-ID: <aemBG7BJemipU/8I@tpad> (raw)
In-Reply-To: <f878fd12-17bc-4249-beca-cd134bb93d2c@nutanix.com>
On Thu, Apr 16, 2026 at 04:33:20PM +0100, Florian Schmidt wrote:
> Hi Paolo, thank you for your reply!
>
> On 2026-04-16 13:47, Paolo Bonzini wrote:
> > As discussed on IRC, there are multiple sources of writes and each of
> > them needs to be tracked, and we can say that any write potentially
> > makes the page nonzero.
> >
> > But as far as QEMU is concerned you could indeed add a fourth dirty
> > memory bitmap, DIRTY_MEMORY_NONZERO, and turn it off after the first
> > call to the hypercall (hopefully Windows only calls it once)?
>
> I'm not sure we can rely on that. I'd have to double-check. Crucially, I'm
> pretty sure some Windows versions may call this more than once, and which of
> those results it then uses is the big question: if it's always the first
> one, we could stop there, return "nothing" as answer for further ones, and
> be good. If it's not the first one... that's a problem, because I don't
> think we want to do this kind of tracking for the whole lifetime of the
> guest?
>
>
> > As an alternative to tracking dirty pages in KVM, it would be possible
> > to ask KVM to build a bitmap of pages that were mapped, even at high
> > granularity (e.g. with 32MB you'd fit the bitmap for 1TB of memory in a
> > single page!). QEMU could use dirty page tracking, and build a
> > combination (NOR) of the bitmap from KVM and QEMU's dirty page bitmap.
> >
> > QEMU could also do the same high-granularity tracking, but that leaves
> > out other sources of writes like VFIO or vhost, both of which probably
> > matter to Nutanix :) and which would be stuck with 4k-granularity dirty
> > page tracking.
>
> Are you thinking of creating a new KVM ioctl that would do that for us?
> That's possible, but... isn't good old mincore enough in this case, since
> qemu knows the host-virtual addresses of the guest memory? At the cost of 8
> times as much memory for the (temporary) data structure, since it's a byte
> per page.
>
> One thing I was wondering about was races. Unless we pause the guest while
> we're scanning the tables, the guest could touch pages as we scan. But: at
> the point the hypercall is invoked by the guest, the guest OS is up by
> definition. So at this point, the guest OS must be aware of any memory that
> is put to use, correct? Even including vfio/vhost stuff, since the buffers
> used to write data into the guest would have been set aside for that purpose
> by the guest. So even if we overestimate and announce some pages as
> pre-zeroed, that shouldn't matter if the guest OS already handed them out
> for some usage (and pre-zeroed them in the meantime). What we really care
> about is not announcing any pages as pre-zeroed that are in fact dirty,
> *and* that the guest OS does not realise were ever dirtied.
>
> ... Famous last words and I'm not 100% sure, I appreciate any thoughts on
> this.
>
> Cheers,
> Florian
>
Two definitions. The first one:
9.4.7 (TLFS PDF):
****************************************************************************************************
Hyper-V allocates zero-filled pages to a VM at creation time. The HvExtCallGetBootZeroedMemory
hypercall can be used to query which GPA pages were zeroed by Hyper-V during creation.
****************************************************************************************************
This can
prevent the guest memory manager from having to redundantly zero GPA pages, which can reduce
utilization and increase performance.
This is an extended hypercall; its availability must be queried using HvExtCallQueryCapabilities.
Wrapper Interface
HV_STATUS
HvExtCallGetBootZeroedMemory(
__out UINT64 StartGpa,
__out UINT64 PageCount
);
Native Interface
HvExtCallGetBootZeroedMemory
Call Code = 0x8002
Output Parameters
0 StartGpa (8 bytes)
8 PageCount (8 bytes)
Input Parameters
None.
Output Parameters
StartGpa – the GPA address where the zeroed memory region begins.
PageCount – the number of pages included in the zeroed memory region.
Second definition (the web):
https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercalls/hvextcallgetbootzeroedmemory
The hypercall returns ranges that are known to be zeroed at the time the hypercall is made. Cacheable reads from reported ranges must return all zeroes.
Querying zeroed ranges may allow the virtual machine to avoid zeroing memory that was already zeroed by the hypervisor.
Ranges can include memory that don’t exist and can overlap. The hypervisor should attempt to report "best" / biggest zeroed ranges earlier in the list for optimal performance
====
If qemu uses mmap(MAP_ANONYMOUS), can you return pages which have not
yet been mmaped?
You can inspect /proc/<pid>/pagemap to find which virtual pages within the mmap region have no physical page backing (i.e., never been written to, still zero-fill-on-demand).
Alternatively, /proc/<pid>/smaps gives a per-VMA summary — but for fine-grained per-page resolution within a single large VMA, pagemap is the tool:
// For each page in the region, read the 8-byte pagemap entry:
int fd = open("/proc/<pid>/pagemap", O_RDONLY);
uint64_t entry;
for (addr = start; addr < end; addr += PAGE_SIZE) {
pread(fd, &entry, 8, (addr / PAGE_SIZE) * 8);
bool present = entry & (1ULL << 63); // bit 63 = page present
bool swapped = entry & (1ULL << 62); // bit 62 = swapped
if (!present && !swapped) {
// Page never faulted in — still unallocated (zero-fill-on-demand)
}
}
I suppose the OS is responsible for handling races ?
Or rather, Windows assumes nothing has visibility to such memory regions
other than CPUs (which it controls).
next prev parent reply other threads:[~2026-04-23 2:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-12 11:31 [RFC PATCH 0/2] Support for Hyper-V's HvExtCallGetBootZeroedMemory() Florian Schmidt
2026-01-12 11:31 ` [RFC PATCH 1/2] Add HvExtCallQueryCapabilities Florian Schmidt
2026-01-12 11:31 ` [RFC PATCH 2/2] Add HvExtCallGetBootZeroedMemory Florian Schmidt
2026-04-16 12:47 ` Paolo Bonzini
2026-04-16 15:33 ` Florian Schmidt
2026-04-16 16:04 ` Paolo Bonzini
2026-04-20 10:51 ` Florian Schmidt
2026-04-21 22:46 ` Paolo Bonzini
2026-05-05 15:57 ` Florian Schmidt
2026-04-23 2:16 ` Marcelo Tosatti [this message]
2026-04-17 5:51 ` Marcelo Tosatti
2026-04-20 9:25 ` Florian Schmidt
2026-04-20 15:01 ` Marcelo Tosatti
2026-02-02 14:26 ` [RFC PATCH 0/2] Support for Hyper-V's HvExtCallGetBootZeroedMemory() Florian Schmidt
2026-02-23 11:23 ` Florian Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aemBG7BJemipU/8I@tpad \
--to=mtosatti@redhat.com \
--cc=flosch@nutanix.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.