All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Florian Schmidt <flosch@nutanix.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	qemu-devel@nongnu.org
Subject: Re: [RFC PATCH 2/2] Add HvExtCallGetBootZeroedMemory
Date: Fri, 17 Apr 2026 02:51:56 -0300	[thread overview]
Message-ID: <aeHKfOmm2b7cY70i@tpad> (raw)
In-Reply-To: <a28bd807-6e98-4393-a354-b7ed1f7a08ae@redhat.com>

On Thu, Apr 16, 2026 at 02:47:06PM +0200, Paolo Bonzini wrote:
> On 1/12/26 12:31, Florian Schmidt wrote:
> > This call allows a guest to ask the hypervisor which of its (guest
> > physical) memory ranges were already zeroed out by the hypervisor, which
> > means there's no need for the guest to zero them out again at boot.
> > 
> > For now, we simply send one entry back that says "all
> > 64-bit-adddressable memory is zeroed out". This seems to work fine.
> 
> This is risky and depends on the firmware.
> 
> As discussed on IRC, there are multiple sources of writes and each of them
> needs to be tracked, and we can say that any write potentially makes the
> page nonzero.
> 
> The main two are QEMU and the guest, i.e. KVM.
> 
> Dirty page tracking is suitable for QEMU, but not so much for KVM.  When
> QEMU enables dirty page tracking in KVM, it does so lazily, i.e. all pages
> are assumed dirty at the beginning.  This is exactly the opposite of what
> you want here (almost all pages are zero at the beginning).  So maybe only
> enable this hypercall when dirty page tracking uses a ring buffer?
> 
> But as far as QEMU is concerned you could indeed add a fourth dirty memory
> bitmap, DIRTY_MEMORY_NONZERO, and turn it off after the first call to the
> hypercall (hopefully Windows only calls it once)?
> 
> As an alternative to tracking dirty pages in KVM, it would be possible to
> ask KVM to build a bitmap of pages that were mapped, even at high
> granularity (e.g. with 32MB you'd fit the bitmap for 1TB of memory in a
> single page!).  QEMU could use dirty page tracking, and build a combination
> (NOR) of the bitmap from KVM and QEMU's dirty page bitmap.
> 
> QEMU could also do the same high-granularity tracking, but that leaves out
> other sources of writes like VFIO or vhost, both of which probably matter to
> Nutanix :) and which would be stuck with 4k-granularity dirty page tracking.
> 
> Sorry for the delay, I was hoping to have some better ideas.
> 
> Paolo

Hi,

Is this to improve boot time of very large guests ?

> > Note that memory devies hotplugged later will still be zeroed out.
> > 
> > Signed-off-by: Florian Schmidt <flosch@nutanix.com>
> > ---
> >   docs/system/i386/hyperv.rst      |  5 +++++
> >   hw/hyperv/hyperv.c               | 34 ++++++++++++++++++++++++++++++++
> >   include/hw/hyperv/hyperv-proto.h | 11 +++++++++++
> >   include/hw/hyperv/hyperv.h       |  5 +++++
> >   target/i386/cpu.c                |  2 ++
> >   target/i386/cpu.h                |  1 +
> >   target/i386/kvm/hyperv-proto.h   |  5 +++++
> >   target/i386/kvm/hyperv.c         | 12 ++++++++++-
> >   8 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst
> > index 1c1de77feb..7d2be2109e 100644
> > --- a/docs/system/i386/hyperv.rst
> > +++ b/docs/system/i386/hyperv.rst
> > @@ -256,6 +256,11 @@ Existing enlightenments
> >     Recommended: ``hv-evmcs`` (Intel)
> > +``hv-boot-zeroed-mem``
> > +  Enables the HvExtGetBootZeroedMemory hypercall. This allows a Windows guest to
> > +  inquire which memory has already been zeroed out by the host and thus doesn't
> > +  need to be zeroed out at boot again.
> > +
> >   Supplementary features
> >   ----------------------
> > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > index 13a42a68b2..d1b15c089e 100644
> > --- a/hw/hyperv/hyperv.c
> > +++ b/hw/hyperv/hyperv.c
> > @@ -727,6 +727,40 @@ cleanup:
> >       return ret;
> >   }
> > +uint16_t hyperv_ext_hcall_get_boot_zeroed_memory(uint64_t outgpa, bool fast)
> > +{
> > +    uint16_t ret;
> > +    struct hyperv_get_boot_zeroed_memory_output *zero_ranges = NULL;
> > +    hwaddr len;
> > +
> > +    if (fast) {
> > +        ret = HV_STATUS_INVALID_HYPERCALL_CODE;
> > +        goto cleanup;
> > +    }
> > +
> > +    len = sizeof(*zero_ranges);
> > +    zero_ranges = cpu_physical_memory_map(outgpa, &len, 1);
> > +    if (!zero_ranges || len < sizeof(*zero_ranges)) {
> > +        ret = HV_STATUS_INSUFFICIENT_MEMORY;
> > +        goto cleanup;
> > +    }
> > +
> > +    /*
> > +     * All memory we pass through will always be zeroed.
> > +     * (Check if that's actually true!)
> > +     */
> > +    zero_ranges->range_count = 1;
> > +    zero_ranges->ranges[0].start_pfn = 0x0;
> > +    zero_ranges->ranges[0].page_count = 0x10000000000000;
> > +    ret = HV_STATUS_SUCCESS;
> > +
> > +cleanup:
> > +    if (zero_ranges) {
> > +        cpu_physical_memory_unmap(zero_ranges, sizeof(*zero_ranges), 1, len);
> > +    }
> > +    return ret;
> > +}
> > +
> >   uint16_t hyperv_hcall_signal_event(uint64_t param, bool fast)
> >   {
> >       EventFlagHandler *handler;
> > diff --git a/include/hw/hyperv/hyperv-proto.h b/include/hw/hyperv/hyperv-proto.h
> > index f1d1d2eb26..5bf5684d11 100644
> > --- a/include/hw/hyperv/hyperv-proto.h
> > +++ b/include/hw/hyperv/hyperv-proto.h
> > @@ -36,6 +36,7 @@
> >   #define HV_RETRIEVE_DEBUG_DATA                0x006a
> >   #define HV_RESET_DEBUG_SESSION                0x006b
> >   #define HV_EXT_CALL_QUERY_CAPABILITIES        0x8001
> > +#define HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY    0x8002
> >   #define HV_HYPERCALL_FAST                     (1u << 16)
> >   /*
> > @@ -192,4 +193,14 @@ struct hyperv_retrieve_debug_data_output {
> >       uint32_t retrieved_count;
> >       uint32_t remaining_count;
> >   } __attribute__ ((__packed__));
> > +
> > +struct hyperv_get_boot_zeroed_memory_range {
> > +    uint64_t start_pfn;
> > +    uint64_t page_count;
> > +} __attribute__ ((__packed__));
> > +
> > +struct hyperv_get_boot_zeroed_memory_output {
> > +    uint64_t range_count;
> > +    struct hyperv_get_boot_zeroed_memory_range ranges[255];
> > +} __attribute__ ((__packed__));
> >   #endif
> > diff --git a/include/hw/hyperv/hyperv.h b/include/hw/hyperv/hyperv.h
> > index 921e1623f7..54cd2fff72 100644
> > --- a/include/hw/hyperv/hyperv.h
> > +++ b/include/hw/hyperv/hyperv.h
> > @@ -101,6 +101,11 @@ uint16_t hyperv_hcall_post_dbg_data(uint64_t ingpa, uint64_t outgpa, bool fast);
> >    */
> >   uint16_t hyperv_ext_hcall_query_caps(uint64_t sup, uint64_t outgpa, bool fast);
> > +/*
> > + * Process HVCALL_EXT_GET_BOOT_ZEROED_MEMORY hypercall.
> > + */
> > +uint16_t hyperv_ext_hcall_get_boot_zeroed_memory(uint64_t outgpa, bool fast);
> > +
> >   uint32_t hyperv_syndbg_send(uint64_t ingpa, uint32_t count);
> >   uint32_t hyperv_syndbg_recv(uint64_t ingpa, uint32_t count);
> >   void hyperv_syndbg_set_pending_page(uint64_t ingpa);
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 37803cd724..d4160f3334 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -10492,6 +10492,8 @@ static const Property x86_cpu_properties[] = {
> >                         HYPERV_FEAT_TLBFLUSH_EXT, 0),
> >       DEFINE_PROP_BIT64("hv-tlbflush-direct", X86CPU, hyperv_features,
> >                         HYPERV_FEAT_TLBFLUSH_DIRECT, 0),
> > +    DEFINE_PROP_BIT64("hv-boot-zeroed-mem", X86CPU, hyperv_features,
> > +                      HYPERV_FEAT_BOOT_ZEROED_MEMORY, 0),
> >       DEFINE_PROP_ON_OFF_AUTO("hv-no-nonarch-coresharing", X86CPU,
> >                               hyperv_no_nonarch_cs, ON_OFF_AUTO_OFF),
> >   #ifdef CONFIG_SYNDBG
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 2bbc977d90..a42eacd800 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1463,6 +1463,7 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, FeatureWord w);
> >   #define HYPERV_FEAT_XMM_INPUT           18
> >   #define HYPERV_FEAT_TLBFLUSH_EXT        19
> >   #define HYPERV_FEAT_TLBFLUSH_DIRECT     20
> > +#define HYPERV_FEAT_BOOT_ZEROED_MEMORY  21
> >   #ifndef HYPERV_SPINLOCK_NEVER_NOTIFY
> >   #define HYPERV_SPINLOCK_NEVER_NOTIFY             0xFFFFFFFF
> > diff --git a/target/i386/kvm/hyperv-proto.h b/target/i386/kvm/hyperv-proto.h
> > index 4eb2955ac5..ec38b717e4 100644
> > --- a/target/i386/kvm/hyperv-proto.h
> > +++ b/target/i386/kvm/hyperv-proto.h
> > @@ -94,6 +94,11 @@
> >   #define HV_NESTED_DIRECT_FLUSH              (1u << 17)
> >   #define HV_NESTED_MSR_BITMAP                (1u << 19)
> > +/*
> > + * HV_EXT_CALL_QUERY_CAPABILITIES bits
> > + */
> > +#define HV_EXT_CAP_GET_BOOT_ZEROED_MEMORY   (1u << 0)
> > +
> >   /*
> >    * Basic virtualized MSRs
> >    */
> > diff --git a/target/i386/kvm/hyperv.c b/target/i386/kvm/hyperv.c
> > index 1ac5c26799..f92ea7e0a2 100644
> > --- a/target/i386/kvm/hyperv.c
> > +++ b/target/i386/kvm/hyperv.c
> > @@ -55,7 +55,9 @@ static uint64_t calc_supported_ext_hypercalls(X86CPU *cpu)
> >   {
> >       uint64_t ret = 0;
> > -    /* For now, no extended hypercalls are supported. */
> > +    if (hyperv_feat_enabled(cpu, HYPERV_FEAT_BOOT_ZEROED_MEMORY)) {
> > +        ret |= HV_EXT_CAP_GET_BOOT_ZEROED_MEMORY;
> > +    }
> >       return ret;
> >   }
> > @@ -122,6 +124,14 @@ int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
> >                   hyperv_ext_hcall_query_caps(calc_supported_ext_hypercalls(cpu),
> >                                               out_param, fast);
> >               break;
> > +        case HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY:
> > +            if (!hyperv_feat_enabled(cpu, HYPERV_FEAT_BOOT_ZEROED_MEMORY)) {
> > +                exit->u.hcall.result = HV_STATUS_INVALID_HYPERCALL_CODE;
> > +            } else {
> > +                exit->u.hcall.result =
> > +                    hyperv_ext_hcall_get_boot_zeroed_memory(out_param, fast);
> > +            }
> > +            break;
> >           default:
> >               exit->u.hcall.result = HV_STATUS_INVALID_HYPERCALL_CODE;
> >           }
> 
> 



  parent reply	other threads:[~2026-04-18 13:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 11:31 [RFC PATCH 0/2] Support for Hyper-V's HvExtCallGetBootZeroedMemory() Florian Schmidt
2026-01-12 11:31 ` [RFC PATCH 1/2] Add HvExtCallQueryCapabilities Florian Schmidt
2026-01-12 11:31 ` [RFC PATCH 2/2] Add HvExtCallGetBootZeroedMemory Florian Schmidt
2026-04-16 12:47   ` Paolo Bonzini
2026-04-16 15:33     ` Florian Schmidt
2026-04-16 16:04       ` Paolo Bonzini
2026-04-20 10:51         ` Florian Schmidt
2026-04-21 22:46           ` Paolo Bonzini
2026-05-05 15:57             ` Florian Schmidt
2026-04-23  2:16       ` Marcelo Tosatti
2026-04-17  5:51     ` Marcelo Tosatti [this message]
2026-04-20  9:25       ` Florian Schmidt
2026-04-20 15:01         ` Marcelo Tosatti
2026-02-02 14:26 ` [RFC PATCH 0/2] Support for Hyper-V's HvExtCallGetBootZeroedMemory() Florian Schmidt
2026-02-23 11:23   ` Florian Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeHKfOmm2b7cY70i@tpad \
    --to=mtosatti@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.