From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: David Woodhouse <dwmw2@infradead.org>
Cc: <qemu-devel@nongnu.org>, "Michael S. Tsirkin" <mst@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
Ani Sinha <anisinha@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
Richard Henderson <richard.henderson@linaro.org>,
Eduardo Habkost <eduardo@habkost.net>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Peter Hilber <peter.hilber@opensynergy.com>,
"Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>,
paul <paul@xen.org>
Subject: Re: [PATCH v4] hw/acpi: Add vmclock device
Date: Tue, 8 Oct 2024 12:04:29 +0100 [thread overview]
Message-ID: <20241008120429.00000603@Huawei.com> (raw)
In-Reply-To: <64144cfa550c54bee25c7be0cc3101cb1d1b1967.camel@infradead.org>
On Mon, 07 Oct 2024 14:53:54 +0100
David Woodhouse <dwmw2@infradead.org> wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The vmclock device addresses the problem of live migration with
> precision clocks. The tolerances of a hardware counter (e.g. TSC) are
> typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that
> counter against an external source of 'real' time, and track the precise
> frequency of the counter as it changes with environmental conditions.
>
> When a guest is live migrated, anything it knows about the frequency of
> the underlying counter becomes invalid. It may move from a host where
> the counter running at -50PPM of its nominal frequency, to a host where
> it runs at +50PPM. There will also be a step change in the value of the
> counter, as the correctness of its absolute value at migration is
> limited by the accuracy of the source and destination host's time
> synchronization.
>
> The device exposes a shared memory region to guests, which can be mapped
> all the way to userspace. In the first phase, this merely advertises a
> 'disruption_marker', which indicates that the guest should throw away any
> NTP synchronization it thinks it has, and start again.
>
> Because the region can be exposed all the way to userspace, applications
> can still use time from a fast vDSO 'system call', and check the
> disruption marker to be sure that their timestamp is indeed truthful.
>
> The structure also allows for the precise time, as known by the host, to
> be exposed directly to guests so that they don't have to wait for NTP to
> resync from scratch.
>
> The values and fields are based on the nascent virtio-rtc specification,
> and the intent is that a version (hopefully precisely this version) of
> this structure will be included as an optional part of that spec. In the
> meantime, a simple ACPI device along the lines of VMGENID is perfectly
> sufficient and is compatible with what's being shipped in certain
> commercial hypervisors.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Hi David,
A couple of drive by comments (as we all love those!)
Jonathan
> ---
> diff --git a/hw/acpi/vmclock-abi.h b/hw/acpi/vmclock-abi.h
> new file mode 100644
> index 0000000000..19cbf85efd
> --- /dev/null
> +++ b/hw/acpi/vmclock-abi.h
> @@ -0,0 +1,186 @@
> +#endif /* __VMCLOCK_ABI_H__ */
> diff --git a/hw/acpi/vmclock.c b/hw/acpi/vmclock.c
> new file mode 100644
> index 0000000000..e7df047c33
> --- /dev/null
> +++ b/hw/acpi/vmclock.c
> +
> +void vmclock_build_acpi(VmclockState *vms, GArray *table_data,
> + BIOSLinker *linker, const char *oem_id)
> +{
> + Aml *ssdt, *dev, *scope, *method, *addr, *crs;
> + AcpiTable table = { .sig = "SSDT", .rev = 1,
> + .oem_id = oem_id, .oem_table_id = "VMCLOCK" };
> +
> + /* Put VMCLOCK into a separate SSDT table */
> + acpi_table_begin(&table, table_data);
> + ssdt = init_aml_allocator();
> +
> + scope = aml_scope("\\_SB");
> + dev = aml_device("VCLK");
> + aml_append(dev, aml_name_decl("_HID", aml_string("AMZNC10C")));
Nice _HID :)
> + aml_append(dev, aml_name_decl("_CID", aml_string("VMCLOCK")));
> + aml_append(dev, aml_name_decl("_DDN", aml_string("VMCLOCK")));
> +
> + /* Simple status method */
> + method = aml_method("_STA", 0, AML_NOTSERIALIZED);
> + addr = aml_local(0);
> + aml_append(method, aml_store(aml_int(0xf), addr));
> + aml_append(method, aml_return(addr));
> + aml_append(dev, method);
It's static so you should be able to do:
aml_append(dev, aml_name_decl("_STA", aml_int(0xf)));
> +
> + crs = aml_resource_template();
> + aml_append(crs, aml_qword_memory(AML_POS_DECODE,
> + AML_MIN_FIXED, AML_MAX_FIXED,
> + AML_CACHEABLE, AML_READ_ONLY,
> + 0xffffffffffffffffULL,
> + vms->physaddr,
> + vms->physaddr + VMCLOCK_SIZE - 1,
> + 0, VMCLOCK_SIZE));
> + aml_append(dev, aml_name_decl("_CRS", crs));
> + aml_append(scope, dev);
> + aml_append(ssdt, scope);
> +
> + g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> + acpi_table_end(linker, &table);
> + free_aml_allocator();
> +}
> +static Property vmclock_device_properties[] = {
> + DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vmclock_device_class_init(ObjectClass *klass, void *data)
> +{
> + DeviceClass *dc = DEVICE_CLASS(klass);
> +
> + dc->vmsd = &vmstate_vmclock;
> + dc->realize = vmclock_realize;
> + device_class_set_props(dc, vmclock_device_properties);
Probably a silly question but why register no properties?
Is that any different to not registering them?
+ a quick look suggests not all class_init functions
set them.
> + dc->hotpluggable = false;
> + set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +}
> +
> +static const TypeInfo vmclock_device_info = {
> + .name = TYPE_VMCLOCK,
> + .parent = TYPE_DEVICE,
> + .instance_size = sizeof(VmclockState),
> + .class_init = vmclock_device_class_init,
> +};
> +
> +static void vmclock_register_types(void)
> +{
> + type_register_static(&vmclock_device_info);
> +}
> +
> +type_init(vmclock_register_types)
next prev parent reply other threads:[~2024-10-08 11:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-07 13:53 [PATCH v4] hw/acpi: Add vmclock device David Woodhouse
2024-10-08 11:04 ` Jonathan Cameron via [this message]
2024-10-08 11:21 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241008120429.00000603@Huawei.com \
--to=qemu-devel@nongnu.org \
--cc=Jonathan.Cameron@Huawei.com \
--cc=abuehaze@amazon.com \
--cc=anisinha@redhat.com \
--cc=dwmw2@infradead.org \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=peter.hilber@opensynergy.com \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).