* [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification
@ 2025-12-03 12:35 Chalios, Babis
2025-12-03 12:35 ` [PATCH v3 1/4] ptp: vmclock: add vm generation counter Chalios, Babis
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Chalios, Babis @ 2025-12-03 12:35 UTC (permalink / raw)
To: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
richardcochran@gmail.com, dwmw2@infradead.org,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Chalios, Babis,
Graf (AWS), Alexander, mzxreary@0pointer.de, Cali, Marco
Similarly to live migration, starting a VM from some serialized state
(aka snapshot) is an event which calls for adjusting guest clocks, hence
a hypervisor should increase the disruption_marker before resuming the
VM vCPUs, letting the guest know.
However, loading a snapshot, is slightly different than live migration,
especially since we can start multiple VMs from the same serialized
state. Apart from adjusting clocks, the guest needs to take additional
action during such events, e.g. recreate UUIDs, reset network
adapters/connections, reseed entropy pools, etc. These actions are not
necessary during live migration. This calls for a differentiation
between the two triggering events.
We differentiate between the two events via an extra field in the
vmclock_abi, called vm_generation_counter. Whereas hypervisors should
increase the disruption marker in both cases, they should only increase
vm_generation_counter when a snapshot is loaded in a VM (not during live
migration).
Additionally, we attach an ACPI notification to VMClock. Implementing
the notification is optional for the device. VMClock device will declare
that it implements the notification by setting
VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
that implement the notification must send an ACPI notification every
time seq_count changes to an even number. The driver will propagate
these notifications to userspace via the poll() interface.
Changes:
* RFC -> v1:
- Made the notification support optional. Hypervisor needs to
advertise support for the notification via a flag in vmclock_abi.
Subsequently, poll() will return POLLHUP when the feature is not
supported, to avoid having userspace blocking indefinitely waiting
for events that won't arrive
- Reworded the comment around vm_generation_counter field to avoid
speaking about "jumping forward in time".
* v1 -> v2:
- Correctly handle failures when calling vmclock_setup_notification to
setup notifications.
- Use atomic_t for fst->seq and handle the case of concurrent
read()/poll() accesses.
- Initialize fst->seq to 0 rather than what is currently stored in the
shared page. This is to avoid reading odd numbers.
- Add DT bindings similar to existing VMGenID ones.
* v2 -> v3:
- Include missing header file and drop unused variables in PATH 2/4.
- Include missing Reviewed-by in PATCH 1/4.
- Fix DT node name to be generic (s/vmclock/ptp).
- Include missing maintainers.
Chalios, Babis (2):
ptp: vmclock: add vm generation counter
ptp: vmclock: support device notifications
David Woodhouse (2):
dt-bindings: ptp: Add amazon,vmclock
ptp: ptp_vmclock: Add device tree support
.../bindings/ptp/amazon,vmclock.yaml | 46 +++++
drivers/ptp/ptp_vmclock.c | 185 +++++++++++++++++-
include/uapi/linux/vmclock-abi.h | 20 ++
3 files changed, 243 insertions(+), 8 deletions(-)
create mode 100644 Documentation/devicetree/bindings/ptp/amazon,vmclock.yaml
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/4] ptp: vmclock: add vm generation counter
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
@ 2025-12-03 12:35 ` Chalios, Babis
2025-12-03 12:36 ` [PATCH v3 2/4] ptp: vmclock: support device notifications Chalios, Babis
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Chalios, Babis @ 2025-12-03 12:35 UTC (permalink / raw)
To: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
richardcochran@gmail.com, dwmw2@infradead.org,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Chalios, Babis,
Graf (AWS), Alexander, mzxreary@0pointer.de, Cali, Marco,
Woodhouse, David
From: "Chalios, Babis" <bchalios@amazon.es>
Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.
Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
include/uapi/linux/vmclock-abi.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-abi.h
index 2d99b29ac44a..937fe00e4f33 100644
--- a/include/uapi/linux/vmclock-abi.h
+++ b/include/uapi/linux/vmclock-abi.h
@@ -115,6 +115,12 @@ struct vmclock_abi {
* bit again after the update, using the about-to-be-valid fields.
*/
#define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7)
+ /*
+ * If the VM_GEN_COUNTER_PRESENT flag is set, the hypervisor will
+ * bump the vm_generation_counter field every time the guest is
+ * loaded from some save state (restored from a snapshot).
+ */
+#define VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT (1 << 8)
__u8 pad[2];
__u8 clock_status;
@@ -177,6 +183,15 @@ struct vmclock_abi {
__le64 time_frac_sec; /* Units of 1/2^64 of a second */
__le64 time_esterror_nanosec;
__le64 time_maxerror_nanosec;
+
+ /*
+ * This field changes to another non-repeating value when the guest
+ * has been loaded from a snapshot. In addition to handling a
+ * disruption in time (which will also be signalled through the
+ * disruption_marker field), a guest may wish to discard UUIDs,
+ * reset network connections, reseed entropy, etc.
+ */
+ __le64 vm_generation_counter;
};
#endif /* __VMCLOCK_ABI_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 2/4] ptp: vmclock: support device notifications
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
2025-12-03 12:35 ` [PATCH v3 1/4] ptp: vmclock: add vm generation counter Chalios, Babis
@ 2025-12-03 12:36 ` Chalios, Babis
2025-12-03 16:52 ` David Woodhouse
2025-12-03 12:36 ` [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock Chalios, Babis
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Chalios, Babis @ 2025-12-03 12:36 UTC (permalink / raw)
To: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
richardcochran@gmail.com, dwmw2@infradead.org,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Chalios, Babis,
Graf (AWS), Alexander, mzxreary@0pointer.de, Cali, Marco
From: "Chalios, Babis" <bchalios@amazon.es>
Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.
Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().
The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
---
drivers/ptp/ptp_vmclock.c | 126 +++++++++++++++++++++++++++++--
include/uapi/linux/vmclock-abi.h | 5 ++
2 files changed, 124 insertions(+), 7 deletions(-)
diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
index b3a83b03d9c1..49a17435bd35 100644
--- a/drivers/ptp/ptp_vmclock.c
+++ b/drivers/ptp/ptp_vmclock.c
@@ -5,6 +5,9 @@
* Copyright © 2024 Amazon.com, Inc. or its affiliates.
*/
+#include "linux/poll.h"
+#include "linux/types.h"
+#include "linux/wait.h"
#include <linux/acpi.h>
#include <linux/device.h>
#include <linux/err.h>
@@ -39,6 +42,7 @@ struct vmclock_state {
struct resource res;
struct vmclock_abi *clk;
struct miscdevice miscdev;
+ wait_queue_head_t disrupt_wait;
struct ptp_clock_info ptp_clock_info;
struct ptp_clock *ptp_clock;
enum clocksource_ids cs_id, sys_cs_id;
@@ -357,10 +361,15 @@ static struct ptp_clock *vmclock_ptp_register(struct device *dev,
return ptp_clock_register(&st->ptp_clock_info, dev);
}
+struct vmclock_file_state {
+ struct vmclock_state *st;
+ atomic_t seq;
+};
+
static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vma)
{
- struct vmclock_state *st = container_of(fp->private_data,
- struct vmclock_state, miscdev);
+ struct vmclock_file_state *fst = fp->private_data;
+ struct vmclock_state *st = fst->st;
if ((vma->vm_flags & (VM_READ|VM_WRITE)) != VM_READ)
return -EROFS;
@@ -379,11 +388,12 @@ static int vmclock_miscdev_mmap(struct file *fp, struct vm_area_struct *vma)
static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf,
size_t count, loff_t *ppos)
{
- struct vmclock_state *st = container_of(fp->private_data,
- struct vmclock_state, miscdev);
+ struct vmclock_file_state *fst = fp->private_data;
+ struct vmclock_state *st = fst->st;
+
ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT);
size_t max_count;
- uint32_t seq;
+ uint32_t seq, old_seq;
if (*ppos >= PAGE_SIZE)
return 0;
@@ -392,6 +402,7 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf,
if (count > max_count)
count = max_count;
+ old_seq = atomic_read(&fst->seq);
while (1) {
seq = le32_to_cpu(st->clk->seq_count) & ~1U;
/* Pairs with hypervisor wmb */
@@ -402,8 +413,16 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf,
/* Pairs with hypervisor wmb */
virt_rmb();
- if (seq == le32_to_cpu(st->clk->seq_count))
- break;
+ if (seq == le32_to_cpu(st->clk->seq_count)) {
+ /*
+ * Either we updated fst->seq to seq (the latest version we observed)
+ * or someone else did (old_seq == seq), so we can break.
+ */
+ if (atomic_try_cmpxchg(&fst->seq, &old_seq, seq) ||
+ old_seq == seq) {
+ break;
+ }
+ }
if (ktime_after(ktime_get(), deadline))
return -ETIMEDOUT;
@@ -413,10 +432,58 @@ static ssize_t vmclock_miscdev_read(struct file *fp, char __user *buf,
return count;
}
+static __poll_t vmclock_miscdev_poll(struct file *fp, poll_table *wait)
+{
+ struct vmclock_file_state *fst = fp->private_data;
+ struct vmclock_state *st = fst->st;
+ uint32_t seq;
+
+ /*
+ * Hypervisor will not send us any notifications, so fail immediately
+ * to avoid having caller sleeping for ever.
+ */
+ if (!(st->clk->flags & VMCLOCK_FLAG_NOTIFICATION_PRESENT))
+ return POLLHUP;
+
+ poll_wait(fp, &st->disrupt_wait, wait);
+
+ seq = le32_to_cpu(st->clk->seq_count);
+ if (atomic_read(&fst->seq) != seq)
+ return POLLIN | POLLRDNORM;
+
+ return 0;
+}
+
+static int vmclock_miscdev_open(struct inode *inode, struct file *fp)
+{
+ struct vmclock_state *st = container_of(fp->private_data,
+ struct vmclock_state, miscdev);
+ struct vmclock_file_state *fst = kzalloc(sizeof(*fst), GFP_KERNEL);
+
+ if (!fst)
+ return -ENOMEM;
+
+ fst->st = st;
+ atomic_set(&fst->seq, 0);
+
+ fp->private_data = fst;
+
+ return 0;
+}
+
+static int vmclock_miscdev_release(struct inode *inode, struct file *fp)
+{
+ kfree(fp->private_data);
+ return 0;
+}
+
static const struct file_operations vmclock_miscdev_fops = {
.owner = THIS_MODULE,
+ .open = vmclock_miscdev_open,
+ .release = vmclock_miscdev_release,
.mmap = vmclock_miscdev_mmap,
.read = vmclock_miscdev_read,
+ .poll = vmclock_miscdev_poll,
};
/* module operations */
@@ -459,6 +526,44 @@ static acpi_status vmclock_acpi_resources(struct acpi_resource *ares, void *data
return AE_ERROR;
}
+static void
+vmclock_acpi_notification_handler(acpi_handle __always_unused handle,
+ u32 __always_unused event, void *dev)
+{
+ struct device *device = dev;
+ struct vmclock_state *st = device->driver_data;
+
+ wake_up_interruptible(&st->disrupt_wait);
+}
+
+static int vmclock_setup_notification(struct device *dev, struct vmclock_state *st)
+{
+ struct acpi_device *adev = ACPI_COMPANION(dev);
+ acpi_status status;
+
+ /*
+ * This should never happen as this function is only called when
+ * has_acpi_companion(dev) is true, but the logic is sufficiently
+ * complex that Coverity can't see the tautology.
+ */
+ if (!adev)
+ return -ENODEV;
+
+ /* The device does not support notifications. Nothing else to do */
+ if (!(le64_to_cpu(st->clk->flags) & VMCLOCK_FLAG_NOTIFICATION_PRESENT))
+ return 0;
+
+ status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY,
+ vmclock_acpi_notification_handler,
+ dev);
+ if (ACPI_FAILURE(status)) {
+ dev_err(dev, "failed to install notification handler");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
static int vmclock_probe_acpi(struct device *dev, struct vmclock_state *st)
{
struct acpi_device *adev = ACPI_COMPANION(dev);
@@ -549,6 +654,11 @@ static int vmclock_probe(struct platform_device *pdev)
if (ret)
return ret;
+ init_waitqueue_head(&st->disrupt_wait);
+ ret = vmclock_setup_notification(dev, st);
+ if (ret)
+ return ret;
+
/*
* If the structure is big enough, it can be mapped to userspace.
* Theoretically a guest OS even using larger pages could still
@@ -581,6 +691,8 @@ static int vmclock_probe(struct platform_device *pdev)
return -ENODEV;
}
+ dev->driver_data = st;
+
dev_info(dev, "%s: registered %s%s%s\n", st->name,
st->miscdev.minor ? "miscdev" : "",
(st->miscdev.minor && st->ptp_clock) ? ", " : "",
diff --git a/include/uapi/linux/vmclock-abi.h b/include/uapi/linux/vmclock-abi.h
index 937fe00e4f33..d320623b0118 100644
--- a/include/uapi/linux/vmclock-abi.h
+++ b/include/uapi/linux/vmclock-abi.h
@@ -121,6 +121,11 @@ struct vmclock_abi {
* loaded from some save state (restored from a snapshot).
*/
#define VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT (1 << 8)
+ /*
+ * If the NOTIFICATION_PRESENT flag is set, the hypervisor will send
+ * a notification every time it updates seq_count to a new even number.
+ */
+#define VMCLOCK_FLAG_NOTIFICATION_PRESENT (1 << 9)
__u8 pad[2];
__u8 clock_status;
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
2025-12-03 12:35 ` [PATCH v3 1/4] ptp: vmclock: add vm generation counter Chalios, Babis
2025-12-03 12:36 ` [PATCH v3 2/4] ptp: vmclock: support device notifications Chalios, Babis
@ 2025-12-03 12:36 ` Chalios, Babis
2025-12-03 13:46 ` Krzysztof Kozlowski
2025-12-03 12:36 ` [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support Chalios, Babis
2025-12-04 11:09 ` [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Paolo Abeni
4 siblings, 1 reply; 9+ messages in thread
From: Chalios, Babis @ 2025-12-03 12:36 UTC (permalink / raw)
To: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
richardcochran@gmail.com, dwmw2@infradead.org,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Chalios, Babis,
Graf (AWS), Alexander, mzxreary@0pointer.de, Cali, Marco,
Woodhouse, David
From: David Woodhouse <dwmw@amazon.co.uk>
The vmclock device provides a PTP clock source and precise timekeeping
across live migration and snapshot/restore operations.
The binding has a required memory region containing the vmclock_abi
structure and an optional interrupt for clock disruption notifications.
The full specification is at https://david.woodhou.se/VMClock.pdf
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
---
.../bindings/ptp/amazon,vmclock.yaml | 46 +++++++++++++++++++
1 file changed, 46 insertions(+)
create mode 100644 Documentation/devicetree/bindings/ptp/amazon,vmclock.yaml
diff --git a/Documentation/devicetree/bindings/ptp/amazon,vmclock.yaml b/Documentation/devicetree/bindings/ptp/amazon,vmclock.yaml
new file mode 100644
index 000000000000..b98fee20ce5f
--- /dev/null
+++ b/Documentation/devicetree/bindings/ptp/amazon,vmclock.yaml
@@ -0,0 +1,46 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/ptp/amazon,vmclock.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Virtual Machine Clock
+
+maintainers:
+ - David Woodhouse <dwmw2@infradead.org>
+
+description:
+ The vmclock device provides a precise clock source and allows for
+ accurate timekeeping across live migration and snapshot/restore
+ operations. The full specification of the shared data structure
+ is available at https://david.woodhou.se/VMClock.pdf
+
+properties:
+ compatible:
+ const: amazon,vmclock
+
+ reg:
+ description:
+ Specifies the shared memory region containing the vmclock_abi structure.
+ maxItems: 1
+
+ interrupts:
+ description:
+ Interrupt used to notify when the contents of the vmclock_abi structure
+ have been updated.
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ ptp@80000000 {
+ compatible = "amazon,vmclock";
+ reg = <0x80000000 0x1000>;
+ interrupts = <GIC_SPI 36 IRQ_TYPE_EDGE_RISING>;
+ };
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
` (2 preceding siblings ...)
2025-12-03 12:36 ` [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock Chalios, Babis
@ 2025-12-03 12:36 ` Chalios, Babis
2025-12-11 19:30 ` Sai Krishna Gajula
2025-12-04 11:09 ` [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Paolo Abeni
4 siblings, 1 reply; 9+ messages in thread
From: Chalios, Babis @ 2025-12-03 12:36 UTC (permalink / raw)
To: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
richardcochran@gmail.com, dwmw2@infradead.org,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Chalios, Babis,
Graf (AWS), Alexander, mzxreary@0pointer.de, Cali, Marco,
Woodhouse, David
From: David Woodhouse <dwmw@amazon.co.uk>
Add device tree support to the ptp_vmclock driver, allowing it to probe
via device tree in addition to ACPI.
Handle optional interrupt for clock disruption notifications, mirroring
the ACPI notification behavior.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
---
drivers/ptp/ptp_vmclock.c | 69 +++++++++++++++++++++++++++++++++++----
1 file changed, 63 insertions(+), 6 deletions(-)
diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c
index 49a17435bd35..349582f1ccc3 100644
--- a/drivers/ptp/ptp_vmclock.c
+++ b/drivers/ptp/ptp_vmclock.c
@@ -14,10 +14,12 @@
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/init.h>
+#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/miscdevice.h>
#include <linux/mm.h>
#include <linux/module.h>
+#include <linux/of.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
@@ -536,7 +538,7 @@ vmclock_acpi_notification_handler(acpi_handle __always_unused handle,
wake_up_interruptible(&st->disrupt_wait);
}
-static int vmclock_setup_notification(struct device *dev, struct vmclock_state *st)
+static int vmclock_setup_acpi_notification(struct device *dev)
{
struct acpi_device *adev = ACPI_COMPANION(dev);
acpi_status status;
@@ -549,10 +551,6 @@ static int vmclock_setup_notification(struct device *dev, struct vmclock_state *
if (!adev)
return -ENODEV;
- /* The device does not support notifications. Nothing else to do */
- if (!(le64_to_cpu(st->clk->flags) & VMCLOCK_FLAG_NOTIFICATION_PRESENT))
- return 0;
-
status = acpi_install_notify_handler(adev->handle, ACPI_DEVICE_NOTIFY,
vmclock_acpi_notification_handler,
dev);
@@ -587,6 +585,58 @@ static int vmclock_probe_acpi(struct device *dev, struct vmclock_state *st)
return 0;
}
+static irqreturn_t vmclock_of_irq_handler(int __always_unused irq, void *dev)
+{
+ struct device *device = dev;
+ struct vmclock_state *st = device->driver_data;
+
+ wake_up_interruptible(&st->disrupt_wait);
+ return IRQ_HANDLED;
+}
+
+static int vmclock_probe_dt(struct device *dev, struct vmclock_state *st)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ struct resource *res;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -ENODEV;
+
+ st->res = *res;
+
+ return 0;
+}
+
+static int vmclock_setup_of_notification(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ int irq;
+
+ irq = platform_get_irq(pdev, 0);
+ if (irq < 0)
+ return irq;
+
+ return devm_request_irq(dev, irq, vmclock_of_irq_handler, IRQF_SHARED,
+ "vmclock", dev);
+}
+
+static int vmclock_setup_notification(struct device *dev,
+ struct vmclock_state *st)
+{
+ /* The device does not support notifications. Nothing else to do */
+ if (!(le64_to_cpu(st->clk->flags) & VMCLOCK_FLAG_NOTIFICATION_PRESENT))
+ return 0;
+
+ if (has_acpi_companion(dev)) {
+ return vmclock_setup_acpi_notification(dev);
+ } else {
+ return vmclock_setup_of_notification(dev);
+ }
+
+}
+
+
static void vmclock_put_idx(void *data)
{
struct vmclock_state *st = data;
@@ -607,7 +657,7 @@ static int vmclock_probe(struct platform_device *pdev)
if (has_acpi_companion(dev))
ret = vmclock_probe_acpi(dev, st);
else
- ret = -EINVAL; /* Only ACPI for now */
+ ret = vmclock_probe_dt(dev, st);
if (ret) {
dev_info(dev, "Failed to obtain physical address: %d\n", ret);
@@ -707,11 +757,18 @@ static const struct acpi_device_id vmclock_acpi_ids[] = {
};
MODULE_DEVICE_TABLE(acpi, vmclock_acpi_ids);
+static const struct of_device_id vmclock_of_ids[] = {
+ { .compatible = "amazon,vmclock", },
+ { },
+};
+MODULE_DEVICE_TABLE(of, vmclock_of_ids);
+
static struct platform_driver vmclock_platform_driver = {
.probe = vmclock_probe,
.driver = {
.name = "vmclock",
.acpi_match_table = vmclock_acpi_ids,
+ .of_match_table = vmclock_of_ids,
},
};
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock
2025-12-03 12:36 ` [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock Chalios, Babis
@ 2025-12-03 13:46 ` Krzysztof Kozlowski
0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2025-12-03 13:46 UTC (permalink / raw)
To: Chalios, Babis, robh@kernel.org, krzk+dt@kernel.org,
conor+dt@kernel.org, richardcochran@gmail.com,
dwmw2@infradead.org, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Graf (AWS), Alexander,
mzxreary@0pointer.de, Cali, Marco, Woodhouse, David
On 03/12/2025 13:36, Chalios, Babis wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The vmclock device provides a PTP clock source and precise timekeeping
> across live migration and snapshot/restore operations.
>
> The binding has a required memory region containing the vmclock_abi
> structure and an optional interrupt for clock disruption notifications.
>
> The full specification is at https://david.woodhou.se/VMClock.pdf
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Babis Chalios <bchalios@amazon.es>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/4] ptp: vmclock: support device notifications
2025-12-03 12:36 ` [PATCH v3 2/4] ptp: vmclock: support device notifications Chalios, Babis
@ 2025-12-03 16:52 ` David Woodhouse
0 siblings, 0 replies; 9+ messages in thread
From: David Woodhouse @ 2025-12-03 16:52 UTC (permalink / raw)
To: Chalios, Babis, robh@kernel.org, krzk+dt@kernel.org,
conor+dt@kernel.org, richardcochran@gmail.com,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Graf (AWS), Alexander,
mzxreary@0pointer.de, Cali, Marco
[-- Attachment #1: Type: text/plain, Size: 2651 bytes --]
On Wed, 2025-12-03 at 12:36 +0000, Chalios, Babis wrote:
> From: "Chalios, Babis" <bchalios@amazon.es>
>
> Add optional support for device notifications in VMClock. When
> supported, the hypervisor will send a device notification every time it
> updates the seq_count to a new even value.
>
> Moreover, add support for poll() in VMClock as a means to propagate this
> notification to user space. poll() will return a POLLIN event to
> listeners every time seq_count changes to a value different than the one
> last seen (since open() or last read()/pread()). This means that when
> poll() returns a POLLIN event, listeners need to use read() to observe
> what has changed and update the reader's view of seq_count. In other
> words, after a poll() returned, all subsequent calls to poll() will
> immediately return with a POLLIN event until the listener calls read().
>
> The device advertises support for the notification mechanism by setting
> flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
> the flag is not present the driver won't setup the ACPI notification
> handler and poll() will always immediately return POLLHUP.
>
> Signed-off-by: Babis Chalios <bchalios@amazon.es>
Mostly looks good to me; thanks. However...
> +static __poll_t vmclock_miscdev_poll(struct file *fp, poll_table *wait)
> +{
> + struct vmclock_file_state *fst = fp->private_data;
> + struct vmclock_state *st = fst->st;
> + uint32_t seq;
> +
> + /*
> + * Hypervisor will not send us any notifications, so fail immediately
> + * to avoid having caller sleeping for ever.
> + */
> + if (!(st->clk->flags & VMCLOCK_FLAG_NOTIFICATION_PRESENT))
> + return POLLHUP;
Missing le64_to_cpu() there. And I guess *strictly* speaking we should
do the seq_lock dance whenever we read the dynamic fields, although
that only ever matters if a hypervisor were to bump seq_count to an odd
value, *then* set the VMCLOCK_FLAG_NOTIFICATION_PRESENT flag, then
clear the flag again before bumping seq_count to even, and blame the
*guest* for looking at the wrong time. Which is frankly insane, and I
don't think I care.
So with the le64_to_cpu() added,
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Don't post again for a few days though; three versions in 48 hours is
enough :)
> +static int vmclock_miscdev_release(struct inode *inode, struct file *fp)
> +{
> + kfree(fp->private_data);
> + return 0;
> +}
Still bugs me a little. I note simple_attr_release() is exported and
does the same, but I guess we'd want to move and rename that before we
try to use it from places like this.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
` (3 preceding siblings ...)
2025-12-03 12:36 ` [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support Chalios, Babis
@ 2025-12-04 11:09 ` Paolo Abeni
4 siblings, 0 replies; 9+ messages in thread
From: Paolo Abeni @ 2025-12-04 11:09 UTC (permalink / raw)
To: Chalios, Babis, robh@kernel.org, krzk+dt@kernel.org,
conor+dt@kernel.org, richardcochran@gmail.com,
dwmw2@infradead.org, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Graf (AWS), Alexander,
mzxreary@0pointer.de, Cali, Marco
On 12/3/25 1:35 PM, Chalios, Babis wrote:
> Similarly to live migration, starting a VM from some serialized state
> (aka snapshot) is an event which calls for adjusting guest clocks, hence
> a hypervisor should increase the disruption_marker before resuming the
> VM vCPUs, letting the guest know.
>
> However, loading a snapshot, is slightly different than live migration,
> especially since we can start multiple VMs from the same serialized
> state. Apart from adjusting clocks, the guest needs to take additional
> action during such events, e.g. recreate UUIDs, reset network
> adapters/connections, reseed entropy pools, etc. These actions are not
> necessary during live migration. This calls for a differentiation
> between the two triggering events.
>
> We differentiate between the two events via an extra field in the
> vmclock_abi, called vm_generation_counter. Whereas hypervisors should
> increase the disruption marker in both cases, they should only increase
> vm_generation_counter when a snapshot is loaded in a VM (not during live
> migration).
>
> Additionally, we attach an ACPI notification to VMClock. Implementing
> the notification is optional for the device. VMClock device will declare
> that it implements the notification by setting
> VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
> that implement the notification must send an ACPI notification every
> time seq_count changes to an even number. The driver will propagate
> these notifications to userspace via the poll() interface.
Linux tagged 6.18 final, so net-next is closed for new code submissions
per the announcement at
https://lore.kernel.org/20251130174502.3908e3ee@kernel.org
/P
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support
2025-12-03 12:36 ` [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support Chalios, Babis
@ 2025-12-11 19:30 ` Sai Krishna Gajula
0 siblings, 0 replies; 9+ messages in thread
From: Sai Krishna Gajula @ 2025-12-11 19:30 UTC (permalink / raw)
To: Chalios, Babis, robh@kernel.org, krzk+dt@kernel.org,
conor+dt@kernel.org, richardcochran@gmail.com,
dwmw2@infradead.org, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com
Cc: devicetree@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Graf (AWS), Alexander,
mzxreary@0pointer.de, Cali, Marco, Woodhouse, David
> -----Original Message-----
> From: Chalios, Babis <bchalios@amazon.es>
> Sent: Wednesday, December 3, 2025 6:06 PM
> To: robh@kernel.org; krzk+dt@kernel.org; conor+dt@kernel.org;
> richardcochran@gmail.com; dwmw2@infradead.org;
> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com
> Cc: devicetree@vger.kernel.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Chalios, Babis <bchalios@amazon.es>; Graf (AWS),
> Alexander <graf@amazon.de>; mzxreary@0pointer.de; Cali, Marco
> <xmarcalx@amazon.co.uk>; Woodhouse, David <dwmw@amazon.co.uk>
> Subject: [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree
> support
>
> From: David Woodhouse <dwmw@ amazon. co. uk> Add device tree support
> to the ptp_vmclock driver, allowing it to probe via device tree in addition to
> ACPI. Handle optional interrupt for clock disruption notifications, mirroring
> the ACPI notification
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Add device tree support to the ptp_vmclock driver, allowing it to probe via
> device tree in addition to ACPI.
>
> Handle optional interrupt for clock disruption notifications, mirroring the ACPI
> notification behavior.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Babis Chalios <bchalios@amazon.es>
> ---
> drivers/ptp/ptp_vmclock.c | 69 +++++++++++++++++++++++++++++++++++----
> 1 file changed, 63 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/ptp/ptp_vmclock.c b/drivers/ptp/ptp_vmclock.c index
> 49a17435bd35..349582f1ccc3 100644
> --- a/drivers/ptp/ptp_vmclock.c
> +++ b/drivers/ptp/ptp_vmclock.c
> @@ -14,10 +14,12 @@
> #include <linux/file.h>
> #include <linux/fs.h>
> #include <linux/init.h>
> +#include <linux/interrupt.h>
> #include <linux/kernel.h>
> #include <linux/miscdevice.h>
> #include <linux/mm.h>
> #include <linux/module.h>
> +#include <linux/of.h>
> #include <linux/platform_device.h>
> #include <linux/slab.h>
>
> @@ -536,7 +538,7 @@ vmclock_acpi_notification_handler(acpi_handle
> __always_unused handle,
> wake_up_interruptible(&st->disrupt_wait);
> }
>
> -static int vmclock_setup_notification(struct device *dev, struct vmclock_state
> *st)
> +static int vmclock_setup_acpi_notification(struct device *dev)
> {
> struct acpi_device *adev = ACPI_COMPANION(dev);
> acpi_status status;
> @@ -549,10 +551,6 @@ static int vmclock_setup_notification(struct device
> *dev, struct vmclock_state *
> if (!adev)
> return -ENODEV;
>
> - /* The device does not support notifications. Nothing else to do */
> - if (!(le64_to_cpu(st->clk->flags) &
> VMCLOCK_FLAG_NOTIFICATION_PRESENT))
> - return 0;
> -
> status = acpi_install_notify_handler(adev->handle,
> ACPI_DEVICE_NOTIFY,
> vmclock_acpi_notification_handler,
> dev);
> @@ -587,6 +585,58 @@ static int vmclock_probe_acpi(struct device *dev,
> struct vmclock_state *st)
> return 0;
> }
>
> +static irqreturn_t vmclock_of_irq_handler(int __always_unused irq, void
> +*dev) {
> + struct device *device = dev;
> + struct vmclock_state *st = device->driver_data;
> +
> + wake_up_interruptible(&st->disrupt_wait);
> + return IRQ_HANDLED;
> +}
Minor nit: For clarity and type-safety, it would be better to pass st as the IRQ handler dev_id and cast directly:
static irqreturn_t vmclock_of_irq_handler(int __always_unused irq, void *dev_id)
{
struct vmclock_state *st = dev_id;
...
}
static int vmclock_setup_of_notification(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct vmclock_state *st = dev_get_drvdata(dev);
....
return devm_request_irq(dev, irq, vmclock_of_irq_handler, IRQF_SHARED,
"vmclock", st); /* Pass st directly */
}
> +
> +static int vmclock_probe_dt(struct device *dev, struct vmclock_state
> +*st) {
Also, should all functions have the opening brace on the next line, to satisfy kernel coding style.
static int vmclock_probe_dt(struct device *dev, struct vmclock_state *st)
{
...
}
> + struct platform_device *pdev = to_platform_device(dev);
> + struct resource *res;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res)
> + return -ENODEV;
> +
> + st->res = *res;
> +
> + return 0;
> +}
> +
> +static int vmclock_setup_of_notification(struct device *dev) {
> + struct platform_device *pdev = to_platform_device(dev);
> + int irq;
> +
> + irq = platform_get_irq(pdev, 0);
> + if (irq < 0)
> + return irq;
> +
> + return devm_request_irq(dev, irq, vmclock_of_irq_handler,
> IRQF_SHARED,
> + "vmclock", dev);
> +}
> +
> +static int vmclock_setup_notification(struct device *dev,
> + struct vmclock_state *st)
> +{
> + /* The device does not support notifications. Nothing else to do */
> + if (!(le64_to_cpu(st->clk->flags) &
> VMCLOCK_FLAG_NOTIFICATION_PRESENT))
> + return 0;
> +
> + if (has_acpi_companion(dev)) {
> + return vmclock_setup_acpi_notification(dev);
> + } else {
> + return vmclock_setup_of_notification(dev);
> + }
> +
> +}
> +
> +
> static void vmclock_put_idx(void *data) {
> struct vmclock_state *st = data;
> @@ -607,7 +657,7 @@ static int vmclock_probe(struct platform_device
> *pdev)
> if (has_acpi_companion(dev))
> ret = vmclock_probe_acpi(dev, st);
> else
> - ret = -EINVAL; /* Only ACPI for now */
> + ret = vmclock_probe_dt(dev, st);
>
> if (ret) {
> dev_info(dev, "Failed to obtain physical address: %d\n", ret);
> @@ -707,11 +757,18 @@ static const struct acpi_device_id
> vmclock_acpi_ids[] = { }; MODULE_DEVICE_TABLE(acpi, vmclock_acpi_ids);
>
> +static const struct of_device_id vmclock_of_ids[] = {
> + { .compatible = "amazon,vmclock", },
> + { },
> +};
> +MODULE_DEVICE_TABLE(of, vmclock_of_ids);
> +
> static struct platform_driver vmclock_platform_driver = {
> .probe = vmclock_probe,
> .driver = {
> .name = "vmclock",
> .acpi_match_table = vmclock_acpi_ids,
> + .of_match_table = vmclock_of_ids,
> },
> };
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-11 19:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-03 12:35 [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Chalios, Babis
2025-12-03 12:35 ` [PATCH v3 1/4] ptp: vmclock: add vm generation counter Chalios, Babis
2025-12-03 12:36 ` [PATCH v3 2/4] ptp: vmclock: support device notifications Chalios, Babis
2025-12-03 16:52 ` David Woodhouse
2025-12-03 12:36 ` [PATCH v3 3/4] dt-bindings: ptp: Add amazon,vmclock Chalios, Babis
2025-12-03 13:46 ` Krzysztof Kozlowski
2025-12-03 12:36 ` [PATCH v3 4/4] ptp: ptp_vmclock: Add device tree support Chalios, Babis
2025-12-11 19:30 ` Sai Krishna Gajula
2025-12-04 11:09 ` [PATCH v3 0/4] ptp: vmclock: Add VM generation counter and ACPI notification Paolo Abeni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).