* [PATCH net-next v2 0/2] net: extend ndo_get_tstamp and implement in gve
@ 2026-01-21 16:04 Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 2/2] gve: implement ndo_get_tstamp Kevin Yang
0 siblings, 2 replies; 8+ messages in thread
From: Kevin Yang @ 2026-01-21 16:04 UTC (permalink / raw)
To: Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn, David Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Washington,
Gerhard Engleder, Richard Cochran
Cc: netdev, yyd
This series extends ndo_get_tstamp to support other timestamp types
and implements it in the gve driver.
Changes in v2:
- Fixed 32-bit compile issues in gve_ptp.c by using div64_u64 helpers.
- Fixed a div by zero case in gve_hwts_realtime_update.
- Added kdoc for ndo_get_tstamp.
Kevin Yang (2):
net: extend ndo_get_tstamp for other timestamp types
gve: implement ndo_get_tstamp
Documentation/networking/timestamping.rst | 21 ++++
drivers/net/ethernet/engleder/tsnep_main.c | 8 +-
drivers/net/ethernet/google/gve/gve.h | 8 ++
drivers/net/ethernet/google/gve/gve_adminq.h | 4 +-
drivers/net/ethernet/google/gve/gve_main.c | 27 +++++
drivers/net/ethernet/google/gve/gve_ptp.c | 107 ++++++++++++++++++-
drivers/net/ethernet/intel/igc/igc_main.c | 8 +-
include/linux/netdevice.h | 21 ++--
8 files changed, 188 insertions(+), 16 deletions(-)
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-21 16:04 [PATCH net-next v2 0/2] net: extend ndo_get_tstamp and implement in gve Kevin Yang
@ 2026-01-21 16:04 ` Kevin Yang
2026-01-22 20:04 ` Gerhard Engleder
2026-01-21 16:04 ` [PATCH net-next v2 2/2] gve: implement ndo_get_tstamp Kevin Yang
1 sibling, 1 reply; 8+ messages in thread
From: Kevin Yang @ 2026-01-21 16:04 UTC (permalink / raw)
To: Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn, David Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Washington,
Gerhard Engleder, Richard Cochran
Cc: netdev, yyd
Network device hardware timestamps (hwtstamps) and the system's
clock (ktime) often originate from different clock domains.
This makes it hard to directly calculate the duration between
a hardware-timestamped event and a system-time event by simple
subtraction.
This patch extends ndo_get_tstamp to allow a netdev to provide
a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
driver to either perform a conversion by estimating or, if the
clocks are kept synchronized, return the original timestamp directly.
Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
a use surfaces.
This is useful for features that need to measure the delay between
a packet's hardware arrival/departure and a later software event.
For example, the TCP stack can use this to measure precise
packet receive delays, which is a requirement for the upcoming
TCP Swift [1] congestion control algorithm.
[1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
for congestion control in the datacenter." Proceedings of the
Annual conference of the ACM Special Interest Group on Data
Communication on the applications, technologies, architectures,
and protocols for computer communication. 2020.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
Documentation/networking/timestamping.rst | 21 +++++++++++++++++++++
drivers/net/ethernet/engleder/tsnep_main.c | 8 +++++---
drivers/net/ethernet/intel/igc/igc_main.c | 8 +++++---
include/linux/netdevice.h | 21 ++++++++++++++-------
4 files changed, 45 insertions(+), 13 deletions(-)
diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index 2162c4f2b28a6..05e3607b43551 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -671,6 +671,27 @@ Time stamps for outgoing packets are to be generated as follows:
software time stamping and therefore could lead to unexpected deltas
between time stamps.
+3.1.1 netdev_tstamp_type and ndo_get_tstamp
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``ndo_get_tstamp`` operation allows the stack to convert a hardware timestamp
+to a specific time domain or format. The ``type`` argument specifies the
+requested timestamp type:
+
+- ``NETDEV_TSTAMP_PHC``: The hardware timestamp in its PTP Hardware Clock
+ domain.
+- ``NETDEV_TSTAMP_CYCLE``: The hardware timestamp as a cycle counter.
+- ``NETDEV_TSTAMP_REALTIME``: The hardware timestamp converted to system time
+ (``CLOCK_REALTIME``).
+
+For ``NETDEV_TSTAMP_REALTIME``, the driver is responsible for converting the
+hardware timestamp to system time. In that case, ``ndo_get_tstamp`` does not
+provide an accuracy guarantee. A device might use a disciplined clock that is
+synchronized with ``CLOCK_REALTIME``. Or, the driver might estimate the system
+time. The accuracy primarily depends on the physical hardware and driver
+implementation.
+
+
3.2 Special considerations for stacked PTP Hardware Clocks
----------------------------------------------------------
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index b118407c30e87..d1c0cbaba06e7 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -2275,15 +2275,17 @@ static int tsnep_netdev_set_features(struct net_device *netdev,
static ktime_t tsnep_netdev_get_tstamp(struct net_device *netdev,
const struct skb_shared_hwtstamps *hwtstamps,
- bool cycles)
+ enum netdev_tstamp_type type)
{
struct tsnep_rx_inline *rx_inline = hwtstamps->netdev_data;
u64 timestamp;
- if (cycles)
+ if (type == NETDEV_TSTAMP_CYCLE)
timestamp = __le64_to_cpu(rx_inline->counter);
- else
+ else if (type == NETDEV_TSTAMP_PHC)
timestamp = __le64_to_cpu(rx_inline->timestamp);
+ else
+ return 0;
return ns_to_ktime(timestamp);
}
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 7aafa60ba0c86..3bbb9098c6fc4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6947,7 +6947,7 @@ int igc_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
static ktime_t igc_get_tstamp(struct net_device *dev,
const struct skb_shared_hwtstamps *hwtstamps,
- bool cycles)
+ enum netdev_tstamp_type type)
{
struct igc_adapter *adapter = netdev_priv(dev);
struct igc_inline_rx_tstamps *tstamp;
@@ -6955,10 +6955,12 @@ static ktime_t igc_get_tstamp(struct net_device *dev,
tstamp = hwtstamps->netdev_data;
- if (cycles)
+ if (type == NETDEV_TSTAMP_CYCLE)
timestamp = igc_ptp_rx_pktstamp(adapter, tstamp->timer1);
- else
+ else if (type == NETDEV_TSTAMP_PHC)
timestamp = igc_ptp_rx_pktstamp(adapter, tstamp->timer0);
+ else
+ return 0;
return timestamp;
}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d99b0fbc1942a..bfbdeca150c03 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1062,6 +1062,12 @@ struct netdev_net_notifier {
struct notifier_block *nb;
};
+enum netdev_tstamp_type {
+ NETDEV_TSTAMP_PHC = 0,
+ NETDEV_TSTAMP_CYCLE,
+ NETDEV_TSTAMP_REALTIME,
+};
+
/*
* This structure defines the management hooks for network devices.
* The following hooks can be defined; unless noted otherwise, they are
@@ -1406,11 +1412,10 @@ struct netdev_net_notifier {
* Get the forwarding path to reach the real device from the HW destination address
* ktime_t (*ndo_get_tstamp)(struct net_device *dev,
* const struct skb_shared_hwtstamps *hwtstamps,
- * bool cycles);
- * Get hardware timestamp based on normal/adjustable time or free running
- * cycle counter. This function is required if physical clock supports a
- * free running cycle counter.
- *
+ * enum netdev_tstamp_type type);
+ * Get hardware timestamp based on the type requested, or return 0 if the
+ * requested type is not supported. This function is required if physical
+ * clock supports a free running cycle counter.
* int (*ndo_hwtstamp_get)(struct net_device *dev,
* struct kernel_hwtstamp_config *kernel_config);
* Get the currently configured hardware timestamping parameters for the
@@ -1661,7 +1666,7 @@ struct net_device_ops {
struct net_device_path *path);
ktime_t (*ndo_get_tstamp)(struct net_device *dev,
const struct skb_shared_hwtstamps *hwtstamps,
- bool cycles);
+ enum netdev_tstamp_type type);
int (*ndo_hwtstamp_get)(struct net_device *dev,
struct kernel_hwtstamp_config *kernel_config);
int (*ndo_hwtstamp_set)(struct net_device *dev,
@@ -5236,9 +5241,11 @@ static inline ktime_t netdev_get_tstamp(struct net_device *dev,
bool cycles)
{
const struct net_device_ops *ops = dev->netdev_ops;
+ enum netdev_tstamp_type type = cycles ? NETDEV_TSTAMP_CYCLE :
+ NETDEV_TSTAMP_PHC;
if (ops->ndo_get_tstamp)
- return ops->ndo_get_tstamp(dev, hwtstamps, cycles);
+ return ops->ndo_get_tstamp(dev, hwtstamps, type);
return hwtstamps->hwtstamp;
}
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 2/2] gve: implement ndo_get_tstamp
2026-01-21 16:04 [PATCH net-next v2 0/2] net: extend ndo_get_tstamp and implement in gve Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types Kevin Yang
@ 2026-01-21 16:04 ` Kevin Yang
1 sibling, 0 replies; 8+ messages in thread
From: Kevin Yang @ 2026-01-21 16:04 UTC (permalink / raw)
To: Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn, David Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Joshua Washington,
Gerhard Engleder, Richard Cochran
Cc: netdev, yyd
This patch implements ndo_get_tstamp in gve to support converting a
hwtstamp to the system's realtime clock.
The implementation does not assume the NIC clock is disciplined,
in other word, the NIC clock can be free-running. A periodic
job, embedded in gve's ptp_aux_work, updates the offset and slope
for the conversion.
Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
---
drivers/net/ethernet/google/gve/gve.h | 8 ++
drivers/net/ethernet/google/gve/gve_adminq.h | 4 +-
drivers/net/ethernet/google/gve/gve_main.c | 27 +++++
drivers/net/ethernet/google/gve/gve_ptp.c | 107 ++++++++++++++++++-
4 files changed, 143 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 970d5ca8cddee..13a4c450e7635 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -774,6 +774,13 @@ struct gve_flow_rule {
struct gve_flow_spec mask;
};
+struct gve_tstamp_conversion {
+ u64 last_sync_ns;
+ seqlock_t lock; /* protects tc and cc */
+ struct timecounter tc;
+ struct cyclecounter cc;
+};
+
struct gve_flow_rules_cache {
bool rules_cache_synced; /* False if the driver's rules_cache is outdated */
struct gve_adminq_queried_flow_rule *rules_cache;
@@ -925,6 +932,7 @@ struct gve_priv {
struct gve_nic_ts_report *nic_ts_report;
dma_addr_t nic_ts_report_bus;
u64 last_sync_nic_counter; /* Clock counter from last NIC TS report */
+ struct gve_tstamp_conversion ts_real;
};
enum gve_service_task_flags_bit {
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index 22a74b6aa17ea..812160b87b143 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -411,8 +411,8 @@ static_assert(sizeof(struct gve_adminq_report_nic_ts) == 16);
struct gve_nic_ts_report {
__be64 nic_timestamp; /* NIC clock in nanoseconds */
- __be64 reserved1;
- __be64 reserved2;
+ __be64 cycle_pre;
+ __be64 cycle_post;
__be64 reserved3;
__be64 reserved4;
};
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index c42640da15a5a..c44b4526ccced 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2198,6 +2198,32 @@ static int gve_set_ts_config(struct net_device *dev,
return 0;
}
+static ktime_t gve_get_tstamp(struct net_device *dev,
+ const struct skb_shared_hwtstamps *hwtstamps,
+ enum netdev_tstamp_type type)
+{
+ struct gve_priv *priv = netdev_priv(dev);
+ unsigned int seq;
+ u64 ns;
+
+ if (type == NETDEV_TSTAMP_PHC)
+ return hwtstamps->hwtstamp;
+
+ if (type != NETDEV_TSTAMP_REALTIME)
+ return 0;
+
+ /* Skip if never synced */
+ if (!READ_ONCE(priv->ts_real.last_sync_ns))
+ return 0;
+
+ do {
+ seq = read_seqbegin(&priv->ts_real.lock);
+ ns = timecounter_cyc2time(&priv->ts_real.tc,
+ hwtstamps->hwtstamp);
+ } while (read_seqretry(&priv->ts_real.lock, seq));
+ return ns_to_ktime(ns);
+}
+
static const struct net_device_ops gve_netdev_ops = {
.ndo_start_xmit = gve_start_xmit,
.ndo_features_check = gve_features_check,
@@ -2209,6 +2235,7 @@ static const struct net_device_ops gve_netdev_ops = {
.ndo_bpf = gve_xdp,
.ndo_xdp_xmit = gve_xdp_xmit,
.ndo_xsk_wakeup = gve_xsk_wakeup,
+ .ndo_get_tstamp = gve_get_tstamp,
.ndo_hwtstamp_get = gve_get_ts_config,
.ndo_hwtstamp_set = gve_set_ts_config,
};
diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
index 073677d82ee8e..dfe353ae75fb1 100644
--- a/drivers/net/ethernet/google/gve/gve_ptp.c
+++ b/drivers/net/ethernet/google/gve/gve_ptp.c
@@ -10,10 +10,92 @@
/* Interval to schedule a nic timestamp calibration, 250ms. */
#define GVE_NIC_TS_SYNC_INTERVAL_MS 250
+/* Scale ts_real.cc.mult by 1 << 31. Maximize mult for finer adjustment
+ * granularity, but ensure (mult * cycle) does not overflow in
+ * cyclecounter_cyc2ns.
+ */
+#define GVE_HWTS_REAL_CC_SHIFT 31
+#define GVE_HWTS_REAL_CC_NOMINAL BIT_ULL(GVE_HWTS_REAL_CC_SHIFT)
+
+/* Get the cross time stamp info */
+static int gve_get_cross_time(ktime_t *device,
+ struct system_counterval_t *system, void *ctx)
+{
+ struct gve_priv *priv = ctx;
+
+ *device = ns_to_ktime(be64_to_cpu(priv->nic_ts_report->nic_timestamp));
+ system->cycles = be64_to_cpu(priv->nic_ts_report->cycle_pre) +
+ (be64_to_cpu(priv->nic_ts_report->cycle_post) -
+ be64_to_cpu(priv->nic_ts_report->cycle_pre)) / 2;
+ system->use_nsecs = false;
+ if (IS_ENABLED(CONFIG_X86))
+ system->cs_id = CSID_X86_TSC;
+ else if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER))
+ system->cs_id = CSID_ARM_ARCH_COUNTER;
+ else
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static int gve_hwts_realtime_update(struct gve_priv *priv, u64 prev_nic)
+{
+ u64 nic_delta = priv->last_sync_nic_counter - prev_nic;
+ struct system_device_crosststamp cts = {};
+ struct system_time_snapshot history = {};
+ s64 nic_real_off_ns;
+ u64 real_ns;
+ int ret;
+
+ /* Step 1: Get the realtime of when NIC clock was read */
+ ktime_get_snapshot(&history);
+ ret = get_device_system_crosststamp(gve_get_cross_time, priv, &history,
+ &cts);
+ if (ret) {
+ dev_err_ratelimited(&priv->pdev->dev,
+ "%s crosststamp err %d\n", __func__, ret);
+ return ret;
+ }
+
+ real_ns = ktime_to_ns(cts.sys_realtime);
+
+ /* Step 2: Adjust NIC clock's offset */
+ /* Read-side ndo_get_tstamp can be called from TCP rx softirq */
+ write_seqlock_bh(&priv->ts_real.lock);
+ nic_real_off_ns = real_ns - timecounter_read(&priv->ts_real.tc);
+ timecounter_adjtime(&priv->ts_real.tc, nic_real_off_ns);
+
+ /* Step 3: Adjust NIC clock's ratio (when this is not the first sync).
+ * The NIC clock's nominal tick ratio is 1 tick per nanosecond,
+ * scaled by 1 << GVE_HWTS_REAL_CC_SHIFT. Adjust it to
+ * (ktime - prev_ktime) / (nic - prev_nic). The ratio should not
+ * deviate more than 1% from the nominal, otherwise it may suggest
+ * there was a sudden change on NIC clock. In that case, reset ratio
+ * to nominal. And since each sync only compares to the previous read,
+ * this is a one-time error, not a persistent failure.
+ */
+ if (prev_nic && nic_delta) {
+ const u64 lower = GVE_HWTS_REAL_CC_NOMINAL * 99 / 100;
+ const u64 upper = GVE_HWTS_REAL_CC_NOMINAL * 101 / 100;
+ u64 numer = real_ns - priv->ts_real.last_sync_ns;
+ u64 mult, quot, rem;
+
+ quot = div64_u64_rem(GVE_HWTS_REAL_CC_NOMINAL, nic_delta, &rem);
+ mult = (quot * numer) + div64_u64(rem * numer, nic_delta);
+ if (mult < lower || mult > upper)
+ mult = GVE_HWTS_REAL_CC_NOMINAL;
+ priv->ts_real.cc.mult = mult;
+ }
+
+ write_sequnlock_bh(&priv->ts_real.lock);
+ WRITE_ONCE(priv->ts_real.last_sync_ns, real_ns);
+ return 0;
+}
+
/* Read the nic timestamp from hardware via the admin queue. */
int gve_clock_nic_ts_read(struct gve_priv *priv)
{
- u64 nic_raw;
+ u64 nic_raw, prev_nic;
int err;
err = gve_adminq_report_nic_ts(priv, priv->nic_ts_report_bus);
@@ -21,7 +103,11 @@ int gve_clock_nic_ts_read(struct gve_priv *priv)
return err;
nic_raw = be64_to_cpu(priv->nic_ts_report->nic_timestamp);
+ prev_nic = priv->last_sync_nic_counter;
WRITE_ONCE(priv->last_sync_nic_counter, nic_raw);
+ err = gve_hwts_realtime_update(priv, prev_nic);
+ if (err)
+ return err;
return 0;
}
@@ -57,6 +143,14 @@ static long gve_ptp_do_aux_work(struct ptp_clock_info *info)
return msecs_to_jiffies(GVE_NIC_TS_SYNC_INTERVAL_MS);
}
+static u64 gve_cycles_read(struct cyclecounter *cc)
+{
+ const struct gve_priv *priv = container_of(cc, struct gve_priv,
+ ts_real.cc);
+
+ return READ_ONCE(priv->last_sync_nic_counter);
+}
+
static const struct ptp_clock_info gve_ptp_caps = {
.owner = THIS_MODULE,
.name = "gve clock",
@@ -89,6 +183,17 @@ static int gve_ptp_init(struct gve_priv *priv)
goto free_ptp;
}
+ priv->last_sync_nic_counter = 0;
+ priv->ts_real.last_sync_ns = 0;
+ seqlock_init(&priv->ts_real.lock);
+ memset(&priv->ts_real.cc, 0, sizeof(priv->ts_real.cc));
+ priv->ts_real.cc.mask = U32_MAX;
+ priv->ts_real.cc.shift = GVE_HWTS_REAL_CC_SHIFT;
+ priv->ts_real.cc.mult = GVE_HWTS_REAL_CC_NOMINAL;
+ priv->ts_real.cc.read = gve_cycles_read;
+ timecounter_init(&priv->ts_real.tc, &priv->ts_real.cc,
+ ktime_get_real_ns());
+
ptp->priv = priv;
return 0;
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-21 16:04 ` [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types Kevin Yang
@ 2026-01-22 20:04 ` Gerhard Engleder
2026-01-22 22:28 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Gerhard Engleder @ 2026-01-22 20:04 UTC (permalink / raw)
To: Kevin Yang, Jakub Kicinski
Cc: netdev, Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn,
David Miller, Eric Dumazet, Paolo Abeni, Joshua Washington,
Richard Cochran
On 21.01.26 17:04, Kevin Yang wrote:
> Network device hardware timestamps (hwtstamps) and the system's
> clock (ktime) often originate from different clock domains.
> This makes it hard to directly calculate the duration between
> a hardware-timestamped event and a system-time event by simple
> subtraction.
>
> This patch extends ndo_get_tstamp to allow a netdev to provide
> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> driver to either perform a conversion by estimating or, if the
> clocks are kept synchronized, return the original timestamp directly.
> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> a use surfaces.
>
> This is useful for features that need to measure the delay between
> a packet's hardware arrival/departure and a later software event.
> For example, the TCP stack can use this to measure precise
> packet receive delays, which is a requirement for the upcoming
> TCP Swift [1] congestion control algorithm.
>
> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> for congestion control in the datacenter." Proceedings of the
> Annual conference of the ACM Special Interest Group on Data
> Communication on the applications, technologies, architectures,
> and protocols for computer communication. 2020.
>
> Signed-off-by: Kevin Yang <yyd@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
Like Jakub in his reply
https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
for me also the question why this is a driver implementation came to my
mind.
With vclocks it is already possible to get timestamps for arbitrary
clock domains in parallel. So it is already possible to synchronize
the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
Therefore, user space synchronisation is needed, but e.g. ptp4l does
a much better synchronisation job than your solution.
Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
saving this clock jumps. IMO these jumps will also be problem for
your solution, as it will lead to wrong delays two times a year.
So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
To sum up: IMO you suggest a driver specific in-kernel solution where
already a driver independent user space solution with higher accuracy
exists.
Gerhard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-22 20:04 ` Gerhard Engleder
@ 2026-01-22 22:28 ` Willem de Bruijn
2026-01-25 19:45 ` Gerhard Engleder
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2026-01-22 22:28 UTC (permalink / raw)
To: Gerhard Engleder, Kevin Yang, Jakub Kicinski
Cc: netdev, Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn,
David Miller, Eric Dumazet, Paolo Abeni, Joshua Washington,
Richard Cochran
Gerhard Engleder wrote:
> On 21.01.26 17:04, Kevin Yang wrote:
> > Network device hardware timestamps (hwtstamps) and the system's
> > clock (ktime) often originate from different clock domains.
> > This makes it hard to directly calculate the duration between
> > a hardware-timestamped event and a system-time event by simple
> > subtraction.
> >
> > This patch extends ndo_get_tstamp to allow a netdev to provide
> > a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> > driver to either perform a conversion by estimating or, if the
> > clocks are kept synchronized, return the original timestamp directly.
> > Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> > a use surfaces.
> >
> > This is useful for features that need to measure the delay between
> > a packet's hardware arrival/departure and a later software event.
> > For example, the TCP stack can use this to measure precise
> > packet receive delays, which is a requirement for the upcoming
> > TCP Swift [1] congestion control algorithm.
> >
> > [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> > for congestion control in the datacenter." Proceedings of the
> > Annual conference of the ACM Special Interest Group on Data
> > Communication on the applications, technologies, architectures,
> > and protocols for computer communication. 2020.
> >
> > Signed-off-by: Kevin Yang <yyd@google.com>
> > Reviewed-by: Willem de Bruijn <willemb@google.com>
>
> Like Jakub in his reply
> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> for me also the question why this is a driver implementation came to my
> mind.
>
> With vclocks it is already possible to get timestamps for arbitrary
> clock domains in parallel. So it is already possible to synchronize
> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> a much better synchronisation job than your solution.
>
> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> saving this clock jumps. IMO these jumps will also be problem for
> your solution, as it will lead to wrong delays two times a year.
> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
>
> To sum up: IMO you suggest a driver specific in-kernel solution where
> already a driver independent user space solution with higher accuracy
> exists.
Definitely a promising alternative.
With multiple netdevices, a TCP listener socket may receive packets
from all devices. This would need new infrastructure to lookup the
correct vclock for a given net_device, cannot hardcode a choice with
SOF_TIMESTAMPING_BIND_PHC.
And this needs to happen for every packet, so with minimal overhead.
Though for established connections the expectation will be that
packets generally arrive on the same netdevice. Bar infrequent path
changes such as from sk_rethink_txhash on the peer. So there this
value can perhaps be cached.
It would still have to be learned by the kernel, no explicit
setsockopt.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-22 22:28 ` Willem de Bruijn
@ 2026-01-25 19:45 ` Gerhard Engleder
2026-01-25 21:41 ` Willem de Bruijn
0 siblings, 1 reply; 8+ messages in thread
From: Gerhard Engleder @ 2026-01-25 19:45 UTC (permalink / raw)
To: Willem de Bruijn, Kevin Yang, Jakub Kicinski
Cc: netdev, Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn,
David Miller, Eric Dumazet, Paolo Abeni, Joshua Washington,
Richard Cochran
On 22.01.26 23:28, Willem de Bruijn wrote:
> Gerhard Engleder wrote:
>> On 21.01.26 17:04, Kevin Yang wrote:
>>> Network device hardware timestamps (hwtstamps) and the system's
>>> clock (ktime) often originate from different clock domains.
>>> This makes it hard to directly calculate the duration between
>>> a hardware-timestamped event and a system-time event by simple
>>> subtraction.
>>>
>>> This patch extends ndo_get_tstamp to allow a netdev to provide
>>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
>>> driver to either perform a conversion by estimating or, if the
>>> clocks are kept synchronized, return the original timestamp directly.
>>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
>>> a use surfaces.
>>>
>>> This is useful for features that need to measure the delay between
>>> a packet's hardware arrival/departure and a later software event.
>>> For example, the TCP stack can use this to measure precise
>>> packet receive delays, which is a requirement for the upcoming
>>> TCP Swift [1] congestion control algorithm.
>>>
>>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
>>> for congestion control in the datacenter." Proceedings of the
>>> Annual conference of the ACM Special Interest Group on Data
>>> Communication on the applications, technologies, architectures,
>>> and protocols for computer communication. 2020.
>>>
>>> Signed-off-by: Kevin Yang <yyd@google.com>
>>> Reviewed-by: Willem de Bruijn <willemb@google.com>
>>
>> Like Jakub in his reply
>> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
>> for me also the question why this is a driver implementation came to my
>> mind.
>>
>> With vclocks it is already possible to get timestamps for arbitrary
>> clock domains in parallel. So it is already possible to synchronize
>> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
>> Therefore, user space synchronisation is needed, but e.g. ptp4l does
>> a much better synchronisation job than your solution.
>>
>> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
>> saving this clock jumps. IMO these jumps will also be problem for
>> your solution, as it will lead to wrong delays two times a year.
>> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
>>
>> To sum up: IMO you suggest a driver specific in-kernel solution where
>> already a driver independent user space solution with higher accuracy
>> exists.
>
> Definitely a promising alternative.
>
> With multiple netdevices, a TCP listener socket may receive packets
> from all devices. This would need new infrastructure to lookup the
> correct vclock for a given net_device, cannot hardcode a choice with
> SOF_TIMESTAMPING_BIND_PHC.
>
> And this needs to happen for every packet, so with minimal overhead.
>
> Though for established connections the expectation will be that
> packets generally arrive on the same netdevice. Bar infrequent path
> changes such as from sk_rethink_txhash on the peer. So there this
> value can perhaps be cached.
>
> It would still have to be learned by the kernel, no explicit
> setsockopt.
Maybe it would also be an option, that the kernel learns with which
clock domain the timestamps of the PHC and vclocks correlate. Then
the TCP stack could calculate the delay if it finds a valid e.g.
CLOCK_MONOTONIC timestamp in the packet. This would make the
TCP listener socket independent from the devices. Just an idea, without
thinking about implementation details.
Gerhard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-25 19:45 ` Gerhard Engleder
@ 2026-01-25 21:41 ` Willem de Bruijn
2026-01-27 23:13 ` Kevin Yang
0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2026-01-25 21:41 UTC (permalink / raw)
To: Gerhard Engleder, Willem de Bruijn, Kevin Yang, Jakub Kicinski
Cc: netdev, Willem de Bruijn, Harshitha Ramamurthy, Andrew Lunn,
David Miller, Eric Dumazet, Paolo Abeni, Joshua Washington,
Richard Cochran
Gerhard Engleder wrote:
> On 22.01.26 23:28, Willem de Bruijn wrote:
> > Gerhard Engleder wrote:
> >> On 21.01.26 17:04, Kevin Yang wrote:
> >>> Network device hardware timestamps (hwtstamps) and the system's
> >>> clock (ktime) often originate from different clock domains.
> >>> This makes it hard to directly calculate the duration between
> >>> a hardware-timestamped event and a system-time event by simple
> >>> subtraction.
> >>>
> >>> This patch extends ndo_get_tstamp to allow a netdev to provide
> >>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> >>> driver to either perform a conversion by estimating or, if the
> >>> clocks are kept synchronized, return the original timestamp directly.
> >>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> >>> a use surfaces.
> >>>
> >>> This is useful for features that need to measure the delay between
> >>> a packet's hardware arrival/departure and a later software event.
> >>> For example, the TCP stack can use this to measure precise
> >>> packet receive delays, which is a requirement for the upcoming
> >>> TCP Swift [1] congestion control algorithm.
> >>>
> >>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> >>> for congestion control in the datacenter." Proceedings of the
> >>> Annual conference of the ACM Special Interest Group on Data
> >>> Communication on the applications, technologies, architectures,
> >>> and protocols for computer communication. 2020.
> >>>
> >>> Signed-off-by: Kevin Yang <yyd@google.com>
> >>> Reviewed-by: Willem de Bruijn <willemb@google.com>
> >>
> >> Like Jakub in his reply
> >> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> >> for me also the question why this is a driver implementation came to my
> >> mind.
> >>
> >> With vclocks it is already possible to get timestamps for arbitrary
> >> clock domains in parallel. So it is already possible to synchronize
> >> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> >> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> >> a much better synchronisation job than your solution.
> >>
> >> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> >> saving this clock jumps. IMO these jumps will also be problem for
> >> your solution, as it will lead to wrong delays two times a year.
> >> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
> >>
> >> To sum up: IMO you suggest a driver specific in-kernel solution where
> >> already a driver independent user space solution with higher accuracy
> >> exists.
> >
> > Definitely a promising alternative.
> >
> > With multiple netdevices, a TCP listener socket may receive packets
> > from all devices. This would need new infrastructure to lookup the
> > correct vclock for a given net_device, cannot hardcode a choice with
> > SOF_TIMESTAMPING_BIND_PHC.
> >
> > And this needs to happen for every packet, so with minimal overhead.
> >
> > Though for established connections the expectation will be that
> > packets generally arrive on the same netdevice. Bar infrequent path
> > changes such as from sk_rethink_txhash on the peer. So there this
> > value can perhaps be cached.
> >
> > It would still have to be learned by the kernel, no explicit
> > setsockopt.
>
> Maybe it would also be an option, that the kernel learns with which
> clock domain the timestamps of the PHC and vclocks correlate. Then
> the TCP stack could calculate the delay if it finds a valid e.g.
> CLOCK_MONOTONIC timestamp in the packet. This would make the
> TCP listener socket independent from the devices. Just an idea, without
> thinking about implementation details.
I think we're on the same page.
- use the existing vclocks
- look up the right vclock based on the original incoming iface
- cache this known clock with an established socket
But I also have not looked at how/whether the lookup infra can be
implemented to find a vclock automatically, i.e., without userspace
admin.
In some cases shinfo hwtstamp raw format may actually be the
CLOCK_REALTIME that TCP requires. But if the raw clock is not
realtime, we'll have to adjust based on timecounter/cyclecounter.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
2026-01-25 21:41 ` Willem de Bruijn
@ 2026-01-27 23:13 ` Kevin Yang
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Yang @ 2026-01-27 23:13 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Gerhard Engleder, Jakub Kicinski, netdev, Willem de Bruijn,
Harshitha Ramamurthy, Andrew Lunn, David Miller, Eric Dumazet,
Paolo Abeni, Joshua Washington, Richard Cochran
Just to clarify, the ptp vclocks approach is not contradictory to this patch.
I think converting the timestamp should be a net_device ndo function. Since
- a TCP socket may receive packets from multiple net_devices
- only the driver is aware of the device clock's details and whether
a conversion is actually required
That conversion is implemented per device:
- Some devices can identify the correct vclock and call
ptp_convert_timestamp(&hwtstamp, vclock). This is a valid approach,
but it requires the system admin to run phc2sys to sync the vclock to
REALTIME(or MONOTONIC), alongside necessary lookup infrastructure.
- Some devices may already sync their clock to REALTIME natively.
In this case, hwtstamp is returned as-is without conversion.
- Some devices may handle conversion internally without using PTP.
This is the case for our current GVE patch.
With that, I think this patch (extending ndo_get_tstamp) has value
regardless of the specific driver implementation.
As for the second patch and the question of why GVE does not use vclocks:
The current GVE patch is simple and self-contained. Switching to vclocks
would likely require adding PTP infrastructure to look up a vclock without
user interaction, which complicates development compared to the current
approach. Also, ptp_convert_timestamp involves a mutex lock, this might
raise performance concerns since the usage is on the TCP RX fast path.
On Sun, Jan 25, 2026 at 4:41 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Gerhard Engleder wrote:
> > On 22.01.26 23:28, Willem de Bruijn wrote:
> > > Gerhard Engleder wrote:
> > >> On 21.01.26 17:04, Kevin Yang wrote:
> > >>> Network device hardware timestamps (hwtstamps) and the system's
> > >>> clock (ktime) often originate from different clock domains.
> > >>> This makes it hard to directly calculate the duration between
> > >>> a hardware-timestamped event and a system-time event by simple
> > >>> subtraction.
> > >>>
> > >>> This patch extends ndo_get_tstamp to allow a netdev to provide
> > >>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> > >>> driver to either perform a conversion by estimating or, if the
> > >>> clocks are kept synchronized, return the original timestamp directly.
> > >>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> > >>> a use surfaces.
> > >>>
> > >>> This is useful for features that need to measure the delay between
> > >>> a packet's hardware arrival/departure and a later software event.
> > >>> For example, the TCP stack can use this to measure precise
> > >>> packet receive delays, which is a requirement for the upcoming
> > >>> TCP Swift [1] congestion control algorithm.
> > >>>
> > >>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> > >>> for congestion control in the datacenter." Proceedings of the
> > >>> Annual conference of the ACM Special Interest Group on Data
> > >>> Communication on the applications, technologies, architectures,
> > >>> and protocols for computer communication. 2020.
> > >>>
> > >>> Signed-off-by: Kevin Yang <yyd@google.com>
> > >>> Reviewed-by: Willem de Bruijn <willemb@google.com>
> > >>
> > >> Like Jakub in his reply
> > >> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> > >> for me also the question why this is a driver implementation came to my
> > >> mind.
> > >>
> > >> With vclocks it is already possible to get timestamps for arbitrary
> > >> clock domains in parallel. So it is already possible to synchronize
> > >> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> > >> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> > >> a much better synchronisation job than your solution.
> > >>
> > >> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> > >> saving this clock jumps. IMO these jumps will also be problem for
> > >> your solution, as it will lead to wrong delays two times a year.
> > >> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
> > >>
> > >> To sum up: IMO you suggest a driver specific in-kernel solution where
> > >> already a driver independent user space solution with higher accuracy
> > >> exists.
> > >
> > > Definitely a promising alternative.
> > >
> > > With multiple netdevices, a TCP listener socket may receive packets
> > > from all devices. This would need new infrastructure to lookup the
> > > correct vclock for a given net_device, cannot hardcode a choice with
> > > SOF_TIMESTAMPING_BIND_PHC.
> > >
> > > And this needs to happen for every packet, so with minimal overhead.
> > >
> > > Though for established connections the expectation will be that
> > > packets generally arrive on the same netdevice. Bar infrequent path
> > > changes such as from sk_rethink_txhash on the peer. So there this
> > > value can perhaps be cached.
> > >
> > > It would still have to be learned by the kernel, no explicit
> > > setsockopt.
> >
> > Maybe it would also be an option, that the kernel learns with which
> > clock domain the timestamps of the PHC and vclocks correlate. Then
> > the TCP stack could calculate the delay if it finds a valid e.g.
> > CLOCK_MONOTONIC timestamp in the packet. This would make the
> > TCP listener socket independent from the devices. Just an idea, without
> > thinking about implementation details.
>
> I think we're on the same page.
>
> - use the existing vclocks
> - look up the right vclock based on the original incoming iface
> - cache this known clock with an established socket
>
> But I also have not looked at how/whether the lookup infra can be
> implemented to find a vclock automatically, i.e., without userspace
> admin.
>
> In some cases shinfo hwtstamp raw format may actually be the
> CLOCK_REALTIME that TCP requires. But if the raw clock is not
> realtime, we'll have to adjust based on timecounter/cyclecounter.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-27 23:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-21 16:04 [PATCH net-next v2 0/2] net: extend ndo_get_tstamp and implement in gve Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types Kevin Yang
2026-01-22 20:04 ` Gerhard Engleder
2026-01-22 22:28 ` Willem de Bruijn
2026-01-25 19:45 ` Gerhard Engleder
2026-01-25 21:41 ` Willem de Bruijn
2026-01-27 23:13 ` Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 2/2] gve: implement ndo_get_tstamp Kevin Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox