From: Paolo Abeni <pabeni@redhat.com>
To: Harshitha Ramamurthy <hramamurthy@google.com>, netdev@vger.kernel.org
Cc: joshwash@google.com, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, richardcochran@gmail.com,
jstultz@google.com, tglx@kernel.org, sboyd@kernel.org,
willemb@google.com, nktgrg@google.com, jfraker@google.com,
ziweixiao@google.com, maolson@google.com, jordanrhee@google.com,
thostet@google.com, alok.a.tiwari@oracle.com,
pkaligineedi@google.com, horms@kernel.org, dwmw2@infradead.org,
jacob.e.keller@intel.com, yyd@google.com,
linux-kernel@vger.kernel.org,
Naman Gulati <namangulati@google.com>
Subject: Re: [PATCH net-next v5 3/3] gve: implement PTP gettimex64
Date: Tue, 5 May 2026 10:08:26 +0200 [thread overview]
Message-ID: <6259cf59-b054-42bb-9dad-5f62a54a2bc0@redhat.com> (raw)
In-Reply-To: <20260429012819.3102675-4-hramamurthy@google.com>
On 4/29/26 3:28 AM, Harshitha Ramamurthy wrote:
> From: Jordan Rhee <jordanrhee@google.com>
>
> Enable chrony and phc2sys to synchronize system clock to NIC clock.
>
> The system cycle counters are sampled by the device to minimize the
> uncertainty window. If the system times are sampled in the host, the
> delta between pre and post readings is 100us or more due to AQ command
> latency. The system times returned by the device have a delta of ~1us,
> which enables significantly more accurate clock synchronization.
>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Reviewed-by: Kevin Yang <yyd@google.com>
> Reviewed-by: Naman Gulati <namangulati@google.com>
> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> ---
> Changes in v5:
> - Reformulate retry loop in terms of total timeout (Jakub Kicinski)
>
> Changes in v3:
> - Take system time snapshot inside the mutex
> - Return -EOPNOTSUPP if cross-timestamp is requested on an arch other
> than x86 or arm64
>
> Changes in v2:
> - fix compilation warning on ARM by casting cycles_t to u64
> ---
> drivers/net/ethernet/google/gve/gve_adminq.h | 4 +-
> drivers/net/ethernet/google/gve/gve_ptp.c | 196 ++++++++++++++++++-
> 2 files changed, 191 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> index 22a74b6aa17e..e6dcf6da9091 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> @@ -411,8 +411,8 @@ static_assert(sizeof(struct gve_adminq_report_nic_ts) == 16);
>
> struct gve_nic_ts_report {
> __be64 nic_timestamp; /* NIC clock in nanoseconds */
> - __be64 reserved1;
> - __be64 reserved2;
> + __be64 pre_cycles; /* System cycle counter before NIC clock read */
> + __be64 post_cycles; /* System cycle counter after NIC clock read */
> __be64 reserved3;
> __be64 reserved4;
> };
> diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> index ad15f1209a83..c6c98ef825aa 100644
> --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> @@ -10,28 +10,210 @@
> /* Interval to schedule a nic timestamp calibration, 250ms. */
> #define GVE_NIC_TS_SYNC_INTERVAL_MS 250
>
> +/*
> + * Stores cycle counter samples in get_cycles() units from a
> + * sandwiched NIC clock read
> + */
> +struct gve_sysclock_sample {
> + /* system time snapshot taken just before issuing AdminQ command */
> + struct system_time_snapshot snapshot;
> + /* Cycle counter from NIC before clock read */
> + u64 nic_pre_cycles;
> + /* Cycle counter from NIC after clock read */
> + u64 nic_post_cycles;
> + /* Cycle counter from host before issuing AQ command */
> + cycles_t host_pre_cycles;
> + /* Cycle counter from host after AQ command returns */
> + cycles_t host_post_cycles;
> +};
> +
> +/*
> + * Read NIC clock by issuing the AQ command. The command is subject to
> + * rate limiting and may need to be retried. Requires nic_ts_read_lock
> + * to be held.
> + */
> +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles,
> + cycles_t *post_cycles,
> + struct system_time_snapshot *snap)
> +{
> + unsigned long deadline = jiffies + msecs_to_jiffies(100);
> + unsigned long delay_us = 1000;
> + int err;
> +
> + lockdep_assert_held(&ptp->nic_ts_read_lock);
> +
> + do {
> + if (snap)
> + ktime_get_snapshot(snap);
> +
> + *pre_cycles = get_cycles();
> + err = gve_adminq_report_nic_ts(ptp->priv,
> + ptp->nic_ts_report_bus);
> +
> + /* Prevent get_cycles() from being speculatively executed
> + * before the AdminQ command
> + */
> + rmb();
> + *post_cycles = get_cycles();
> + if (likely(err != -EAGAIN))
> + return err;
> +
> + fsleep(delay_us);
> +
> + /* Exponential backoff */
> + delay_us *= 2;
> + } while (time_before(jiffies, deadline));
> +
> + return -ETIMEDOUT;
> +}
> +
> /* Read the nic timestamp from hardware via the admin queue. */
> -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
> +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw,
> + struct gve_sysclock_sample *sysclock)
> {
> + cycles_t host_pre_cycles, host_post_cycles;
> + struct gve_nic_ts_report *ts_report;
> int err;
>
> mutex_lock(&ptp->nic_ts_read_lock);
> - err = gve_adminq_report_nic_ts(ptp->priv, ptp->nic_ts_report_bus);
> - if (err)
> + err = gve_ptp_read_timestamp(ptp, &host_pre_cycles, &host_post_cycles,
> + sysclock ? &sysclock->snapshot : NULL);
> + if (err) {
> + dev_err_ratelimited(&ptp->priv->pdev->dev,
> + "AdminQ timestamp read failed: %d\n", err);
> goto out;
> + }
>
> - *nic_raw = be64_to_cpu(ptp->nic_ts_report->nic_timestamp);
> + ts_report = ptp->nic_ts_report;
> + *nic_raw = be64_to_cpu(ts_report->nic_timestamp);
> +
> + if (sysclock) {
> + sysclock->nic_pre_cycles = be64_to_cpu(ts_report->pre_cycles);
> + sysclock->nic_post_cycles = be64_to_cpu(ts_report->post_cycles);
> + sysclock->host_pre_cycles = host_pre_cycles;
> + sysclock->host_post_cycles = host_post_cycles;
> + }
>
> out:
> mutex_unlock(&ptp->nic_ts_read_lock);
> return err;
> }
>
> +struct gve_cycles_to_clock_callback_ctx {
> + u64 cycles;
> +};
> +
> +static int gve_cycles_to_clock_fn(ktime_t *device_time,
> + struct system_counterval_t *system_counterval,
> + void *ctx)
> +{
> + struct gve_cycles_to_clock_callback_ctx *context = ctx;
> +
> + *device_time = 0;
> +
> + system_counterval->cycles = context->cycles;
> + system_counterval->use_nsecs = false;
> +
> + if (IS_ENABLED(CONFIG_X86))
> + system_counterval->cs_id = CSID_X86_TSC;
> + else if (IS_ENABLED(CONFIG_ARM64))
> + system_counterval->cs_id = CSID_ARM_ARCH_COUNTER;
> + else
> + return -EOPNOTSUPP;
> +
> + return 0;
> +}
> +
> +/*
> + * Convert a raw cycle count (e.g. from get_cycles()) to the system clock
> + * type specified by clockid. The system_time_snapshot must be taken before
> + * the cycle counter is sampled.
> + */
> +static int gve_cycles_to_timespec64(struct gve_priv *priv, clockid_t clockid,
> + struct system_time_snapshot *snap,
> + u64 cycles, struct timespec64 *ts)
> +{
> + struct gve_cycles_to_clock_callback_ctx ctx = {0};
> + struct system_device_crosststamp xtstamp;
> + int err;
> +
> + ctx.cycles = cycles;
> + err = get_device_system_crosststamp(gve_cycles_to_clock_fn, &ctx, snap,
> + &xtstamp);
> + if (err) {
> + dev_err_ratelimited(&priv->pdev->dev,
> + "get_device_system_crosststamp() failed to convert %lld cycles to system time: %d\n",
> + cycles,
> + err);
> + return err;
> + }
> +
> + switch (clockid) {
> + case CLOCK_REALTIME:
> + *ts = ktime_to_timespec64(xtstamp.sys_realtime);
> + break;
> + case CLOCK_MONOTONIC_RAW:
> + *ts = ktime_to_timespec64(xtstamp.sys_monoraw);
> + break;
> + default:
> + dev_err_ratelimited(&priv->pdev->dev,
> + "Cycle count conversion to clockid %d not supported\n",
> + clockid);
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> static int gve_ptp_gettimex64(struct ptp_clock_info *info,
> struct timespec64 *ts,
> struct ptp_system_timestamp *sts)
> {
> - return -EOPNOTSUPP;
> + struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> + struct gve_sysclock_sample sysclock = {0};
> + struct gve_priv *priv = ptp->priv;
> + u64 nic_ts;
> + int err;
> +
> + if (sts && !(IS_ENABLED(CONFIG_X86) || IS_ENABLED(CONFIG_ARM64)))
> + return -EOPNOTSUPP;
> +
> + err = gve_clock_nic_ts_read(ptp, &nic_ts, sts ? &sysclock : NULL);
> + if (err)
> + return err;
> +
> + if (sts) {
> + /* Reject samples with out of order system clock values */
> + if (!(sysclock.host_pre_cycles <= sysclock.nic_pre_cycles &&
> + sysclock.nic_pre_cycles <= sysclock.nic_post_cycles &&
> + sysclock.nic_post_cycles <= sysclock.host_post_cycles)) {
> + dev_err_ratelimited(&priv->pdev->dev,
> + "AdminQ system clock cycle counts out of order. Expecting %llu <= %llu <= %llu <= %llu\n",
> + (u64)sysclock.host_pre_cycles,
> + sysclock.nic_pre_cycles,
> + sysclock.nic_post_cycles,
> + (u64)sysclock.host_post_cycles);
> + return -EBADMSG;
Sashiko/gemini is reporting the following:
---
If older firmware does not support this feature and returns 0 for the
pre_cycles and post_cycles fields, won't host_pre_cycles <=
nic_pre_cycles evaluate to false since get_cycles() will be greater than
0? If this occurs, the driver returns -EBADMSG instead of -EOPNOTSUPP.
Does returning -EBADMSG prevent userspace tools from gracefully falling
back to legacy PTP ioctls like PTP_SYS_OFFSET, causing PTP
synchronization to fail completely on older firmwares?
---
which looks legit to me, or am I missing something? Note that
proactively triaging sashiko comments would help maintainers a lot.
Thanks,
Paolo
next prev parent reply other threads:[~2026-05-05 8:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 1:28 [PATCH net-next v5 0/3] gve: add support for PTP gettimex64 Harshitha Ramamurthy
2026-04-29 1:28 ` [PATCH net-next v5 1/3] gve: skip error logging for retryable AdminQ commands Harshitha Ramamurthy
2026-04-29 1:28 ` [PATCH net-next v5 2/3] gve: make nic clock reads thread safe Harshitha Ramamurthy
2026-05-05 21:38 ` Jordan Rhee
2026-04-29 1:28 ` [PATCH net-next v5 3/3] gve: implement PTP gettimex64 Harshitha Ramamurthy
2026-05-05 8:08 ` Paolo Abeni [this message]
2026-05-05 18:22 ` Jordan Rhee
2026-05-05 18:53 ` Jordan Rhee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6259cf59-b054-42bb-9dad-5f62a54a2bc0@redhat.com \
--to=pabeni@redhat.com \
--cc=alok.a.tiwari@oracle.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dwmw2@infradead.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=jacob.e.keller@intel.com \
--cc=jfraker@google.com \
--cc=jordanrhee@google.com \
--cc=joshwash@google.com \
--cc=jstultz@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maolson@google.com \
--cc=namangulati@google.com \
--cc=netdev@vger.kernel.org \
--cc=nktgrg@google.com \
--cc=pkaligineedi@google.com \
--cc=richardcochran@gmail.com \
--cc=sboyd@kernel.org \
--cc=tglx@kernel.org \
--cc=thostet@google.com \
--cc=willemb@google.com \
--cc=yyd@google.com \
--cc=ziweixiao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox