public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Harshitha Ramamurthy <hramamurthy@google.com>, netdev@vger.kernel.org
Cc: joshwash@google.com, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, richardcochran@gmail.com,
	jstultz@google.com, tglx@kernel.org, sboyd@kernel.org,
	willemb@google.com, nktgrg@google.com, jfraker@google.com,
	ziweixiao@google.com, maolson@google.com, jordanrhee@google.com,
	thostet@google.com, alok.a.tiwari@oracle.com,
	pkaligineedi@google.com, horms@kernel.org, dwmw2@infradead.org,
	jacob.e.keller@intel.com, yyd@google.com,
	linux-kernel@vger.kernel.org,
	Naman Gulati <namangulati@google.com>
Subject: Re: [PATCH net-next v5 3/3] gve: implement PTP gettimex64
Date: Tue, 5 May 2026 10:08:26 +0200	[thread overview]
Message-ID: <6259cf59-b054-42bb-9dad-5f62a54a2bc0@redhat.com> (raw)
In-Reply-To: <20260429012819.3102675-4-hramamurthy@google.com>

On 4/29/26 3:28 AM, Harshitha Ramamurthy wrote:
> From: Jordan Rhee <jordanrhee@google.com>
> 
> Enable chrony and phc2sys to synchronize system clock to NIC clock.
> 
> The system cycle counters are sampled by the device to minimize the
> uncertainty window. If the system times are sampled in the host, the
> delta between pre and post readings is 100us or more due to AQ command
> latency. The system times returned by the device have a delta of ~1us,
> which enables significantly more accurate clock synchronization.
> 
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Reviewed-by: Kevin Yang <yyd@google.com>
> Reviewed-by: Naman Gulati <namangulati@google.com>
> Signed-off-by: Jordan Rhee <jordanrhee@google.com>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
> ---
> Changes in v5:
> - Reformulate retry loop in terms of total timeout (Jakub Kicinski)
> 
> Changes in v3:
> - Take system time snapshot inside the mutex
> - Return -EOPNOTSUPP if cross-timestamp is requested on an arch other
>   than x86 or arm64
> 
> Changes in v2:
>  - fix compilation warning on ARM by casting cycles_t to u64
> ---
>  drivers/net/ethernet/google/gve/gve_adminq.h |   4 +-
>  drivers/net/ethernet/google/gve/gve_ptp.c    | 196 ++++++++++++++++++-
>  2 files changed, 191 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> index 22a74b6aa17e..e6dcf6da9091 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> @@ -411,8 +411,8 @@ static_assert(sizeof(struct gve_adminq_report_nic_ts) == 16);
>  
>  struct gve_nic_ts_report {
>  	__be64 nic_timestamp; /* NIC clock in nanoseconds */
> -	__be64 reserved1;
> -	__be64 reserved2;
> +	__be64 pre_cycles; /* System cycle counter before NIC clock read */
> +	__be64 post_cycles; /* System cycle counter after NIC clock read */
>  	__be64 reserved3;
>  	__be64 reserved4;
>  };
> diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> index ad15f1209a83..c6c98ef825aa 100644
> --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> @@ -10,28 +10,210 @@
>  /* Interval to schedule a nic timestamp calibration, 250ms. */
>  #define GVE_NIC_TS_SYNC_INTERVAL_MS 250
>  
> +/*
> + * Stores cycle counter samples in get_cycles() units from a
> + * sandwiched NIC clock read
> + */
> +struct gve_sysclock_sample {
> +	/* system time snapshot taken just before issuing AdminQ command */
> +	struct system_time_snapshot snapshot;
> +	/* Cycle counter from NIC before clock read */
> +	u64 nic_pre_cycles;
> +	/* Cycle counter from NIC after clock read */
> +	u64 nic_post_cycles;
> +	/* Cycle counter from host before issuing AQ command */
> +	cycles_t host_pre_cycles;
> +	/* Cycle counter from host after AQ command returns */
> +	cycles_t host_post_cycles;
> +};
> +
> +/*
> + * Read NIC clock by issuing the AQ command. The command is subject to
> + * rate limiting and may need to be retried. Requires nic_ts_read_lock
> + * to be held.
> + */
> +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles,
> +				  cycles_t *post_cycles,
> +				  struct system_time_snapshot *snap)
> +{
> +	unsigned long deadline = jiffies + msecs_to_jiffies(100);
> +	unsigned long delay_us = 1000;
> +	int err;
> +
> +	lockdep_assert_held(&ptp->nic_ts_read_lock);
> +
> +	do {
> +		if (snap)
> +			ktime_get_snapshot(snap);
> +
> +		*pre_cycles = get_cycles();
> +		err = gve_adminq_report_nic_ts(ptp->priv,
> +					       ptp->nic_ts_report_bus);
> +
> +		/* Prevent get_cycles() from being speculatively executed
> +		 * before the AdminQ command
> +		 */
> +		rmb();
> +		*post_cycles = get_cycles();
> +		if (likely(err != -EAGAIN))
> +			return err;
> +
> +		fsleep(delay_us);
> +
> +		/* Exponential backoff */
> +		delay_us *= 2;
> +	} while (time_before(jiffies, deadline));
> +
> +	return -ETIMEDOUT;
> +}
> +
>  /* Read the nic timestamp from hardware via the admin queue. */
> -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
> +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw,
> +				 struct gve_sysclock_sample *sysclock)
>  {
> +	cycles_t host_pre_cycles, host_post_cycles;
> +	struct gve_nic_ts_report *ts_report;
>  	int err;
>  
>  	mutex_lock(&ptp->nic_ts_read_lock);
> -	err = gve_adminq_report_nic_ts(ptp->priv, ptp->nic_ts_report_bus);
> -	if (err)
> +	err = gve_ptp_read_timestamp(ptp, &host_pre_cycles, &host_post_cycles,
> +				     sysclock ? &sysclock->snapshot : NULL);
> +	if (err) {
> +		dev_err_ratelimited(&ptp->priv->pdev->dev,
> +				    "AdminQ timestamp read failed: %d\n", err);
>  		goto out;
> +	}
>  
> -	*nic_raw = be64_to_cpu(ptp->nic_ts_report->nic_timestamp);
> +	ts_report = ptp->nic_ts_report;
> +	*nic_raw = be64_to_cpu(ts_report->nic_timestamp);
> +
> +	if (sysclock) {
> +		sysclock->nic_pre_cycles = be64_to_cpu(ts_report->pre_cycles);
> +		sysclock->nic_post_cycles = be64_to_cpu(ts_report->post_cycles);
> +		sysclock->host_pre_cycles = host_pre_cycles;
> +		sysclock->host_post_cycles = host_post_cycles;
> +	}
>  
>  out:
>  	mutex_unlock(&ptp->nic_ts_read_lock);
>  	return err;
>  }
>  
> +struct gve_cycles_to_clock_callback_ctx {
> +	u64 cycles;
> +};
> +
> +static int gve_cycles_to_clock_fn(ktime_t *device_time,
> +				  struct system_counterval_t *system_counterval,
> +				  void *ctx)
> +{
> +	struct gve_cycles_to_clock_callback_ctx *context = ctx;
> +
> +	*device_time = 0;
> +
> +	system_counterval->cycles = context->cycles;
> +	system_counterval->use_nsecs = false;
> +
> +	if (IS_ENABLED(CONFIG_X86))
> +		system_counterval->cs_id = CSID_X86_TSC;
> +	else if (IS_ENABLED(CONFIG_ARM64))
> +		system_counterval->cs_id = CSID_ARM_ARCH_COUNTER;
> +	else
> +		return -EOPNOTSUPP;
> +
> +	return 0;
> +}
> +
> +/*
> + * Convert a raw cycle count (e.g. from get_cycles()) to the system clock
> + * type specified by clockid. The system_time_snapshot must be taken before
> + * the cycle counter is sampled.
> + */
> +static int gve_cycles_to_timespec64(struct gve_priv *priv, clockid_t clockid,
> +				    struct system_time_snapshot *snap,
> +				    u64 cycles, struct timespec64 *ts)
> +{
> +	struct gve_cycles_to_clock_callback_ctx ctx = {0};
> +	struct system_device_crosststamp xtstamp;
> +	int err;
> +
> +	ctx.cycles = cycles;
> +	err = get_device_system_crosststamp(gve_cycles_to_clock_fn, &ctx, snap,
> +					    &xtstamp);
> +	if (err) {
> +		dev_err_ratelimited(&priv->pdev->dev,
> +				    "get_device_system_crosststamp() failed to convert %lld cycles to system time: %d\n",
> +				    cycles,
> +				    err);
> +		return err;
> +	}
> +
> +	switch (clockid) {
> +	case CLOCK_REALTIME:
> +		*ts = ktime_to_timespec64(xtstamp.sys_realtime);
> +		break;
> +	case CLOCK_MONOTONIC_RAW:
> +		*ts = ktime_to_timespec64(xtstamp.sys_monoraw);
> +		break;
> +	default:
> +		dev_err_ratelimited(&priv->pdev->dev,
> +				    "Cycle count conversion to clockid %d not supported\n",
> +				    clockid);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
>  static int gve_ptp_gettimex64(struct ptp_clock_info *info,
>  			      struct timespec64 *ts,
>  			      struct ptp_system_timestamp *sts)
>  {
> -	return -EOPNOTSUPP;
> +	struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> +	struct gve_sysclock_sample sysclock = {0};
> +	struct gve_priv *priv = ptp->priv;
> +	u64 nic_ts;
> +	int err;
> +
> +	if (sts && !(IS_ENABLED(CONFIG_X86) || IS_ENABLED(CONFIG_ARM64)))
> +		return -EOPNOTSUPP;
> +
> +	err = gve_clock_nic_ts_read(ptp, &nic_ts, sts ? &sysclock : NULL);
> +	if (err)
> +		return err;
> +
> +	if (sts) {
> +		/* Reject samples with out of order system clock values */
> +		if (!(sysclock.host_pre_cycles <= sysclock.nic_pre_cycles &&
> +		      sysclock.nic_pre_cycles  <= sysclock.nic_post_cycles &&
> +		      sysclock.nic_post_cycles <= sysclock.host_post_cycles)) {
> +			dev_err_ratelimited(&priv->pdev->dev,
> +					    "AdminQ system clock cycle counts out of order. Expecting %llu <= %llu <= %llu <= %llu\n",
> +					    (u64)sysclock.host_pre_cycles,
> +					    sysclock.nic_pre_cycles,
> +					    sysclock.nic_post_cycles,
> +					    (u64)sysclock.host_post_cycles);
> +			return -EBADMSG;

Sashiko/gemini is reporting the following:

---
If older firmware does not support this feature and returns 0 for the
pre_cycles and post_cycles fields, won't host_pre_cycles <=
nic_pre_cycles evaluate to false since get_cycles() will be greater than
0? If this occurs, the driver returns -EBADMSG instead of -EOPNOTSUPP.
Does returning -EBADMSG prevent userspace tools from gracefully falling
back to legacy PTP ioctls like PTP_SYS_OFFSET, causing PTP
synchronization to fail completely on older firmwares?
---

which looks legit to me, or am I missing something? Note that
proactively triaging sashiko comments would help maintainers a lot.

Thanks,

Paolo


  reply	other threads:[~2026-05-05  8:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29  1:28 [PATCH net-next v5 0/3] gve: add support for PTP gettimex64 Harshitha Ramamurthy
2026-04-29  1:28 ` [PATCH net-next v5 1/3] gve: skip error logging for retryable AdminQ commands Harshitha Ramamurthy
2026-04-29  1:28 ` [PATCH net-next v5 2/3] gve: make nic clock reads thread safe Harshitha Ramamurthy
2026-05-05 21:38   ` Jordan Rhee
2026-04-29  1:28 ` [PATCH net-next v5 3/3] gve: implement PTP gettimex64 Harshitha Ramamurthy
2026-05-05  8:08   ` Paolo Abeni [this message]
2026-05-05 18:22     ` Jordan Rhee
2026-05-05 18:53       ` Jordan Rhee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6259cf59-b054-42bb-9dad-5f62a54a2bc0@redhat.com \
    --to=pabeni@redhat.com \
    --cc=alok.a.tiwari@oracle.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dwmw2@infradead.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jfraker@google.com \
    --cc=jordanrhee@google.com \
    --cc=joshwash@google.com \
    --cc=jstultz@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maolson@google.com \
    --cc=namangulati@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=nktgrg@google.com \
    --cc=pkaligineedi@google.com \
    --cc=richardcochran@gmail.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@kernel.org \
    --cc=thostet@google.com \
    --cc=willemb@google.com \
    --cc=yyd@google.com \
    --cc=ziweixiao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox