From: Thomas Gleixner <tglx@kernel.org>
To: Harshitha Ramamurthy <hramamurthy@google.com>, netdev@vger.kernel.org
Cc: joshwash@google.com, hramamurthy@google.com,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
jstultz@google.com, sboyd@kernel.org, willemb@google.com,
nktgrg@google.com, jfraker@google.com, ziweixiao@google.com,
maolson@google.com, jordanrhee@google.com, thostet@google.com,
alok.a.tiwari@oracle.com, pkaligineedi@google.com,
horms@kernel.org, dwmw2@infradead.org, jacob.e.keller@intel.com,
yyd@google.com, jefrogers@google.com,
linux-kernel@vger.kernel.org,
Naman Gulati <namangulati@google.com>
Subject: Re: [PATCH net-next v8 3/3] gve: implement PTP gettimex64
Date: Wed, 20 May 2026 23:08:42 +0200 [thread overview]
Message-ID: <87mrxtwzz9.ffs@tglx> (raw)
In-Reply-To: <20260514225842.110706-4-hramamurthy@google.com>
On Thu, May 14 2026 at 22:58, Harshitha Ramamurthy wrote:
> From: Jordan Rhee <jordanrhee@google.com>
>
> Enable chrony and phc2sys to synchronize system clock to NIC clock.
>
> Two paths are implemented: a precise path using system counter values
> sampled by the device, and a fallback path using system counter values
> sampled in the driver using ptp_read_system_prets()/postts().
>
> To use the precise path, the current system clocksource must match the
> units returned by the device, which on x86 is X86_TSC and on ARM64 is
> ARM_ARCH_COUNTER. The clockid requested for the cross-timestamp must
> be either CLOCK_REALTIME or CLOCK_MONOTONIC_RAW. These conditions hold
> by default on GCP VMs using Chrony, so we expect the precise path to be
> used the vast majority of the time. If the system clocksource is changed
> to kvm-clock, it activates the fallback path. Ethtool counters have been
> added to count how many times each path is used.
>
> The uncertainty window in the precise path is typically around 1-2us,
> while in the fallback path is around 60-80us.
>
> Stub implementions of adjfine and adjtime are added to avoid NULL
> dereference when phc2sys tries to adjust the clock.
Sorry that I did not react to this earlier, but I was burried in
regressions and allowed myself to take a few days off.
> +/*
> + * Read NIC clock by issuing the AQ command. The command is subject to
> + * rate limiting and may need to be retried. Requires nic_ts_read_lock
> + * to be held.
> + */
> +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles,
> + cycles_t *post_cycles)
> +{
> + unsigned long deadline = jiffies + msecs_to_jiffies(100);
> + unsigned long delay_us = 1000;
> + int err;
> +
> + lockdep_assert_held(&ptp->nic_ts_read_lock);
> +
> + do {
> + *pre_cycles = get_cycles();
This is a completely non-sensical hack. Why do you need get_cycles()
here, which is a horrible and ill-defined interface that others are
trying to get rid of?
Just because it's still there is not a good answer and it's actually not
required at all. See below.
> + err = gve_adminq_report_nic_ts(ptp->priv,
> + ptp->nic_ts_report_bus);
> +
> + /* Prevent get_cycles() from being speculatively executed
> + * before the AdminQ command
> + */
> + rmb();
> + *post_cycles = get_cycles();
> + if (likely(err != -EAGAIN))
> + return err;
> +
> + fsleep(delay_us);
> +
> + /* Exponential backoff */
> + delay_us *= 2;
> + } while (time_before(jiffies, deadline));
> +
> + return -ETIMEDOUT;
> +}
> +
> /* Read the nic timestamp from hardware via the admin queue. */
> -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw)
> +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw,
> + struct gve_sysclock_sample *sysclock)
> {
> + cycles_t host_pre_cycles, host_post_cycles;
> + struct gve_nic_ts_report *ts_report;
> int err;
>
> mutex_lock(&ptp->nic_ts_read_lock);
Why is this lock taken _after_ the regular (fallback) pre/post
timestamps are aquired?
What's wrong with doing:
do {
guard(mutex)(&ptp->nic_ts_read_lock);
ptp_read_system_prets(sts);
err = read_nic_timestamp(...);
ptp_read_system_postts(sts);
} while (err && !timeout(...));
return err;
which moves the "fallback" timestamp right next to the actual hardware
readout?
That would make the "fallback" case too precise, right?
After noticing the obvious, I just checked what our new AI review
overlord "sashiko" has to say about this:
"Does wrapping the timestamp read outside the locks and retry loops skew the
measurements?
In the fallback path, ptp_read_system_prets() is captured before calling
gve_clock_nic_ts_read(), and postts() is captured after it returns. However,
gve_clock_nic_ts_read() acquires ptp->nic_ts_read_lock and calls
gve_ptp_read_timestamp(), which can sleep using fsleep() during retries. This
means the prets/postts window could include mutex wait times and sleep
durations."
It _IS_ actually on spot. So why the heck did this garbage get merged
without addressing this obvious fail?
What "sashiko" actually saw aside of that, which I didn't see myself when
looking at this, is:
"Similarly, in gve_ptp_read_timestamp(), the get_cycles() bounds are taken
outside the gve_adminq_report_nic_ts() call, which acquires the shared
priv->adminq_lock. If there is lock contention, the cycle bounds could become
extremely wide."
Oh well.
So what you really want is:
do {
guard(mutex)(&ptp->nic_ts_read_lock);
guard(mutex)(&gpe_priv->adminq_lock);
ptp_read_system_prets(sts);
err = read_nic_timestamp(...) {
...
return gve_adminq_execute_cmd_locked(....);
}
ptp_read_system_postts(sts);
} while (err && !timeout(...));
return err;
Or just rely on gpe_priv->adminq_lock in the first place. No?
But what's worse and escaped the AI scrutiny is:
Your attempt to convert these 'get_cycles()' time stamps after the
fact and outside of the timekeeping core protection is fundamentally
wrong.
get_device_system_crosststamp() does
do {
seq = read_seqcount_begin(&tk_core.seq);
take_timestamps(...);
convert_timestamps(...)
} while (read_seqcount_retry(&tk_core.seq, seq));
which covers the whole sequence of taking the snapshots from the
hardware and the conversion.
Your abuse of this is completely out of spec. Just because it did not
blow up in your face yet does not make it in any way correct. You do not
even get bonus points for creative abuse.
Now let me look at the "firmware" provided timestamps.
That's the poor mans emulation of actual hardware timestamps aka PTM.
Why do you need all this pre_nic/post_nic muck there? Fix your firmware
to actually honor PTM and not provide some made up version of it.
Also the whole debug mechanism of comparing the time stamps of
get_cycles() with the timestamps provided by firmware tells me that your
firmware is a hacked together piece of garbage.
If you want to use "hardware" timestamps then populate the
ptp.info.getcrosststamp() callback.
If your firmware really can't do PTM and only provides pre and post
values, then you simply can take the midpoint in your getcrosststamp()
callback:
do {
guard(mutex)(&ptp->nic_ts_read_lock);
err = read_nic_timestamp(...);
system_counterval->cycles = (nic_pre + nic_post) / 2;
} while (err && !timeout(...));
If you do not even trust your firmware then you still can validate it
easily without creating horrible hacks:
do {
guard(mutex)(...);
ptp_read_system_prets(&ctx->sts);
err = read_nic_timestamp(...);
system_counterval->cycles = (nic_pre + nic_post) / 2;
ptp_read_system_postts(&ctx->sts);
} while (err && !timeout(...));
and then check the ctx->sts timestamps against the results provided by
get_device_system_crosststamp() after conversion in your
ptp.info.getcrosststamp() callback:
err = get_device_system_crosststamp(..., &ctx);
if (err)
return err;
if (ctx.sts.pre_ts > xtstamp.sys... || ctx.sts.post_ts < xtstamp.sys...)
return -EBROKENFIRMWARE;
no?
Jakub, please back out this hackery before it makes a precedence for others
to copy from.
Thanks,
tglx
next prev parent reply other threads:[~2026-05-20 21:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 22:58 [PATCH net-next v8 0/3] gve: add support for PTP gettimex64 Harshitha Ramamurthy
2026-05-14 22:58 ` [PATCH net-next v8 1/3] gve: skip error logging for retryable AdminQ commands Harshitha Ramamurthy
2026-05-14 22:58 ` [PATCH net-next v8 2/3] gve: make nic clock reads thread safe Harshitha Ramamurthy
2026-05-14 22:58 ` [PATCH net-next v8 3/3] gve: implement PTP gettimex64 Harshitha Ramamurthy
2026-05-20 21:08 ` Thomas Gleixner [this message]
2026-05-20 22:16 ` Jakub Kicinski
2026-05-20 1:30 ` [PATCH net-next v8 0/3] gve: add support for " patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mrxtwzz9.ffs@tglx \
--to=tglx@kernel.org \
--cc=alok.a.tiwari@oracle.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dwmw2@infradead.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=jacob.e.keller@intel.com \
--cc=jefrogers@google.com \
--cc=jfraker@google.com \
--cc=jordanrhee@google.com \
--cc=joshwash@google.com \
--cc=jstultz@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maolson@google.com \
--cc=namangulati@google.com \
--cc=netdev@vger.kernel.org \
--cc=nktgrg@google.com \
--cc=pabeni@redhat.com \
--cc=pkaligineedi@google.com \
--cc=richardcochran@gmail.com \
--cc=sboyd@kernel.org \
--cc=thostet@google.com \
--cc=willemb@google.com \
--cc=yyd@google.com \
--cc=ziweixiao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox