From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BF04374721; Wed, 20 May 2026 21:08:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779311327; cv=none; b=rjmmvVPaFTl7Yc3/eRlSrUwLFqVEztEMGu2+BsyVud17gK0nmCNiVuiXkM5fJr3YylqIxk3qeTSvSCDruQZoD76vr+QEUpxwQ/ehgzbMi4j5MYyZoNQjrjvZi4yKmXtUAQyQjpKjYZ0nKZJNgiOacxQA5wUKOO6jRZe6VWD3EKw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779311327; c=relaxed/simple; bh=Ipv9L2oY2ZKvCKuBrESr6hIXmrYLVtmJWQQqoo5YJJA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=cqlqKOYXQ6gTRQc1R70XixO9sXYTksP4m0JAUGtDrjb5EfObdAArbeEOxkO72/XPW5nX+0NRBLCOa9uz7eiVZBejah1cBNKgkmqKGl3WAwxjx6C8C4gJ870Hsk5PwHfn95SZnXBqQr1oNE1EKcAVzgVWD6RuZLBec0t6jVHhDJg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RqdqhYBQ; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RqdqhYBQ" Received: by smtp.kernel.org (Postfix) with UTF8SMTPSA id 6486A1F00A39; Wed, 20 May 2026 21:08:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779311326; bh=oLNLi08itDX7jR+qOyWa8710Li0EKz+b0ONR4MNoKgE=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=RqdqhYBQ56nQHSQTqimZ8o0JAqysXNys3c28ivdp3OIccaktMzHeSr1/j/JQo7k2b Bf3JxK9NhGDMrxsyUJ2msHy4dLs2rT2zJoAiX/YjlLwjuCE8JlosXdJpgv/bMx3nPt FLvhWjUdpxgiIxdZeLRE68NqQUA3chSj1+IJ4ZfF69iOifUDl8AGJ8mJrWAKfRtHwi 24aoLxD7S7+4UyhVCwxYVeIWPybTf5iVi4r8mzJKGIKAVC+ElejQR7EnzOGbuBn5D6 bvgG25SJAuIREqzDKZcgBghHEsnzS7lQbbjiQG1ajfjEHdP9PIggTdlpTveREhITJi 0kagzgvW07pqQ== From: Thomas Gleixner To: Harshitha Ramamurthy , netdev@vger.kernel.org Cc: joshwash@google.com, hramamurthy@google.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com, jstultz@google.com, sboyd@kernel.org, willemb@google.com, nktgrg@google.com, jfraker@google.com, ziweixiao@google.com, maolson@google.com, jordanrhee@google.com, thostet@google.com, alok.a.tiwari@oracle.com, pkaligineedi@google.com, horms@kernel.org, dwmw2@infradead.org, jacob.e.keller@intel.com, yyd@google.com, jefrogers@google.com, linux-kernel@vger.kernel.org, Naman Gulati Subject: Re: [PATCH net-next v8 3/3] gve: implement PTP gettimex64 In-Reply-To: <20260514225842.110706-4-hramamurthy@google.com> References: <20260514225842.110706-1-hramamurthy@google.com> <20260514225842.110706-4-hramamurthy@google.com> Date: Wed, 20 May 2026 23:08:42 +0200 Message-ID: <87mrxtwzz9.ffs@tglx> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Thu, May 14 2026 at 22:58, Harshitha Ramamurthy wrote: > From: Jordan Rhee > > Enable chrony and phc2sys to synchronize system clock to NIC clock. > > Two paths are implemented: a precise path using system counter values > sampled by the device, and a fallback path using system counter values > sampled in the driver using ptp_read_system_prets()/postts(). > > To use the precise path, the current system clocksource must match the > units returned by the device, which on x86 is X86_TSC and on ARM64 is > ARM_ARCH_COUNTER. The clockid requested for the cross-timestamp must > be either CLOCK_REALTIME or CLOCK_MONOTONIC_RAW. These conditions hold > by default on GCP VMs using Chrony, so we expect the precise path to be > used the vast majority of the time. If the system clocksource is changed > to kvm-clock, it activates the fallback path. Ethtool counters have been > added to count how many times each path is used. > > The uncertainty window in the precise path is typically around 1-2us, > while in the fallback path is around 60-80us. > > Stub implementions of adjfine and adjtime are added to avoid NULL > dereference when phc2sys tries to adjust the clock. Sorry that I did not react to this earlier, but I was burried in regressions and allowed myself to take a few days off. > +/* > + * Read NIC clock by issuing the AQ command. The command is subject to > + * rate limiting and may need to be retried. Requires nic_ts_read_lock > + * to be held. > + */ > +static int gve_ptp_read_timestamp(struct gve_ptp *ptp, cycles_t *pre_cycles, > + cycles_t *post_cycles) > +{ > + unsigned long deadline = jiffies + msecs_to_jiffies(100); > + unsigned long delay_us = 1000; > + int err; > + > + lockdep_assert_held(&ptp->nic_ts_read_lock); > + > + do { > + *pre_cycles = get_cycles(); This is a completely non-sensical hack. Why do you need get_cycles() here, which is a horrible and ill-defined interface that others are trying to get rid of? Just because it's still there is not a good answer and it's actually not required at all. See below. > + err = gve_adminq_report_nic_ts(ptp->priv, > + ptp->nic_ts_report_bus); > + > + /* Prevent get_cycles() from being speculatively executed > + * before the AdminQ command > + */ > + rmb(); > + *post_cycles = get_cycles(); > + if (likely(err != -EAGAIN)) > + return err; > + > + fsleep(delay_us); > + > + /* Exponential backoff */ > + delay_us *= 2; > + } while (time_before(jiffies, deadline)); > + > + return -ETIMEDOUT; > +} > + > /* Read the nic timestamp from hardware via the admin queue. */ > -static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw) > +static int gve_clock_nic_ts_read(struct gve_ptp *ptp, u64 *nic_raw, > + struct gve_sysclock_sample *sysclock) > { > + cycles_t host_pre_cycles, host_post_cycles; > + struct gve_nic_ts_report *ts_report; > int err; > > mutex_lock(&ptp->nic_ts_read_lock); Why is this lock taken _after_ the regular (fallback) pre/post timestamps are aquired? What's wrong with doing: do { guard(mutex)(&ptp->nic_ts_read_lock); ptp_read_system_prets(sts); err = read_nic_timestamp(...); ptp_read_system_postts(sts); } while (err && !timeout(...)); return err; which moves the "fallback" timestamp right next to the actual hardware readout? That would make the "fallback" case too precise, right? After noticing the obvious, I just checked what our new AI review overlord "sashiko" has to say about this: "Does wrapping the timestamp read outside the locks and retry loops skew the measurements? In the fallback path, ptp_read_system_prets() is captured before calling gve_clock_nic_ts_read(), and postts() is captured after it returns. However, gve_clock_nic_ts_read() acquires ptp->nic_ts_read_lock and calls gve_ptp_read_timestamp(), which can sleep using fsleep() during retries. This means the prets/postts window could include mutex wait times and sleep durations." It _IS_ actually on spot. So why the heck did this garbage get merged without addressing this obvious fail? What "sashiko" actually saw aside of that, which I didn't see myself when looking at this, is: "Similarly, in gve_ptp_read_timestamp(), the get_cycles() bounds are taken outside the gve_adminq_report_nic_ts() call, which acquires the shared priv->adminq_lock. If there is lock contention, the cycle bounds could become extremely wide." Oh well. So what you really want is: do { guard(mutex)(&ptp->nic_ts_read_lock); guard(mutex)(&gpe_priv->adminq_lock); ptp_read_system_prets(sts); err = read_nic_timestamp(...) { ... return gve_adminq_execute_cmd_locked(....); } ptp_read_system_postts(sts); } while (err && !timeout(...)); return err; Or just rely on gpe_priv->adminq_lock in the first place. No? But what's worse and escaped the AI scrutiny is: Your attempt to convert these 'get_cycles()' time stamps after the fact and outside of the timekeeping core protection is fundamentally wrong. get_device_system_crosststamp() does do { seq = read_seqcount_begin(&tk_core.seq); take_timestamps(...); convert_timestamps(...) } while (read_seqcount_retry(&tk_core.seq, seq)); which covers the whole sequence of taking the snapshots from the hardware and the conversion. Your abuse of this is completely out of spec. Just because it did not blow up in your face yet does not make it in any way correct. You do not even get bonus points for creative abuse. Now let me look at the "firmware" provided timestamps. That's the poor mans emulation of actual hardware timestamps aka PTM. Why do you need all this pre_nic/post_nic muck there? Fix your firmware to actually honor PTM and not provide some made up version of it. Also the whole debug mechanism of comparing the time stamps of get_cycles() with the timestamps provided by firmware tells me that your firmware is a hacked together piece of garbage. If you want to use "hardware" timestamps then populate the ptp.info.getcrosststamp() callback. If your firmware really can't do PTM and only provides pre and post values, then you simply can take the midpoint in your getcrosststamp() callback: do { guard(mutex)(&ptp->nic_ts_read_lock); err = read_nic_timestamp(...); system_counterval->cycles = (nic_pre + nic_post) / 2; } while (err && !timeout(...)); If you do not even trust your firmware then you still can validate it easily without creating horrible hacks: do { guard(mutex)(...); ptp_read_system_prets(&ctx->sts); err = read_nic_timestamp(...); system_counterval->cycles = (nic_pre + nic_post) / 2; ptp_read_system_postts(&ctx->sts); } while (err && !timeout(...)); and then check the ctx->sts timestamps against the results provided by get_device_system_crosststamp() after conversion in your ptp.info.getcrosststamp() callback: err = get_device_system_crosststamp(..., &ctx); if (err) return err; if (ctx.sts.pre_ts > xtstamp.sys... || ctx.sts.post_ts < xtstamp.sys...) return -EBROKENFIRMWARE; no? Jakub, please back out this hackery before it makes a precedence for others to copy from. Thanks, tglx