public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Gerhard Engleder <gerhard@engleder-embedded.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 Kevin Yang <yyd@google.com>,  Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org,  Willem de Bruijn <willemb@google.com>,
	 Harshitha Ramamurthy <hramamurthy@google.com>,
	 Andrew Lunn <andrew+netdev@lunn.ch>,
	 David Miller <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Joshua Washington <joshwash@google.com>,
	 Richard Cochran <richardcochran@gmail.com>
Subject: Re: [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types
Date: Sun, 25 Jan 2026 16:41:03 -0500	[thread overview]
Message-ID: <willemdebruijn.kernel.976a69fefaaa@gmail.com> (raw)
In-Reply-To: <9d21ddb4-5e7d-4dfa-9ee4-ecdf73499f5b@engleder-embedded.com>

Gerhard Engleder wrote:
> On 22.01.26 23:28, Willem de Bruijn wrote:
> > Gerhard Engleder wrote:
> >> On 21.01.26 17:04, Kevin Yang wrote:
> >>> Network device hardware timestamps (hwtstamps) and the system's
> >>> clock (ktime) often originate from different clock domains.
> >>> This makes it hard to directly calculate the duration between
> >>> a hardware-timestamped event and a system-time event by simple
> >>> subtraction.
> >>>
> >>> This patch extends ndo_get_tstamp to allow a netdev to provide
> >>> a hwtstamp into the system's CLOCK_REALTIME domain. This allows a
> >>> driver to either perform a conversion by estimating or, if the
> >>> clocks are kept synchronized, return the original timestamp directly.
> >>> Other clock domains, e.g. CLOCK_MONOTONIC_RAW can also be added when
> >>> a use surfaces.
> >>>
> >>> This is useful for features that need to measure the delay between
> >>> a packet's hardware arrival/departure and a later software event.
> >>> For example, the TCP stack can use this to measure precise
> >>> packet receive delays, which is a requirement for the upcoming
> >>> TCP Swift [1] congestion control algorithm.
> >>>
> >>> [1] Kumar, Gautam, et al. "Swift: Delay is simple and effective
> >>> for congestion control in the datacenter." Proceedings of the
> >>> Annual conference of the ACM Special Interest Group on Data
> >>> Communication on the applications, technologies, architectures,
> >>> and protocols for computer communication. 2020.
> >>>
> >>> Signed-off-by: Kevin Yang <yyd@google.com>
> >>> Reviewed-by: Willem de Bruijn <willemb@google.com>
> >>
> >> Like Jakub in his reply
> >> https://lore.kernel.org/netdev/20260119115710.6fdde8c0@kernel.org/
> >> for me also the question why this is a driver implementation came to my
> >> mind.
> >>
> >> With vclocks it is already possible to get timestamps for arbitrary
> >> clock domains in parallel. So it is already possible to synchronize
> >> the hwtstamp to CLOCK_REALTIME, CLOCK_MONOTONIC, ... in parallel.
> >> Therefore, user space synchronisation is needed, but e.g. ptp4l does
> >> a much better synchronisation job than your solution.
> >>
> >> Maybe CLOCK_REALTIME is not supported by ptp4l, because due to daytime
> >> saving this clock jumps. IMO these jumps will also be problem for
> >> your solution, as it will lead to wrong delays two times a year.
> >> So usually CLOCK_TAI or CLOCK_MONOTONIC would be a better choice.
> >>
> >> To sum up: IMO you suggest a driver specific in-kernel solution where
> >> already a driver independent user space solution with higher accuracy
> >> exists.
> > 
> > Definitely a promising alternative.
> > 
> > With multiple netdevices, a TCP listener socket may receive packets
> > from all devices. This would need new infrastructure to lookup the
> > correct vclock for a given net_device, cannot hardcode a choice with
> > SOF_TIMESTAMPING_BIND_PHC.
> > 
> > And this needs to happen for every packet, so with minimal overhead.
> > 
> > Though for established connections the expectation will be that
> > packets generally arrive on the same netdevice. Bar infrequent path
> > changes such as from sk_rethink_txhash on the peer. So there this
> > value can perhaps be cached.
> > 
> > It would still have to be learned by the kernel, no explicit
> > setsockopt.
> 
> Maybe it would also be an option, that the kernel learns with which
> clock domain the timestamps of the PHC and vclocks correlate. Then
> the TCP stack could calculate the delay if it finds a valid e.g.
> CLOCK_MONOTONIC timestamp in the packet. This would make the
> TCP listener socket independent from the devices. Just an idea, without
> thinking about implementation details.

I think we're on the same page.

- use the existing vclocks
- look up the right vclock based on the original incoming iface
- cache this known clock with an established socket

But I also have not looked at how/whether the lookup infra can be
implemented to find a vclock automatically, i.e., without userspace
admin.

In some cases shinfo hwtstamp raw format may actually be the
CLOCK_REALTIME that TCP requires. But if the raw clock is not
realtime, we'll have to adjust based on timecounter/cyclecounter.

  reply	other threads:[~2026-01-25 21:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 16:04 [PATCH net-next v2 0/2] net: extend ndo_get_tstamp and implement in gve Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 1/2] net: extend ndo_get_tstamp for other timestamp types Kevin Yang
2026-01-22 20:04   ` Gerhard Engleder
2026-01-22 22:28     ` Willem de Bruijn
2026-01-25 19:45       ` Gerhard Engleder
2026-01-25 21:41         ` Willem de Bruijn [this message]
2026-01-27 23:13           ` Kevin Yang
2026-01-21 16:04 ` [PATCH net-next v2 2/2] gve: implement ndo_get_tstamp Kevin Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.976a69fefaaa@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gerhard@engleder-embedded.com \
    --cc=hramamurthy@google.com \
    --cc=joshwash@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=willemb@google.com \
    --cc=yyd@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox