Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/3] SO_TXTIME improvements
@ 2026-06-04 19:41 Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Willem de Bruijn @ 2026-06-04 19:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

FQ targets monotonic timestamps as generated by the TCP stack.

But SO_TXTIME was later added, which can send skbs with timestamps
against other clocks. It is now possible to detect these through skb
tstamp_type.

Make FQ robust by converting these timestamps for use in FQ (patch 2).

This also requires testing against out-of-bounds values. Prefer to do
this at the source, when parsing SCM_TXTIME (patch 1). But, tests in
the hot path are still needed, to handle BPF sources.

Extend the so_txtime selftest to handle this new case (patch 3).

Changes: see individual patches
Previous versions:
v1: https://lore.kernel.org/netdev/20260603190243.2789335-1-willemdebruijn.kernel@gmail.com/

Willem de Bruijn (3):
  net: ensure SCM_TXTIME delivery time is no older than system boot
  net_sched: sch_fq: convert skb->tstamp if not monotonic
  selftests: drv-net: extend so_txtime with FQ with other clocks

 net/core/sock.c                               | 32 +++++++++++++-
 net/sched/sch_fq.c                            | 43 ++++++++++++++++---
 .../selftests/drivers/net/so_txtime.py        | 18 ++++++--
 3 files changed, 83 insertions(+), 10 deletions(-)

-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot
  2026-06-04 19:41 [PATCH net-next v2 0/3] SO_TXTIME improvements Willem de Bruijn
@ 2026-06-04 19:41 ` Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn
  2 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2026-06-04 19:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Limit input to sane values to avoid having to add tests later in the
kernel hot path, e.g., in FQ.

SCM_TXTIME timestamps are converted to signed ktime_t when assigned to
skb->tstamp. Avoid having negative values overflow into large positive
ones when again used as u64, e.g., in FQ time_to_send.

For CLOCK_MONOTONIC, only allow positive values.

For CLOCK_REALTIME and CLOCK_TAI, allow equivalent values, i.e., no
older than the boot of the machine.

skb->tstamp zero is a special case signaling feature off. This is not
converted between clockids.

Handle the special case where the realtime clock is set so small that
real - mono is negative, however unlikely in practice.

Ideally we would also set a sane upper bound, but that would require
reading the clock, which is an expensive operation. Continue to defer
that validation to users of the data. FQ already does this.

Bound rather than return error on older timestamps. This is the
existing policy e.g., in FQ.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    - remove spurious semicolon at end of switch
    - remove Fixes tag
---
 net/core/sock.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index f362e3ce1efb..635d8f2f7e2b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3041,12 +3041,42 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 		sockc->tsflags |= tsflags;
 		break;
 	case SCM_TXTIME:
+	{
+		ktime_t tmin;
+		u64 txtime;
+
 		if (!sock_flag(sk, SOCK_TXTIME))
 			return -EINVAL;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
 			return -EINVAL;
-		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
+
+		txtime = get_unaligned((u64 *)CMSG_DATA(cmsg));
+
+		/* Allow sending without a delivery time: zero special case */
+		if (!txtime) {
+			sockc->transmit_time = 0;
+			break;
+		}
+
+		switch (sk->sk_clockid) {
+		case CLOCK_MONOTONIC:
+			tmin = 1;
+			break;
+		case CLOCK_REALTIME:
+			tmin = max(ktime_mono_to_real(0), 1);
+			break;
+		case CLOCK_TAI:
+			tmin = max(ktime_mono_to_any(0, TK_OFFS_TAI), 1);
+			break;
+		default:
+			tmin = 1;
+			WARN_ON_ONCE(1);
+			break;
+		}
+
+		sockc->transmit_time = max_t(ktime_t, txtime, tmin);
 		break;
+	}
 	case SCM_TS_OPT_ID:
 		if (sk_is_tcp(sk))
 			return -EINVAL;
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic
  2026-06-04 19:41 [PATCH net-next v2 0/3] SO_TXTIME improvements Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
@ 2026-06-04 19:41 ` Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn
  2 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2026-06-04 19:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

FQ currently assumes skb->tstamp holds monotonic time, as used by TCP.

Users with ns_capable CAP_NET_ADMIN can transmit skbs using SO_TXTIME
with CLOCK_MONOTONIC, CLOCK_REALTIME or CLOCK_TAI clockids as of
commit 80b14dee2bea ("net: Add a new socket option for a future
transmit time.")

More recently, skbs also gained tstamp_type to explicitly communicate
the clockid of skb->tstamp, with commit 4d25ca2d6801 ("net: Rename
mono_delivery_time to tstamp_type for scalabilty"), commit
1693c5db6ab8 ("net: Add additional bit to support clockid_t timestamp
type") and a few others.

Detect other clocks and convert to monotonic for use in FQ. That is,
convert fq_skb_cb(skb)->time_to_send. Do not convert skb->tstamp
itself. Network device clocks are more commonly synchronized to TAI.

Conversion may be imprecise due to clock adjustment (e.g., adjfreq)
between when SCM_TSTAMP is set and when it is converted in fq_enqueue.
The common codepath is short, so skew will be well below common pacing
operation. Even in edge cases, bursts (too soon) or beyond horizon
(too late) are indistinguishable from network conditions. To which
senders must be robust, as long as infrequent.

Avoid overflow due to negative offsets becoming huge when converting
from signed ktime_t to u64 time_to_send. Bound lower to mono 1 and
upper to now + q->horizon. This protects against bad input, e.g.,
from BPF programs.

Detect legacy BPF programs that program skb->tstamp without setting
skb->tstamp_type. Here tstamp_type is zero (SKB_CLOCK_REALTIME), but
the value will be unrealistic for realtime in the 21st century. Follow
existing TIME_UPTIME_SEC_MAX as bound between mono and realtime.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    - replace Fixes tag with references inside the commit message
---
 net/sched/sch_fq.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 33783c9f8e16..7cae082a9847 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -537,10 +537,10 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
 	rb_insert_color(&skb->rbnode, &flow->t_root);
 }
 
-static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
+static bool fq_packet_beyond_horizon(ktime_t time_to_send,
 				     const struct fq_sched_data *q, u64 now)
 {
-	return unlikely((s64)skb->tstamp > (s64)(now + q->horizon));
+	return unlikely((s64)time_to_send > (s64)(now + q->horizon));
 }
 
 static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
@@ -561,6 +561,36 @@ static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
 	}
 }
 
+static ktime_t fq_skb_tstamp_to_mono(struct sk_buff *skb)
+{
+	const ktime_t mono_max = NSEC_PER_SEC * TIME_UPTIME_SEC_MAX;
+
+	if (likely(skb->tstamp_type == SKB_CLOCK_MONOTONIC))
+		return max(skb->tstamp, 1);
+
+	if (skb->tstamp_type == SKB_CLOCK_TAI)
+		return max(ktime_sub(skb->tstamp, ktime_mono_to_any(0, TK_OFFS_TAI)), 1);
+
+	if (likely(skb->tstamp > mono_max))
+		return max(ktime_sub(skb->tstamp, ktime_mono_to_real(0)), 1);
+
+	/* Handle BPF programs setting skb->stamp but not tstamp_type */
+	net_warn_ratelimited("fq: likely mono tstamp with tstamp_type 0\n");
+
+	skb->tstamp_type = SKB_CLOCK_MONOTONIC;
+	return max(skb->tstamp, 1);
+}
+
+static void fq_mono_to_skb_tstamp(struct sk_buff *skb, ktime_t time_to_send)
+{
+	if (skb->tstamp_type == SKB_CLOCK_MONOTONIC)
+		skb->tstamp = time_to_send;
+	else if (skb->tstamp_type == SKB_CLOCK_REALTIME)
+		skb->tstamp = ktime_mono_to_real(time_to_send);
+	else
+		skb->tstamp = ktime_mono_to_any(time_to_send, TK_OFFS_TAI);
+}
+
 static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		      struct sk_buff **to_free)
 {
@@ -579,17 +609,20 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (!skb->tstamp) {
 		fq_skb_cb(skb)->time_to_send = now;
 	} else {
+		ktime_t time_to_send = fq_skb_tstamp_to_mono(skb);
+
 		/* Check if packet timestamp is too far in the future. */
-		if (fq_packet_beyond_horizon(skb, q, now)) {
+		if (fq_packet_beyond_horizon(time_to_send, q, now)) {
 			if (q->horizon_drop) {
 				q->stat_horizon_drops++;
 				return qdisc_drop_reason(skb, sch, to_free,
 							 QDISC_DROP_HORIZON_LIMIT);
 			}
 			q->stat_horizon_caps++;
-			skb->tstamp = now + q->horizon;
+			time_to_send = now + q->horizon;
+			fq_mono_to_skb_tstamp(skb, time_to_send);
 		}
-		fq_skb_cb(skb)->time_to_send = skb->tstamp;
+		fq_skb_cb(skb)->time_to_send = (u64)time_to_send;
 	}
 
 	f = fq_classify(sch, skb, now);
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks
  2026-06-04 19:41 [PATCH net-next v2 0/3] SO_TXTIME improvements Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
  2026-06-04 19:41 ` [PATCH net-next v2 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
@ 2026-06-04 19:41 ` Willem de Bruijn
  2 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2026-06-04 19:41 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add a variant of the existing FQ tests, but pass CLOCK_TAI rather than
the native CLOCK_MONOTONIC clock id.

FQ used to imply monotonic. This is no longer the case, and the
inverse need not hold either. Rename $PREFIX_mono to $PREFIX_fq.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 .../testing/selftests/drivers/net/so_txtime.py | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/so_txtime.py b/tools/testing/selftests/drivers/net/so_txtime.py
index 5d4388bfc6dd..b7be4cabbec2 100755
--- a/tools/testing/selftests/drivers/net/so_txtime.py
+++ b/tools/testing/selftests/drivers/net/so_txtime.py
@@ -46,7 +46,7 @@ def _qdisc_setup(ifname, qdisc, optargs=""):
     tc(f"qdisc replace dev {ifname} root {qdisc} {optargs}")
 
 
-def _test_variants_mono():
+def _test_variants_fq():
     for ipver in ["4", "6"]:
         for testcase in [
             ["no_delay", "a,-1", "a,-1"],
@@ -59,13 +59,20 @@ def _test_variants_mono():
             yield KsftNamedVariant(name, ipver, testcase[1], testcase[2])
 
 
-@ksft_variants(_test_variants_mono())
-def test_so_txtime_mono(cfg, ipver, args_tx, args_rx):
+@ksft_variants(_test_variants_fq())
+def test_so_txtime_fq_mono(cfg, ipver, args_tx, args_rx):
     """Run all variants of monotonic (fq) tests."""
     _qdisc_setup(cfg.ifname, "fq")
     test_so_txtime(cfg, "mono", ipver, args_tx, args_rx, True)
 
 
+@ksft_variants(_test_variants_fq())
+def test_so_txtime_fq_tai(cfg, ipver, args_tx, args_rx):
+    """Run all variants of fq tests, but pass CLOCK_TAI to test conversion."""
+    _qdisc_setup(cfg.ifname, "fq")
+    test_so_txtime(cfg, "tai", ipver, args_tx, args_rx, True)
+
+
 def _test_variants_etf():
     for ipver in ["4", "6"]:
         for testcase in [
@@ -95,7 +102,10 @@ def test_so_txtime_etf(cfg, ipver, args_tx, args_rx, expect_fail):
 def main() -> None:
     """Boilerplate ksft main."""
     with NetDrvEpEnv(__file__) as cfg:
-        ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
+        ksft_run(
+            [test_so_txtime_fq_mono, test_so_txtime_fq_tai, test_so_txtime_etf],
+            args=(cfg,),
+        )
     ksft_exit()
 
 
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-04 19:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 19:41 [PATCH net-next v2 0/3] SO_TXTIME improvements Willem de Bruijn
2026-06-04 19:41 ` [PATCH net-next v2 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
2026-06-04 19:41 ` [PATCH net-next v2 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
2026-06-04 19:41 ` [PATCH net-next v2 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox