[PATCH net-next 0/3] SO_TXTIME improvements

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] SO_TXTIME improvements
@ 2026-06-03 19:01 Willem de Bruijn
  2026-06-03 19:01 ` [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Willem de Bruijn @ 2026-06-03 19:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

FQ targets monotonic timestamps as generated by the TCP stack.

But SO_TXTIME was later added, which can send skbs with timestamps
against other clocks. It is now possible to detect these through skb
tstamp_type.

Make FQ robust by converting these timestamps for use in FQ (patch 2).

This also requires testing against out-of-bounds values. Prefer to do
this at the source, when parsing SCM_TXTIME (patch 1). But, tests in
the hot path are still needed, to handle BPF sources.

Extend the so_txtime selftest to handle this new case (patch 3).

The last patch would have a conflict in net. This is not stable
material, fine to go to net-next only.

Willem de Bruijn (3):
  net: ensure SCM_TXTIME delivery time is no older than system boot
  net_sched: sch_fq: convert skb->tstamp if not monotonic
  selftests: drv-net: extend so_txtime with FQ with other clocks

 net/core/sock.c                               | 32 +++++++++++++-
 net/sched/sch_fq.c                            | 43 ++++++++++++++++---
 .../selftests/drivers/net/so_txtime.py        | 18 ++++++--
 3 files changed, 83 insertions(+), 10 deletions(-)

-- 
2.54.0.1032.g2f8565e1d1-goog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot
  2026-06-03 19:01 [PATCH net-next 0/3] SO_TXTIME improvements Willem de Bruijn
@ 2026-06-03 19:01 ` Willem de Bruijn
  2026-06-03 22:11   ` Jakub Kicinski
  2026-06-03 19:01 ` [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
  2026-06-03 19:01 ` [PATCH net-next 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn
  2 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2026-06-03 19:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Limit input to sane values to avoid having to add tests later in the
kernel hot path, e.g., in FQ.

SCM_TXTIME timestamps are converted to signed ktime_t when assigned to
skb->tstamp. Avoid having negative values overflow into large positive
ones when again used as u64, e.g., in FQ time_to_send.

For CLOCK_MONOTONIC, only allow positive values.

For CLOCK_REALTIME and CLOCK_TAI, allow equivalent values, i.e., no
older than the boot of the machine.

skb->tstamp zero is a special case signaling feature off. This is not
converted between clockids.

Handle the special case where the realtime clock is set so small that
real - mono is negative, however unlikely in practice.

Ideally we would also set a sane upper bound, but that would require
reading the clock, which is an expensive operation. Continue to defer
that validation to users of the data. FQ already does this.

Bound rather than return error on older timestamps. This is the
existing policy e.g., in FQ.

Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/core/sock.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index f362e3ce1efb..dff48ef49a8c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3041,12 +3041,42 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 		sockc->tsflags |= tsflags;
 		break;
 	case SCM_TXTIME:
+	{
+		ktime_t tmin;
+		u64 txtime;
+
 		if (!sock_flag(sk, SOCK_TXTIME))
 			return -EINVAL;
 		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
 			return -EINVAL;
-		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
+
+		txtime = get_unaligned((u64 *)CMSG_DATA(cmsg));
+
+		/* Allow sending without a delivery time: zero special case */
+		if (!txtime) {
+			sockc->transmit_time = 0;
+			break;
+		}
+
+		switch (sk->sk_clockid) {
+		case CLOCK_MONOTONIC:
+			tmin = 1;
+			break;
+		case CLOCK_REALTIME:
+			tmin = max(ktime_mono_to_real(0), 1);
+			break;
+		case CLOCK_TAI:
+			tmin = max(ktime_mono_to_any(0, TK_OFFS_TAI), 1);
+			break;
+		default:
+			tmin = 1;
+			WARN_ON_ONCE(1);
+			break;
+		};
+
+		sockc->transmit_time = max_t(ktime_t, txtime, tmin);
 		break;
+	}
 	case SCM_TS_OPT_ID:
 		if (sk_is_tcp(sk))
 			return -EINVAL;
-- 
2.54.0.1032.g2f8565e1d1-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic
  2026-06-03 19:01 [PATCH net-next 0/3] SO_TXTIME improvements Willem de Bruijn
  2026-06-03 19:01 ` [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
@ 2026-06-03 19:01 ` Willem de Bruijn
  2026-06-03 22:22   ` Jakub Kicinski
  2026-06-03 19:01 ` [PATCH net-next 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn
  2 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2026-06-03 19:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

FQ currently assumes skb->tstamp holds monotonic time, as used by TCP.

Users with ns_capable CAP_NET_ADMIN can transmit skbs using SO_TXTIME
with CLOCK_MONOTONIC, CLOCK_REALTIME or CLOCK_TAI clockids as of the
below commit.

More recently, skbs also gained tstamp_type to explicitly communicate
the clockid of skb->tstamp.

Detect other clocks and convert to monotonic for use in FQ. That is,
convert fq_skb_cb(skb)->time_to_send. Do not convert skb->tstamp
itself. Network device clocks are more commonly synchronized to TAI.

Conversion may be imprecise due to clock adjustment (e.g., adjfreq)
between when SCM_TSTAMP is set and when it is converted in fq_enqueue.
The common codepath is short, so skew will be well below common pacing
operation. Even in edge cases, bursts (too soon) or beyond horizon
(too late) are indistinguishable from network conditions. To which
senders must be robust, as long as infrequent.

Avoid overflow due to negative offsets becoming huge when converting
from signed ktime_t to u64 time_to_send. Bound lower to mono 1 and
upper to now + q->horizon. This protects against bad input, e.g.,
from BPF programs.

Detect legacy BPF programs that program skb->tstamp without setting
skb->tstamp_type. Here tstamp_type is zero (SKB_CLOCK_REALTIME), but
the value will be unrealistic for realtime in the 21st century. Follow
existing TIME_UPTIME_SEC_MAX as bound between mono and realtime.

Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/sched/sch_fq.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 33783c9f8e16..7cae082a9847 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -537,10 +537,10 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
 	rb_insert_color(&skb->rbnode, &flow->t_root);
 }
 
-static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
+static bool fq_packet_beyond_horizon(ktime_t time_to_send,
 				     const struct fq_sched_data *q, u64 now)
 {
-	return unlikely((s64)skb->tstamp > (s64)(now + q->horizon));
+	return unlikely((s64)time_to_send > (s64)(now + q->horizon));
 }
 
 static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
@@ -561,6 +561,36 @@ static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
 	}
 }
 
+static ktime_t fq_skb_tstamp_to_mono(struct sk_buff *skb)
+{
+	const ktime_t mono_max = NSEC_PER_SEC * TIME_UPTIME_SEC_MAX;
+
+	if (likely(skb->tstamp_type == SKB_CLOCK_MONOTONIC))
+		return max(skb->tstamp, 1);
+
+	if (skb->tstamp_type == SKB_CLOCK_TAI)
+		return max(ktime_sub(skb->tstamp, ktime_mono_to_any(0, TK_OFFS_TAI)), 1);
+
+	if (likely(skb->tstamp > mono_max))
+		return max(ktime_sub(skb->tstamp, ktime_mono_to_real(0)), 1);
+
+	/* Handle BPF programs setting skb->stamp but not tstamp_type */
+	net_warn_ratelimited("fq: likely mono tstamp with tstamp_type 0\n");
+
+	skb->tstamp_type = SKB_CLOCK_MONOTONIC;
+	return max(skb->tstamp, 1);
+}
+
+static void fq_mono_to_skb_tstamp(struct sk_buff *skb, ktime_t time_to_send)
+{
+	if (skb->tstamp_type == SKB_CLOCK_MONOTONIC)
+		skb->tstamp = time_to_send;
+	else if (skb->tstamp_type == SKB_CLOCK_REALTIME)
+		skb->tstamp = ktime_mono_to_real(time_to_send);
+	else
+		skb->tstamp = ktime_mono_to_any(time_to_send, TK_OFFS_TAI);
+}
+
 static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		      struct sk_buff **to_free)
 {
@@ -579,17 +609,20 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (!skb->tstamp) {
 		fq_skb_cb(skb)->time_to_send = now;
 	} else {
+		ktime_t time_to_send = fq_skb_tstamp_to_mono(skb);
+
 		/* Check if packet timestamp is too far in the future. */
-		if (fq_packet_beyond_horizon(skb, q, now)) {
+		if (fq_packet_beyond_horizon(time_to_send, q, now)) {
 			if (q->horizon_drop) {
 				q->stat_horizon_drops++;
 				return qdisc_drop_reason(skb, sch, to_free,
 							 QDISC_DROP_HORIZON_LIMIT);
 			}
 			q->stat_horizon_caps++;
-			skb->tstamp = now + q->horizon;
+			time_to_send = now + q->horizon;
+			fq_mono_to_skb_tstamp(skb, time_to_send);
 		}
-		fq_skb_cb(skb)->time_to_send = skb->tstamp;
+		fq_skb_cb(skb)->time_to_send = (u64)time_to_send;
 	}
 
 	f = fq_classify(sch, skb, now);
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks
  2026-06-03 19:01 [PATCH net-next 0/3] SO_TXTIME improvements Willem de Bruijn
  2026-06-03 19:01 ` [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
  2026-06-03 19:01 ` [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
@ 2026-06-03 19:01 ` Willem de Bruijn
  2 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2026-06-03 19:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, edumazet, pabeni, horms, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add a variant of the existing FQ tests, but pass CLOCK_TAI rather than
the native CLOCK_MONOTONIC clock id.

FQ used to imply monotonic. This is no longer the case, and the
inverse need not hold either. Rename $PREFIX_mono to $PREFIX_fq.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 .../testing/selftests/drivers/net/so_txtime.py | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/so_txtime.py b/tools/testing/selftests/drivers/net/so_txtime.py
index 5d4388bfc6dd..b7be4cabbec2 100755
--- a/tools/testing/selftests/drivers/net/so_txtime.py
+++ b/tools/testing/selftests/drivers/net/so_txtime.py
@@ -46,7 +46,7 @@ def _qdisc_setup(ifname, qdisc, optargs=""):
     tc(f"qdisc replace dev {ifname} root {qdisc} {optargs}")
 
 
-def _test_variants_mono():
+def _test_variants_fq():
     for ipver in ["4", "6"]:
         for testcase in [
             ["no_delay", "a,-1", "a,-1"],
@@ -59,13 +59,20 @@ def _test_variants_mono():
             yield KsftNamedVariant(name, ipver, testcase[1], testcase[2])
 
 
-@ksft_variants(_test_variants_mono())
-def test_so_txtime_mono(cfg, ipver, args_tx, args_rx):
+@ksft_variants(_test_variants_fq())
+def test_so_txtime_fq_mono(cfg, ipver, args_tx, args_rx):
     """Run all variants of monotonic (fq) tests."""
     _qdisc_setup(cfg.ifname, "fq")
     test_so_txtime(cfg, "mono", ipver, args_tx, args_rx, True)
 
 
+@ksft_variants(_test_variants_fq())
+def test_so_txtime_fq_tai(cfg, ipver, args_tx, args_rx):
+    """Run all variants of fq tests, but pass CLOCK_TAI to test conversion."""
+    _qdisc_setup(cfg.ifname, "fq")
+    test_so_txtime(cfg, "tai", ipver, args_tx, args_rx, True)
+
+
 def _test_variants_etf():
     for ipver in ["4", "6"]:
         for testcase in [
@@ -95,7 +102,10 @@ def test_so_txtime_etf(cfg, ipver, args_tx, args_rx, expect_fail):
 def main() -> None:
     """Boilerplate ksft main."""
     with NetDrvEpEnv(__file__) as cfg:
-        ksft_run([test_so_txtime_mono, test_so_txtime_etf], args=(cfg,))
+        ksft_run(
+            [test_so_txtime_fq_mono, test_so_txtime_fq_tai, test_so_txtime_etf],
+            args=(cfg,),
+        )
     ksft_exit()
 
 
-- 
2.54.0.1032.g2f8565e1d1-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot
  2026-06-03 19:01 ` [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
@ 2026-06-03 22:11   ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-06-03 22:11 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, edumazet, pabeni, horms, Willem de Bruijn

On Wed,  3 Jun 2026 15:01:28 -0400 Willem de Bruijn wrote:
> +		switch (sk->sk_clockid) {
> +		case CLOCK_MONOTONIC:
> +			tmin = 1;
> +			break;
> +		case CLOCK_REALTIME:
> +			tmin = max(ktime_mono_to_real(0), 1);
> +			break;
> +		case CLOCK_TAI:
> +			tmin = max(ktime_mono_to_any(0, TK_OFFS_TAI), 1);
> +			break;
> +		default:
> +			tmin = 1;
> +			WARN_ON_ONCE(1);
> +			break;
> +		};

net/core/sock.c:3079:3-4: Unneeded semicolon
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic
  2026-06-03 19:01 ` [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
@ 2026-06-03 22:22   ` Jakub Kicinski
  2026-06-03 22:59     ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-06-03 22:22 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, edumazet, pabeni, horms, Willem de Bruijn

On Wed,  3 Jun 2026 15:01:29 -0400 Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> FQ currently assumes skb->tstamp holds monotonic time, as used by TCP.
> 
> Users with ns_capable CAP_NET_ADMIN can transmit skbs using SO_TXTIME
> with CLOCK_MONOTONIC, CLOCK_REALTIME or CLOCK_TAI clockids as of the
> below commit.
> 
> More recently, skbs also gained tstamp_type to explicitly communicate
> the clockid of skb->tstamp.
> 
> Detect other clocks and convert to monotonic for use in FQ. That is,
> convert fq_skb_cb(skb)->time_to_send. Do not convert skb->tstamp
> itself. Network device clocks are more commonly synchronized to TAI.
> 
> Conversion may be imprecise due to clock adjustment (e.g., adjfreq)
> between when SCM_TSTAMP is set and when it is converted in fq_enqueue.
> The common codepath is short, so skew will be well below common pacing
> operation. Even in edge cases, bursts (too soon) or beyond horizon
> (too late) are indistinguishable from network conditions. To which
> senders must be robust, as long as infrequent.
> 
> Avoid overflow due to negative offsets becoming huge when converting
> from signed ktime_t to u64 time_to_send. Bound lower to mono 1 and
> upper to now + q->horizon. This protects against bad input, e.g.,
> from BPF programs.
> 
> Detect legacy BPF programs that program skb->tstamp without setting
> skb->tstamp_type. Here tstamp_type is zero (SKB_CLOCK_REALTIME), but
> the value will be unrealistic for realtime in the 21st century. Follow
> existing TIME_UPTIME_SEC_MAX as bound between mono and realtime.
> 
> Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")

net-next + Fixes tag is not a thing! :(
What is it saying? "This is not urgent by I would like it to be
backported to stable"? Stable is for urgent fixes. If you want things
that never worked to work you must update your kernel. Such is life.

You can quite the commit which added the feature in the body of the
message.

> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/sched/sch_fq.c | 43 ++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> index 33783c9f8e16..7cae082a9847 100644
> --- a/net/sched/sch_fq.c
> +++ b/net/sched/sch_fq.c
> @@ -537,10 +537,10 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
>  	rb_insert_color(&skb->rbnode, &flow->t_root);
>  }
>  
> -static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
> +static bool fq_packet_beyond_horizon(ktime_t time_to_send,
>  				     const struct fq_sched_data *q, u64 now)
>  {
> -	return unlikely((s64)skb->tstamp > (s64)(now + q->horizon));
> +	return unlikely((s64)time_to_send > (s64)(now + q->horizon));
>  }
>  
>  static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
> @@ -561,6 +561,36 @@ static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
>  	}
>  }
>  
> +static ktime_t fq_skb_tstamp_to_mono(struct sk_buff *skb)
> +{
> +	const ktime_t mono_max = NSEC_PER_SEC * TIME_UPTIME_SEC_MAX;
> +
> +	if (likely(skb->tstamp_type == SKB_CLOCK_MONOTONIC))
> +		return max(skb->tstamp, 1);
> +
> +	if (skb->tstamp_type == SKB_CLOCK_TAI)
> +		return max(ktime_sub(skb->tstamp, ktime_mono_to_any(0, TK_OFFS_TAI)), 1);
> +
> +	if (likely(skb->tstamp > mono_max))
> +		return max(ktime_sub(skb->tstamp, ktime_mono_to_real(0)), 1);
> +
> +	/* Handle BPF programs setting skb->stamp but not tstamp_type */
> +	net_warn_ratelimited("fq: likely mono tstamp with tstamp_type 0\n");
> +
> +	skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> +	return max(skb->tstamp, 1);
> +}
> +
> +static void fq_mono_to_skb_tstamp(struct sk_buff *skb, ktime_t time_to_send)
> +{
> +	if (skb->tstamp_type == SKB_CLOCK_MONOTONIC)

fq_skb_tstamp_to_mono() has a likely() around monotonic, this one does not?
Is there a reason?

> +		skb->tstamp = time_to_send;
> +	else if (skb->tstamp_type == SKB_CLOCK_REALTIME)
> +		skb->tstamp = ktime_mono_to_real(time_to_send);
> +	else
> +		skb->tstamp = ktime_mono_to_any(time_to_send, TK_OFFS_TAI);
> +}
> +
>  static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  		      struct sk_buff **to_free)
>  {
> @@ -579,17 +609,20 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  	if (!skb->tstamp) {
>  		fq_skb_cb(skb)->time_to_send = now;
>  	} else {
> +		ktime_t time_to_send = fq_skb_tstamp_to_mono(skb);
> +
>  		/* Check if packet timestamp is too far in the future. */
> -		if (fq_packet_beyond_horizon(skb, q, now)) {
> +		if (fq_packet_beyond_horizon(time_to_send, q, now)) {
>  			if (q->horizon_drop) {
>  				q->stat_horizon_drops++;
>  				return qdisc_drop_reason(skb, sch, to_free,
>  							 QDISC_DROP_HORIZON_LIMIT);
>  			}
>  			q->stat_horizon_caps++;
> -			skb->tstamp = now + q->horizon;
> +			time_to_send = now + q->horizon;
> +			fq_mono_to_skb_tstamp(skb, time_to_send);
>  		}
> -		fq_skb_cb(skb)->time_to_send = skb->tstamp;
> +		fq_skb_cb(skb)->time_to_send = (u64)time_to_send;
>  	}
>  
>  	f = fq_classify(sch, skb, now);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic
  2026-06-03 22:22   ` Jakub Kicinski
@ 2026-06-03 22:59     ` Willem de Bruijn
  2026-06-03 23:27       ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2026-06-03 22:59 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: netdev, davem, edumazet, pabeni, horms, Willem de Bruijn

Jakub Kicinski wrote:
> On Wed,  3 Jun 2026 15:01:29 -0400 Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > FQ currently assumes skb->tstamp holds monotonic time, as used by TCP.
> > 
> > Users with ns_capable CAP_NET_ADMIN can transmit skbs using SO_TXTIME
> > with CLOCK_MONOTONIC, CLOCK_REALTIME or CLOCK_TAI clockids as of the
> > below commit.
> > 
> > More recently, skbs also gained tstamp_type to explicitly communicate
> > the clockid of skb->tstamp.
> > 
> > Detect other clocks and convert to monotonic for use in FQ. That is,
> > convert fq_skb_cb(skb)->time_to_send. Do not convert skb->tstamp
> > itself. Network device clocks are more commonly synchronized to TAI.
> > 
> > Conversion may be imprecise due to clock adjustment (e.g., adjfreq)
> > between when SCM_TSTAMP is set and when it is converted in fq_enqueue.
> > The common codepath is short, so skew will be well below common pacing
> > operation. Even in edge cases, bursts (too soon) or beyond horizon
> > (too late) are indistinguishable from network conditions. To which
> > senders must be robust, as long as infrequent.
> > 
> > Avoid overflow due to negative offsets becoming huge when converting
> > from signed ktime_t to u64 time_to_send. Bound lower to mono 1 and
> > upper to now + q->horizon. This protects against bad input, e.g.,
> > from BPF programs.
> > 
> > Detect legacy BPF programs that program skb->tstamp without setting
> > skb->tstamp_type. Here tstamp_type is zero (SKB_CLOCK_REALTIME), but
> > the value will be unrealistic for realtime in the 21st century. Follow
> > existing TIME_UPTIME_SEC_MAX as bound between mono and realtime.
> > 
> > Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
> 
> net-next + Fixes tag is not a thing! :(
> What is it saying? "This is not urgent by I would like it to be
> backported to stable"? Stable is for urgent fixes. If you want things
> that never worked to work you must update your kernel. Such is life.

Sorry, I did not know that Fixes implies stable.

No, I don't think this is stable material.

Will remove the tag and reference in the text.
 
> You can quite the commit which added the feature in the body of the
> message.
> 
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  net/sched/sch_fq.c | 43 ++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 38 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> > index 33783c9f8e16..7cae082a9847 100644
> > --- a/net/sched/sch_fq.c
> > +++ b/net/sched/sch_fq.c
> > @@ -537,10 +537,10 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
> >  	rb_insert_color(&skb->rbnode, &flow->t_root);
> >  }
> >  
> > -static bool fq_packet_beyond_horizon(const struct sk_buff *skb,
> > +static bool fq_packet_beyond_horizon(ktime_t time_to_send,
> >  				     const struct fq_sched_data *q, u64 now)
> >  {
> > -	return unlikely((s64)skb->tstamp > (s64)(now + q->horizon));
> > +	return unlikely((s64)time_to_send > (s64)(now + q->horizon));
> >  }
> >  
> >  static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
> > @@ -561,6 +561,36 @@ static void fq_flow_adjust_timer(struct fq_sched_data *q, struct fq_flow *flow,
> >  	}
> >  }
> >  
> > +static ktime_t fq_skb_tstamp_to_mono(struct sk_buff *skb)
> > +{
> > +	const ktime_t mono_max = NSEC_PER_SEC * TIME_UPTIME_SEC_MAX;
> > +
> > +	if (likely(skb->tstamp_type == SKB_CLOCK_MONOTONIC))
> > +		return max(skb->tstamp, 1);
> > +
> > +	if (skb->tstamp_type == SKB_CLOCK_TAI)
> > +		return max(ktime_sub(skb->tstamp, ktime_mono_to_any(0, TK_OFFS_TAI)), 1);
> > +
> > +	if (likely(skb->tstamp > mono_max))
> > +		return max(ktime_sub(skb->tstamp, ktime_mono_to_real(0)), 1);
> > +
> > +	/* Handle BPF programs setting skb->stamp but not tstamp_type */
> > +	net_warn_ratelimited("fq: likely mono tstamp with tstamp_type 0\n");
> > +
> > +	skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> > +	return max(skb->tstamp, 1);
> > +}
> > +
> > +static void fq_mono_to_skb_tstamp(struct sk_buff *skb, ktime_t time_to_send)
> > +{
> > +	if (skb->tstamp_type == SKB_CLOCK_MONOTONIC)
> 
> fq_skb_tstamp_to_mono() has a likely() around monotonic, this one does not?
> Is there a reason?

Thought process is that the first is a standalone if, so predicted
not to be taken. While this is an if/else, where the if is predicted
taken (which is why it's first even if not first in the enum).

Not sure how true these heuristics still are. Not very with good
branch predictors probably, let alone FDO.

> > +		skb->tstamp = time_to_send;
> > +	else if (skb->tstamp_type == SKB_CLOCK_REALTIME)
> > +		skb->tstamp = ktime_mono_to_real(time_to_send);
> > +	else
> > +		skb->tstamp = ktime_mono_to_any(time_to_send, TK_OFFS_TAI);
> > +}
> > +
> >  static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  		      struct sk_buff **to_free)
> >  {
> > @@ -579,17 +609,20 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >  	if (!skb->tstamp) {
> >  		fq_skb_cb(skb)->time_to_send = now;
> >  	} else {
> > +		ktime_t time_to_send = fq_skb_tstamp_to_mono(skb);
> > +
> >  		/* Check if packet timestamp is too far in the future. */
> > -		if (fq_packet_beyond_horizon(skb, q, now)) {
> > +		if (fq_packet_beyond_horizon(time_to_send, q, now)) {
> >  			if (q->horizon_drop) {
> >  				q->stat_horizon_drops++;
> >  				return qdisc_drop_reason(skb, sch, to_free,
> >  							 QDISC_DROP_HORIZON_LIMIT);
> >  			}
> >  			q->stat_horizon_caps++;
> > -			skb->tstamp = now + q->horizon;
> > +			time_to_send = now + q->horizon;
> > +			fq_mono_to_skb_tstamp(skb, time_to_send);
> >  		}
> > -		fq_skb_cb(skb)->time_to_send = skb->tstamp;
> > +		fq_skb_cb(skb)->time_to_send = (u64)time_to_send;
> >  	}
> >  
> >  	f = fq_classify(sch, skb, now);
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic
  2026-06-03 22:59     ` Willem de Bruijn
@ 2026-06-03 23:27       ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-06-03 23:27 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem, edumazet, pabeni, horms, Willem de Bruijn

On Wed, 03 Jun 2026 18:59:31 -0400 Willem de Bruijn wrote:
> > fq_skb_tstamp_to_mono() has a likely() around monotonic, this one does not?
> > Is there a reason?  
> 
> Thought process is that the first is a standalone if, so predicted
> not to be taken. While this is an if/else, where the if is predicted
> taken (which is why it's first even if not first in the enum).

Oh, I see, makes sense. I should probably learn the prediction defaults.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-03 23:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 19:01 [PATCH net-next 0/3] SO_TXTIME improvements Willem de Bruijn
2026-06-03 19:01 ` [PATCH net-next 1/3] net: ensure SCM_TXTIME delivery time is no older than system boot Willem de Bruijn
2026-06-03 22:11   ` Jakub Kicinski
2026-06-03 19:01 ` [PATCH net-next 2/3] net_sched: sch_fq: convert skb->tstamp if not monotonic Willem de Bruijn
2026-06-03 22:22   ` Jakub Kicinski
2026-06-03 22:59     ` Willem de Bruijn
2026-06-03 23:27       ` Jakub Kicinski
2026-06-03 19:01 ` [PATCH net-next 3/3] selftests: drv-net: extend so_txtime with FQ with other clocks Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox