[PATCH net-next 01/10] ipv6: do not fragment packets into jumbograms

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 01/10] ipv6: do not fragment packets into jumbograms
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 02/10] ipv6: allow route exceptions with MTUs above 65535 Mariusz Klimek
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch prevents packets from being fragmented into jumbograms by
capping the effective MTU to IP6_MAX_MTU in ip6_fragment, as RFC2675
prohibits a Jumbo Payload option from being carried in a packet that
carries a fragment header.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 net/ipv6/ip6_output.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c14adcdd4396..67c34fb281c6 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -935,6 +935,8 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 	}
 	if (mtu < hlen + sizeof(struct frag_hdr) + 8)
 		goto fail_toobig;
+	else if (unlikely(mtu > IP6_MAX_MTU))
+		mtu = IP6_MAX_MTU;
 	mtu -= hlen + sizeof(struct frag_hdr);
 
 	frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 02/10] ipv6: allow route exceptions with MTUs above 65535
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
  2026-06-08 13:07 ` [PATCH net-next 01/10] ipv6: do not fragment packets into jumbograms Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 03/10] ipv6: add jumbo payload option to non-gso jumbograms Mariusz Klimek
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch allows route exceptions to specify an MTU above 65535 so that
PMTU discovery can settle on an PMTU above 65535 (if the MTUs along the
path allow for it).

IP6_MAX_JUMBOGRAM_MTU is set to INT_MAX rather than UINT_MAX because MTU
values and packet lengths are sometimes stored as signed integers.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 include/linux/ipv6.h    | 6 ++++++
 include/net/ip6_route.h | 5 +++--
 net/ipv6/route.c        | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index a7421382a916..201c1615ac8e 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -196,6 +196,12 @@ static inline bool ipv6_l3mdev_skb(__u16 flags)
 #define IP6CB(skb)	((struct inet6_skb_parm*)((skb)->cb))
 #define IP6CBMTU(skb)	((struct ip6_mtuinfo *)((skb)->cb))
 
+/* IPv6 jumbogram payload length is stored into a 32bit field, but cap the max
+ * MTU to INT_MAX because MTUs and packet lengths are sometimes represented as
+ * signed ints.
+ */
+#define IP6_MAX_JUMBOGRAM_MTU INT_MAX
+
 static inline int inet6_iif(const struct sk_buff *skb)
 {
 	bool l3_slave = ipv6_l3mdev_skb(IP6CB(skb)->flags);
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 09ffe0f13ce7..9c9fbf881235 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -38,8 +38,9 @@ struct route_info {
 #define RT6_LOOKUP_F_IGNORE_LINKSTATE	0x00000040
 #define RT6_LOOKUP_F_DST_NOREF		0x00000080
 
-/* We do not (yet ?) support IPv6 jumbograms (RFC 2675)
- * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header
+/* We do not (yet ?) fully support IPv6 jumbograms (RFC 2675) for all protocols.
+ * Where jumbograms are supported, IP6_MAX_JUMBOGRAM_MTU should be used instead.
+ * Unlike IPv4, hdr->seg_len doesn't include the IPv6 header.
  */
 #define IP6_MAX_MTU (0xFFFF + sizeof(struct ipv6hdr))
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 636f0120d7e3..ac38771c49cd 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1656,7 +1656,7 @@ static unsigned int fib6_mtu(const struct fib6_result *res)
 		rcu_read_unlock();
 	}
 
-	mtu = min_t(unsigned int, mtu, IP6_MAX_MTU);
+	mtu = min_t(unsigned int, mtu, IP6_MAX_JUMBOGRAM_MTU);
 
 	return mtu - lwtunnel_headroom(nh->fib_nh_lws, mtu);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 03/10] ipv6: add jumbo payload option to non-gso jumbograms
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
  2026-06-08 13:07 ` [PATCH net-next 01/10] ipv6: do not fragment packets into jumbograms Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 02/10] ipv6: allow route exceptions with MTUs above 65535 Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 04/10] tcp: decouple TSO segment length from MSS Mariusz Klimek
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This path reintroduces code recently removed by Alice in commit
741d069aa488 ("net/ipv6: Drop HBH for BIG TCP on TX side") and commit
35f66ce90037 ("net/ipv6: Remove HBH helpers") that adds a hop-by-hop header
containing a jumbo-payload option to packets larger than 65535 bytes. The
hop-by-hop header is only added to jumbograms that aren't BIG TCP packets
since BIG TCP packets no longer need to contain such hop-by-hop headers.

This fixes sending jumbograms to a loopback device with MTU > 65535.

Fixes: 741d069aa488 ("net/ipv6: Drop HBH for BIG TCP on TX side")
Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 include/net/ipv6.h    | 11 +++++++++++
 net/ipv6/ip6_output.c | 22 ++++++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 1dec81faff28..d9e1fdca6934 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -152,6 +152,17 @@ struct frag_hdr {
 	__be32	identification;
 };
 
+/*
+ * Jumbo payload option, as described in RFC 2675 2.
+ */
+struct hop_jumbo_hdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	tlv_type;	/* IPV6_TLV_JUMBO, 0xC2 */
+	u8	tlv_len;	/* 4 */
+	__be32	jumbo_payload_len;
+};
+
 #define	IP6_MF		0x0001
 #define	IP6_OFFSET	0xFFF8
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 67c34fb281c6..405a9a0e65a4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -293,6 +293,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	struct in6_addr *first_hop = &fl6->daddr;
 	struct dst_entry *dst = skb_dst(skb);
 	struct inet6_dev *idev = ip6_dst_idev(dst);
+	struct hop_jumbo_hdr *hop_jumbo;
+	int hoplen = sizeof(*hop_jumbo);
 	struct net *net = sock_net(sk);
 	unsigned int head_room;
 	struct net_device *dev;
@@ -305,7 +307,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	rcu_read_lock();
 
 	dev = dst_dev_rcu(dst);
-	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
+	head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
 	if (opt)
 		head_room += opt->opt_nflen + opt->opt_flen;
 
@@ -331,8 +333,24 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 						     &fl6->saddr);
 	}
 
-	if (unlikely(seg_len > IPV6_MAXPLEN))
+	if (unlikely(seg_len > IPV6_MAXPLEN)) {
+		/* Only "real" jumbograms require a jumbo-payload option.
+		 * BIG TCP packets just rely on skb->len.
+		 */
+		if (!skb_is_gso(skb)) {
+			hop_jumbo = __skb_push(skb, hoplen);
+
+			hop_jumbo->nexthdr = proto;
+			hop_jumbo->hdrlen = 0;
+			hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+			hop_jumbo->tlv_len = 4;
+			hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
+
+			proto = IPPROTO_HOPOPTS;
+		}
+
 		seg_len = 0;
+	}
 
 	__skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 04/10] tcp: decouple TSO segment length from MSS
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (2 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 03/10] ipv6: add jumbo payload option to non-gso jumbograms Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 05/10] tcp: split jumbograms with urgent pointer correctly Mariusz Klimek
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch decouples the TSO segment length from the MSS to allow for MSS >
65535 despite the TSO segment length being capped to 16 bits. Ideally
TSO/GSO would support jumbogram segments so that the decoupling isn't
necessary, but that would require a much bigger change.

Add a new helper function tcp_tso_seglen that returns the segment length
for a given MSS, capped at 65535 - MAX_TCP_HEADER, and use it where the MSS
is treated as the segment length. This leaves enough room for TCP/IPv6
headers, including TCP options and extension headers.

Change the signatures of some functions to accept max_len instead of segs
where segs is only used to calculate the maximum length of a TSO packet.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 include/net/tcp.h     | 12 ++++++--
 net/ipv4/tcp.c        | 10 ++++---
 net/ipv4/tcp_output.c | 67 +++++++++++++++++++++++++------------------
 net/ipv4/tcp_timer.c  |  4 +--
 4 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f063eccbbba3..b3a50f6d3381 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -747,8 +747,8 @@ void tcp_skb_entail(struct sock *sk, struct sk_buff *skb);
 void tcp_mark_push(struct tcp_sock *tp, struct sk_buff *skb);
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 			       int nonagle);
-int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
-int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int max_len);
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int max_len);
 void tcp_retransmit_timer(struct sock *sk);
 void tcp_xmit_retransmit_queue(struct sock *);
 void tcp_simple_retransmit(struct sock *);
@@ -1219,6 +1219,14 @@ static inline void tcp_skb_pcount_add(struct sk_buff *skb, int segs)
 	TCP_SKB_CB(skb)->tcp_gso_segs += segs;
 }
 
+/* Return the segment length we want for the given MSS. We cap the segment
+ * length to prevent the segments from becoming jumbograms.
+ */
+static inline u16 tcp_tso_seglen(u32 mss_now)
+{
+	return min_t(u32, GSO_BY_FRAGS - MAX_TCP_HEADER, mss_now);
+}
+
 /* This is valid iff skb is in write queue and tcp_skb_pcount() > 1. */
 static inline int tcp_skb_mss(const struct sk_buff *skb)
 {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 890182a151e1..5ac2befbdc58 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -960,6 +960,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 new_size_goal, size_goal;
+	u16 gso_size;
 
 	if (!large_allowed)
 		return mss_now;
@@ -968,12 +969,13 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 	new_size_goal = tcp_bound_to_half_wnd(tp, sk->sk_gso_max_size);
 
 	/* We try hard to avoid divides here */
-	size_goal = tp->gso_segs * mss_now;
+	gso_size = tcp_tso_seglen(mss_now);
+	size_goal = tp->gso_segs * gso_size;
 	if (unlikely(new_size_goal < size_goal ||
-		     new_size_goal >= size_goal + mss_now)) {
-		tp->gso_segs = min_t(u16, new_size_goal / mss_now,
+		     new_size_goal >= size_goal + gso_size)) {
+		tp->gso_segs = min_t(u16, new_size_goal / gso_size,
 				     sk->sk_gso_max_segs);
-		size_goal = tp->gso_segs * mss_now;
+		size_goal = tp->gso_segs * gso_size;
 	}
 
 	return max(size_goal, mss_now);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d3b8e61d3c5e..a66a3622006d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1748,7 +1748,7 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb)
 /* Initialize TSO segments for a packet. */
 static int tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_now)
 {
-	int tso_segs;
+	int tso_size, tso_segs;
 
 	if (skb->len <= mss_now) {
 		/* Avoid the costly divide in the normal
@@ -1758,8 +1758,9 @@ static int tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_now)
 		tcp_skb_pcount_set(skb, 1);
 		return 1;
 	}
-	TCP_SKB_CB(skb)->tcp_gso_size = mss_now;
-	tso_segs = DIV_ROUND_UP(skb->len, mss_now);
+	tso_size = tcp_tso_seglen(mss_now);
+	TCP_SKB_CB(skb)->tcp_gso_size = tso_size;
+	tso_segs = DIV_ROUND_UP(skb->len, tso_size);
 	tcp_skb_pcount_set(skb, tso_segs);
 	return tso_segs;
 }
@@ -2207,12 +2208,14 @@ static bool tcp_minshall_check(const struct tcp_sock *tp)
  * if ((skb->len % mss) != 0)
  *        tp->snd_sml = TCP_SKB_CB(skb)->end_seq;
  * But we can avoid doing the divide again given we already have
- *  skb_pcount = skb->len / mss_now
+ *  skb_pcount = skb->len / tcp_skb_seglen(skb)
  */
 static void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss_now,
 				const struct sk_buff *skb)
 {
-	if (skb->len < tcp_skb_pcount(skb) * mss_now)
+	u32 seglen = tcp_skb_pcount(skb) == 1 ? mss_now : tcp_skb_mss(skb);
+
+	if (skb->len < tcp_skb_pcount(skb) * seglen)
 		tp->snd_sml = TCP_SKB_CB(skb)->end_seq;
 }
 
@@ -2245,7 +2248,7 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,
  * for every 2^9 usec (aka 512 us) of RTT, so that the RTT-based allowance
  * is below 1500 bytes after 6 * ~500 usec = 3ms.
  */
-static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
+static u32 tcp_tso_autosize(const struct sock *sk, unsigned int tso_size,
 			    int min_tso_segs)
 {
 	unsigned long bytes;
@@ -2259,7 +2262,7 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 
 	bytes = min_t(unsigned long, bytes, sk->sk_gso_max_size);
 
-	return max_t(u32, bytes / mss_now, min_tso_segs);
+	return max_t(u32, bytes / tso_size, min_tso_segs);
 }
 
 /* Return the number of segments we want in the skb we are transmitting.
@@ -2274,14 +2277,14 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
 			ca_ops->min_tso_segs(sk) :
 			READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
 
-	tso_segs = tcp_tso_autosize(sk, mss_now, min_tso);
+	tso_segs = tcp_tso_autosize(sk, tcp_tso_seglen(mss_now), min_tso);
 	return min_t(u32, tso_segs, sk->sk_gso_max_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */
 static unsigned int tcp_mss_split_point(const struct sock *sk,
 					const struct sk_buff *skb,
-					unsigned int mss_now,
+					unsigned int seglen,
 					unsigned int max_segs,
 					int nonagle)
 {
@@ -2289,7 +2292,7 @@ static unsigned int tcp_mss_split_point(const struct sock *sk,
 	u32 partial, needed, window, max_len;
 
 	window = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
-	max_len = mss_now * max_segs;
+	max_len = seglen * max_segs;
 
 	if (likely(max_len <= window && skb != tcp_write_queue_tail(sk)))
 		return max_len;
@@ -2299,7 +2302,7 @@ static unsigned int tcp_mss_split_point(const struct sock *sk,
 	if (max_len <= needed)
 		return max_len;
 
-	partial = needed % mss_now;
+	partial = needed % seglen;
 	/* If last segment is not a full MSS, check if Nagle rules allow us
 	 * to include this last segment in this skb.
 	 * Otherwise, we'll split the skb at last MSS boundary
@@ -2337,7 +2340,8 @@ static int tcp_init_tso_segs(struct sk_buff *skb, unsigned int mss_now)
 {
 	int tso_segs = tcp_skb_pcount(skb);
 
-	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now))
+	if (!tso_segs ||
+	    (tso_segs > 1 && tcp_skb_mss(skb) != tcp_tso_seglen(mss_now)))
 		return tcp_set_skb_tso_segs(skb, mss_now);
 
 	return tso_segs;
@@ -2444,7 +2448,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
 static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
 				 bool *is_cwnd_limited,
 				 bool *is_rwnd_limited,
-				 u32 max_segs)
+				 u32 max_len)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 send_win, cong_win, limit, in_flight, threshold;
@@ -2479,7 +2483,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
 	limit = min(send_win, cong_win);
 
 	/* If a full-sized TSO skb can be sent, do it. */
-	if (limit >= max_segs * tp->mss_cache)
+	if (limit >= max_len)
 		goto send_now;
 
 	/* Middle in queue won't get any more data, full sendable already? */
@@ -2956,10 +2960,10 @@ static void tcp_grow_skb(struct sock *sk, struct sk_buff *skb, int amount)
 static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 			   int push_one, gfp_t gfp)
 {
+	u32 cwnd_quota, max_segs, max_len;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
 	unsigned int tso_segs, sent_pkts;
-	u32 cwnd_quota, max_segs;
 	int result;
 	bool is_cwnd_limited = false, is_rwnd_limited = false;
 
@@ -3007,7 +3011,9 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		}
 		cwnd_quota = min(cwnd_quota, max_segs);
-		missing_bytes = cwnd_quota * mss_now - skb->len;
+
+		max_len = max(mss_now, cwnd_quota * tcp_tso_seglen(mss_now));
+		missing_bytes = max_len - skb->len;
 		if (missing_bytes > 0)
 			tcp_grow_skb(sk, skb, missing_bytes);
 
@@ -3026,13 +3032,13 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		} else {
 			if (!push_one &&
 			    tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
-						 &is_rwnd_limited, max_segs))
+						 &is_rwnd_limited, max_len))
 				break;
 		}
 
 		limit = mss_now;
 		if (tso_segs > 1 && !tcp_urg_mode(tp))
-			limit = tcp_mss_split_point(sk, skb, mss_now,
+			limit = tcp_mss_split_point(sk, skb, tcp_tso_seglen(mss_now),
 						    cwnd_quota,
 						    nonagle);
 
@@ -3193,10 +3199,10 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!pcount))
 		goto rearm_timer;
 
-	if ((pcount > 1) && (skb->len > (pcount - 1) * mss)) {
+	if ((pcount > 1) && (skb->len > (pcount - 1) * tcp_tso_seglen(mss))) {
 		if (unlikely(tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb,
-					  (pcount - 1) * mss, mss,
-					  GFP_ATOMIC)))
+					  (pcount - 1) *  tcp_tso_seglen(mss),
+					  mss, GFP_ATOMIC)))
 			goto rearm_timer;
 		skb = skb_rb_next(skb);
 	}
@@ -3204,7 +3210,7 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	if (__tcp_retransmit_skb(sk, skb, 1))
+	if (__tcp_retransmit_skb(sk, skb, mss))
 		goto rearm_timer;
 
 	tp->tlp_retrans = 1;
@@ -3539,13 +3545,14 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
  * state updates are done by the caller.  Returns non-zero if an
  * error occurred which prevented the send.
  */
-int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int max_len)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned int cur_mss;
 	int diff, len, err;
 	int avail_wnd;
+	int segs;
 
 	/* Inconclusive MTU probe */
 	if (icsk->icsk_mtup.probe_size)
@@ -3595,7 +3602,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 		avail_wnd = cur_mss;
 	}
 
-	len = cur_mss * segs;
+	len = max_len;
 	if (len > avail_wnd) {
 		len = rounddown(avail_wnd, cur_mss);
 		if (!len)
@@ -3684,10 +3691,10 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	return err;
 }
 
-int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int max_len)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int err = __tcp_retransmit_skb(sk, skb, segs);
+	int err = __tcp_retransmit_skb(sk, skb, max_len);
 
 	if (err == 0) {
 #if FASTRETRANS_DEBUG > 0
@@ -3721,6 +3728,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	bool rearm_timer = false;
 	u32 max_segs;
+	u32 mss_now;
 	int mib_idx;
 
 	if (!tp->packets_out)
@@ -3728,9 +3736,11 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 	rtx_head = tcp_rtx_queue_head(sk);
 	skb = tp->retransmit_skb_hint ?: rtx_head;
-	max_segs = tcp_tso_segs(sk, tcp_current_mss(sk));
+	mss_now = tcp_current_mss(sk);
+	max_segs = tcp_tso_segs(sk, mss_now);
 	skb_rbtree_walk_from(skb) {
 		__u8 sacked;
+		u32 max_len;
 		int segs;
 
 		if (tcp_pacing_check(sk))
@@ -3748,6 +3758,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		 * we need to make sure not sending too bigs TSO packets
 		 */
 		segs = min_t(int, segs, max_segs);
+		max_len = max_t(u32, mss_now, segs * tcp_tso_seglen(mss_now));
 
 		if (tp->retrans_out >= tp->lost_out) {
 			break;
@@ -3769,7 +3780,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		if (tcp_small_queue_check(sk, skb, 1))
 			break;
 
-		if (tcp_retransmit_skb(sk, skb, segs))
+		if (tcp_retransmit_skb(sk, skb, max_len))
 			break;
 
 		NET_ADD_STATS(sock_net(sk), mib_idx, tcp_skb_pcount(skb));
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 322db13333c7..2e5331441469 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -595,7 +595,7 @@ void tcp_retransmit_timer(struct sock *sk)
 			goto out;
 		}
 		tcp_enter_loss(sk);
-		tcp_retransmit_skb(sk, skb, 1);
+		tcp_retransmit_skb(sk, skb, tcp_current_mss(sk));
 		__sk_dst_reset(sk);
 		goto out_reset_timer;
 	}
@@ -628,7 +628,7 @@ void tcp_retransmit_timer(struct sock *sk)
 	tcp_enter_loss(sk);
 
 	tcp_update_rto_stats(sk);
-	if (tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), 1) > 0) {
+	if (tcp_retransmit_skb(sk, tcp_rtx_queue_head(sk), tcp_current_mss(sk)) > 0) {
 		/* Retransmission failed because of local congestion,
 		 * Let senders fight for local resources conservatively.
 		 */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 05/10] tcp: split jumbograms with urgent pointer correctly
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (3 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 04/10] tcp: decouple TSO segment length from MSS Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 06/10] tcp: set MSS correctly for PMTU above 65535 Mariusz Klimek
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch fixes urgent-pointer handling for TCP jumbograms. According to
RFC2674, if the urgent pointer offset from the seq number is greater than
65535 but less than the length of the TCP data, the packet should be split
into at least two pieces, with a new packet starting at the urgent offset.

Though the optimal solution is to split the packet into exactly two
packets, not sending jumbograms in the first place is a much simpler
implementation which only requires capping the size returned from
tcp_xmit_size_goal when large_allowed is false. Considering that urgent
pointer is rarely used with jumbograms, this solution should suffice.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5ac2befbdc58..8a2e256913c8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -963,7 +963,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 	u16 gso_size;

 	if (!large_allowed)
-		return mss_now;
+		return min(IPV6_MAXPLEN, mss_now);

 	/* Note : tcp_tso_autosize() will eventually split this later */
 	new_size_goal = tcp_bound_to_half_wnd(tp, sk->sk_gso_max_size);
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 06/10] tcp: set MSS correctly for PMTU above 65535
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (4 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 05/10] tcp: split jumbograms with urgent pointer correctly Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 07/10] veth: raise the max MTU " Mariusz Klimek
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch fixes __tcp_mtu_to_mss to support MTUs above 65535. According to
RFC2675 and RFC6691, an advertised MSS of 65535 means "any MSS, rely only
on PMTU discovery". ip6_default_advmss already adheres to this behavior,
but __tcp_mtu_to_mss doesn't. __tcp_mtu_to_mss instead clamps the MSS to
65535.

Note that MTU probing also doesn't currently support PMTU > 65535 because
icsk_mtup.search_high is set to 65535. This commit doesn't add support for
PMTU > 65535 when MTU probing is enabled because it is not obvious what
icsk_mtup.search_high should be. A possible solution is to only set
icsk_mtup.search_low and to exponentially raise it until a probe becomes
too big (and then set icsk_mtup.search_high) but since this is a
non-trivial change, delegate it to a separate patch series.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 net/ipv4/tcp_output.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a66a3622006d..2d87b9cacd12 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1990,19 +1990,30 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
+	unsigned int ext_hdr_len;
 	int mss_now;
 
+	ext_hdr_len = icsk->icsk_ext_hdr_len;
+
+	/* Take into account added jumbogram HBH header. */
+	if (unlikely(pmtu - sizeof(struct ipv6hdr) > IPV6_MAXPLEN))
+		ext_hdr_len += sizeof(struct hop_jumbo_hdr);
+
 	/* Calculate base mss without TCP options:
 	   It is MMS_S - sizeof(tcphdr) of rfc1122
 	 */
 	mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr);
 
-	/* Clamp it (mss_clamp does not include tcp options) */
-	if (mss_now > tp->rx_opt.mss_clamp)
+	/* Clamp it (mss_clamp does not include tcp options).
+	 * An mss of 65535 means we should rely entirely on PMTU discovery
+	 * (RFC2675, RFC6691).
+	 */
+	if (mss_now > tp->rx_opt.mss_clamp &&
+	    likely(tp->rx_opt.mss_clamp < IPV6_MAXPLEN))
 		mss_now = tp->rx_opt.mss_clamp;
 
 	/* Now subtract optional transport overhead */
-	mss_now -= icsk->icsk_ext_hdr_len;
+	mss_now -= ext_hdr_len;
 
 	/* Then reserve room for full set of TCP options and 8 bytes of data */
 	mss_now = max(mss_now,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 07/10] veth: raise the max MTU above 65535
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (5 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 06/10] tcp: set MSS correctly for PMTU above 65535 Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 08/10] selftests/net: test sending TCP jumbograms over veth Mariusz Klimek
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch raises the maximum allowed MTU above 65535 to allow IPv6
jumbograms to pass through veth pairs. Raising the MTU above 65535 can
significantly improve throughput between connected namespaces. This is
particularly useful for docker containers, which are connected to the host
through veth pairs. This also serves as a way to test jumbogram handling in
the kernel.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 drivers/net/veth.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0cfb19b760dd..e34ffbc1d651 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1425,9 +1425,11 @@ static int veth_close(struct net_device *dev)
 	return 0;
 }
 
+#define MAX_MTU IP6_MAX_JUMBOGRAM_MTU
+
 static int is_valid_veth_mtu(int mtu)
 {
-	return mtu >= ETH_MIN_MTU && mtu <= ETH_MAX_MTU;
+	return mtu >= ETH_MIN_MTU && mtu <= MAX_MTU;
 }
 
 static int veth_alloc_queues(struct net_device *dev)
@@ -1628,7 +1630,7 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 
 			if (peer) {
 				peer->hw_features |= NETIF_F_GSO_SOFTWARE;
-				peer->max_mtu = ETH_MAX_MTU;
+				peer->max_mtu = MAX_MTU;
 			}
 		}
 		bpf_prog_put(old_prog);
@@ -1754,7 +1756,7 @@ static void veth_setup(struct net_device *dev)
 	dev->needs_free_netdev = true;
 	dev->priv_destructor = veth_dev_free;
 	dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
-	dev->max_mtu = ETH_MAX_MTU;
+	dev->max_mtu = MAX_MTU;
 
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 08/10] selftests/net: test sending TCP jumbograms over veth
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (6 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 07/10] veth: raise the max MTU " Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 09/10] selftests/net: add test cases with MTU above 65535 to big_tcp.sh Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 10/10] selftests/net: add jumbogram test case to msg_zerocopy.sh Mariusz Klimek
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch adds a selftest that tests TCP connections over large MTUs. The
test uses three network namespaces called NS_CLIENT, NS_ROUTER and
NS_SERVER. Messages are sent from NS_CLIENT to NS_SERVER through NS_ROUTER,
which are connected through veth pairs. The test verifies that the messages
are properly sent and received and that jumbograms are received if the MTUs
allow it.

The jumbogram_tx and jumbogram_rx helper programs are used to send and
receive large TCP messages. The programs are loosely based on
udpgso_bench_tx and udpgso_bench_rx.

The jumbogram.bpf.c bpf program is used to keep track of the largest packet
size received during the connection. If jumbograms are expected to be
delivered, the largest observed packet size is checked to be above the
expected size. We only check that at least one packet was a jumbogram
because packets received early in the connection may have been smaller due
to the small initial window size.

This test is loosely based on big_tcp.sh. The jumbogram.sh test is separate
from big_tcp.sh because they are technically different features, and
big_tcp.sh would require heavy modification anyway to properly test non-GSO
jumbograms.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 tools/testing/selftests/net/Makefile        |   3 +
 tools/testing/selftests/net/jumbogram.bpf.c |  36 ++
 tools/testing/selftests/net/jumbogram.sh    | 380 ++++++++++++++++++++
 tools/testing/selftests/net/jumbogram_rx.c  | 199 ++++++++++
 tools/testing/selftests/net/jumbogram_tx.c  | 139 +++++++
 5 files changed, 757 insertions(+)
 create mode 100644 tools/testing/selftests/net/jumbogram.bpf.c
 create mode 100755 tools/testing/selftests/net/jumbogram.sh
 create mode 100644 tools/testing/selftests/net/jumbogram_rx.c
 create mode 100644 tools/testing/selftests/net/jumbogram_tx.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 5ca6c557fc3f..0e08ca29bcea 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -53,6 +53,7 @@ TEST_PROGS := \
 	ipv6_force_forwarding.sh \
 	ipv6_route_update_soft_lockup.sh \
 	ipvtap_test.sh \
+	jumbogram.sh \
 	l2_tos_ttl_inherit.sh \
 	l2tp.sh \
 	link_netns.py \
@@ -146,6 +147,8 @@ TEST_GEN_FILES := \
 	ipsec \
 	ipv6_flowlabel \
 	ipv6_flowlabel_mgr \
+	jumbogram_rx \
+	jumbogram_tx \
 	msg_zerocopy \
 	nettest \
 	psock_fanout \
diff --git a/tools/testing/selftests/net/jumbogram.bpf.c b/tools/testing/selftests/net/jumbogram.bpf.c
new file mode 100644
index 000000000000..2bef831bc90f
--- /dev/null
+++ b/tools/testing/selftests/net/jumbogram.bpf.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} max_packet_size SEC(".maps");
+
+SEC("ingress")
+int track_max_size(struct __sk_buff *skb)
+{
+	__u32 *max_size_ptr, *count;
+	__u32 max_size;
+	__u32 key = 0;
+
+	max_size_ptr = bpf_map_lookup_elem(&max_packet_size, &key);
+	if (max_size_ptr)
+		max_size = *max_size_ptr;
+	else
+		max_size = 0;
+
+	if (skb->len >= max_size) {
+		max_size = skb->len;
+		bpf_map_update_elem(&max_packet_size, &key, &max_size,
+				    BPF_ANY);
+	}
+
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = ("GPL");
diff --git a/tools/testing/selftests/net/jumbogram.sh b/tools/testing/selftests/net/jumbogram.sh
new file mode 100755
index 000000000000..bec5197c0a12
--- /dev/null
+++ b/tools/testing/selftests/net/jumbogram.sh
@@ -0,0 +1,380 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test is for checking IPv6 jumbogram passthrough through high MTUs.
+#
+# The test uses three namespaces: A client namespace, a server namespace and a
+# router namespace that forwards packets between the client and the server.
+#
+# +------------------------------------+
+# | NS_CLIENT                          |
+# |            veth$CLIENT             |
+# |                 +                  |
+# +-----------------|------------------+
+#                   |
+# +-----------------|------------------+
+# | NS_ROUTER       +                  |
+# |         veth$ROUTER_CLIENT         |
+# |                                    |
+# |         veth$ROUTER_SERVER         |
+# |                 +                  |
+# +-----------------|------------------+
+#                   |
+# +-----------------|------------------+
+# | NS_SERVER       +                  |
+# |            veth$SERVER             |
+# |                                    |
+# +------------------------------------+
+
+source lib.sh
+
+# All the tests in this script. Can be overridden with -t option.
+TESTS="
+	test_mtu_low_high
+	test_mtu_high_low
+	test_mtu_high_medium
+	test_mtu_high_high
+	test_mtu_probe
+	test_gso
+	test_fastopen
+"
+VERBOSE=0
+
+declare NS_CLIENT
+declare NS_ROUTER
+declare NS_SERVER
+
+readonly CLIENT=1
+readonly ROUTER_CLIENT=2
+readonly ROUTER_SERVER=3
+readonly SERVER=4
+
+readonly CLIENT_ADDR="2001:db8:$CLIENT::$CLIENT"
+readonly ROUTER_CLIENT_ADDR="2001:db8:$CLIENT::$ROUTER_CLIENT"
+readonly ROUTER_SERVER_ADDR="2001:db8:$SERVER::$ROUTER_SERVER"
+readonly SERVER_ADDR="2001:db8:$SERVER::$SERVER"
+
+readonly PORT=8000
+
+readonly MTU=500000
+# Leave enough space for headers.
+readonly PACKET_SIZE=$((MTU - 100))
+readonly PACKET_COUNT=30
+
+################################################################################
+# Utilities
+
+run_cmd()
+{
+	local out
+	if ((VERBOSE)); then
+		echo "COMMAND: $*"
+		out="$("$@")"
+	else
+		out="$("$@" 2>/dev/null)"
+	fi
+
+	local rc="$?"
+	if ((VERBOSE)) && [[ -n "$out" ]]; then
+		echo "    $out"
+	fi
+
+	return "$rc"
+}
+
+################################################################################
+# Setup
+
+setup()
+{
+	setup_ns NS_CLIENT NS_ROUTER NS_SERVER
+
+	# Connect the namespaces with veth pairs.
+	run_cmd ip link add \
+		name "veth$CLIENT" netns "$NS_CLIENT" type veth peer \
+		name "veth$ROUTER_CLIENT" netns "$NS_ROUTER"
+
+	run_cmd ip link add \
+		name "veth$SERVER" netns "$NS_SERVER" type veth peer \
+		name "veth$ROUTER_SERVER" netns "$NS_ROUTER"
+
+	run_cmd ip -n "$NS_CLIENT" link set dev "veth$CLIENT" up
+	run_cmd ip -n "$NS_ROUTER" link set dev "veth$ROUTER_CLIENT" up
+	run_cmd ip -n "$NS_ROUTER" link set dev "veth$ROUTER_SERVER" up
+	run_cmd ip -n "$NS_SERVER" link set dev "veth$SERVER" up
+
+	run_cmd ip -n "$NS_CLIENT" addr add dev \
+		"veth$CLIENT" "$CLIENT_ADDR/64" nodad
+	run_cmd ip -n "$NS_ROUTER" addr add dev \
+		"veth$ROUTER_CLIENT" "$ROUTER_CLIENT_ADDR/64" nodad
+	run_cmd ip -n "$NS_ROUTER" addr add dev \
+		"veth$ROUTER_SERVER" "$ROUTER_SERVER_ADDR/64" nodad
+	run_cmd ip -n "$NS_SERVER" addr add dev \
+		"veth$SERVER" "$SERVER_ADDR/64" nodad
+
+	# Set up forwarding through NS_ROUTER.
+	run_cmd ip netns exec "$NS_ROUTER" \
+		sysctl -wq "net.ipv6.conf.all.forwarding=1"
+	run_cmd ip netns exec "$NS_ROUTER" \
+		sysctl -wq "net.ipv6.conf.veth$ROUTER_CLIENT.forwarding=1"
+	run_cmd ip netns exec "$NS_ROUTER" \
+		sysctl -wq "net.ipv6.conf.veth$ROUTER_SERVER.forwarding=1"
+
+	run_cmd ip -n "$NS_CLIENT" -6 route add "$SERVER_ADDR" \
+		via "$ROUTER_CLIENT_ADDR" dev "veth$CLIENT"
+	run_cmd ip -n "$NS_SERVER" -6 route add "$CLIENT_ADDR" \
+		via "$ROUTER_SERVER_ADDR" dev "veth$SERVER"
+
+	# Disable GSO and GRO.
+	run_cmd ip netns exec "$NS_CLIENT" ethtool -K "veth$CLIENT" gso off
+	run_cmd ip netns exec "$NS_CLIENT" ethtool -K "veth$CLIENT" tso off
+
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_SERVER" gso off
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_SERVER" tso off
+
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_CLIENT" gro off
+	run_cmd ip netns exec "$NS_SERVER" ethtool -K "veth$SERVER" gro off
+}
+
+cleanup()
+{
+	cleanup_all_ns
+}
+
+set_mtus()
+{
+	local client_mtu="$1"
+	local server_mtu="$2"
+
+	run_cmd ip -n "$NS_CLIENT" link set "veth$CLIENT" mtu "$client_mtu"
+	run_cmd ip -n "$NS_ROUTER" link set "veth$ROUTER_CLIENT" mtu "$client_mtu"
+
+	run_cmd ip -n "$NS_SERVER" link set "veth$SERVER" mtu "$server_mtu"
+	run_cmd ip -n "$NS_ROUTER" link set "veth$ROUTER_SERVER" mtu "$server_mtu"
+}
+
+set_gso_max_size()
+{
+	local gso_max_size="$1"
+
+	run_cmd ip -n "$NS_CLIENT" link set dev "veth$CLIENT" gso_max_size "$gso_max_size"
+	run_cmd ip netns exec "$NS_CLIENT" ethtool -K "veth$CLIENT" gso on
+	run_cmd ip netns exec "$NS_CLIENT" ethtool -K "veth$CLIENT" tso on
+
+	run_cmd ip -n "$NS_ROUTER" link set dev "veth$ROUTER_CLIENT" gso_max_size "$gso_max_size"
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_CLIENT" gso on
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_CLIENT" tso on
+
+	run_cmd ip -n "$NS_ROUTER" link set dev "veth$ROUTER_SERVER" gso_max_size "$gso_max_size"
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_SERVER" gso on
+	run_cmd ip netns exec "$NS_ROUTER" ethtool -K "veth$ROUTER_SERVER" tso on
+
+	run_cmd ip -n "$NS_SERVER" link set dev "veth$SERVER" gso_max_size "$gso_max_size"
+	run_cmd ip netns exec "$NS_SERVER" ethtool -K "veth$SERVER" gso on
+	run_cmd ip netns exec "$NS_SERVER" ethtool -K "veth$SERVER" tso on
+}
+
+################################################################################
+# Tests
+
+# Attach a BPF program that keeps track of the maximum packet size it observes.
+observe_packets_start() {
+	run_cmd tc -n "$NS_SERVER" qdisc add dev "veth$SERVER" clsact
+	run_cmd tc -n "$NS_SERVER" filter add dev "veth$SERVER" ingress \
+   		bpf object-file jumbogram.bpf.o section ingress
+}
+
+max_observed_packet_size() {
+	bpftool map lookup name max_packet_size key 0 0 0 0 | jq ".value"
+}
+
+observe_packets_stop() {
+	run_cmd tc -n "$NS_SERVER" filter del dev "veth$SERVER" ingress
+	run_cmd tc -n "$NS_SERVER" qdisc del dev "veth$SERVER" clsact
+}
+
+# Check jumbograms are received by the server.
+check_jumbogram_passthrough()
+{
+	local success="$1"
+	local expected_size="${2-$PACKET_SIZE}"
+	local args=("${@:3}")
+
+	# Start the server.
+	local server_stdout="$(mktemp server-stdout-XXXXXX)"
+	local server_stderr="$(mktemp server-stderr-XXXXXX)"
+	ip netns exec "$NS_SERVER" \
+		./jumbogram_rx -p "$PORT" -l $((PACKET_SIZE * PACKET_COUNT)) \
+		-C 4000 -R 20 "${args[@]}" >"$server_stdout" 2>"$server_stderr" &
+	local server_pid="$!"
+
+	# Wait for the server to start listening.
+	for i in {1..4}; do
+		if grep -q "listening" "$server_stdout"; then
+			break
+		fi
+		sleep 1
+	done
+
+	if ! grep -q "listening" "$server_stdout"; then
+		check_err 1 "failed to start server"
+		kill "$server_pid"
+		wait "$server_pid"
+		return
+	fi
+
+	observe_packets_start
+
+	# Start the client.
+	local client_out="$(mktemp client-out-XXXXXX)"
+	ip netns exec "$NS_CLIENT" \
+		./jumbogram_tx -D "$SERVER_ADDR" -p "$PORT" -M "$PACKET_COUNT" \
+		-s "$PACKET_SIZE" "${args[@]}" >"$client_out" 2>&1
+
+	check_err "$?" "$(cat "$client_out")"
+	rm "$client_out"
+
+	# Make sure the server received the correct amount of data.
+	run_cmd wait "$server_pid"
+	check_err "$?" "$(cat "$server_stderr")"
+	rm "$server_stderr" "$server_stdout"
+
+	# Check if at least on packet was not segmented by checking the maximum
+	# observed packet size. The first packets are always segmented due to
+	# the small initial window size.
+	local max_size="$(max_observed_packet_size)"
+	observe_packets_stop
+
+	((max_size >= expected_size))
+	check_err_fail $((success ^ 1)) "$?" \
+		"expected >=$expected_size received $max_size; jumbogram passthrough"
+}
+
+# If one side has MTU < 65536, the negotiated MSS should be < 65536.
+test_mtu_low_high()
+{
+	set_mtus 1500 "$MTU"
+	check_jumbogram_passthrough 0
+	log_test "TCP jumbograms over veth" "client:   1500, server: $MTU"
+}
+
+test_mtu_high_low()
+{
+	set_mtus "$MTU" 1500
+	check_jumbogram_passthrough 0
+	log_test "TCP jumbograms over veth" "client: $MTU, server:   1500"
+}
+
+# If both sides have MTU > 65535 but the message size is above the server MTU,
+# smaller jumbograms should still be delivered.
+test_mtu_high_medium()
+{
+	local server_mtu=$((MTU / 2))
+	set_mtus "$MTU" "$server_mtu"
+	check_jumbogram_passthrough 0 "$PACKET_SIZE"
+	check_jumbogram_passthrough 1 $((server_mtu - 100))
+	log_test "TCP jumbograms over veth" "client: $MTU, server: $server_mtu"
+}
+
+# If both ends have MTU > 65535, message-sized jumbograms should pass through.
+test_mtu_high_high()
+{
+	set_mtus "$MTU" "$MTU"
+	check_jumbogram_passthrough 1
+	log_test "TCP jumbograms over veth" "client: $MTU, server: $MTU"
+}
+
+# MTU probing can't currently settle on MSS > 65535 even if the MTUs allow for
+# it due to the MTU-search-range upper bound of 65535. At least make sure that
+# the MSS reaches close to 65535.
+test_mtu_probe()
+{
+	run_cmd ip netns exec "$NS_CLIENT" \
+		sysctl -wq "net.ipv4.tcp_mtu_probing=2"
+
+	set_mtus "$MTU" "$MTU"
+	check_jumbogram_passthrough 0 "$PACKET_SIZE"
+	check_jumbogram_passthrough 1 49000
+	log_test "High MTUs with MTU probing"
+}
+
+# If gso_max_size < MTU then GSO shouldn't be used. gso_max_size > MTU is tested
+# in big_tcp.sh.
+test_gso()
+{
+	local gso_max_size=$((MTU - 100))
+
+	set_mtus "$MTU" "$MTU"
+	set_gso_max_size "$gso_max_size"
+	check_jumbogram_passthrough 1
+	log_test "High MTUs with lower gso_max_size"
+}
+
+# Make sure TCP fastopen works with MTU > 65535.
+test_fastopen()
+{
+	run_cmd ip netns exec "$NS_CLIENT" \
+		sysctl -wq "net.ipv4.tcp_fastopen=3"
+	run_cmd ip netns exec "$NS_SERVER" \
+		sysctl -wq "net.ipv4.tcp_fastopen=3"
+
+	set_mtus "$MTU" "$MTU"
+	check_jumbogram_passthrough 1 "$PACKET_SIZE" -f
+	# The same cookie as the previous connection should be used.
+	check_jumbogram_passthrough 1 "$PACKET_SIZE" -f
+	log_test "High MTUs with TCP fastopen"
+}
+
+################################################################################
+# Usage
+
+usage()
+{
+	cat <<EOF
+usage: ${0##*/} OPTS
+
+        -t <test>   Test(s) to run (default: all)
+                    (options: $TESTS)
+        -p          Pause on fail
+        -v          Verbose mode (show commands and output)
+        -h          Show this help message
+EOF
+}
+
+################################################################################
+# Main
+
+while getopts :t:phv o
+do
+	case "$o" in
+		t) TESTS="$OPTARG";;
+		p) PAUSE_ON_FAIL=yes;;
+		v) VERBOSE=$((VERBOSE + 1));;
+		h) usage; exit 0;;
+		*) usage; exit 1;;
+	esac
+done
+
+if [[ "$(id -u)" -ne 0 ]]; then
+	echo "SKIP: Need root privileges"
+	exit "$ksft_skip";
+fi
+
+require_command bpftool
+require_command ethtool
+require_command ip
+require_command iptables
+require_command jq
+require_command tc
+
+# Start clean.
+cleanup
+
+trap cleanup EXIT
+
+for t in $TESTS
+do
+	setup; "$t"; cleanup;
+done
+
+exit "$EXIT_STATUS"
diff --git a/tools/testing/selftests/net/jumbogram_rx.c b/tools/testing/selftests/net/jumbogram_rx.c
new file mode 100644
index 000000000000..5880704114c4
--- /dev/null
+++ b/tools/testing/selftests/net/jumbogram_rx.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <error.h>
+#include <errno.h>
+#include <netinet/tcp.h>
+#include <poll.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/wait.h>
+
+#define POLL_TIMEOUT 10
+#define RCVBUF_SIZE (1 << 21)
+
+static const char *cfg_bind_addr = "::";
+static int cfg_connect_timeout_ms;
+static int cfg_rcv_timeout_ms;
+static int cfg_port = 8000;
+static int cfg_total_len;
+static int cfg_fastopen;
+
+static unsigned long bytes;
+static bool interrupted;
+
+static void sigint_handler(int signum)
+{
+	if (signum == SIGINT)
+		interrupted = true;
+}
+
+static void wait_for_data(int fd, int timeout_ms)
+{
+	struct pollfd pfd;
+
+	pfd.events = POLLIN;
+	pfd.revents = 0;
+	pfd.fd = fd;
+
+	while (true) {
+		int ret = poll(&pfd, 1, POLL_TIMEOUT);
+
+		if (interrupted || (ret > 0 && pfd.revents == POLLIN))
+			break;
+		else if (ret > 0)
+			error(1, errno, "poll: 0x%x expected 0x%x\n",
+			      pfd.revents, POLLIN);
+		else if (ret == -1)
+			error(1, errno, "poll");
+
+		if (!timeout_ms)
+			continue;
+
+		timeout_ms -= POLL_TIMEOUT;
+		if (timeout_ms <= 0) {
+			interrupted = true;
+			break;
+		}
+
+		/* no events and more time to wait, do poll again */
+	}
+}
+
+static int accept_connection(void)
+{
+	static struct sockaddr_in6 addr;
+	int server_fd, fd;
+	int val;
+
+	server_fd = socket(PF_INET6, SOCK_STREAM, 0);
+	if (server_fd == -1)
+		error(1, errno, "socket");
+
+	val = RCVBUF_SIZE;
+	if (setsockopt(server_fd, SOL_SOCKET, SO_RCVBUF, &val, sizeof(val)))
+		error(1, errno, "setsockopt rcvbuf");
+
+	val = 1;
+	if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEPORT, &val, sizeof(val)))
+		error(1, errno, "setsockopt reuseport");
+
+	if (cfg_fastopen &&
+	    setsockopt(server_fd, SOL_TCP, TCP_FASTOPEN, &val, sizeof(val)))
+		error(1, errno, "setsockopt fastopen");
+
+	addr.sin6_family = AF_INET6;
+	addr.sin6_port = htons(cfg_port);
+	if (inet_pton(AF_INET6, cfg_bind_addr, &addr.sin6_addr) != 1)
+		error(1, 0, "ipv6 parse error: %s", cfg_bind_addr);
+
+	if (bind(server_fd, &addr, sizeof(addr)))
+		error(1, errno, "bind");
+
+	if (listen(server_fd, 1))
+		error(1, errno, "listen");
+
+	puts("listening for connection");
+
+	wait_for_data(server_fd, cfg_connect_timeout_ms);
+	if (interrupted)
+		exit(0);
+
+	fd = accept(server_fd, NULL, NULL);
+	if (fd == -1)
+		error(1, errno, "accept");
+
+	if (close(server_fd))
+		error(1, errno, "close accept fd");
+
+	return fd;
+}
+
+static bool receive_packets(int fd)
+{
+	int ret;
+
+	while (true) {
+		ret = recv(fd, NULL, RCVBUF_SIZE, MSG_TRUNC | MSG_DONTWAIT);
+		if (ret > 0)
+			bytes += ret;
+		else if (ret == 0)
+			return true;
+		else if (errno == EAGAIN)
+			return false;
+		else
+			error(1, errno, "recv");
+	}
+
+}
+
+
+static void usage(const char *filepath)
+{
+	error(1, 0, "Usage: %s [-C connect_timeout] [-b addr] [-f] [-p port]"
+	      " [-l total_len] [-n packetnr] [-R rcv_timeout]",
+	      filepath);
+}
+
+static void parse_opts(int argc, char **argv)
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "b:C:fhl:p:R:")) != -1) {
+		switch (c) {
+		case 'b':
+			cfg_bind_addr = optarg;
+			break;
+		case 'C':
+			cfg_connect_timeout_ms = strtoul(optarg, NULL, 0);
+			break;
+		case 'f':
+			cfg_fastopen = true;
+			break;
+		case 'h':
+			usage(argv[0]);
+			break;
+		case 'l':
+			cfg_total_len = strtoul(optarg, NULL, 0);
+			break;
+		case 'p':
+			cfg_port = strtoul(optarg, NULL, 0);
+			break;
+		case 'R':
+			cfg_rcv_timeout_ms = strtoul(optarg, NULL, 0);
+			break;
+		default:
+			exit(1);
+		}
+	}
+
+	if (optind != argc)
+		usage(argv[0]);
+}
+
+int main(int argc, char **argv)
+{
+	parse_opts(argc, argv);
+
+	signal(SIGINT, sigint_handler);
+
+	int fd = accept_connection();
+	int stop = false;
+
+	while (!interrupted && !stop) {
+		wait_for_data(fd, cfg_rcv_timeout_ms);
+		stop = receive_packets(fd);
+	}
+
+	if (cfg_total_len && (bytes != cfg_total_len))
+		error(1, 0, "wrong data length! got %ld, expected %d\n",
+		      bytes, cfg_total_len);
+
+	if (close(fd))
+		error(1, errno, "close");
+
+	return 0;
+}
diff --git a/tools/testing/selftests/net/jumbogram_tx.c b/tools/testing/selftests/net/jumbogram_tx.c
new file mode 100644
index 000000000000..53d1ea4aec44
--- /dev/null
+++ b/tools/testing/selftests/net/jumbogram_tx.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <error.h>
+#include <errno.h>
+#include <netinet/ip6.h>
+#include <netinet/tcp.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/wait.h>
+
+#define RCVBUF_SIZE (1 << 21)
+
+static const char *cfg_bind_addr = "::";
+static int cfg_payload_len = 499700;
+static int cfg_port = 8000;
+static int cfg_msg_nr = 1;
+static bool cfg_fastopen;
+
+static char buf[RCVBUF_SIZE];
+static bool interrupted;
+
+static void sigint_handler(int signum)
+{
+	if (signum == SIGINT)
+		interrupted = true;
+}
+
+static void send_message(int fd, const char *buf, size_t len)
+{
+	int done = 0;
+	int ret;
+
+	while (done < len) {
+		ret = send(fd, &buf[done], len - done, 0);
+		if (ret < 0)
+			error(1, errno, "send");
+
+		done += ret;
+	}
+}
+
+static int connect_to_server(void)
+{
+	struct sockaddr_in6 addr;
+	int ret;
+	int fd;
+
+	fd = socket(PF_INET6, SOCK_STREAM, 0);
+	if (fd == -1)
+		error(1, errno, "socket");
+
+	addr.sin6_family = AF_INET6;
+	addr.sin6_port = htons(cfg_port);
+	if (inet_pton(AF_INET6, cfg_bind_addr, &addr.sin6_addr) != 1)
+		error(1, 0, "ipv6 parse error: %s", cfg_bind_addr);
+
+	if (cfg_fastopen) {
+		ret = sendto(fd, &buf, cfg_payload_len, MSG_FASTOPEN, &addr,
+			     sizeof(addr));
+		if (ret < 0)
+			error(1, errno, "sendto");
+
+		send_message(fd, &buf[ret], cfg_payload_len - ret);
+		cfg_msg_nr--;
+	} else if (connect(fd, &addr, sizeof(addr))) {
+		error(1, errno, "connect");
+	}
+
+	return fd;
+}
+
+static void usage(const char *filepath)
+{
+	error(1, 0, "Usage: %s [-D dst ip] [-f] [-M messagenr] [-p port]"
+		    " [-s sendsize]",
+		    filepath);
+}
+
+static void parse_opts(int argc, char **argv)
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "D:fhM:p:s:")) != -1) {
+		switch (c) {
+		case 'D':
+			cfg_bind_addr = optarg;
+			break;
+		case 'f':
+			cfg_fastopen = true;
+			break;
+		case 'h':
+			usage(argv[0]);
+			break;
+		case 'M':
+			cfg_msg_nr = strtoul(optarg, NULL, 10);
+			break;
+		case 'p':
+			cfg_port = strtoul(optarg, NULL, 0);
+			break;
+		case 's':
+			cfg_payload_len = strtoul(optarg, NULL, 0);
+			break;
+		default:
+			exit(1);
+		}
+	}
+
+	if (optind != argc)
+		usage(argv[0]);
+
+	if (cfg_payload_len > RCVBUF_SIZE)
+		error(1, 0, "payload length %u exceeds max %u",
+		      cfg_payload_len, RCVBUF_SIZE);
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	int i;
+
+	parse_opts(argc, argv);
+
+	memset(buf, 'A', sizeof(buf));
+
+	signal(SIGINT, sigint_handler);
+
+	fd = connect_to_server();
+	for (i = 0; i < cfg_msg_nr && !interrupted; i++)
+		send_message(fd, buf, cfg_payload_len);
+
+	if (close(fd))
+		error(1, errno, "close");
+
+	return 0;
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 09/10] selftests/net: add test cases with MTU above 65535 to big_tcp.sh
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (7 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 08/10] selftests/net: test sending TCP jumbograms over veth Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  2026-06-08 13:07 ` [PATCH net-next 10/10] selftests/net: add jumbogram test case to msg_zerocopy.sh Mariusz Klimek
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch adds test cases to big_tcp.sh that test BIG TCP over MTUs above
65535. Two test cases are added: gso_max_size > mtu > 65535 and mtu >
gso_max_size > 65535.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 tools/testing/selftests/net/big_tcp.sh | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
index 2db9d15cd45f..cb0ffccda3ff 100755
--- a/tools/testing/selftests/net/big_tcp.sh
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -132,8 +132,14 @@ do_test() {
 	local gw_gro=$2
 	local gw_tso=$3
 	local ser_gro=$4
+	local mtu=$5
 	local ret="PASS"
 
+	ip -n $CLIENT_NS link set link0 mtu "$mtu"
+	ip -n $ROUTER_NS link set link1 mtu "$mtu"
+	ip -n $ROUTER_NS link set link2 mtu "$mtu"
+	ip -n $SERVER_NS link set link3 mtu "$mtu"
+
 	ip net exec $CLIENT_NS ethtool -K link0 tso $cli_tso
 	ip net exec $ROUTER_NS ethtool -K link1 gro $gw_gro
 	ip net exec $ROUTER_NS ethtool -K link2 tso $gw_tso
@@ -151,18 +157,20 @@ do_test() {
 
 	stop_counter link1 $ROUTER_NS
 	stop_counter link3 $SERVER_NS
-	printf "%-9s %-8s %-8s %-8s: [%s]\n" \
-		$cli_tso $gw_gro $gw_tso $ser_gro $ret
+	printf "%-9s %-8s %-8s %-9s %-7s: [%s]\n" \
+		$cli_tso $gw_gro $gw_tso $ser_gro $mtu $ret
 	test $ret = "PASS"
 }
 
 testup() {
-	echo "CLI GSO | GW GRO | GW GSO | SER GRO" && \
-	do_test "on"  "on"  "on"  "on"  && \
-	do_test "on"  "off" "on"  "off" && \
-	do_test "off" "on"  "on"  "on"  && \
-	do_test "on"  "on"  "off" "on"  && \
-	do_test "off" "on"  "off" "on"
+	echo "CLI GSO | GW GRO | GW GSO | SER GRO | MTU" && \
+	do_test "on"  "on"  "on"  "on"    1500 && \
+	do_test "on"  "off" "on"  "off"   1500 && \
+	do_test "off" "on"  "on"  "on"    1500 && \
+	do_test "on"  "on"  "off" "on"    1500 && \
+	do_test "off" "on"  "off" "on"    1500 && \
+	do_test "on"  "off" "on"  "off"  66000 && \
+	do_test "on"  "off" "on"  "off" 200000
 }
 
 if ! netperf -V &> /dev/null; then
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 10/10] selftests/net: add jumbogram test case to msg_zerocopy.sh
       [not found] <20260608130755.5626-1-maklimek97@gmail.com>
                   ` (8 preceding siblings ...)
  2026-06-08 13:07 ` [PATCH net-next 09/10] selftests/net: add test cases with MTU above 65535 to big_tcp.sh Mariusz Klimek
@ 2026-06-08 13:07 ` Mariusz Klimek
  9 siblings, 0 replies; 10+ messages in thread
From: Mariusz Klimek @ 2026-06-08 13:07 UTC (permalink / raw)
  To: netdev
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, dsahern, idosch,
	ncardwell, shuah, kuniyu, alice, Mariusz Klimek

This patch adds a test case to msg_zerocopy.sh that tests whether zerocopy
works with TCP jumbograms.

Signed-off-by: Mariusz Klimek <maklimek97@gmail.com>
---
 tools/testing/selftests/net/msg_zerocopy.c  | 3 ++-
 tools/testing/selftests/net/msg_zerocopy.sh | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
index 1d5d3c4e7e87..b5df360c7811 100644
--- a/tools/testing/selftests/net/msg_zerocopy.c
+++ b/tools/testing/selftests/net/msg_zerocopy.c
@@ -789,7 +789,8 @@ static void parse_opts(int argc, char **argv)
 	setup_sockaddr(cfg_family, daddr, &cfg_dst_addr);
 	setup_sockaddr(cfg_family, saddr, &cfg_src_addr);
 
-	if (cfg_payload_len > max_payload_len)
+	if (cfg_payload_len > max_payload_len &&
+	    (strcmp(cfg_test, "tcp") != 0 || cfg_family != PF_INET6))
 		error(1, 0, "-s: payload exceeds max (%d)", max_payload_len);
 	if (cfg_cork_mixed && (!cfg_zerocopy || !cfg_cork))
 		error(1, 0, "-m: cork_mixed requires corking and zerocopy");
diff --git a/tools/testing/selftests/net/msg_zerocopy.sh b/tools/testing/selftests/net/msg_zerocopy.sh
index 28178a38a4e7..ab498c6ea210 100755
--- a/tools/testing/selftests/net/msg_zerocopy.sh
+++ b/tools/testing/selftests/net/msg_zerocopy.sh
@@ -7,7 +7,7 @@ set -e
 
 readonly DEV="veth0"
 readonly DUMMY_DEV="dummy0"
-readonly DEV_MTU=65535
+readonly DEV_MTU=200000
 readonly BIN="./msg_zerocopy"
 
 readonly RAND="$(mktemp -u XXXXXX)"
@@ -29,6 +29,7 @@ if [[ "$#" -eq "0" ]]; then
 
 	$0 4 tcp -t 1 || ret=1
 	$0 6 tcp -t 1 || ret=1
+	$0 6 tcp -t 1 -s 100000 || ret=1
 	$0 4 udp -t 1 || ret=1
 	$0 6 udp -t 1 || ret=1
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-08 13:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260608130755.5626-1-maklimek97@gmail.com>
2026-06-08 13:07 ` [PATCH net-next 01/10] ipv6: do not fragment packets into jumbograms Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 02/10] ipv6: allow route exceptions with MTUs above 65535 Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 03/10] ipv6: add jumbo payload option to non-gso jumbograms Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 04/10] tcp: decouple TSO segment length from MSS Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 05/10] tcp: split jumbograms with urgent pointer correctly Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 06/10] tcp: set MSS correctly for PMTU above 65535 Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 07/10] veth: raise the max MTU " Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 08/10] selftests/net: test sending TCP jumbograms over veth Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 09/10] selftests/net: add test cases with MTU above 65535 to big_tcp.sh Mariusz Klimek
2026-06-08 13:07 ` [PATCH net-next 10/10] selftests/net: add jumbogram test case to msg_zerocopy.sh Mariusz Klimek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox