public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
@ 2026-01-18 17:52 Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 1/3] net: always inline __skb_incr_checksum_unnecessary() Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-01-18 17:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

On some platforms, GRO stack is too deep and causes cpu stalls.

Decreasing call depths by one shows a 1.5 % gain on Zen2 cpus.
(32 RX queues, 100Gbit NIC, RFS enabled, tcp_rr with 128 threads and 10,000 flows)

We can go further by inlining ipv6_gro_{receive,complete}
and take care of IPv4 if there is interest.

Note: two temporary __always_inline will be replaced with
      inline_for_performance when available.

v2: dealt with udp6_gro_receive()/udp6_gro_complete()
    missing declarations (kernel test robot <lkp@intel.com>)
    for CONFIG_MITIGATION_RETPOLINE=n

Cumulative size increase for this series (of 3):

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.3
add/remove: 2/2 grow/shrink: 5/1 up/down: 1572/-471 (1101)
Function                                     old     new   delta
ipv6_gro_receive                            1069    1846    +777
ipv6_gro_complete                            433     733    +300
tcp6_check_fraglist_gro                        -     272    +272
tcp6_gro_complete                            227     306     +79
tcp4_gro_complete                            325     397     +72
ipv6_offload_init                            218     274     +56
__pfx_tcp6_check_fraglist_gro                  -      16     +16
__pfx___skb_incr_checksum_unnecessary         32       -     -32
__skb_incr_checksum_unnecessary              186       -    -186
tcp6_gro_receive                             959     706    -253
Total: Before=22592724, After=22593825, chg +0.00%

Eric Dumazet (3):
  net: always inline __skb_incr_checksum_unnecessary()
  gro: inline tcp6_gro_receive()
  gro: inline tcp6_gro_complete()

 include/linux/skbuff.h   |  2 +-
 include/net/gro.h        |  5 ++---
 include/net/tcp.h        |  2 --
 net/ipv6/Makefile        |  2 +-
 net/ipv6/ip6_offload.c   | 43 ++++++++++++++++++++--------------------
 net/ipv6/tcpv6_offload.c | 12 +++++------
 6 files changed, 31 insertions(+), 35 deletions(-)

-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 1/3] net: always inline __skb_incr_checksum_unnecessary()
  2026-01-18 17:52 [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Eric Dumazet
@ 2026-01-18 17:52 ` Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 2/3] gro: inline tcp6_gro_receive() Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-01-18 17:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

clang does not inline this helper in GRO fast path.

We can save space and cpu cycles.

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/2 grow/shrink: 2/0 up/down: 156/-218 (-62)
Function                                     old     new   delta
tcp6_gro_complete                            227     311     +84
tcp4_gro_complete                            325     397     +72
__pfx___skb_incr_checksum_unnecessary         32       -     -32
__skb_incr_checksum_unnecessary              186       -    -186
Total: Before=22592724, After=22592662, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 86737076101d4a8452e90fe78adcdcfdefb79169..e6bfe5d0c5252b2e7540e1fef9317aab83feced2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4763,7 +4763,7 @@ static inline void __skb_decr_checksum_unnecessary(struct sk_buff *skb)
 	}
 }
 
-static inline void __skb_incr_checksum_unnecessary(struct sk_buff *skb)
+static __always_inline void __skb_incr_checksum_unnecessary(struct sk_buff *skb)
 {
 	if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
 		if (skb->csum_level < SKB_MAX_CSUM_LEVEL)
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 2/3] gro: inline tcp6_gro_receive()
  2026-01-18 17:52 [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 1/3] net: always inline __skb_incr_checksum_unnecessary() Eric Dumazet
@ 2026-01-18 17:52 ` Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 3/3] gro: inline tcp6_gro_complete() Eric Dumazet
  2026-01-20 15:30 ` [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Jakub Kicinski
  3 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-01-18 17:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

FDO/LTO are unable to inline tcp6_gro_receive() from ipv6_gro_receive()

Make sure tcp6_check_fraglist_gro() is only called only when needed,
so that compiler can leave it out-of-line.

$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 2/0 grow/shrink: 3/1 up/down: 1123/-253 (870)
Function                                     old     new   delta
ipv6_gro_receive                            1069    1846    +777
tcp6_check_fraglist_gro                        -     272    +272
ipv6_offload_init                            218     274     +56
__pfx_tcp6_check_fraglist_gro                  -      16     +16
ipv6_gro_complete                            433     435      +2
tcp6_gro_receive                             959     706    -253
Total: Before=22592662, After=22593532, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/gro.h        |  3 +--
 include/net/tcp.h        |  1 -
 net/ipv6/Makefile        |  2 +-
 net/ipv6/ip6_offload.c   | 22 +++++++++++++---------
 net/ipv6/tcpv6_offload.c | 10 ++++------
 5 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index b65f631c521d7d9741ef86781add0038c9ce4055..85e5eeed4c90feef9440c57af9382b0e9ead1219 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -405,8 +405,7 @@ INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp4_gro_receive(struct list_head *,
 							   struct sk_buff *));
 INDIRECT_CALLABLE_DECLARE(int udp4_gro_complete(struct sk_buff *, int));
 
-INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp6_gro_receive(struct list_head *,
-							   struct sk_buff *));
+struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
 INDIRECT_CALLABLE_DECLARE(int udp6_gro_complete(struct sk_buff *, int));
 
 #define indirect_call_gro_receive_inet(cb, f2, f1, head, skb)	\
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 15f9b20f851fe322f4417ff403c3965436aa3f9f..3b94c84888a884d9ca8eb602ad1f7d4f941f3ef9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2327,7 +2327,6 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff));
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb));
 INDIRECT_CALLABLE_DECLARE(int tcp6_gro_complete(struct sk_buff *skb, int thoff));
-INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb));
 #ifdef CONFIG_INET
 void tcp_gro_complete(struct sk_buff *skb);
 #else
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index d283c59df4c1c421bc043056fe11e5437cc4aece..0492f1a0b4918ada8c56cf649fbec04c7114863a 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_IPV6_FOU) += fou6.o
 
 obj-y += addrconf_core.o exthdrs_core.o ip6_checksum.o ip6_icmp.o
 obj-$(CONFIG_INET) += output_core.o protocol.o \
-			ip6_offload.o tcpv6_offload.o exthdrs_offload.o
+			ip6_offload.o exthdrs_offload.o
 
 obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o
 
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index fce91183797a60fcbf271c73e086aeb0aa9d40c6..4d96154c0dcd019322908ab6ddaa663a2a565e44 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -19,6 +19,7 @@
 #include <net/gso.h>
 
 #include "ip6_offload.h"
+#include "tcpv6_offload.c"
 
 /* All GRO functions are always builtin, except UDP over ipv6, which lays in
  * ipv6 module, as it depends on UDPv6 lookup function, so we need special care
@@ -30,13 +31,6 @@
 #define INDIRECT_CALL_L4(f, f2, f1, ...) INDIRECT_CALL_1(f, f2, __VA_ARGS__)
 #endif
 
-#define indirect_call_gro_receive_l4(f2, f1, cb, head, skb)	\
-({								\
-	unlikely(gro_recursion_inc_test(skb)) ?			\
-		NAPI_GRO_CB(skb)->flush |= 1, NULL :		\
-		INDIRECT_CALL_L4(cb, f2, f1, head, skb);	\
-})
-
 static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto)
 {
 	const struct net_offload *ops = NULL;
@@ -298,9 +292,19 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 
 	skb_gro_postpull_rcsum(skb, iph, nlen);
 
-	pp = indirect_call_gro_receive_l4(tcp6_gro_receive, udp6_gro_receive,
-					 ops->callbacks.gro_receive, head, skb);
+	if (unlikely(gro_recursion_inc_test(skb))) {
+		flush = 1;
+		goto out;
+	}
 
+	if (likely(proto == IPPROTO_TCP))
+		pp = tcp6_gro_receive(head, skb);
+#if IS_BUILTIN(CONFIG_IPV6)
+	else if (likely(proto == IPPROTO_UDP))
+		pp = udp6_gro_receive(head, skb);
+#endif
+	else
+		pp = ops->callbacks.gro_receive(head, skb);
 out:
 	skb_gro_flush_final(skb, pp, flush);
 
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index effeba58630b5ac2593b824bd8fc10a473954b6c..7f19ce423058870f285b7f8ae2a4d116d783f9fb 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -24,9 +24,6 @@ static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
 	struct net *net;
 	int iif, sdif;
 
-	if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST)))
-		return;
-
 	p = tcp_gro_lookup(head, th);
 	if (p) {
 		NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist;
@@ -45,8 +42,8 @@ static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb,
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 }
 
-INDIRECT_CALLABLE_SCOPE
-struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
+static __always_inline struct sk_buff *tcp6_gro_receive(struct list_head *head,
+							struct sk_buff *skb)
 {
 	struct tcphdr *th;
 
@@ -60,7 +57,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 	if (!th)
 		goto flush;
 
-	tcp6_check_fraglist_gro(head, skb, th);
+	if (unlikely(skb->dev->features & NETIF_F_GRO_FRAGLIST))
+		tcp6_check_fraglist_gro(head, skb, th);
 
 	return tcp_gro_receive(head, skb, th);
 
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 net-next 3/3] gro: inline tcp6_gro_complete()
  2026-01-18 17:52 [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 1/3] net: always inline __skb_incr_checksum_unnecessary() Eric Dumazet
  2026-01-18 17:52 ` [PATCH v2 net-next 2/3] gro: inline tcp6_gro_receive() Eric Dumazet
@ 2026-01-18 17:52 ` Eric Dumazet
  2026-01-20 15:30 ` [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Jakub Kicinski
  3 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-01-18 17:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

Remove one function call from GRO stack for native IPv6 + TCP packets.

$ scripts/bloat-o-meter -t vmlinux.2 vmlinux.3
add/remove: 0/0 grow/shrink: 1/1 up/down: 298/-5 (293)
Function                                     old     new   delta
ipv6_gro_complete                            435     733    +298
tcp6_gro_complete                            311     306      -5
Total: Before=22593532, After=22593825, chg +0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/gro.h        |  2 +-
 include/net/tcp.h        |  1 -
 net/ipv6/ip6_offload.c   | 21 +++++++++------------
 net/ipv6/tcpv6_offload.c |  2 +-
 4 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index 85e5eeed4c90feef9440c57af9382b0e9ead1219..2300b6da05b2728ec40f42228f8fa9c195d8479c 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -406,7 +406,7 @@ INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp4_gro_receive(struct list_head *,
 INDIRECT_CALLABLE_DECLARE(int udp4_gro_complete(struct sk_buff *, int));
 
 struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
-INDIRECT_CALLABLE_DECLARE(int udp6_gro_complete(struct sk_buff *, int));
+int udp6_gro_complete(struct sk_buff *, int);
 
 #define indirect_call_gro_receive_inet(cb, f2, f1, head, skb)	\
 ({								\
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3b94c84888a884d9ca8eb602ad1f7d4f941f3ef9..ebdf59d435b8002ca9b90803f40720a58ce3e809 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2326,7 +2326,6 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 				struct tcphdr *th);
 INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff));
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb));
-INDIRECT_CALLABLE_DECLARE(int tcp6_gro_complete(struct sk_buff *skb, int thoff));
 #ifdef CONFIG_INET
 void tcp_gro_complete(struct sk_buff *skb);
 #else
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 4d96154c0dcd019322908ab6ddaa663a2a565e44..32a104ead8760d33e152e0b0a6a6896d70d155b5 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -21,16 +21,6 @@
 #include "ip6_offload.h"
 #include "tcpv6_offload.c"
 
-/* All GRO functions are always builtin, except UDP over ipv6, which lays in
- * ipv6 module, as it depends on UDPv6 lookup function, so we need special care
- * when ipv6 is built as a module
- */
-#if IS_BUILTIN(CONFIG_IPV6)
-#define INDIRECT_CALL_L4(f, f2, f1, ...) INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__)
-#else
-#define INDIRECT_CALL_L4(f, f2, f1, ...) INDIRECT_CALL_1(f, f2, __VA_ARGS__)
-#endif
-
 static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto)
 {
 	const struct net_offload *ops = NULL;
@@ -383,11 +373,18 @@ INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 	}
 
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
+
+	if (likely(ops == &net_hotdata.tcpv6_offload))
+		return tcp6_gro_complete(skb, nhoff);
+#if IS_BUILTIN(CONFIG_IPV6)
+	if (ops == &net_hotdata.udpv6_offload)
+		return udp6_gro_complete(skb, nhoff);
+#endif
+
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
 		goto out;
 
-	err = INDIRECT_CALL_L4(ops->callbacks.gro_complete, tcp6_gro_complete,
-			       udp6_gro_complete, skb, nhoff);
+	err = ops->callbacks.gro_complete(skb, nhoff);
 
 out:
 	return err;
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 7f19ce423058870f285b7f8ae2a4d116d783f9fb..46fa2069d321663ed232e2836db77e3fcb1f4f07 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -67,7 +67,7 @@ static __always_inline struct sk_buff *tcp6_gro_receive(struct list_head *head,
 	return NULL;
 }
 
-INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
+static __always_inline int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 {
 	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
 	const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset);
-- 
2.52.0.457.g6b5491de43-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
  2026-01-18 17:52 [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Eric Dumazet
                   ` (2 preceding siblings ...)
  2026-01-18 17:52 ` [PATCH v2 net-next 3/3] gro: inline tcp6_gro_complete() Eric Dumazet
@ 2026-01-20 15:30 ` Jakub Kicinski
  2026-01-20 15:41   ` Eric Dumazet
  3 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2026-01-20 15:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, netdev, eric.dumazet

On Sun, 18 Jan 2026 17:52:12 +0000 Eric Dumazet wrote:
> On some platforms, GRO stack is too deep and causes cpu stalls.
> 
> Decreasing call depths by one shows a 1.5 % gain on Zen2 cpus.
> (32 RX queues, 100Gbit NIC, RFS enabled, tcp_rr with 128 threads and 10,000 flows)
> 
> We can go further by inlining ipv6_gro_{receive,complete}
> and take care of IPv4 if there is interest.
> 
> Note: two temporary __always_inline will be replaced with
>       inline_for_performance when available.
> 
> v2: dealt with udp6_gro_receive()/udp6_gro_complete()
>     missing declarations (kernel test robot <lkp@intel.com>)
>     for CONFIG_MITIGATION_RETPOLINE=n

Still not good?

net/ipv6/udp_offload.c:136:17: error: static declaration of ‘udp6_gro_receive’ follows non-static declaration
  136 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
      |                 ^~~~~~~~~~~~~~~~
In file included from net/ipv6/udp_offload.c:16:
./include/net/gro.h:408:17: note: previous declaration of ‘udp6_gro_receive’ with type ‘struct sk_buff *(struct list_head *, struct sk_buff *)’
  408 | struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
      |                 ^~~~~~~~~~~~~~~~
net/ipv6/udp_offload.c:168:29: error: static declaration of ‘udp6_gro_complete’ follows non-static declaration
  168 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
      |                             ^~~~~~~~~~~~~~~~~
./include/net/gro.h:409:5: note: previous declaration of ‘udp6_gro_complete’ with type ‘int(struct sk_buff *, int)’
  409 | int udp6_gro_complete(struct sk_buff *, int);
      |     ^~~~~~~~~~~~~~~~~
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
  2026-01-20 15:30 ` [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Jakub Kicinski
@ 2026-01-20 15:41   ` Eric Dumazet
  2026-01-20 15:44     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2026-01-20 15:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, netdev, eric.dumazet

On Tue, Jan 20, 2026 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 18 Jan 2026 17:52:12 +0000 Eric Dumazet wrote:
> > On some platforms, GRO stack is too deep and causes cpu stalls.
> >
> > Decreasing call depths by one shows a 1.5 % gain on Zen2 cpus.
> > (32 RX queues, 100Gbit NIC, RFS enabled, tcp_rr with 128 threads and 10,000 flows)
> >
> > We can go further by inlining ipv6_gro_{receive,complete}
> > and take care of IPv4 if there is interest.
> >
> > Note: two temporary __always_inline will be replaced with
> >       inline_for_performance when available.
> >
> > v2: dealt with udp6_gro_receive()/udp6_gro_complete()
> >     missing declarations (kernel test robot <lkp@intel.com>)
> >     for CONFIG_MITIGATION_RETPOLINE=n
>
> Still not good?
>
> net/ipv6/udp_offload.c:136:17: error: static declaration of ‘udp6_gro_receive’ follows non-static declaration
>   136 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
>       |                 ^~~~~~~~~~~~~~~~
> In file included from net/ipv6/udp_offload.c:16:
> ./include/net/gro.h:408:17: note: previous declaration of ‘udp6_gro_receive’ with type ‘struct sk_buff *(struct list_head *, struct sk_buff *)’
>   408 | struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
>       |                 ^~~~~~~~~~~~~~~~
> net/ipv6/udp_offload.c:168:29: error: static declaration of ‘udp6_gro_complete’ follows non-static declaration
>   168 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
>       |                             ^~~~~~~~~~~~~~~~~
> ./include/net/gro.h:409:5: note: previous declaration of ‘udp6_gro_complete’ with type ‘int(struct sk_buff *, int)’
>   409 | int udp6_gro_complete(struct sk_buff *, int);
>       |     ^~~~~~~~~~~~~~~~~

Oh well, I thought I tested this stuff.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
  2026-01-20 15:41   ` Eric Dumazet
@ 2026-01-20 15:44     ` Eric Dumazet
  2026-01-20 16:29       ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2026-01-20 15:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, netdev, eric.dumazet

On Tue, Jan 20, 2026 at 4:41 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jan 20, 2026 at 4:30 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sun, 18 Jan 2026 17:52:12 +0000 Eric Dumazet wrote:
> > > On some platforms, GRO stack is too deep and causes cpu stalls.
> > >
> > > Decreasing call depths by one shows a 1.5 % gain on Zen2 cpus.
> > > (32 RX queues, 100Gbit NIC, RFS enabled, tcp_rr with 128 threads and 10,000 flows)
> > >
> > > We can go further by inlining ipv6_gro_{receive,complete}
> > > and take care of IPv4 if there is interest.
> > >
> > > Note: two temporary __always_inline will be replaced with
> > >       inline_for_performance when available.
> > >
> > > v2: dealt with udp6_gro_receive()/udp6_gro_complete()
> > >     missing declarations (kernel test robot <lkp@intel.com>)
> > >     for CONFIG_MITIGATION_RETPOLINE=n
> >
> > Still not good?
> >
> > net/ipv6/udp_offload.c:136:17: error: static declaration of ‘udp6_gro_receive’ follows non-static declaration
> >   136 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
> >       |                 ^~~~~~~~~~~~~~~~
> > In file included from net/ipv6/udp_offload.c:16:
> > ./include/net/gro.h:408:17: note: previous declaration of ‘udp6_gro_receive’ with type ‘struct sk_buff *(struct list_head *, struct sk_buff *)’
> >   408 | struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
> >       |                 ^~~~~~~~~~~~~~~~
> > net/ipv6/udp_offload.c:168:29: error: static declaration of ‘udp6_gro_complete’ follows non-static declaration
> >   168 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
> >       |                             ^~~~~~~~~~~~~~~~~
> > ./include/net/gro.h:409:5: note: previous declaration of ‘udp6_gro_complete’ with type ‘int(struct sk_buff *, int)’
> >   409 | int udp6_gro_complete(struct sk_buff *, int);
> >       |     ^~~~~~~~~~~~~~~~~
>
> Oh well, I thought I tested this stuff.

Interesting... clang (our default compiler for kernel) does not complain at all.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
  2026-01-20 15:44     ` Eric Dumazet
@ 2026-01-20 16:29       ` Jakub Kicinski
  2026-01-20 16:38         ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2026-01-20 16:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Paolo Abeni, Simon Horman, netdev, eric.dumazet

On Tue, 20 Jan 2026 16:44:52 +0100 Eric Dumazet wrote:
> On Tue, Jan 20, 2026 at 4:41 PM Eric Dumazet <edumazet@google.com> wrote:
> > > Still not good?
> > >
> > > net/ipv6/udp_offload.c:136:17: error: static declaration of ‘udp6_gro_receive’ follows non-static declaration
> > >   136 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
> > >       |                 ^~~~~~~~~~~~~~~~
> > > In file included from net/ipv6/udp_offload.c:16:
> > > ./include/net/gro.h:408:17: note: previous declaration of ‘udp6_gro_receive’ with type ‘struct sk_buff *(struct list_head *, struct sk_buff *)’
> > >   408 | struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
> > >       |                 ^~~~~~~~~~~~~~~~
> > > net/ipv6/udp_offload.c:168:29: error: static declaration of ‘udp6_gro_complete’ follows non-static declaration
> > >   168 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
> > >       |                             ^~~~~~~~~~~~~~~~~
> > > ./include/net/gro.h:409:5: note: previous declaration of ‘udp6_gro_complete’ with type ‘int(struct sk_buff *, int)’
> > >   409 | int udp6_gro_complete(struct sk_buff *, int);
> > >       |     ^~~~~~~~~~~~~~~~~  
> >
> > Oh well, I thought I tested this stuff.  
> 
> Interesting... clang (our default compiler for kernel) does not complain at all.

Well, at least I _think_ it's this series, haven't tested.
It breaks in the kselftests, no allmodconfig, here's the full config:

https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill-dbg/results/482021/config

Also possible that it's a silent conflict with another pending series.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete}
  2026-01-20 16:29       ` Jakub Kicinski
@ 2026-01-20 16:38         ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2026-01-20 16:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S . Miller, Paolo Abeni, Simon Horman, netdev, eric.dumazet

On Tue, Jan 20, 2026 at 5:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 20 Jan 2026 16:44:52 +0100 Eric Dumazet wrote:
> > On Tue, Jan 20, 2026 at 4:41 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > Still not good?
> > > >
> > > > net/ipv6/udp_offload.c:136:17: error: static declaration of ‘udp6_gro_receive’ follows non-static declaration
> > > >   136 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
> > > >       |                 ^~~~~~~~~~~~~~~~
> > > > In file included from net/ipv6/udp_offload.c:16:
> > > > ./include/net/gro.h:408:17: note: previous declaration of ‘udp6_gro_receive’ with type ‘struct sk_buff *(struct list_head *, struct sk_buff *)’
> > > >   408 | struct sk_buff *udp6_gro_receive(struct list_head *, struct sk_buff *);
> > > >       |                 ^~~~~~~~~~~~~~~~
> > > > net/ipv6/udp_offload.c:168:29: error: static declaration of ‘udp6_gro_complete’ follows non-static declaration
> > > >   168 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
> > > >       |                             ^~~~~~~~~~~~~~~~~
> > > > ./include/net/gro.h:409:5: note: previous declaration of ‘udp6_gro_complete’ with type ‘int(struct sk_buff *, int)’
> > > >   409 | int udp6_gro_complete(struct sk_buff *, int);
> > > >       |     ^~~~~~~~~~~~~~~~~
> > >
> > > Oh well, I thought I tested this stuff.
> >
> > Interesting... clang (our default compiler for kernel) does not complain at all.
>
> Well, at least I _think_ it's this series, haven't tested.
> It breaks in the kselftests, no allmodconfig, here's the full config:
>
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill-dbg/results/482021/config
>
> Also possible that it's a silent conflict with another pending series.

To clarify : clang does not see an error, gcc does.

I removed the INDIRECT_CALLABLE_SCOPE from both functions for v3.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-01-20 16:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-18 17:52 [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Eric Dumazet
2026-01-18 17:52 ` [PATCH v2 net-next 1/3] net: always inline __skb_incr_checksum_unnecessary() Eric Dumazet
2026-01-18 17:52 ` [PATCH v2 net-next 2/3] gro: inline tcp6_gro_receive() Eric Dumazet
2026-01-18 17:52 ` [PATCH v2 net-next 3/3] gro: inline tcp6_gro_complete() Eric Dumazet
2026-01-20 15:30 ` [PATCH v2 net-next 0/3] gro: inline tcp6_gro_{receive,complete} Jakub Kicinski
2026-01-20 15:41   ` Eric Dumazet
2026-01-20 15:44     ` Eric Dumazet
2026-01-20 16:29       ` Jakub Kicinski
2026-01-20 16:38         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox