[PATCH] net: Optimize flush calculation in inet_gro

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net: Optimize flush calculation in inet_gro_receive()
@ 2026-04-10 14:43 Helge Deller
  2026-04-11  5:19 ` Kuniyuki Iwashima
  0 siblings, 1 reply; 2+ messages in thread
From: Helge Deller @ 2026-04-10 14:43 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller, David Ahern; +Cc: linux-parisc

For the calculation of the flush variable, use the get_unaligned_xxx() helpers
to access only relevant bits of the IP header.

Note: Since I don't know the network details, I'm not sure if "& ~IP_DF"
(& ~0x4000) is correct, or if "& IP_OFFSET" (& 0x1FFF) should be used instead
(which I believe would be more correct). Instead of possibly breaking things I
left it as is, but maybe some expert can check?

Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c7731e300a44..58cad2687c2c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1479,7 +1479,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	struct sk_buff *p;
 	unsigned int hlen;
 	unsigned int off;
-	int flush = 1;
+	u16 flush = 1;
 	int proto;
 
 	off = skb_gro_offset(skb);
@@ -1504,7 +1504,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 		goto out;
 
 	NAPI_GRO_CB(skb)->proto = proto;
-	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
+	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb)) |
+	        (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
 
 	list_for_each_entry(p, head, list) {
 		struct iphdr *iph2;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] net: Optimize flush calculation in inet_gro_receive()
  2026-04-10 14:43 [PATCH] net: Optimize flush calculation in inet_gro_receive() Helge Deller
@ 2026-04-11  5:19 ` Kuniyuki Iwashima
  0 siblings, 0 replies; 2+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-11  5:19 UTC (permalink / raw)
  To: deller; +Cc: davem, dsahern, linux-kernel, linux-parisc, netdev, edumazet

From: Helge Deller <deller@kernel.org>
Date: Fri, 10 Apr 2026 16:43:54 +0200
> For the calculation of the flush variable, use the get_unaligned_xxx() helpers
> to access only relevant bits of the IP header.
> 
> Note: Since I don't know the network details, I'm not sure if "& ~IP_DF"
> (& ~0x4000) is correct, or if "& IP_OFFSET" (& 0x1FFF) should be used instead

~IP_DF is correct (MF bit needs to be checked), see

commit db8caf3dbc77599dc90f4ea0a803cd1d97116f30
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 31 11:18:10 2013

    gro: should aggregate frames without DF


> (which I believe would be more correct). Instead of possibly breaking things I
> left it as is, but maybe some expert can check?
> 
> Signed-off-by: Helge Deller <deller@gmx.de>
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index c7731e300a44..58cad2687c2c 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1479,7 +1479,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
>  	struct sk_buff *p;
>  	unsigned int hlen;
>  	unsigned int off;
> -	int flush = 1;
> +	u16 flush = 1;
>  	int proto;
>  
>  	off = skb_gro_offset(skb);
> @@ -1504,7 +1504,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
>  		goto out;
>  
>  	NAPI_GRO_CB(skb)->proto = proto;
> -	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
> +	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb)) |
> +	        (get_unaligned_be16(&iph->frag_off) & ~IP_DF);

I think here we intentionally use 32-bit loads:

commit 1075f3f65d0e0f49351b7d4310e9f94483972a51
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue May 26 18:50:29 2009

    ipv4: Use 32-bit loads for ID and length in GRO


Before your patch, 32-bit load + bswap are used while
16-bit load + rol 8 after the change.

I feel the 4-byte aligned load + bswap is faster than
misaligned access + 8 times shift (Is this internally
optimised like xchg for a single word size ?)

Do you have some numbers ?


Before:
	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb))
mov    edx,DWORD PTR [rcx]
bswap  edx
	return skb->len - NAPI_GRO_CB(skb)->data_offset;
mov    r8d,DWORD PTR [rsi+0x38]
mov    r9d,DWORD PTR [rsi+0x70]
sub    r9d,r8d
xor    r9d,edx
	| (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
mov    ebp,0xffbfffff
and    ebp,DWORD PTR [rcx+0x4]
bswap  ebp
or     ebp,r9d


After:
	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb))
movzx  edx,WORD PTR [rcx+0x2]
rol    dx,0x8
	return skb->len - NAPI_GRO_CB(skb)->data_offset;
mov    r8d,DWORD PTR [rsi+0x38]
mov    r9d,DWORD PTR [rsi+0x70]
sub    r9d,r8d
xor    r9d,edx
	| (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
movzx  ebp,WORD PTR [rcx+0x6]
and    ebp,0xffffffbf
rol    bp,0x8
or     ebp,r9d

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-11  5:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 14:43 [PATCH] net: Optimize flush calculation in inet_gro_receive() Helge Deller
2026-04-11  5:19 ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox