* Re: [PATCH] net: Optimize flush calculation in inet_gro_receive()
2026-04-10 14:43 [PATCH] net: Optimize flush calculation in inet_gro_receive() Helge Deller
@ 2026-04-11 5:19 ` Kuniyuki Iwashima
0 siblings, 0 replies; 2+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-11 5:19 UTC (permalink / raw)
To: deller; +Cc: davem, dsahern, linux-kernel, linux-parisc, netdev, edumazet
From: Helge Deller <deller@kernel.org>
Date: Fri, 10 Apr 2026 16:43:54 +0200
> For the calculation of the flush variable, use the get_unaligned_xxx() helpers
> to access only relevant bits of the IP header.
>
> Note: Since I don't know the network details, I'm not sure if "& ~IP_DF"
> (& ~0x4000) is correct, or if "& IP_OFFSET" (& 0x1FFF) should be used instead
~IP_DF is correct (MF bit needs to be checked), see
commit db8caf3dbc77599dc90f4ea0a803cd1d97116f30
Author: Eric Dumazet <edumazet@google.com>
Date: Fri May 31 11:18:10 2013
gro: should aggregate frames without DF
> (which I believe would be more correct). Instead of possibly breaking things I
> left it as is, but maybe some expert can check?
>
> Signed-off-by: Helge Deller <deller@gmx.de>
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index c7731e300a44..58cad2687c2c 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1479,7 +1479,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
> struct sk_buff *p;
> unsigned int hlen;
> unsigned int off;
> - int flush = 1;
> + u16 flush = 1;
> int proto;
>
> off = skb_gro_offset(skb);
> @@ -1504,7 +1504,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
> goto out;
>
> NAPI_GRO_CB(skb)->proto = proto;
> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
> + flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb)) |
> + (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
I think here we intentionally use 32-bit loads:
commit 1075f3f65d0e0f49351b7d4310e9f94483972a51
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue May 26 18:50:29 2009
ipv4: Use 32-bit loads for ID and length in GRO
Before your patch, 32-bit load + bswap are used while
16-bit load + rol 8 after the change.
I feel the 4-byte aligned load + bswap is faster than
misaligned access + 8 times shift (Is this internally
optimised like xchg for a single word size ?)
Do you have some numbers ?
Before:
flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb))
mov edx,DWORD PTR [rcx]
bswap edx
return skb->len - NAPI_GRO_CB(skb)->data_offset;
mov r8d,DWORD PTR [rsi+0x38]
mov r9d,DWORD PTR [rsi+0x70]
sub r9d,r8d
xor r9d,edx
| (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
mov ebp,0xffbfffff
and ebp,DWORD PTR [rcx+0x4]
bswap ebp
or ebp,r9d
After:
flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb))
movzx edx,WORD PTR [rcx+0x2]
rol dx,0x8
return skb->len - NAPI_GRO_CB(skb)->data_offset;
mov r8d,DWORD PTR [rsi+0x38]
mov r9d,DWORD PTR [rsi+0x70]
sub r9d,r8d
xor r9d,edx
| (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
movzx ebp,WORD PTR [rcx+0x6]
and ebp,0xffffffbf
rol bp,0x8
or ebp,r9d
^ permalink raw reply [flat|nested] 2+ messages in thread