Re: [PATCH] net: Optimize flush calculation in inet_gro_receive()

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: deller@kernel.org, davem@davemloft.net, dsahern@kernel.org,
	linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
	netdev@vger.kernel.org, edumazet@google.com
Subject: Re: [PATCH] net: Optimize flush calculation in inet_gro_receive()
Date: Sat, 11 Apr 2026 13:09:58 +0100	[thread overview]
Message-ID: <20260411130958.70202bab@pumpkin> (raw)
In-Reply-To: <20260411052037.2013228-1-kuniyu@google.com>

On Sat, 11 Apr 2026 05:19:35 +0000
Kuniyuki Iwashima <kuniyu@google.com> wrote:

> From: Helge Deller <deller@kernel.org>
> Date: Fri, 10 Apr 2026 16:43:54 +0200
> > For the calculation of the flush variable, use the get_unaligned_xxx() helpers
> > to access only relevant bits of the IP header.
> > 
> > Note: Since I don't know the network details, I'm not sure if "& ~IP_DF"
> > (& ~0x4000) is correct, or if "& IP_OFFSET" (& 0x1FFF) should be used instead  
> 
> ~IP_DF is correct (MF bit needs to be checked), see
> 
> commit db8caf3dbc77599dc90f4ea0a803cd1d97116f30
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri May 31 11:18:10 2013
> 
>     gro: should aggregate frames without DF
> 
> 
> > (which I believe would be more correct). Instead of possibly breaking things I
> > left it as is, but maybe some expert can check?
> > 
> > Signed-off-by: Helge Deller <deller@gmx.de>
> > 
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index c7731e300a44..58cad2687c2c 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -1479,7 +1479,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
> >  	struct sk_buff *p;
> >  	unsigned int hlen;
> >  	unsigned int off;
> > -	int flush = 1;
> > +	u16 flush = 1;
> >  	int proto;
> >  
> >  	off = skb_gro_offset(skb);
> > @@ -1504,7 +1504,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
> >  		goto out;
> >  
> >  	NAPI_GRO_CB(skb)->proto = proto;
> > -	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
> > +	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb)) |
> > +	        (get_unaligned_be16(&iph->frag_off) & ~IP_DF);  
> 
> I think here we intentionally use 32-bit loads:
> 
> commit 1075f3f65d0e0f49351b7d4310e9f94483972a51
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Tue May 26 18:50:29 2009
> 
>     ipv4: Use 32-bit loads for ID and length in GRO
> 
> 
> Before your patch, 32-bit load + bswap are used while
> 16-bit load + rol 8 after the change.
> 
> I feel the 4-byte aligned load + bswap is faster than
> misaligned access + 8 times shift (Is this internally
> optimised like xchg for a single word size ?)
> 
> Do you have some numbers ?

Check on some architecture that doesn't support misaligned loads.
Actually, aren't the accesses aligned??
Also on ones without 32bit byteswap (some do have byteswapping
memory reads).

Also you may not want to change 'flush' to u16.
On non-x86 it may force the compiler add extra masking instructions.

	David
 
> 
> 
> Before:
> 	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb))
> mov    edx,DWORD PTR [rcx]
> bswap  edx
> 	return skb->len - NAPI_GRO_CB(skb)->data_offset;
> mov    r8d,DWORD PTR [rsi+0x38]
> mov    r9d,DWORD PTR [rsi+0x70]
> sub    r9d,r8d
> xor    r9d,edx
> 	| (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
> mov    ebp,0xffbfffff
> and    ebp,DWORD PTR [rcx+0x4]
> bswap  ebp
> or     ebp,r9d
> 
> 
> After:
> 	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb))
> movzx  edx,WORD PTR [rcx+0x2]
> rol    dx,0x8
> 	return skb->len - NAPI_GRO_CB(skb)->data_offset;
> mov    r8d,DWORD PTR [rsi+0x38]
> mov    r9d,DWORD PTR [rsi+0x70]
> sub    r9d,r8d
> xor    r9d,edx
> 	| (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
> movzx  ebp,WORD PTR [rcx+0x6]
> and    ebp,0xffffffbf
> rol    bp,0x8
> or     ebp,r9d
>

     prev parent reply	other threads:[~2026-04-11 12:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 14:43 [PATCH] net: Optimize flush calculation in inet_gro_receive() Helge Deller
2026-04-11  5:19 ` Kuniyuki Iwashima
2026-04-11 12:09   ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260411130958.70202bab@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=davem@davemloft.net \
    --cc=deller@kernel.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox