netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Gobert <richardbgobert@gmail.com>
To: Eric Dumazet <edumazet@google.com>, pabeni@redhat.com
Cc: Paolo Abeni <pabeni@redhat.com>,
	davem@davemloft.net, kuba@kernel.org, dsahern@kernel.org,
	alexanderduyck@fb.com, lucien.xin@gmail.com,
	lixiaoyan@google.com, iwienand@redhat.com, leon@kernel.org,
	ye.xingchen@zte.com.cn, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/2] gro: optimise redundant parsing of packets
Date: Wed, 22 Mar 2023 20:33:11 +0100	[thread overview]
Message-ID: <20230322193309.GA32681@debian> (raw)
In-Reply-To: <CANn89iK_bsPoaRVe+FNZ7LF_eLbz2Af6kju4j9TVHtbgkpcn5g@mail.gmail.com>

> On Wed, Mar 22, 2023 at 2:59 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Mon, 2023-03-20 at 18:00 +0100, Richard Gobert wrote:
> > > Currently the IPv6 extension headers are parsed twice: first in
> > > ipv6_gro_receive, and then again in ipv6_gro_complete.
> > >
> > > By using the new ->transport_proto field, and also storing the size of the
> > > network header, we can avoid parsing extension headers a second time in
> > > ipv6_gro_complete (which saves multiple memory dereferences and conditional
> > > checks inside ipv6_exthdrs_len for a varying amount of extension headers in
> > > IPv6 packets).
> > >
> > > The implementation had to handle both inner and outer layers in case of
> > > encapsulation (as they can't use the same field). I've applied a similar
> > > optimisation to Ethernet.
> > >
> > > Performance tests for TCP stream over IPv6 with a varying amount of
> > > extension headers demonstrate throughput improvement of ~0.7%.
> >
> > I'm surprised that the improvement is measurable: for large aggregate
> > packets a single ipv6_exthdrs_len() call is avoided out of tens calls
> > for the individual pkts. Additionally such figure is comparable to
> > noise level in my tests.

It's not simple but I made an effort to make a quiet environment.
Correct configuration allows for this kind of measurements to be made
as the test is CPU bound and noise is a variance that can be reduced with 
enough samples.

Environment example: (100Gbit NIC (mlx5), physical machine, i9 12th
gen)

    # power-management and hyperthreading disabled in BIOS
    # sysctl preallocate net mem
    echo 0 > /sys/devices/system/cpu/cpufreq/boost # disable turboboost
    ethtool -A enp1s0f0np0 rx off tx off autoneg off # no PAUSE frames

    # Single core performance
    for x in /sys/devices/system/cpu/cpu[1-9]*/online; do echo 0 >"$x"; done

    ./network-testing-master/bin/netfilter_unload_modules.sh 2>/dev/null # unload netfilter
    tuned-adm profile latency-performance
    cpupower frequency-set -f 2200MHz # Set core to specific frequency
    systemctl isolate rescue-ssh.target
    # and kill all processes besides init

> > This adds a couple of additional branches for the common (no extensions
> > header) case.

The additional branch in ipv6_gro_receive would be negligible or even
non-existent for a branch predictor in the common case
(non-encapsulated packets).
I could wrap it with a likely macro if you wish.
Inside ipv6_gro_complete a couple of branches are saved for the common
case as demonstrated below.

original code ipv6_gro_complete (ipv6_exthdrs_len is inlined):

    // if (skb->encapsulation)

    ffffffff81c4962b:	f6 87 81 00 00 00 20 	testb  $0x20,0x81(%rdi)
    ffffffff81c49632:	74 2a                	je     ffffffff81c4965e <ipv6_gro_complete+0x3e>

    ...

    // nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);

    ffffffff81c4969c:	eb 1b                	jmp    ffffffff81c496b9 <ipv6_gro_complete+0x99>    <-- jump to beginning of for loop
    ffffffff81c4968e:   b8 28 00 00 00          mov    $0x28,%eax
    ffffffff81c49693:   31 f6                   xor    %esi,%esi
    ffffffff81c49695:   48 c7 c7 c0 28 aa 82    mov    $0xffffffff82aa28c0,%rdi
    ffffffff81c4969c:   eb 1b                   jmp    ffffffff81c496b9 <ipv6_gro_complete+0x99>
    ffffffff81c4969e:   f6 41 18 01             testb  $0x1,0x18(%rcx)
    ffffffff81c496a2:   74 34                   je     ffffffff81c496d8 <ipv6_gro_complete+0xb8>    <--- 3rd conditional check: !((*opps)->flags & INET6_PROTO_GSO_EXTHDR)
    ffffffff81c496a4:   48 98                   cltq  
    ffffffff81c496a6:   48 01 c2                add    %rax,%rdx
    ffffffff81c496a9:   0f b6 42 01             movzbl 0x1(%rdx),%eax
    ffffffff81c496ad:   0f b6 0a                movzbl (%rdx),%ecx
    ffffffff81c496b0:   8d 04 c5 08 00 00 00    lea    0x8(,%rax,8),%eax
    ffffffff81c496b7:   01 c6                   add    %eax,%esi
    ffffffff81c496b9:   85 c9                   test   %ecx,%ecx                                    <--- for loop starts here
    ffffffff81c496bb:   74 e7                   je     ffffffff81c496a4 <ipv6_gro_complete+0x84>    <--- 1st conditional check: proto != NEXTHDR_HOP
    ffffffff81c496bd:   48 8b 0c cf             mov    (%rdi,%rcx,8),%rcx
    ffffffff81c496c1:   48 85 c9                test   %rcx,%rcx
    ffffffff81c496c4:   75 d8                   jne    ffffffff81c4969e <ipv6_gro_complete+0x7e>    <--- 2nd conditional check: unlikely(!(*opps))
    
    ... (indirect call ops->callbacks.gro_complete)

ipv6_exthdrs_len contains a loop which has 3 conditional checks.
For the common (no extensions header) case, in the new code, *all 3
branches are completely avoided*

patched code ipv6_gro_complete:

    // if (skb->encapsulation)
    ffffffff81befe58:   f6 83 81 00 00 00 20    testb  $0x20,0x81(%rbx)
    ffffffff81befe5f:   74 78                   je     ffffffff81befed9 <ipv6_gro_complete+0xb9>
    
    ...
    
    // else
    ffffffff81befed9:	0f b6 43 50          	movzbl 0x50(%rbx),%eax
    ffffffff81befedd:	0f b7 73 4c          	movzwl 0x4c(%rbx),%esi
    ffffffff81befee1:	48 8b 0c c5 c0 3f a9 	mov    -0x7d56c040(,%rax,8),%rcx
    
    ... (indirect call ops->callbacks.gro_complete)

Thus, the patch is beneficial for both the common case and the ext hdr
case. I would appreciate a second consideration :)

> > while patch 1/2 could be useful, patch 2/2 overall looks not worthy to
> > me.
> >
> > I suggest to re-post for inclusion only patch 1, unless others have
> > strong different opinions.
> >
> 
> +2
> 
> I have the same feeling/opinion.
>
> > Cheers,
> >
> > Paolo
> >

  reply	other threads:[~2023-03-22 19:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 16:37 [PATCH v4 0/2] gro: optimise redundant parsing of packets Richard Gobert
2023-03-20 16:46 ` [PATCH v4 1/2] gro: decrease size of CB Richard Gobert
2023-03-20 17:00 ` [PATCH v4 2/2] gro: optimise redundant parsing of packets Richard Gobert
2023-03-22  9:59   ` Paolo Abeni
2023-03-22 10:11     ` Eric Dumazet
2023-03-22 19:33       ` Richard Gobert [this message]
2023-04-03 11:41         ` Paolo Abeni
2023-04-20 17:23           ` Richard Gobert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230322193309.GA32681@debian \
    --to=richardbgobert@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=iwienand@redhat.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lixiaoyan@google.com \
    --cc=lucien.xin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ye.xingchen@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).