From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin KaFai Lau <kafai@fb.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
dccp@vger.kernel.org, kernel-team@cloudflare.com,
Alexei Starovoitov <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Gerrit Renker <gerrit@erg.abdn.ac.uk>,
Jakub Kicinski <kuba@kernel.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>
Subject: Re: [PATCH bpf-next v2 00/17] Run a BPF program on socket lookup
Date: Wed, 13 May 2020 19:54:09 +0200 [thread overview]
Message-ID: <87sgg3u3em.fsf@cloudflare.com> (raw)
In-Reply-To: <20200512163411.4ae2faoa6hwnrv3k@kafai-mbp.dhcp.thefacebook.com>
On Tue, May 12, 2020 at 06:34 PM CEST, Martin KaFai Lau wrote:
> On Tue, May 12, 2020 at 01:57:45PM +0200, Jakub Sitnicki wrote:
>> On Mon, May 11, 2020 at 09:45 PM CEST, Martin KaFai Lau wrote:
>> > On Mon, May 11, 2020 at 08:52:01PM +0200, Jakub Sitnicki wrote:
[...]
>> >> RX pps measured with `ifpps -d <dev> -t 1000 --csv --loop` for 60 seconds.
>> >>
>> >> | tcp4 SYN flood | rx pps (mean ± sstdev) | Δ rx pps |
>> >> |------------------------------+------------------------+----------|
>> >> | 5.6.3 vanilla (baseline) | 939,616 ± 0.5% | - |
>> >> | no SK_LOOKUP prog attached | 929,275 ± 1.2% | -1.1% |
>> >> | with SK_LOOKUP prog attached | 918,582 ± 0.4% | -2.2% |
>> >>
>> >> | tcp6 SYN flood | rx pps (mean ± sstdev) | Δ rx pps |
>> >> |------------------------------+------------------------+----------|
>> >> | 5.6.3 vanilla (baseline) | 875,838 ± 0.5% | - |
>> >> | no SK_LOOKUP prog attached | 872,005 ± 0.3% | -0.4% |
>> >> | with SK_LOOKUP prog attached | 856,250 ± 0.5% | -2.2% |
>> >>
>> >> | udp4 0-len flood | rx pps (mean ± sstdev) | Δ rx pps |
>> >> |------------------------------+------------------------+----------|
>> >> | 5.6.3 vanilla (baseline) | 2,738,662 ± 1.5% | - |
>> >> | no SK_LOOKUP prog attached | 2,576,893 ± 1.0% | -5.9% |
>> >> | with SK_LOOKUP prog attached | 2,530,698 ± 1.0% | -7.6% |
>> >>
>> >> | udp6 0-len flood | rx pps (mean ± sstdev) | Δ rx pps |
>> >> |------------------------------+------------------------+----------|
>> >> | 5.6.3 vanilla (baseline) | 2,867,885 ± 1.4% | - |
>> >> | no SK_LOOKUP prog attached | 2,646,875 ± 1.0% | -7.7% |
>> > What is causing this regression?
>> >
>>
>> I need to go back to archived perf.data and see if perf-annotate or
>> perf-diff provide any clues that will help me tell where CPU cycles are
>> going. Will get back to you on that.
>>
>> Wild guess is that for udp6 we're loading and coping more data to
>> populate v6 addresses in program context. See inet6_lookup_run_bpf
>> (patch 7).
> If that is the case,
> rcu_access_pointer(net->sk_lookup_prog) should be tested first before
> doing ctx initialization.
Coming back after looking for more hints in recorded perf.data.
`perf diff` between baseline and no-prog-attached shows:
# Event 'cycles'
#
# Baseline Delta Symbol
# ........ ....... ......................................
#
4.63% +0.07% [k] udpv6_queue_rcv_one_skb
3.88% -0.38% [k] __netif_receive_skb_core
3.54% +0.21% [k] udp6_lib_lookup2
3.01% -0.42% [k] 0xffffffffc04926cc
2.69% -0.10% [k] mlx5e_skb_from_cqe_linear
2.56% -0.20% [k] ipv6_gro_receive
2.31% -0.15% [k] dev_gro_receive
2.20% -0.13% [k] do_csum
2.02% -0.68% [k] ip6_pol_route
1.94% +0.79% [k] __udp6_lib_lookup
So __udp6_lib_lookup is where to look, as expected.
`perf annotate __udp6_lib_lookup` for no-prog-attached has a hot spot
right were we populate the context object:
: /* Lookup redirect from BPF */
: result = inet6_lookup_run_bpf(net, udptable->protocol,
0.00 : ffffffff818530db: mov 0x18(%r14),%edx
: inet6_lookup_run_bpf():
: const struct in6_addr *saddr,
: __be16 sport,
: const struct in6_addr *daddr,
: unsigned short hnum)
: {
: struct bpf_inet_lookup_kern ctx = {
inet6_hashtables.h:115 1.27 : ffffffff818530df: lea -0x78(%rbp),%r9
0.00 : ffffffff818530e3: xor %eax,%eax
0.00 : ffffffff818530e5: mov $0x8,%ecx
0.00 : ffffffff818530ea: mov %r9,%rdi
inet6_hashtables.h:115 26.09 : ffffffff818530ed: rep stos %rax,%es:(%rdi)
inet6_hashtables.h:115 1.35 : ffffffff818530f0: mov $0xa,%eax
0.00 : ffffffff818530f5: mov %bx,-0x60(%rbp)
0.00 : ffffffff818530f9: mov %ax,-0x78(%rbp)
0.00 : ffffffff818530fd: mov (%r15),%rax
1.42 : ffffffff81853100: mov %dl,-0x76(%rbp)
0.00 : ffffffff81853103: mov 0x8(%r15),%rdx
0.00 : ffffffff81853107: mov %r11w,-0x48(%rbp)
0.00 : ffffffff8185310c: mov %rax,-0x70(%rbp)
1.27 : ffffffff81853110: mov (%r12),%rax
0.02 : ffffffff81853114: mov %rdx,-0x68(%rbp)
0.00 : ffffffff81853118: mov 0x8(%r12),%rdx
0.02 : ffffffff8185311d: mov %rax,-0x58(%rbp)
1.24 : ffffffff81853121: mov %rdx,-0x50(%rbp)
: __read_once_size():
: })
:
: static __always_inline
: void __read_once_size(const volatile void *p, void *res, int size)
: {
: __READ_ONCE_SIZE;
0.05 : ffffffff81853125: mov 0xd28(%r13),%rdx
Note, struct bpf_inet_lookup has been renamed to bpf_sk_lookup since
then.
I'll switch to copying just the pointer to in6_addr{} and push the
context initialization after test for net->sk_lookup_prog, like you
suggested.
I can post full output from perf-diff/annotate to some pastebin if you
would like to take a deeper look.
Thanks,
Jakub
>
>>
>> This makes me realize the copy is unnecessary, I could just store the
>> pointer to in6_addr{}. Will make this change in v3.
>>
>> As to why udp6 is taking a bigger hit than udp4 - comparing top 10 in
>> `perf report --no-children` shows that in our test setup, socket lookup
>> contributes less to CPU cycles on receive for udp4 than for udp6.
>>
>> * udp4 baseline (no children)
>>
>> # Overhead Samples Symbol
>> # ........ ............ ......................................
>> #
>> 8.11% 19429 [k] fib_table_lookup
>> 4.31% 10333 [k] udp_queue_rcv_one_skb
>> 3.75% 8991 [k] fib4_rule_action
>> 3.66% 8763 [k] __netif_receive_skb_core
>> 3.42% 8198 [k] fib_rules_lookup
>> 3.05% 7314 [k] fib4_rule_match
>> 2.71% 6507 [k] mlx5e_skb_from_cqe_linear
>> 2.58% 6192 [k] inet_gro_receive
>> 2.49% 5981 [k] __x86_indirect_thunk_rax
>> 2.36% 5656 [k] udp4_lib_lookup2
>>
>> * udp6 baseline (no children)
>>
>> # Overhead Samples Symbol
>> # ........ ............ ......................................
>> #
>> 4.63% 11100 [k] udpv6_queue_rcv_one_skb
>> 3.88% 9308 [k] __netif_receive_skb_core
>> 3.54% 8480 [k] udp6_lib_lookup2
>> 2.69% 6442 [k] mlx5e_skb_from_cqe_linear
>> 2.56% 6137 [k] ipv6_gro_receive
>> 2.31% 5540 [k] dev_gro_receive
>> 2.20% 5264 [k] do_csum
>> 2.02% 4835 [k] ip6_pol_route
>> 1.94% 4639 [k] __udp6_lib_lookup
>> 1.89% 4540 [k] selinux_socket_sock_rcv_skb
>>
>> Notice that __udp4_lib_lookup didn't even make the cut. That could
>> explain why adding instructions to __udp6_lib_lookup has more effect on
>> RX PPS.
>>
>> Frankly, that is something that suprised us, but we didn't have time to
>> investigate further, yet.
> The perf report should be able to annotate bpf prog also.
> e.g. may be part of it is because the bpf_prog itself is also dealing
> with a longer address?
>
>>
>> >> | with SK_LOOKUP prog attached | 2,520,474 ± 0.7% | -12.1% |
>> > This also looks very different from udp4.
>> >
>>
>> Thanks for the questions,
>> Jakub
prev parent reply other threads:[~2020-05-13 17:54 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-11 18:52 [PATCH bpf-next v2 00/17] Run a BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 01/17] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-05-11 19:06 ` Jakub Sitnicki
2020-05-13 5:41 ` Martin KaFai Lau
2020-05-13 14:34 ` Jakub Sitnicki
2020-05-13 18:10 ` Martin KaFai Lau
2020-05-11 18:52 ` [PATCH bpf-next v2 03/17] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 04/17] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 05/17] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 20:44 ` Alexei Starovoitov
2020-05-12 13:52 ` Jakub Sitnicki
2020-05-12 23:58 ` Alexei Starovoitov
2020-05-13 13:55 ` Jakub Sitnicki
2020-05-13 14:21 ` Lorenz Bauer
2020-05-13 14:50 ` Jakub Sitnicki
2020-05-15 12:28 ` Jakub Sitnicki
2020-05-15 15:07 ` Alexei Starovoitov
2020-05-11 18:52 ` [PATCH bpf-next v2 06/17] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 07/17] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 08/17] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 09/17] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 10/17] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 11/17] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 12/17] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 13/17] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 14/17] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 15/17] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 16/17] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 17/17] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-05-11 19:45 ` [PATCH bpf-next v2 00/17] Run a BPF program on socket lookup Martin KaFai Lau
2020-05-12 11:57 ` Jakub Sitnicki
2020-05-12 16:34 ` Martin KaFai Lau
2020-05-13 17:54 ` Jakub Sitnicki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sgg3u3em.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dccp@vger.kernel.org \
--cc=edumazet@google.com \
--cc=gerrit@erg.abdn.ac.uk \
--cc=kafai@fb.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).