From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Florian Westphal <fw@strlen.de>,
Eric Dumazet <edumazet@google.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: [RFC PATCH 00/11] udp: full early demux for unconnected sockets
Date: Mon, 25 Sep 2017 22:26:09 +0200 [thread overview]
Message-ID: <1506371169.2614.3.camel@redhat.com> (raw)
In-Reply-To: <1506117524.29839.176.camel@edumazet-glaptop3.roam.corp.google.com>
On Fri, 2017-09-22 at 14:58 -0700, Eric Dumazet wrote:
> On Fri, 2017-09-22 at 23:06 +0200, Paolo Abeni wrote:
> > This series refactor the UDP early demux code so that:
> >
> > * full socket lookup is performed for unicast packets
> > * a sk is grabbed even for unconnected socket match
> > * a dst cache is used even in such scenario
> >
> > To perform this tasks a couple of facilities are added:
> >
> > * noref socket references, scoped inside the current RCU section, to be
> > explicitly cleared before leaving such section
> > * a dst cache inside the inet and inet6 local addresses tables, caching the
> > related local dst entry
> >
> > The measured performance gain under small packet UDP flood is as follow:
> >
> > ingress NIC vanilla patched delta
> > rx queues (kpps) (kpps) (%)
> > [ipv4]
> > 1 2177 2414 10
> > 2 2527 2892 14
> > 3 3050 3733 22
>
>
> This is a clear sign your program is not using latest SO_REUSEPORT +
> [ec]BPF filter [1]
>
> return socket[RX_QUEUE# | or CPU#];
>
> If udp_sink uses SO_REUSEPORT with no extra hint, socket selection is
> based on a lazy hash, meaning that you do not have proper siloing.
>
> return socket[hash(skb)];
>
> Multiple cpus can then :
> - compete on grabbing same socket refcount
> - compete on grabbing the receive queue lock
> - compete for releasing lock and socket refcount
> - skb freeing done on different cpus than where allocated.
>
> You are adding complexity to the kernel because you are using a
> sub-optimal user space program, favoring false sharing.
>
> First solve the false sharing issue.
>
> Performance with 2 rx queues should be almost twice the performance with
> 1 rx queue.
>
> Then we can see if the gains you claim are still applicable.
Here are the performance results using a BPF filter to distribute the
ingress packet to the reuseport socket with the same id of the ingress
CPU - we have 1 to 1 mapping between the ingress receive queue and the
destination socket:
ingress NIC vanilla patched delta
rx queues (kpps) (kpps) (%)
[ipv4]
2 3020 3663 21
3 4352 5179 19
4 5318 6194 16
5 6258 7583 21
6 7376 8558 16
[ipv6]
2 2446 3949 61
3 3099 5092 64
4 3698 6611 78
5 4382 7852 79
6 5116 8851 73
Sone notes:
- figures obtained with:
ethtool -L em2 combined $n
MASK=1
for I in `seq 0 $((n - 1))`; do
[ $I -eq 0 ] && USE_BPF="--use_bpf" || USE_BPF=""
udp_sink --reuseport $USE_BPF --recvfrom --count 10000000 --port 9 &
taskset -p $((MASK << ($I + $n) )) $!
done
- in the IPv6 routing code we currently have a relevant bottle-neck in
ip6_pol_route(), I see a lot of contention on a dst refcount, so
without early demux the performances do not scale well there.
- For maximum performances BH and user space sink need to run on
difference CPUs - yes we have some more cacheline misses and a little
contention on the receive queue spin lock, but a lot less icache misses
and more CPU cycles available, the overall tput is a lot higher than
binding on the same CPU where the BH is running.
> PS: Wei Wan is about to release the IPV6 changes so that the big
> differences you showed are going to disappear soon.
Interesting, looking forward to that!
Cheers,
Paolo
next prev parent reply other threads:[~2017-09-25 20:26 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-22 21:06 [RFC PATCH 00/11] udp: full early demux for unconnected sockets Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 01/11] net: add support for noref skb->sk Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 02/11] net: allow early demux to fetch noref socket Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 03/11] udp: do not touch socket refcount in early demux Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 04/11] net: add simple socket-like dst cache helpers Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 05/11] udp: perform full socket lookup in early demux Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 06/11] ip/route: factor out helper for local route creation Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 07/11] ipv6/addrconf: add an helper for inet6 address lookup Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 08/11] net: implement local route cache inside ifaddr Paolo Abeni
2017-09-22 21:06 ` [RFC PATCH 09/11] route: add ipv4/6 helpers to do partial route lookup vs local dst Paolo Abeni
2017-09-22 21:58 ` [RFC PATCH 00/11] udp: full early demux for unconnected sockets Eric Dumazet
2017-09-25 20:26 ` Paolo Abeni [this message]
2017-09-26 20:18 ` [RFC PATCH 10/11] IP: early demux can return an error code Paolo Abeni
2017-09-26 20:18 ` [RFC PATCH 11/11] udp: dst lookup in early demux for unconnected sockets Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1506371169.2614.3.camel@redhat.com \
--to=pabeni@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=fw@strlen.de \
--cc=hannes@stressinduktion.org \
--cc=netdev@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.