Re: [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Björn Töpel" <bjorn.topel@gmail.com>
Cc: magnus.karlsson@gmail.com, magnus.karlsson@intel.com,
	alexander.h.duyck@intel.com, alexander.duyck@gmail.com,
	john.fastabend@gmail.com, ast@fb.com,
	willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
	mst@redhat.com, netdev@vger.kernel.org,
	"Björn Töpel" <bjorn.topel@intel.com>,
	michael.lundkvist@ericsson.com, jesse.brandeburg@intel.com,
	anjali.singhai@intel.com, qi.z.zhang@intel.com,
	intel-wired-lan@lists.osuosl.org, brouer@redhat.com
Subject: Re: [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support
Date: Wed, 16 May 2018 12:47:07 +0200	[thread overview]
Message-ID: <20180516124707.59d60d2c@redhat.com> (raw)
In-Reply-To: <20180515190615.23099-1-bjorn.topel@gmail.com>

On Tue, 15 May 2018 21:06:03 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> e have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
> NIC is Intel I40E 40Gbit/s using the i40e driver.
> 
> Below are the results in Mpps of the I40E NIC benchmark runs for 64
> and 1500 byte packets, generated by a commercial packet generator HW
> outputing packets at full 40 Gbit/s line rate. The results are without
> retpoline so that we can compare against previous numbers. 
> 
> AF_XDP performance 64 byte packets. Results from the AF_XDP V3 patch
> set are also reported for ease of reference.
> 
> Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
> rxdrop       2.9*       9.6*       21.5
> txpush       2.6*       -          21.6
> l2fwd        1.9*       2.5*       15.0

These performance numbers are actually amazing.

When reaching these amazing/crazy speeds, where we are approaching the
speed of light (travel 30 cm in 1 nanosec), we have to view these
numbers differently, because we are actually working on a nanosec scale.

21.5 Mpps is 46.5 nanosec.

If we want to optimize for +1 Mpps, then (1/22.5*10^3=44.44ns) your
actually only have to optimize the code with 2 nanosec, and with this
2.0 GHz CPU it should in theory only be 4 cycles, but likely have more
instructions per cycle (I see around 2.5 ins per cycle), so we are
looking at (2*2*2.5) needing to find 10 cycles for +1Mpps.

Comparing to XDP_DROP of 32.3Mpps vs ZC-rxdrop 21.5Mpps, this is
actually only a "slowdown" of 15.55 ns, for having frame travel through
xdp_do_redirect, do map lookup etc, and queue into userspace, and
return frames back to kernel.  That is rather amazingly fast.

  (1/21.5*10^3)-(1/32.3*10^3) = 15.55 ns

Another performance number which is amazing is your l2fwd number of
15Mpps, because it if faster than xdp_redirect_map on i40e NICs on my
system, which runs at 12.2 Mpps (2.8Mpps slower).  Again looking at the
nanosec scale instead, this correspond to 15.3 ns.
  I expect, this improvement comes from avoiding page_frag_free, and
avoiding the TX dma_map call (as you premap pages DMA for TX). Reverse
calculating based on perf percentage, I find that these should only
cost 7.18 ns.  Maybe the rest is because you are running TX and TX-dma
completion on another CPU.

I notice you are also using the XDP return-API, which still does a
rhashtable_lookup per frame.  I plan to optimize this to do bulking, to
get away from per frame lookup.  Thus, this should get even faster.

> * From AF_XDP V3 patch set and cover letter.
> 
> AF_XDP performance 1500 byte packets:
> Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
> rxdrop       2.1        3.3       3.3
> l2fwd        1.4        1.8       3.1
> 
> So why do we not get higher values for RX similar to the 34 Mpps we
> had in AF_PACKET V4? We made an experiment running the rxdrop
> benchmark without using the xdp_do_redirect/flush infrastructure nor
> using an XDP program (all traffic on a queue goes to one
> socket). Instead the driver acts directly on the AF_XDP socket. With
> this we got 36.9 Mpps, a significant improvement without any change to
> the uapi. So not forcing users to have an XDP program if they do not
> need it, might be a good idea. This measurement is actually higher
> than what we got with AF_PACKET V4.

So, that are you telling me with your number 36.9 Mpps for
direct-socket-rxdrop...

Compared to XDP_DROP at 32.3Mpps, are you saying that it only costs
3.86 nanosec to call the XDP bpf_prog which returns XDP_DROP.  That is
very impressive actually. (1/32.3*10^3)-(1/36.9*10^3)

Compared to ZC-AF_XDP rxdrop 21.5Mpps, are you saying the cost of XDP
redirect infrastructure, map lookups etc (incl. return-API per frame)
cost 19.41 nanosec (1/21.5*10^3)-(1/36.9*10^3).  Which is approx 40
clock-cycles or 100 (speculative) instructions.  That is not too bad,
and we are still optimizing this stuff.

> XDP performance on our system as a base line:
> 
> 64 byte packets:
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      16      32.3M  0
> 
> 1500 byte packets:
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      16      3.3M    0

Overall I'm *very* impressed by the performance of ZC AF_XDP.
Just remember that measuring improvement in +N Mpps, is actually
misleading, when operating at these (light) speeds.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2018-05-16 10:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-15 19:06 [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 01/12] xsk: remove rebind support Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 02/12] xsk: moved struct xdp_umem definition Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 03/12] xsk: introduce xdp_umem_frame Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 04/12] net: xdp: added bpf_netdev_command XDP_SETUP_XSK_UMEM Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 05/12] xdp: add MEM_TYPE_ZERO_COPY Björn Töpel
2018-05-17  5:57   ` Jesper Dangaard Brouer
2018-05-17  7:08     ` Björn Töpel
2018-05-17  7:09       ` Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 06/12] xsk: add zero-copy support for Rx Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 07/12] net: added netdevice operation for Tx Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 08/12] xsk: wire upp Tx zero-copy functions Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 09/12] samples/bpf: minor *_nb_free performance fix Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 10/12] i40e: added queue pair disable/enable functions Björn Töpel
2018-05-15 19:06 ` [RFC PATCH bpf-next 11/12] i40e: implement AF_XDP zero-copy support for Rx Björn Töpel
2018-05-15 20:25   ` Alexander Duyck
2018-05-15 19:06 ` [RFC PATCH bpf-next 12/12] i40e: implement Tx zero-copy Björn Töpel
2018-05-16 14:28   ` Jesper Dangaard Brouer
2018-05-16 14:38     ` Magnus Karlsson
2018-05-16 15:38       ` Magnus Karlsson
2018-05-16 18:53         ` Jesper Dangaard Brouer
2018-05-17 21:31   ` Jesper Dangaard Brouer
2018-05-18  4:23     ` Björn Töpel
2018-05-16 10:47 ` Jesper Dangaard Brouer [this message]
2018-05-16 17:04 ` [RFC PATCH bpf-next 00/12] AF_XDP, zero-copy support Alexei Starovoitov
2018-05-16 17:49   ` Björn Töpel
2018-05-16 18:14   ` [Intel-wired-lan] " Jeff Kirsher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180516124707.59d60d2c@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=anjali.singhai@intel.com \
    --cc=ast@fb.com \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=michael.lundkvist@ericsson.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=qi.z.zhang@intel.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).