netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Tom Barbette <barbette@kth.se>
Cc: xdp-newbies@vger.kernel.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Saeed Mahameed" <saeedm@mellanox.com>,
	"Leon Romanovsky" <leonro@mellanox.com>,
	"Tariq Toukan" <tariqt@mellanox.com>,
	brouer@redhat.com,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: Bad XDP performance with mlx5
Date: Fri, 31 May 2019 18:18:17 +0200	[thread overview]
Message-ID: <20190531181817.34039c9f@carbon> (raw)
In-Reply-To: <2218141a-7026-1cb8-c594-37e38eef7b15@kth.se>


On Fri, 31 May 2019 08:51:43 +0200 Tom Barbette <barbette@kth.se> wrote:

> CCing mlx5 maintainers and commiters of bce2b2b. TLDK: there is a huge 
> CPU increase on CX5 when introducing a XDP program.
>
> See https://www.youtube.com/watch?v=o5hlJZbN4Tk&feature=youtu.be
> around 0:40. We're talking something like 15% while it's near 0 for
> other drivers. The machine is a recent Skylake. For us it makes XDP
> unusable. Is that a known problem?

I have a similar test setup, and I can reproduce. I have found the
root-cause see below.  But on my system it was even worse, with an
XDP_PASS program loaded, and iperf (6 parallel TCP flows) I would see
100% CPU usage and total 83.3 Gbits/sec. With non-XDP case, I saw 58%
CPU (43% idle) and total 89.7 Gbits/sec.

 
> I wonder if it doesn't simply come from mlx5/en_main.c:
> rq->buff.map_dir = rq->xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
> 

Nope, that is not the problem.

> Which would be inline from my observation that memory access seems 
> heavier. I guess this is for the XDP_TX case.
> 
> If this is indeed the problem. Any chance we can:
> a) detect automatically that a program will not return XDP_TX (I'm not 
> quite sure about what the BPF limitations allow to guess in advance) or
> b) add a flag to such as XDP_FLAGS_NO_TX to avoid such hit in 
> performance when not needed?

This was kind of hard to root-cause, but I solved it by increasing the TCP
socket size used by the iperf tool, like this (please reproduce):

$ iperf -s --window 4M
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  416 KByte (WARNING: requested 4.00 MByte)
------------------------------------------------------------

Given I could reproduce, I took at closer look at perf record/report stats,
and it was actually quite clear that this was related to stalling on getting
pages from the page allocator (function calls top#6 get_page_from_freelist
and free_pcppages_bulk).

Using my tool: ethtool_stats.pl
 https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

It was clear that the mlx5 driver page-cache was not working:
 Ethtool(mlx5p1  ) stat:     6653761 (   6,653,761) <= rx_cache_busy /sec
 Ethtool(mlx5p1  ) stat:     6653732 (   6,653,732) <= rx_cache_full /sec
 Ethtool(mlx5p1  ) stat:      669481 (     669,481) <= rx_cache_reuse /sec
 Ethtool(mlx5p1  ) stat:           1 (           1) <= rx_congst_umr /sec
 Ethtool(mlx5p1  ) stat:     7323230 (   7,323,230) <= rx_csum_unnecessary /sec
 Ethtool(mlx5p1  ) stat:        1034 (       1,034) <= rx_discards_phy /sec
 Ethtool(mlx5p1  ) stat:     7323230 (   7,323,230) <= rx_packets /sec
 Ethtool(mlx5p1  ) stat:     7324244 (   7,324,244) <= rx_packets_phy /sec

While the non-XDP case looked like this:
 Ethtool(mlx5p1  ) stat:      298929 (     298,929) <= rx_cache_busy /sec
 Ethtool(mlx5p1  ) stat:      298971 (     298,971) <= rx_cache_full /sec
 Ethtool(mlx5p1  ) stat:     3548789 (   3,548,789) <= rx_cache_reuse /sec
 Ethtool(mlx5p1  ) stat:     7695476 (   7,695,476) <= rx_csum_complete /sec
 Ethtool(mlx5p1  ) stat:     7695476 (   7,695,476) <= rx_packets /sec
 Ethtool(mlx5p1  ) stat:     7695169 (   7,695,169) <= rx_packets_phy /sec
Manual consistence calc: 7695476-((3548789*2)+(298971*2)) = -44

With the increased TCP window size, the mlx5 driver cache is working better,
but not optimally, see below. I'm getting 88.0 Gbits/sec with 68% CPU usage.
 Ethtool(mlx5p1  ) stat:      894438 (     894,438) <= rx_cache_busy /sec
 Ethtool(mlx5p1  ) stat:      894453 (     894,453) <= rx_cache_full /sec
 Ethtool(mlx5p1  ) stat:     6638518 (   6,638,518) <= rx_cache_reuse /sec
 Ethtool(mlx5p1  ) stat:           6 (           6) <= rx_congst_umr /sec
 Ethtool(mlx5p1  ) stat:     7532983 (   7,532,983) <= rx_csum_unnecessary /sec
 Ethtool(mlx5p1  ) stat:         164 (         164) <= rx_discards_phy /sec
 Ethtool(mlx5p1  ) stat:     7532983 (   7,532,983) <= rx_packets /sec
 Ethtool(mlx5p1  ) stat:     7533193 (   7,533,193) <= rx_packets_phy /sec
Manual consistence calc: 7532983-(6638518+894453) = 12

To understand why this is happening, you first have to know that the
difference is between the two RX-memory modes used by mlx5 for non-XDP vs
XDP. With non-XDP two frames are stored per memory-page, while for XDP only
a single frame per page is used.  The packets available in the RX-rings are
actually the same, as the ring sizes are non-XDP=512 vs. XDP=1024.

I believe, the real issue is that TCP use the SKB->truesize (based on frame
size) for different memory pressure and window calculations, which is why it
solved the issue to increase the window size manually.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

       reply	other threads:[~2019-05-31 16:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d7968b89-7218-1e76-86bf-c452b2f8d0c2@kth.se>
     [not found] ` <20190529191602.71eb6c87@carbon>
     [not found]   ` <0836bd30-828a-9126-5d99-1d35b931e3ab@kth.se>
     [not found]     ` <20190530094053.364b1147@carbon>
     [not found]       ` <d695d08a-9ee1-0228-2cbb-4b2538a1d2f8@kth.se>
     [not found]         ` <2218141a-7026-1cb8-c594-37e38eef7b15@kth.se>
2019-05-31 16:18           ` Jesper Dangaard Brouer [this message]
2019-05-31 18:00             ` Bad XDP performance with mlx5 David Miller
2019-05-31 18:06             ` Saeed Mahameed
2019-05-31 21:57               ` Jesper Dangaard Brouer
     [not found]               ` <9f116335-0fad-079b-4070-89f24af4ab55@kth.se>
2019-06-04  9:15                 ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190531181817.34039c9f@carbon \
    --to=brouer@redhat.com \
    --cc=barbette@kth.se \
    --cc=leonro@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=toke@redhat.com \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).