netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Björn Töpel" <bjorn.topel@gmail.com>
Cc: magnus.karlsson@intel.com, alexander.h.duyck@intel.com,
	alexander.duyck@gmail.com, john.fastabend@gmail.com, ast@fb.com,
	willemdebruijn.kernel@gmail.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, "Björn Töpel" <bjorn.topel@intel.com>,
	michael.lundkvist@ericsson.com, jesse.brandeburg@intel.com,
	anjali.singhai@intel.com, jeffrey.b.shaw@intel.com,
	ferruh.yigit@intel.com, qi.z.zhang@intel.com, brouer@redhat.com,
	"Saeed Mahameed" <saeedm@mellanox.com>
Subject: Re: [RFC PATCH 00/24] Introducing AF_XDP support
Date: Thu, 1 Feb 2018 17:42:40 +0100	[thread overview]
Message-ID: <20180201174240.6368bc66@redhat.com> (raw)
In-Reply-To: <20180131135356.19134-1-bjorn.topel@gmail.com>



On Wed, 31 Jan 2018 14:53:32 +0100 Björn Töpel <bjorn.topel@gmail.com> wrote:

> * In this RFC, do not use an XDP_REDIRECT action other than
>   bpf_xdpsk_redirect for XDP_DRV_ZC. This is because a zero-copy
>   allocated buffer will then be sent to a cpu id / queue_pair through
>   ndo_xdp_xmit that does not know this has been ZC allocated. It will
>   then do a page_free on it and you will get a crash. How to extend
>   ndo_xdp_xmit with some free/completion function that could be called
>   instead of page_free?  Hopefully, the same solution can be used here
>   as in the first problem item in this section.

I'm prototype-coding extending ndo_xdp_xmit with a free/completion
function call, that look at the xdp_rxq_info to determine what
allocator type the RX-NIC used (info per RXq), and invoke the
appropriate callback.

I dusted off my old page_pool implementation (modifying it to run
outside page-allocator).  Implemented XDP_REDIRECT for mlx5, and
extended xdp_rxq_info, and stored needed info in ixgbe for DMA TX
completion.  Disabled the mlx5 page cache, and instead use the
page_pool.

It worked surprisingly well... test is: pktgen on mlx5 100Gbit/s NIC,
and XDP_REDIRECT with xdp_redirect_map sample, out 10G ixgbe NIC.

Performance is surprisingly good... Testing DMA-TX completion on
ixgbe, that calls "xdp_return_frame", which is mapped to
page_pool_put_page(pool, page); Here DMA-TX-completion runs on CPU#3
and mlx5 RX runs on CPU#0.  (Internally page_pool uses ptr_ring, which
is what gives the good cross CPU performance).

Show adapter(s) (ixgbe2 mlx5p2) statistics (ONLY that changed!)
Ethtool(ixgbe2  ) stat:    810562253 (    810,562,253) <= tx_bytes /sec
Ethtool(ixgbe2  ) stat:    864600261 (    864,600,261) <= tx_bytes_nic /sec
Ethtool(ixgbe2  ) stat:     13509371 (     13,509,371) <= tx_packets /sec
Ethtool(ixgbe2  ) stat:     13509380 (     13,509,380) <= tx_pkts_nic /sec
Ethtool(mlx5p2  ) stat:     36827369 (     36,827,369) <= rx_64_bytes_phy /sec
Ethtool(mlx5p2  ) stat:   2356953271 (  2,356,953,271) <= rx_bytes_phy /sec
Ethtool(mlx5p2  ) stat:     23313782 (     23,313,782) <= rx_discards_phy /sec
Ethtool(mlx5p2  ) stat:         3019 (          3,019) <= rx_out_of_buffer /sec
Ethtool(mlx5p2  ) stat:     36827395 (     36,827,395) <= rx_packets_phy /sec
Ethtool(mlx5p2  ) stat:   2356924099 (  2,356,924,099) <= rx_prio0_bytes /sec
Ethtool(mlx5p2  ) stat:     13513560 (     13,513,560) <= rx_prio0_packets /sec
Ethtool(mlx5p2  ) stat:    810820253 (    810,820,253) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p2  ) stat:     13513672 (     13,513,672) <= rx_vport_unicast_packets /sec

If I only disabled the mlx5 page cache (no page_pool), then single flow
performance was 6Mpps, and if I started two flows the collective
performance drop to 4Mpps, because we hit the page allocator lock
(further negative scaling occurs).

If I keep the mlx5 cache, I see between 7-11Mpps... which varies
depending on ixgbe TX-ring size and DMA-completion interrupt levels.


For AF_XDP, we just register another free/completion callback function.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  parent reply	other threads:[~2018-02-01 16:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-31 13:53 [RFC PATCH 00/24] Introducing AF_XDP support Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 01/24] xsk: AF_XDP sockets buildable skeleton Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 02/24] xsk: add user memory registration sockopt Björn Töpel
2018-02-07 16:00   ` Willem de Bruijn
2018-02-07 21:39     ` Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 03/24] xsk: added XDP_{R,T}X_RING sockopt and supporting structures Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 04/24] xsk: add bind support and introduce Rx functionality Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 05/24] bpf: added bpf_xdpsk_redirect Björn Töpel
2018-02-05 13:42   ` Jesper Dangaard Brouer
2018-02-07 21:11     ` Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 06/24] net: wire up xsk support in the XDP_REDIRECT path Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 07/24] xsk: introduce Tx functionality Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 08/24] i40e: add support for XDP_REDIRECT Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 09/24] samples/bpf: added xdpsock program Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 10/24] netdevice: added XDP_{UN,}REGISTER_XSK command to ndo_bpf Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 11/24] netdevice: added ndo for transmitting a packet from an XDP socket Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 12/24] xsk: add iterator functions to xsk_ring Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 13/24] i40e: introduce external allocator support Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 14/24] i40e: implemented page recycling buff_pool Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 15/24] i40e: start using " Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 16/24] i40e: separated buff_pool interface from i40e implementaion Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 17/24] xsk: introduce xsk_buff_pool Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 18/24] xdp: added buff_pool support to struct xdp_buff Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 19/24] xsk: add support for zero copy Rx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 20/24] xsk: add support for zero copy Tx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 21/24] i40e: implement xsk sub-commands in ndo_bpf for zero copy Rx Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 22/24] i40e: introduced a clean_tx callback function Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 23/24] i40e: introduced Tx completion callbacks Björn Töpel
2018-01-31 13:53 ` [RFC PATCH 24/24] i40e: Tx support for zero copy allocator Björn Töpel
2018-02-01 16:42 ` Jesper Dangaard Brouer [this message]
2018-02-02 10:31 ` [RFC PATCH 00/24] Introducing AF_XDP support Jesper Dangaard Brouer
2018-02-05 15:05 ` Björn Töpel
2018-02-07 15:54   ` Willem de Bruijn
2018-02-07 21:28     ` Björn Töpel
2018-02-08 23:16       ` Willem de Bruijn
2018-02-07 17:59 ` Tom Herbert
2018-02-07 21:38   ` Björn Töpel
2018-03-26 16:06 ` William Tu
2018-03-26 16:38   ` Jesper Dangaard Brouer
2018-03-26 21:58     ` William Tu
2018-03-27  6:09       ` Björn Töpel
2018-03-27  9:37       ` Jesper Dangaard Brouer
2018-03-28  0:06         ` William Tu
2018-03-28  8:01           ` Jesper Dangaard Brouer
2018-03-28 15:05             ` William Tu
2018-03-26 22:54     ` Tushar Dave
2018-03-26 23:03       ` Alexander Duyck
2018-03-26 23:20         ` Tushar Dave
2018-03-28  0:49           ` William Tu
2018-03-27  6:30         ` Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201174240.6368bc66@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=anjali.singhai@intel.com \
    --cc=ast@fb.com \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=ferruh.yigit@intel.com \
    --cc=jeffrey.b.shaw@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=michael.lundkvist@ericsson.com \
    --cc=netdev@vger.kernel.org \
    --cc=qi.z.zhang@intel.com \
    --cc=saeedm@mellanox.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).