From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Tariq Toukan <tariqt@mellanox.com>
Cc: netdev@vger.kernel.org, BjörnTöpel <bjorn.topel@intel.com>,
magnus.karlsson@intel.com, eugenia@mellanox.com,
"Jason Wang" <jasowang@redhat.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Eran Ben Elisha" <eranbe@mellanox.com>,
"Saeed Mahameed" <saeedm@mellanox.com>,
galp@mellanox.com, "Daniel Borkmann" <borkmann@iogearbox.net>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
brouer@redhat.com
Subject: Re: [bpf-next V1 PATCH 13/15] mlx5: use page_pool for xdp_return_frame call
Date: Thu, 8 Mar 2018 13:44:40 +0100 [thread overview]
Message-ID: <20180308134440.56317e31@redhat.com> (raw)
In-Reply-To: <1d59e91c-1c92-ccfd-652e-860c8bcd794e@mellanox.com>
On Wed, 7 Mar 2018 14:11:36 +0200
Tariq Toukan <tariqt@mellanox.com> wrote:
> On 06/03/2018 11:48 PM, Jesper Dangaard Brouer wrote:
> > This patch shows how it is possible to have both the driver local page
> > cache, which uses elevated refcnt for "catching"/avoiding SKB
> > put_page. And at the same time, have pages getting returned to the
> > page_pool from ndp_xdp_xmit DMA completion.
> >
> > Performance is surprisingly good. Tested DMA-TX completion on ixgbe,
> > that calls "xdp_return_frame", which call page_pool_put_page().
> > Stats show DMA-TX-completion runs on CPU#9 and mlx5 RX runs on CPU#5.
> > (Internally page_pool uses ptr_ring, which is what gives the good
> > cross CPU performance).
> >
> > Show adapter(s) (ixgbe2 mlx5p2) statistics (ONLY that changed!)
> > Ethtool(ixgbe2 ) stat: 732863573 ( 732,863,573) <= tx_bytes /sec
> > Ethtool(ixgbe2 ) stat: 781724427 ( 781,724,427) <= tx_bytes_nic /sec
> > Ethtool(ixgbe2 ) stat: 12214393 ( 12,214,393) <= tx_packets /sec
> > Ethtool(ixgbe2 ) stat: 12214435 ( 12,214,435) <= tx_pkts_nic /sec
> > Ethtool(mlx5p2 ) stat: 12211786 ( 12,211,786) <= rx3_cache_empty /sec
> > Ethtool(mlx5p2 ) stat: 36506736 ( 36,506,736) <= rx_64_bytes_phy /sec
> > Ethtool(mlx5p2 ) stat: 2336430575 ( 2,336,430,575) <= rx_bytes_phy /sec
> > Ethtool(mlx5p2 ) stat: 12211786 ( 12,211,786) <= rx_cache_empty /sec
> > Ethtool(mlx5p2 ) stat: 22823073 ( 22,823,073) <= rx_discards_phy /sec
> > Ethtool(mlx5p2 ) stat: 1471860 ( 1,471,860) <= rx_out_of_buffer /sec
> > Ethtool(mlx5p2 ) stat: 36506715 ( 36,506,715) <= rx_packets_phy /sec
> > Ethtool(mlx5p2 ) stat: 2336542282 ( 2,336,542,282) <= rx_prio0_bytes /sec
> > Ethtool(mlx5p2 ) stat: 13683921 ( 13,683,921) <= rx_prio0_packets /sec
> > Ethtool(mlx5p2 ) stat: 821015537 ( 821,015,537) <= rx_vport_unicast_bytes /sec
> > Ethtool(mlx5p2 ) stat: 13683608 ( 13,683,608) <= rx_vport_unicast_packets /sec
> >
> > Before this patch: single flow performance was 6Mpps, and if I started
> > two flows the collective performance drop to 4Mpps, because we hit the
> > page allocator lock (further negative scaling occurs).
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 ++
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 39 ++++++++++++++++++---
> > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 10 ++++-
> > 3 files changed, 45 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > index 28cc26debeda..ab91166f7c5a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > @@ -53,6 +53,8 @@
> > #include "mlx5_core.h"
> > #include "en_stats.h"
> >
> > +struct page_pool;
> > +
>
> you can have the include here instead:
> #include <net/page_pool.h>
>
> and remove it from the .c files.
Hmm... I'm not too happy with this change request. I did this to avoid
having to recompile all the drivers files when I changed page_pool.h
(yes, the kernel makefile system detect the dependencies automatically).
I'll keep this, let me know if you feel strongly about this...
> > #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
> >
> > #define MLX5E_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)
> > @@ -535,6 +537,7 @@ struct mlx5e_rq {
> > /* XDP */
> > struct bpf_prog *xdp_prog;
> > struct mlx5e_xdpsq xdpsq;
> > + struct page_pool *page_pool;
> >
> > /* control */
> > struct mlx5_wq_ctrl wq_ctrl;
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 49732c8c27c1..fbe27110ff02 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -44,6 +44,8 @@
> > #include "accel/ipsec.h"
> > #include "vxlan.h"
> >
> > +#include <net/page_pool.h>
> > +
> > struct mlx5e_rq_param {
> > u32 rqc[MLX5_ST_SZ_DW(rqc)];
> > struct mlx5_wq_param wq;
> > @@ -396,6 +398,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> > int err;
> > int i;
> >
> > + struct page_pool_params pp_params = { 0 };
> > +
> > rqp->wq.db_numa_node = cpu_to_node(c->cpu);
> >
> > err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->wq,
> > @@ -506,12 +510,33 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> > rq->mkey_be = c->mkey_be;
> > }
> >
> > - /* This must only be activate for order-0 pages */
> > - if (rq->xdp_prog)
> > - err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
> > - MEM_TYPE_PAGE_ORDER0, NULL);
> > - if (err)
> > + /* Create a page_pool and register it with rxq */
> > + pp_params.size = PAGE_POOL_PARAMS_SIZE;
> > + pp_params.order = rq->buff.page_order;
> > + pp_params.dev = c->pdev;
> > + pp_params.nid = cpu_to_node(c->cpu);
> > + pp_params.dma_dir = rq->buff.map_dir;
> > + pp_params.pool_size = 1 << params->log_rq_size;
>
> if rq->wq_type == MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ, need to multiply
> by MLX5_MPWRQ_PAGES_PER_WQE.
Okay, doing so.
> > + pp_params.flags = 0; /* No-internal DMA mapping in page_pool */
> > +
> > + /* page_pool can be used even when there is no rq->xdp_prog,
> > + * given page_pool does not handle DMA mapping there is no
> > + * required state to clear. And page_pool gracefully handle
> > + * elevated refcnt.
> > + */
> > + rq->page_pool = page_pool_create(&pp_params);
> > + if (IS_ERR_OR_NULL(rq->page_pool)) {
> > + kfree(rq->wqe.frag_info);
> > + rq->page_pool = NULL;
> > + err = -ENOMEM;
>
> better: PTR_ERR(rq->page_pool)
Ok, I changed page_pool_create() return values to be a little more sane
(not returning NULL), so we can do as you suggest (else using PTR_ERR
directly for err can return zero, which indicate success for
mlx5e_alloc_rq() even when it failed...).
> > goto err_rq_wq_destroy;
> > + }
> > + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq,
> > + MEM_TYPE_PAGE_POOL,
> > rq->page_pool);
> > + if (err) {
> > + err = -ENOMEM;
>
> why change err value?
Okay.
> > + goto err_rq_wq_destroy;
> > + }
> >
> > for (i = 0; i < wq_sz; i++) {
> > struct mlx5e_rx_wqe *wqe =
> > mlx5_wq_ll_get_wqe(&rq->wq, i); @@ -549,6 +574,8 @@ static int
> > mlx5e_alloc_rq(struct mlx5e_channel *c, if (rq->xdp_prog)
> > bpf_prog_put(rq->xdp_prog);
> > xdp_rxq_info_unreg(&rq->xdp_rxq);
> > + if (rq->page_pool)
> > + page_pool_destroy_rcu(rq->page_pool);
> > mlx5_wq_destroy(&rq->wq_ctrl);
> >
> > return err;
> > @@ -562,6 +589,8 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
> > bpf_prog_put(rq->xdp_prog);
> >
> > xdp_rxq_info_unreg(&rq->xdp_rxq);
> > + if (rq->page_pool)
> > + page_pool_destroy_rcu(rq->page_pool);
> >
> > switch (rq->wq_type) {
> > case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index
> > 6dcc3e8fbd3e..4898239467d9 100644 ---
> > a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -45,6 +45,8 @@
> > #include "en_accel/ipsec_rxtx.h"
> > #include "lib/clock.h"
> >
> > +#include <net/page_pool.h>
> > +
> > static inline bool mlx5e_rx_hw_stamp(struct hwtstamp_config
> > *config) {
> > return config->rx_filter == HWTSTAMP_FILTER_ALL;
> > @@ -221,7 +223,7 @@ static inline int
> > mlx5e_page_alloc_mapped(struct mlx5e_rq *rq, if
> > (mlx5e_rx_cache_get(rq, dma_info)) return 0;
> >
> > - dma_info->page = dev_alloc_pages(rq->buff.page_order);
> > + dma_info->page = page_pool_dev_alloc_pages(rq->page_pool);
> > if (unlikely(!dma_info->page))
> > return -ENOMEM;
> >
> > @@ -250,7 +252,11 @@ void mlx5e_page_release(struct mlx5e_rq *rq,
> > struct mlx5e_dma_info *dma_info, return;
> >
> > mlx5e_page_dma_unmap(rq, dma_info);
> > - put_page(dma_info->page);
> > +
> > + if (likely(recycle))
> > + page_pool_recycle_direct(rq->page_pool,
> > dma_info->page);
> > + else
> > + put_page(dma_info->page);
> > }
> >
>
> we can save a branch, this way:
>
> - if (likely(recycle) && mlx5e_rx_cache_put(rq, dma_info))
> + if (likely(recycle)) {
> + if (mlx5e_rx_cache_put(rq, dma_info))
> + return;
> +
> + mlx5e_page_dma_unmap(rq, dma_info);
> + page_pool_recycle_direct(rq->page_pool,
> dma_info->page); return;
> + }
Okay, doing this, but you missed that we now need to call
mlx5e_page_dma_unmap(rq, dma_info) in two cases now.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2018-03-08 12:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-06 21:47 [bpf-next V1 PATCH 00/15] XDP redirect memory return API Jesper Dangaard Brouer
2018-03-06 21:47 ` [bpf-next V1 PATCH 01/15] mlx5: basic XDP_REDIRECT forward support Jesper Dangaard Brouer
2018-03-06 21:47 ` [bpf-next V1 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 03/15] ixgbe: use xdp_return_frame API Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 04/15] xdp: move struct xdp_buff from filter.h to xdp.h Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 05/15] xdp: introduce a new xdp_frame type Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 06/15] tun: convert to use generic xdp_frame and xdp_return_frame API Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 07/15] virtio_net: " Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 08/15] bpf: cpumap convert to use generic xdp_frame Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 09/15] mlx5: register a memory model when XDP is enabled Jesper Dangaard Brouer
2018-03-07 11:50 ` Tariq Toukan
2018-03-08 7:34 ` Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 11/15] page_pool: refurbish version of page_pool code Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 12/15] xdp: allow page_pool as an allocator type in xdp_return_frame Jesper Dangaard Brouer
2018-03-06 21:48 ` [bpf-next V1 PATCH 13/15] mlx5: use page_pool for xdp_return_frame call Jesper Dangaard Brouer
2018-03-07 12:11 ` Tariq Toukan
2018-03-08 12:44 ` Jesper Dangaard Brouer [this message]
2018-03-06 21:48 ` [bpf-next V1 PATCH 14/15] xdp: transition into using xdp_frame for return API Jesper Dangaard Brouer
2018-03-06 21:49 ` [bpf-next V1 PATCH 15/15] xdp: transition into using xdp_frame for ndo_xdp_xmit Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180308134440.56317e31@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=borkmann@iogearbox.net \
--cc=eranbe@mellanox.com \
--cc=eugenia@mellanox.com \
--cc=galp@mellanox.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).