From: Jesper Dangaard Brouer <brouer@redhat.com>
To: linux-mm@kvack.org, Alexander Duyck <alexander.duyck@gmail.com>
Cc: willemdebruijn.kernel@gmail.com, netdev@vger.kernel.org,
john.fastabend@gmail.com, Saeed Mahameed <saeedm@mellanox.com>,
Jesper Dangaard Brouer <brouer@redhat.com>,
bjorn.topel@intel.com,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Tariq Toukan <tariqt@mellanox.com>
Subject: [RFC PATCH 3/4] mlx5: use page_pool
Date: Tue, 20 Dec 2016 14:28:22 +0100 [thread overview]
Message-ID: <20161220132822.18788.19768.stgit@firesoul> (raw)
In-Reply-To: <20161220132444.18788.50875.stgit@firesoul>
The mlx5 driver already have a driver local page recycle cache. This
page cache is only efficient when the number of outstanding pages is
small, the queue based cache array size is 128. Further more a single
page with elevated refcnt can block the queue.
Benchmarking on next-next at commit f5f99309fa74 ("sock: do not set
sk_err in sock_dequeue_err_skb"), which include Paolo's UDP
performance optimizations (commit fc13fd398625 ("Merge branch
'udp-fwd-mem-sched-on-dequeue'"). Showed a speedup of 29% for UDP
packets. Detailed ethtool stats showed mlx5 page recycler didn't
"work" in that benchmark. The XDP_DROP use-case, showed a small perf
regression +2.7ns using page_pool. This correspons well to the 28%
gain reported in commit 1bfecfca565c ("net/mlx5e: Build RX SKB on
demand").
UPDATE: On newer kernels, net-next at commit 52f40e9d65. The mlx5 page
recycle cache works again, and performance gain is gone. Detailed
benchmarking show, RX-ksoftirq side is approx 10% faster, while UDP
socket delivery is same performance.
For TC early ingress drop there is a small performance regression of
approx +4 ns. There are pending page_pool optimization that will
close that gap.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 28 +++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 47 ++++++++++++++-------
3 files changed, 60 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 951dbd58594d..b30d5b08d6a6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -361,6 +361,7 @@ struct mlx5e_rq {
struct mlx5e_tstamp *tstamp;
struct mlx5e_rq_stats stats;
struct mlx5e_cq cq;
+ struct page_pool *page_pool;
struct mlx5e_page_cache page_cache;
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index cbfa38fc72c0..cd71e5764ec1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -34,6 +34,7 @@
#include <net/pkt_cls.h>
#include <linux/mlx5/fs.h>
#include <net/vxlan.h>
+#include <linux/page_pool.h>
#include <linux/bpf.h>
#include "en.h"
#include "en_tc.h"
@@ -521,6 +522,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
struct mlx5e_rq_param *param,
struct mlx5e_rq *rq)
{
+ struct page_pool_params pp_params = { 0 };
struct mlx5e_priv *priv = c->priv;
struct mlx5_core_dev *mdev = priv->mdev;
void *rqc = param->rqc;
@@ -591,6 +593,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
default: /* MLX5_WQ_TYPE_LINKED_LIST */
rq->dma_info = kzalloc_node(wq_sz * sizeof(*rq->dma_info),
GFP_KERNEL, cpu_to_node(c->cpu));
+// rq->dma_info = NULL; //HACK ALWAYS FAIL TEST
if (!rq->dma_info) {
err = -ENOMEM;
goto err_rq_wq_destroy;
@@ -618,6 +621,24 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
npages = DIV_ROUND_UP(frag_sz, PAGE_SIZE);
rq->buff.page_order = order_base_2(npages);
+ pp_params.size = PAGE_POOL_PARAMS_SIZE;
+ pp_params.order = rq->buff.page_order;
+ pp_params.dev = c->pdev;
+ pp_params.nid = cpu_to_node(c->cpu);
+ pp_params.dma_dir = DMA_BIDIRECTIONAL;
+ pp_params.pool_size = 2000;
+ pr_info("XXX: %s() pp_params.size=%d end=%lu\n",
+ __func__, pp_params.size,
+ offsetof(struct page_pool_params, end_marker));
+
+ rq->page_pool = page_pool_create(&pp_params);
+ if (IS_ERR_OR_NULL(rq->page_pool)) {
+ rq->page_pool = NULL;
+ kfree(rq->dma_info);
+ err = -ENOMEM;
+ goto err_rq_wq_destroy;
+ }
+
byte_count |= MLX5_HW_START_PADDING;
rq->mkey_be = c->mkey_be;
}
@@ -662,6 +683,13 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
break;
default: /* MLX5_WQ_TYPE_LINKED_LIST */
kfree(rq->dma_info);
+ if (rq->page_pool)
+ page_pool_destroy(rq->page_pool);
+ else
+ // Can happen because mlx5 have some extra
+ // rq's for some other purposes... (explain?)
+ pr_err("XXX: %s() NULL pointer at rq->page_pool\n",
+ __func__);
}
for (i = rq->page_cache.head; i != rq->page_cache.tail;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0e2fb3ed1790..0512632b30fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -182,6 +182,7 @@ void mlx5e_modify_rx_cqe_compression(struct mlx5e_priv *priv, bool val)
#define RQ_PAGE_SIZE(rq) ((1 << rq->buff.page_order) << PAGE_SHIFT)
+// TODO: Remove mlx5-page-cache
static inline bool mlx5e_rx_cache_put(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info)
{
@@ -198,6 +199,7 @@ static inline bool mlx5e_rx_cache_put(struct mlx5e_rq *rq,
return true;
}
+// TODO: Remove mlx5-page-cache
static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq,
struct mlx5e_dma_info *dma_info)
{
@@ -228,20 +230,27 @@ static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq,
{
struct page *page;
- if (mlx5e_rx_cache_get(rq, dma_info))
- return 0;
+// if (mlx5e_rx_cache_get(rq, dma_info))
+// return 0;
- page = dev_alloc_pages(rq->buff.page_order);
+ //page = dev_alloc_pages(rq->buff.page_order);
+ page = page_pool_dev_alloc_pages(rq->page_pool);
if (unlikely(!page))
return -ENOMEM;
dma_info->page = page;
- dma_info->addr = dma_map_page(rq->pdev, page, 0,
- RQ_PAGE_SIZE(rq), rq->buff.map_dir);
- if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) {
- put_page(page);
- return -ENOMEM;
- }
+ dma_info->addr = page->dma_addr;
+// dma_info->addr = dma_map_page(rq->pdev, page, 0,
+// RQ_PAGE_SIZE(rq), rq->buff.map_dir);
+
+ /* DISCUSS: should this be moved into page_pool API? Here we
+ * sync entire page, but some drivers might want have more
+ * control? Like using the dma_sync_single_range_for_device()
+ * like Alex is doing in the Intel drivers...
+ */
+ dma_sync_single_for_device(rq->pdev, dma_info->addr,
+ RQ_PAGE_SIZE(rq),
+ DMA_FROM_DEVICE);
return 0;
}
@@ -249,11 +258,21 @@ static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq,
void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
bool recycle)
{
- if (likely(recycle) && mlx5e_rx_cache_put(rq, dma_info))
+// if (likely(recycle) && mlx5e_rx_cache_put(rq, dma_info))
+// return;
+ // TODO: use page_pool_recycle_direct(dma_info->page);
+ if (recycle) {
+ page_pool_recycle_direct(dma_info->page);
return;
+ }
+
+// page_pool take over dma_unmap
+// dma_unmap_page(rq->pdev, dma_info->addr, RQ_PAGE_SIZE(rq),
+// rq->buff.map_dir);
+ // XXX: do we need to call dma_sync_single_range_for_cpu here???
+ // dma_sync_single_range_for_cpu(rq->pdev, dma_info->addr,
+ // RQ_PAGE_SIZE(rq), rq->buff.map_dir);
- dma_unmap_page(rq->pdev, dma_info->addr, RQ_PAGE_SIZE(rq),
- rq->buff.map_dir);
put_page(dma_info->page);
}
@@ -773,10 +792,6 @@ struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
return NULL;
}
- /* queue up for recycling ..*/
- page_ref_inc(di->page);
- mlx5e_page_release(rq, di, true);
-
skb_reserve(skb, MLX5_RX_HEADROOM);
skb_put(skb, cqe_bcnt);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-20 13:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-20 13:28 [RFC PATCH 0/4] page_pool proof-of-concept early code Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 1/4] doc: page_pool introduction documentation Jesper Dangaard Brouer
2016-12-20 13:28 ` [RFC PATCH 2/4] page_pool: basic implementation of page_pool Jesper Dangaard Brouer
2017-01-03 16:07 ` Vlastimil Babka
2017-01-04 11:00 ` Jesper Dangaard Brouer
2017-01-09 10:43 ` Vlastimil Babka
2017-01-09 20:45 ` Jesper Dangaard Brouer
2017-01-09 21:58 ` Mel Gorman
2017-01-11 7:10 ` Jesper Dangaard Brouer
2017-01-06 5:08 ` [lkp-developer] [page_pool] 50a8fe7622: kernel_BUG_at_mm/slub.c kernel test robot
2017-01-06 7:27 ` Jesper Dangaard Brouer
2016-12-20 13:28 ` Jesper Dangaard Brouer [this message]
2016-12-20 13:28 ` [RFC PATCH 4/4] page_pool: change refcnt model Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161220132822.18788.19768.stgit@firesoul \
--to=brouer@redhat.com \
--cc=alexander.duyck@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn.topel@intel.com \
--cc=john.fastabend@gmail.com \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).