* [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers
@ 2025-02-13 1:06 Jakub Kicinski
2025-02-13 1:06 ` [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Jakub Kicinski @ 2025-02-13 1:06 UTC (permalink / raw)
To: davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms, Jakub Kicinski
Convert mlx4 to page pool. I've been sitting on these patches for
over a year, and Jonathan Lemon had a similar series years before.
We never deployed it or sent upstream because it didn't really show
much perf win under normal load (admittedly I think the real testing
was done before Ilias's work on recycling).
During the v6.9 kernel rollout Meta's CDN team noticed that machines
with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
spent mapping buffers in the IOMMU). The problem does not occur with
modern NICs, so I dusted off this series and reportedly it still works.
And it makes the problem go away, no overloads, perf back in line with
older kernels. Something must have changed in IOMMU code, I guess.
This series is very simple, and can very likely be optimized further.
Thing is, I don't have access to any CX3 Pro NICs. They only exist
in CDN locations which haven't had a HW refresh for a while. So I can
say this series survives a week under traffic w/ XDP enabled, but
my ability to iterate and improve is a bit limited.
v3:
- use priv->rx_skb_size for buffer size calculation
- use priv->dma_dir for DMA mapping direction, instead of always BIDIR
v2: https://lore.kernel.org/20250211192141.619024-1-kuba@kernel.org
- remove unnecessary .max_size (Nit by Ido.)
- change pool size
- fix xdp xmit support description.
v1: https://lore.kernel.org/20250205031213.358973-1-kuba@kernel.org
Jakub Kicinski (4):
eth: mlx4: create a page pool for Rx
eth: mlx4: don't try to complete XDP frames in netpoll
eth: mlx4: remove the local XDP fast-recycling ring
eth: mlx4: use the page pool for Rx buffers
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 15 +--
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 119 +++++++------------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 17 ++-
3 files changed, 52 insertions(+), 99 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-13 1:06 ` Jakub Kicinski
2025-02-18 12:10 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-02-13 1:06 UTC (permalink / raw)
To: davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms, Jakub Kicinski
Create a pool per rx queue. Subsequent patches will make use of it.
Move fcs_del to a hole to make space for the pointer.
Per common "wisdom" base the page pool size on the ring size.
Note that the page pool cache size is in full pages, so just
round up the effective buffer size to pages.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v3:
- use priv->rx_skb_size for effective buffer size
- use priv->dma_dir for DMA mapping direction, instead of always BIDIR
v2: https://lore.kernel.org/20250211192141.619024-2-kuba@kernel.org
- update pp.pool_size
v1: https://lore.kernel.org/20250205031213.358973-2-kuba@kernel.org
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 24 +++++++++++++++++++-
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 28b70dcc652e..29f48e63081b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -335,10 +335,11 @@ struct mlx4_en_rx_ring {
u16 stride;
u16 log_stride;
u16 cqn; /* index of port CQ associated with this ring */
+ u8 fcs_del;
u32 prod;
u32 cons;
u32 buf_size;
- u8 fcs_del;
+ struct page_pool *pp;
void *buf;
void *rx_info;
struct bpf_prog __rcu *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 15c57e9517e9..a8c0cf5d0d08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -48,6 +48,7 @@
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ip6_checksum.h>
#endif
+#include <net/page_pool/helpers.h>
#include "mlx4_en.h"
@@ -268,6 +269,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
u32 size, u16 stride, int node, int queue_index)
{
struct mlx4_en_dev *mdev = priv->mdev;
+ struct page_pool_params pp = {};
struct mlx4_en_rx_ring *ring;
int err = -ENOMEM;
int tmp;
@@ -286,9 +288,26 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
- if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
+ pp.flags = PP_FLAG_DMA_MAP;
+ pp.pool_size = size * DIV_ROUND_UP(priv->rx_skb_size, PAGE_SIZE);
+ pp.nid = node;
+ pp.napi = &priv->rx_cq[queue_index]->napi;
+ pp.netdev = priv->dev;
+ pp.dev = &mdev->dev->persist->pdev->dev;
+ pp.dma_dir = priv->dma_dir;
+
+ ring->pp = page_pool_create(&pp);
+ if (!ring->pp)
goto err_ring;
+ if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
+ goto err_pp;
+
+ err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
+ ring->pp);
+ if (err)
+ goto err_xdp_info;
+
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = kvzalloc_node(tmp, GFP_KERNEL, node);
@@ -319,6 +338,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->rx_info = NULL;
err_xdp_info:
xdp_rxq_info_unreg(&ring->xdp_rxq);
+err_pp:
+ page_pool_destroy(ring->pp);
err_ring:
kfree(ring);
*pring = NULL;
@@ -445,6 +466,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
xdp_rxq_info_unreg(&ring->xdp_rxq);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
kvfree(ring->rx_info);
+ page_pool_destroy(ring->pp);
ring->rx_info = NULL;
kfree(ring);
*pring = NULL;
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-13 1:06 ` [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
@ 2025-02-13 1:06 ` Jakub Kicinski
2025-02-18 12:11 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-02-13 1:06 UTC (permalink / raw)
To: davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms, Jakub Kicinski
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- clarify the commit msg
v1: https://lore.kernel.org/20250205031213.358973-3-kuba@kernel.org
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 1ddb11cb25f9..6e077d202827 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
if (unlikely(!priv->port_up))
return 0;
+ if (unlikely(!napi_budget) && cq->type == TX_XDP)
+ return 0;
netdev_txq_bql_complete_prefetchw(ring->tx_queue);
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-13 1:06 ` [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
2025-02-13 1:06 ` [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
@ 2025-02-13 1:06 ` Jakub Kicinski
2025-02-18 12:12 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-18 23:40 ` [PATCH net-next v3 0/4] " patchwork-bot+netdevbpf
4 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-02-13 1:06 UTC (permalink / raw)
To: davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms, Jakub Kicinski
It will be replaced with page pool's built-in recycling.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2: no change
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 11 ------
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 --------------------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 11 +-----
3 files changed, 2 insertions(+), 60 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 29f48e63081b..97311c98569f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -253,14 +253,6 @@ struct mlx4_en_rx_alloc {
#define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
-struct mlx4_en_page_cache {
- u32 index;
- struct {
- struct page *page;
- dma_addr_t dma;
- } buf[MLX4_EN_CACHE_SIZE];
-};
-
enum {
MLX4_EN_TX_RING_STATE_RECOVERING,
};
@@ -343,7 +335,6 @@ struct mlx4_en_rx_ring {
void *buf;
void *rx_info;
struct bpf_prog __rcu *xdp_prog;
- struct mlx4_en_page_cache page_cache;
unsigned long bytes;
unsigned long packets;
unsigned long csum_ok;
@@ -708,8 +699,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
struct mlx4_en_priv *priv, unsigned int length,
int tx_ind, bool *doorbell_pending);
void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
-bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
- struct mlx4_en_rx_alloc *frame);
int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_tx_ring **pring,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index a8c0cf5d0d08..d2cfbf2e38d9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -142,18 +142,6 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
(index << ring->log_stride);
struct mlx4_en_rx_alloc *frags = ring->rx_info +
(index << priv->log_rx_info);
- if (likely(ring->page_cache.index > 0)) {
- /* XDP uses a single page per frame */
- if (!frags->page) {
- ring->page_cache.index--;
- frags->page = ring->page_cache.buf[ring->page_cache.index].page;
- frags->dma = ring->page_cache.buf[ring->page_cache.index].dma;
- }
- frags->page_offset = XDP_PACKET_HEADROOM;
- rx_desc->data[0].addr = cpu_to_be64(frags->dma +
- XDP_PACKET_HEADROOM);
- return 0;
- }
return mlx4_en_alloc_frags(priv, ring, rx_desc, frags, gfp);
}
@@ -430,26 +418,6 @@ void mlx4_en_recover_from_oom(struct mlx4_en_priv *priv)
}
}
-/* When the rx ring is running in page-per-packet mode, a released frame can go
- * directly into a small cache, to avoid unmapping or touching the page
- * allocator. In bpf prog performance scenarios, buffers are either forwarded
- * or dropped, never converted to skbs, so every page can come directly from
- * this cache when it is sized to be a multiple of the napi budget.
- */
-bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
- struct mlx4_en_rx_alloc *frame)
-{
- struct mlx4_en_page_cache *cache = &ring->page_cache;
-
- if (cache->index >= MLX4_EN_CACHE_SIZE)
- return false;
-
- cache->buf[cache->index].page = frame->page;
- cache->buf[cache->index].dma = frame->dma;
- cache->index++;
- return true;
-}
-
void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride)
@@ -475,14 +443,6 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring)
{
- int i;
-
- for (i = 0; i < ring->page_cache.index; i++) {
- dma_unmap_page(priv->ddev, ring->page_cache.buf[i].dma,
- PAGE_SIZE, priv->dma_dir);
- put_page(ring->page_cache.buf[i].page);
- }
- ring->page_cache.index = 0;
mlx4_en_free_rx_buf(priv, ring);
if (ring->stride <= TXBB_SIZE)
ring->buf -= TXBB_SIZE;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6e077d202827..fe1378a689a1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -350,16 +350,9 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
int napi_mode)
{
struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
- struct mlx4_en_rx_alloc frame = {
- .page = tx_info->page,
- .dma = tx_info->map0_dma,
- };
- if (!napi_mode || !mlx4_en_rx_recycle(ring->recycle_ring, &frame)) {
- dma_unmap_page(priv->ddev, tx_info->map0_dma,
- PAGE_SIZE, priv->dma_dir);
- put_page(tx_info->page);
- }
+ dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
+ put_page(tx_info->page);
return tx_info->nr_txbb;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
` (2 preceding siblings ...)
2025-02-13 1:06 ` [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
@ 2025-02-13 1:06 ` Jakub Kicinski
2025-02-18 12:13 ` Tariq Toukan
2025-02-18 23:40 ` [PATCH net-next v3 0/4] " patchwork-bot+netdevbpf
4 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-02-13 1:06 UTC (permalink / raw)
To: davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms, Jakub Kicinski
Simple conversion to page pool. Preserve the current fragmentation
logic / page splitting. Each page starts with a single frag reference,
and then we bump that when attaching to skbs. This can likely be
optimized further.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
- remove unnecessary .max_len setting
v1: https://lore.kernel.org/20250205031213.358973-5-kuba@kernel.org
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 -
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 55 +++++++-------------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +--
3 files changed, 25 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 97311c98569f..ad0d91a75184 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -247,7 +247,6 @@ struct mlx4_en_tx_desc {
struct mlx4_en_rx_alloc {
struct page *page;
- dma_addr_t dma;
u32 page_offset;
};
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index d2cfbf2e38d9..b33285d755b9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -52,57 +52,39 @@
#include "mlx4_en.h"
-static int mlx4_alloc_page(struct mlx4_en_priv *priv,
- struct mlx4_en_rx_alloc *frag,
- gfp_t gfp)
-{
- struct page *page;
- dma_addr_t dma;
-
- page = alloc_page(gfp);
- if (unlikely(!page))
- return -ENOMEM;
- dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE, priv->dma_dir);
- if (unlikely(dma_mapping_error(priv->ddev, dma))) {
- __free_page(page);
- return -ENOMEM;
- }
- frag->page = page;
- frag->dma = dma;
- frag->page_offset = priv->rx_headroom;
- return 0;
-}
-
static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_desc *rx_desc,
struct mlx4_en_rx_alloc *frags,
gfp_t gfp)
{
+ dma_addr_t dma;
int i;
for (i = 0; i < priv->num_frags; i++, frags++) {
if (!frags->page) {
- if (mlx4_alloc_page(priv, frags, gfp)) {
+ frags->page = page_pool_alloc_pages(ring->pp, gfp);
+ if (!frags->page) {
ring->alloc_fail++;
return -ENOMEM;
}
+ page_pool_fragment_page(frags->page, 1);
+ frags->page_offset = priv->rx_headroom;
+
ring->rx_alloc_pages++;
}
- rx_desc->data[i].addr = cpu_to_be64(frags->dma +
- frags->page_offset);
+ dma = page_pool_get_dma_addr(frags->page);
+ rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
}
return 0;
}
static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
+ struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_alloc *frag)
{
- if (frag->page) {
- dma_unmap_page(priv->ddev, frag->dma,
- PAGE_SIZE, priv->dma_dir);
- __free_page(frag->page);
- }
+ if (frag->page)
+ page_pool_put_full_page(ring->pp, frag->page, false);
/* We need to clear all fields, otherwise a change of priv->log_rx_info
* could lead to see garbage later in frag->page.
*/
@@ -167,7 +149,7 @@ static void mlx4_en_free_rx_desc(const struct mlx4_en_priv *priv,
frags = ring->rx_info + (index << priv->log_rx_info);
for (nr = 0; nr < priv->num_frags; nr++) {
en_dbg(DRV, priv, "Freeing fragment:%d\n", nr);
- mlx4_en_free_frag(priv, frags + nr);
+ mlx4_en_free_frag(priv, ring, frags + nr);
}
}
@@ -469,7 +451,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
if (unlikely(!page))
goto fail;
- dma = frags->dma;
+ dma = page_pool_get_dma_addr(page);
dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
frag_size, priv->dma_dir);
@@ -480,6 +462,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
if (frag_info->frag_stride == PAGE_SIZE / 2) {
frags->page_offset ^= PAGE_SIZE / 2;
release = page_count(page) != 1 ||
+ atomic_long_read(&page->pp_ref_count) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
} else if (!priv->rx_headroom) {
@@ -493,10 +476,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
}
if (release) {
- dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
frags->page = NULL;
} else {
- page_ref_inc(page);
+ page_pool_ref_page(page);
}
nr++;
@@ -766,7 +748,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Get pointer to first fragment since we haven't
* skb yet and cast it to ethhdr struct
*/
- dma = frags[0].dma + frags[0].page_offset;
+ dma = page_pool_get_dma_addr(frags[0].page);
+ dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
DMA_FROM_DEVICE);
@@ -805,7 +788,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
void *orig_data;
u32 act;
- dma = frags[0].dma + frags[0].page_offset;
+ dma = page_pool_get_dma_addr(frags[0].page);
+ dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma,
priv->frag_info[0].frag_size,
DMA_FROM_DEVICE);
@@ -868,6 +852,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
skb = napi_get_frags(&cq->napi);
if (unlikely(!skb))
goto next;
+ skb_mark_for_recycle(skb);
if (unlikely(ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL)) {
u64 timestamp = mlx4_en_get_cqe_ts(cqe);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index fe1378a689a1..87f35bcbeff8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -44,6 +44,7 @@
#include <linux/ipv6.h>
#include <linux/indirect_call_wrapper.h>
#include <net/ipv6.h>
+#include <net/page_pool/helpers.h>
#include "mlx4_en.h"
@@ -350,9 +351,10 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
int napi_mode)
{
struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
+ struct page_pool *pool = ring->recycle_ring->pp;
- dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
- put_page(tx_info->page);
+ /* Note that napi_mode = 0 means ndo_close() path, not budget = 0 */
+ page_pool_put_full_page(pool, tx_info->page, !!napi_mode);
return tx_info->nr_txbb;
}
@@ -1189,7 +1191,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
data = &tx_desc->data;
- dma = frame->dma;
+ dma = page_pool_get_dma_addr(frame->page);
tx_info->page = frame->page;
frame->page = NULL;
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx
2025-02-13 1:06 ` [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
@ 2025-02-18 12:10 ` Tariq Toukan
0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-02-18 12:10 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms
On 13/02/2025 3:06, Jakub Kicinski wrote:
> Create a pool per rx queue. Subsequent patches will make use of it.
>
> Move fcs_del to a hole to make space for the pointer.
>
> Per common "wisdom" base the page pool size on the ring size.
> Note that the page pool cache size is in full pages, so just
> round up the effective buffer size to pages.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v3:
> - use priv->rx_skb_size for effective buffer size
> - use priv->dma_dir for DMA mapping direction, instead of always BIDIR
> v2: https://lore.kernel.org/20250211192141.619024-2-kuba@kernel.org
> - update pp.pool_size
> v1: https://lore.kernel.org/20250205031213.358973-2-kuba@kernel.org
> ---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++-
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 24 +++++++++++++++++++-
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 28b70dcc652e..29f48e63081b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -335,10 +335,11 @@ struct mlx4_en_rx_ring {
> u16 stride;
> u16 log_stride;
> u16 cqn; /* index of port CQ associated with this ring */
> + u8 fcs_del;
> u32 prod;
> u32 cons;
> u32 buf_size;
> - u8 fcs_del;
> + struct page_pool *pp;
> void *buf;
> void *rx_info;
> struct bpf_prog __rcu *xdp_prog;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 15c57e9517e9..a8c0cf5d0d08 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -48,6 +48,7 @@
> #if IS_ENABLED(CONFIG_IPV6)
> #include <net/ip6_checksum.h>
> #endif
> +#include <net/page_pool/helpers.h>
>
> #include "mlx4_en.h"
>
> @@ -268,6 +269,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> u32 size, u16 stride, int node, int queue_index)
> {
> struct mlx4_en_dev *mdev = priv->mdev;
> + struct page_pool_params pp = {};
> struct mlx4_en_rx_ring *ring;
> int err = -ENOMEM;
> int tmp;
> @@ -286,9 +288,26 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->log_stride = ffs(ring->stride) - 1;
> ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
>
> - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + pp.flags = PP_FLAG_DMA_MAP;
> + pp.pool_size = size * DIV_ROUND_UP(priv->rx_skb_size, PAGE_SIZE);
> + pp.nid = node;
> + pp.napi = &priv->rx_cq[queue_index]->napi;
> + pp.netdev = priv->dev;
> + pp.dev = &mdev->dev->persist->pdev->dev;
> + pp.dma_dir = priv->dma_dir;
> +
> + ring->pp = page_pool_create(&pp);
> + if (!ring->pp)
> goto err_ring;
>
> + if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + goto err_pp;
> +
> + err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
> + ring->pp);
> + if (err)
> + goto err_xdp_info;
> +
> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> sizeof(struct mlx4_en_rx_alloc));
> ring->rx_info = kvzalloc_node(tmp, GFP_KERNEL, node);
> @@ -319,6 +338,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->rx_info = NULL;
> err_xdp_info:
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> +err_pp:
> + page_pool_destroy(ring->pp);
> err_ring:
> kfree(ring);
> *pring = NULL;
> @@ -445,6 +466,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
> kvfree(ring->rx_info);
> + page_pool_destroy(ring->pp);
> ring->rx_info = NULL;
> kfree(ring);
> *pring = NULL;
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Thanks for your patches,
Tariq
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll
2025-02-13 1:06 ` [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
@ 2025-02-18 12:11 ` Tariq Toukan
0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-02-18 12:11 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms
On 13/02/2025 3:06, Jakub Kicinski wrote:
> mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
> using page pool until now, so it could run XDP completions
> in netpoll (NAPI budget == 0) just fine. Page pool has calling
> context requirements, make sure we don't try to call it from
> what is potentially HW IRQ context.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
> - clarify the commit msg
> v1: https://lore.kernel.org/20250205031213.358973-3-kuba@kernel.org
> ---
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 1ddb11cb25f9..6e077d202827 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
>
> if (unlikely(!priv->port_up))
> return 0;
> + if (unlikely(!napi_budget) && cq->type == TX_XDP)
> + return 0;
>
> netdev_txq_bql_complete_prefetchw(ring->tx_queue);
>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring
2025-02-13 1:06 ` [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
@ 2025-02-18 12:12 ` Tariq Toukan
0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-02-18 12:12 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms
On 13/02/2025 3:06, Jakub Kicinski wrote:
> It will be replaced with page pool's built-in recycling.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2: no change
> ---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 11 ------
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 --------------------
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 11 +-----
> 3 files changed, 2 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 29f48e63081b..97311c98569f 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -253,14 +253,6 @@ struct mlx4_en_rx_alloc {
>
> #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
>
> -struct mlx4_en_page_cache {
> - u32 index;
> - struct {
> - struct page *page;
> - dma_addr_t dma;
> - } buf[MLX4_EN_CACHE_SIZE];
> -};
> -
> enum {
> MLX4_EN_TX_RING_STATE_RECOVERING,
> };
> @@ -343,7 +335,6 @@ struct mlx4_en_rx_ring {
> void *buf;
> void *rx_info;
> struct bpf_prog __rcu *xdp_prog;
> - struct mlx4_en_page_cache page_cache;
> unsigned long bytes;
> unsigned long packets;
> unsigned long csum_ok;
> @@ -708,8 +699,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
> struct mlx4_en_priv *priv, unsigned int length,
> int tx_ind, bool *doorbell_pending);
> void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
> -bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
> - struct mlx4_en_rx_alloc *frame);
>
> int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> struct mlx4_en_tx_ring **pring,
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index a8c0cf5d0d08..d2cfbf2e38d9 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -142,18 +142,6 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
> (index << ring->log_stride);
> struct mlx4_en_rx_alloc *frags = ring->rx_info +
> (index << priv->log_rx_info);
> - if (likely(ring->page_cache.index > 0)) {
> - /* XDP uses a single page per frame */
> - if (!frags->page) {
> - ring->page_cache.index--;
> - frags->page = ring->page_cache.buf[ring->page_cache.index].page;
> - frags->dma = ring->page_cache.buf[ring->page_cache.index].dma;
> - }
> - frags->page_offset = XDP_PACKET_HEADROOM;
> - rx_desc->data[0].addr = cpu_to_be64(frags->dma +
> - XDP_PACKET_HEADROOM);
> - return 0;
> - }
>
> return mlx4_en_alloc_frags(priv, ring, rx_desc, frags, gfp);
> }
> @@ -430,26 +418,6 @@ void mlx4_en_recover_from_oom(struct mlx4_en_priv *priv)
> }
> }
>
> -/* When the rx ring is running in page-per-packet mode, a released frame can go
> - * directly into a small cache, to avoid unmapping or touching the page
> - * allocator. In bpf prog performance scenarios, buffers are either forwarded
> - * or dropped, never converted to skbs, so every page can come directly from
> - * this cache when it is sized to be a multiple of the napi budget.
> - */
> -bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
> - struct mlx4_en_rx_alloc *frame)
> -{
> - struct mlx4_en_page_cache *cache = &ring->page_cache;
> -
> - if (cache->index >= MLX4_EN_CACHE_SIZE)
> - return false;
> -
> - cache->buf[cache->index].page = frame->page;
> - cache->buf[cache->index].dma = frame->dma;
> - cache->index++;
> - return true;
> -}
> -
> void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_ring **pring,
> u32 size, u16 stride)
> @@ -475,14 +443,6 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
> void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_ring *ring)
> {
> - int i;
> -
> - for (i = 0; i < ring->page_cache.index; i++) {
> - dma_unmap_page(priv->ddev, ring->page_cache.buf[i].dma,
> - PAGE_SIZE, priv->dma_dir);
> - put_page(ring->page_cache.buf[i].page);
> - }
> - ring->page_cache.index = 0;
> mlx4_en_free_rx_buf(priv, ring);
> if (ring->stride <= TXBB_SIZE)
> ring->buf -= TXBB_SIZE;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 6e077d202827..fe1378a689a1 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -350,16 +350,9 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
> int napi_mode)
> {
> struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
> - struct mlx4_en_rx_alloc frame = {
> - .page = tx_info->page,
> - .dma = tx_info->map0_dma,
> - };
>
> - if (!napi_mode || !mlx4_en_rx_recycle(ring->recycle_ring, &frame)) {
> - dma_unmap_page(priv->ddev, tx_info->map0_dma,
> - PAGE_SIZE, priv->dma_dir);
> - put_page(tx_info->page);
> - }
> + dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
> + put_page(tx_info->page);
>
> return tx_info->nr_txbb;
> }
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers
2025-02-13 1:06 ` [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-18 12:13 ` Tariq Toukan
0 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2025-02-18 12:13 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: tariqt, idosch, hawk, netdev, edumazet, pabeni, andrew+netdev,
horms
On 13/02/2025 3:06, Jakub Kicinski wrote:
> Simple conversion to page pool. Preserve the current fragmentation
> logic / page splitting. Each page starts with a single frag reference,
> and then we bump that when attaching to skbs. This can likely be
> optimized further.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
> - remove unnecessary .max_len setting
> v1: https://lore.kernel.org/20250205031213.358973-5-kuba@kernel.org
> ---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 -
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 55 +++++++-------------
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +--
> 3 files changed, 25 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 97311c98569f..ad0d91a75184 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -247,7 +247,6 @@ struct mlx4_en_tx_desc {
>
> struct mlx4_en_rx_alloc {
> struct page *page;
> - dma_addr_t dma;
> u32 page_offset;
> };
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index d2cfbf2e38d9..b33285d755b9 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -52,57 +52,39 @@
>
> #include "mlx4_en.h"
>
> -static int mlx4_alloc_page(struct mlx4_en_priv *priv,
> - struct mlx4_en_rx_alloc *frag,
> - gfp_t gfp)
> -{
> - struct page *page;
> - dma_addr_t dma;
> -
> - page = alloc_page(gfp);
> - if (unlikely(!page))
> - return -ENOMEM;
> - dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE, priv->dma_dir);
> - if (unlikely(dma_mapping_error(priv->ddev, dma))) {
> - __free_page(page);
> - return -ENOMEM;
> - }
> - frag->page = page;
> - frag->dma = dma;
> - frag->page_offset = priv->rx_headroom;
> - return 0;
> -}
> -
> static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_ring *ring,
> struct mlx4_en_rx_desc *rx_desc,
> struct mlx4_en_rx_alloc *frags,
> gfp_t gfp)
> {
> + dma_addr_t dma;
> int i;
>
> for (i = 0; i < priv->num_frags; i++, frags++) {
> if (!frags->page) {
> - if (mlx4_alloc_page(priv, frags, gfp)) {
> + frags->page = page_pool_alloc_pages(ring->pp, gfp);
> + if (!frags->page) {
> ring->alloc_fail++;
> return -ENOMEM;
> }
> + page_pool_fragment_page(frags->page, 1);
> + frags->page_offset = priv->rx_headroom;
> +
> ring->rx_alloc_pages++;
> }
> - rx_desc->data[i].addr = cpu_to_be64(frags->dma +
> - frags->page_offset);
> + dma = page_pool_get_dma_addr(frags->page);
> + rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
> }
> return 0;
> }
>
> static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
> + struct mlx4_en_rx_ring *ring,
> struct mlx4_en_rx_alloc *frag)
> {
> - if (frag->page) {
> - dma_unmap_page(priv->ddev, frag->dma,
> - PAGE_SIZE, priv->dma_dir);
> - __free_page(frag->page);
> - }
> + if (frag->page)
> + page_pool_put_full_page(ring->pp, frag->page, false);
> /* We need to clear all fields, otherwise a change of priv->log_rx_info
> * could lead to see garbage later in frag->page.
> */
> @@ -167,7 +149,7 @@ static void mlx4_en_free_rx_desc(const struct mlx4_en_priv *priv,
> frags = ring->rx_info + (index << priv->log_rx_info);
> for (nr = 0; nr < priv->num_frags; nr++) {
> en_dbg(DRV, priv, "Freeing fragment:%d\n", nr);
> - mlx4_en_free_frag(priv, frags + nr);
> + mlx4_en_free_frag(priv, ring, frags + nr);
> }
> }
>
> @@ -469,7 +451,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> if (unlikely(!page))
> goto fail;
>
> - dma = frags->dma;
> + dma = page_pool_get_dma_addr(page);
> dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
> frag_size, priv->dma_dir);
>
> @@ -480,6 +462,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> if (frag_info->frag_stride == PAGE_SIZE / 2) {
> frags->page_offset ^= PAGE_SIZE / 2;
> release = page_count(page) != 1 ||
> + atomic_long_read(&page->pp_ref_count) != 1 ||
> page_is_pfmemalloc(page) ||
> page_to_nid(page) != numa_mem_id();
> } else if (!priv->rx_headroom) {
> @@ -493,10 +476,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> }
> if (release) {
> - dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
> frags->page = NULL;
> } else {
> - page_ref_inc(page);
> + page_pool_ref_page(page);
> }
>
> nr++;
> @@ -766,7 +748,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> /* Get pointer to first fragment since we haven't
> * skb yet and cast it to ethhdr struct
> */
> - dma = frags[0].dma + frags[0].page_offset;
> + dma = page_pool_get_dma_addr(frags[0].page);
> + dma += frags[0].page_offset;
> dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
> DMA_FROM_DEVICE);
>
> @@ -805,7 +788,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> void *orig_data;
> u32 act;
>
> - dma = frags[0].dma + frags[0].page_offset;
> + dma = page_pool_get_dma_addr(frags[0].page);
> + dma += frags[0].page_offset;
> dma_sync_single_for_cpu(priv->ddev, dma,
> priv->frag_info[0].frag_size,
> DMA_FROM_DEVICE);
> @@ -868,6 +852,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> skb = napi_get_frags(&cq->napi);
> if (unlikely(!skb))
> goto next;
> + skb_mark_for_recycle(skb);
>
> if (unlikely(ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL)) {
> u64 timestamp = mlx4_en_get_cqe_ts(cqe);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index fe1378a689a1..87f35bcbeff8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -44,6 +44,7 @@
> #include <linux/ipv6.h>
> #include <linux/indirect_call_wrapper.h>
> #include <net/ipv6.h>
> +#include <net/page_pool/helpers.h>
>
> #include "mlx4_en.h"
>
> @@ -350,9 +351,10 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
> int napi_mode)
> {
> struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
> + struct page_pool *pool = ring->recycle_ring->pp;
>
> - dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
> - put_page(tx_info->page);
> + /* Note that napi_mode = 0 means ndo_close() path, not budget = 0 */
> + page_pool_put_full_page(pool, tx_info->page, !!napi_mode);
>
> return tx_info->nr_txbb;
> }
> @@ -1189,7 +1191,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
> tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
> data = &tx_desc->data;
>
> - dma = frame->dma;
> + dma = page_pool_get_dma_addr(frame->page);
>
> tx_info->page = frame->page;
> frame->page = NULL;
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
` (3 preceding siblings ...)
2025-02-13 1:06 ` [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-18 23:40 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-18 23:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, tariqt, idosch, hawk, netdev, edumazet, pabeni,
andrew+netdev, horms
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 12 Feb 2025 17:06:31 -0800 you wrote:
> Convert mlx4 to page pool. I've been sitting on these patches for
> over a year, and Jonathan Lemon had a similar series years before.
> We never deployed it or sent upstream because it didn't really show
> much perf win under normal load (admittedly I think the real testing
> was done before Ilias's work on recycling).
>
> During the v6.9 kernel rollout Meta's CDN team noticed that machines
> with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
> spent mapping buffers in the IOMMU). The problem does not occur with
> modern NICs, so I dusted off this series and reportedly it still works.
> And it makes the problem go away, no overloads, perf back in line with
> older kernels. Something must have changed in IOMMU code, I guess.
>
> [...]
Here is the summary with links:
- [net-next,v3,1/4] eth: mlx4: create a page pool for Rx
(no matching commit)
- [net-next,v3,2/4] eth: mlx4: don't try to complete XDP frames in netpoll
https://git.kernel.org/netdev/net-next/c/8fdeafd66eda
- [net-next,v3,3/4] eth: mlx4: remove the local XDP fast-recycling ring
(no matching commit)
- [net-next,v3,4/4] eth: mlx4: use the page pool for Rx buffers
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-02-18 23:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 1:06 [PATCH net-next v3 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-13 1:06 ` [PATCH net-next v3 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
2025-02-18 12:10 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
2025-02-18 12:11 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
2025-02-18 12:12 ` Tariq Toukan
2025-02-13 1:06 ` [PATCH net-next v3 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-18 12:13 ` Tariq Toukan
2025-02-18 23:40 ` [PATCH net-next v3 0/4] " patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).