* [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-05 3:12 [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-05 3:12 ` Jakub Kicinski
2025-02-06 19:44 ` Tariq Toukan
2025-02-11 19:18 ` Tariq Toukan
2025-02-05 3:12 ` [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
` (3 subsequent siblings)
4 siblings, 2 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-05 3:12 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk,
Jakub Kicinski
Create a pool per rx queue. Subsequent patches will make use of it.
Move fcs_del to a hole to make space for the pointer.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 24 +++++++++++++++++++-
2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 28b70dcc652e..29f48e63081b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -335,10 +335,11 @@ struct mlx4_en_rx_ring {
u16 stride;
u16 log_stride;
u16 cqn; /* index of port CQ associated with this ring */
+ u8 fcs_del;
u32 prod;
u32 cons;
u32 buf_size;
- u8 fcs_del;
+ struct page_pool *pp;
void *buf;
void *rx_info;
struct bpf_prog __rcu *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 15c57e9517e9..2c23d75baf14 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -48,6 +48,7 @@
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ip6_checksum.h>
#endif
+#include <net/page_pool/helpers.h>
#include "mlx4_en.h"
@@ -268,6 +269,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
u32 size, u16 stride, int node, int queue_index)
{
struct mlx4_en_dev *mdev = priv->mdev;
+ struct page_pool_params pp = {};
struct mlx4_en_rx_ring *ring;
int err = -ENOMEM;
int tmp;
@@ -286,9 +288,26 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
- if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
+ pp.flags = PP_FLAG_DMA_MAP;
+ pp.pool_size = MLX4_EN_MAX_RX_SIZE;
+ pp.nid = node;
+ pp.napi = &priv->rx_cq[queue_index]->napi;
+ pp.netdev = priv->dev;
+ pp.dev = &mdev->dev->persist->pdev->dev;
+ pp.dma_dir = DMA_BIDIRECTIONAL;
+
+ ring->pp = page_pool_create(&pp);
+ if (!ring->pp)
goto err_ring;
+ if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
+ goto err_pp;
+
+ err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
+ ring->pp);
+ if (err)
+ goto err_xdp_info;
+
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = kvzalloc_node(tmp, GFP_KERNEL, node);
@@ -319,6 +338,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->rx_info = NULL;
err_xdp_info:
xdp_rxq_info_unreg(&ring->xdp_rxq);
+err_pp:
+ page_pool_destroy(ring->pp);
err_ring:
kfree(ring);
*pring = NULL;
@@ -445,6 +466,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
xdp_rxq_info_unreg(&ring->xdp_rxq);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
kvfree(ring->rx_info);
+ page_pool_destroy(ring->pp);
ring->rx_info = NULL;
kfree(ring);
*pring = NULL;
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-05 3:12 ` [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
@ 2025-02-06 19:44 ` Tariq Toukan
2025-02-06 23:04 ` Jakub Kicinski
2025-02-11 19:18 ` Tariq Toukan
1 sibling, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-06 19:44 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk
On 05/02/2025 5:12, Jakub Kicinski wrote:
> Create a pool per rx queue. Subsequent patches will make use of it.
>
> Move fcs_del to a hole to make space for the pointer.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++-
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 24 +++++++++++++++++++-
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 28b70dcc652e..29f48e63081b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -335,10 +335,11 @@ struct mlx4_en_rx_ring {
> u16 stride;
> u16 log_stride;
> u16 cqn; /* index of port CQ associated with this ring */
> + u8 fcs_del;
> u32 prod;
> u32 cons;
> u32 buf_size;
> - u8 fcs_del;
> + struct page_pool *pp;
> void *buf;
> void *rx_info;
> struct bpf_prog __rcu *xdp_prog;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 15c57e9517e9..2c23d75baf14 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -48,6 +48,7 @@
> #if IS_ENABLED(CONFIG_IPV6)
> #include <net/ip6_checksum.h>
> #endif
> +#include <net/page_pool/helpers.h>
>
> #include "mlx4_en.h"
>
> @@ -268,6 +269,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> u32 size, u16 stride, int node, int queue_index)
> {
> struct mlx4_en_dev *mdev = priv->mdev;
> + struct page_pool_params pp = {};
> struct mlx4_en_rx_ring *ring;
> int err = -ENOMEM;
> int tmp;
> @@ -286,9 +288,26 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->log_stride = ffs(ring->stride) - 1;
> ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
>
> - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + pp.flags = PP_FLAG_DMA_MAP;
> + pp.pool_size = MLX4_EN_MAX_RX_SIZE;
Pool size is not accurate.
From one side, MLX4_EN_MAX_RX_SIZE might be too big compared to the
actual size.
However, more importantly, it can be too small when working with large
MTU. This is mutually exclusive with XDP in mlx4.
Rx ring entries consist of 'frags', each entry needs between 1 to 4
(MLX4_EN_MAX_RX_FRAGS) frags. In default MTU, each page shared between
two entries.
> + pp.nid = node;
> + pp.napi = &priv->rx_cq[queue_index]->napi;
> + pp.netdev = priv->dev;
> + pp.dev = &mdev->dev->persist->pdev->dev;
> + pp.dma_dir = DMA_BIDIRECTIONAL;
> +
> + ring->pp = page_pool_create(&pp);
> + if (!ring->pp)
> goto err_ring;
>
> + if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + goto err_pp;
> +
> + err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
> + ring->pp);
> + if (err)
> + goto err_xdp_info;
> +
> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> sizeof(struct mlx4_en_rx_alloc));
> ring->rx_info = kvzalloc_node(tmp, GFP_KERNEL, node);
> @@ -319,6 +338,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->rx_info = NULL;
> err_xdp_info:
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> +err_pp:
> + page_pool_destroy(ring->pp);
> err_ring:
> kfree(ring);
> *pring = NULL;
> @@ -445,6 +466,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
> kvfree(ring->rx_info);
> + page_pool_destroy(ring->pp);
> ring->rx_info = NULL;
> kfree(ring);
> *pring = NULL;
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-06 19:44 ` Tariq Toukan
@ 2025-02-06 23:04 ` Jakub Kicinski
2025-02-11 18:01 ` Tariq Toukan
0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-06 23:04 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Thu, 6 Feb 2025 21:44:38 +0200 Tariq Toukan wrote:
> > - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> > + pp.flags = PP_FLAG_DMA_MAP;
> > + pp.pool_size = MLX4_EN_MAX_RX_SIZE;
>
> Pool size is not accurate.
> From one side, MLX4_EN_MAX_RX_SIZE might be too big compared to the
> actual size.
>
> However, more importantly, it can be too small when working with large
> MTU. This is mutually exclusive with XDP in mlx4.
>
> Rx ring entries consist of 'frags', each entry needs between 1 to 4
> (MLX4_EN_MAX_RX_FRAGS) frags. In default MTU, each page shared between
> two entries.
The pool_size is just the size of the cache, how many unallocated
DMA mapped pages we can keep around before freeing them to system
memory. It has no implications for correctness.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-06 23:04 ` Jakub Kicinski
@ 2025-02-11 18:01 ` Tariq Toukan
2025-02-11 19:06 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 18:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On 07/02/2025 1:04, Jakub Kicinski wrote:
> On Thu, 6 Feb 2025 21:44:38 +0200 Tariq Toukan wrote:
>>> - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
>>> + pp.flags = PP_FLAG_DMA_MAP;
>>> + pp.pool_size = MLX4_EN_MAX_RX_SIZE;
>>
>> Pool size is not accurate.
>> From one side, MLX4_EN_MAX_RX_SIZE might be too big compared to the
>> actual size.
>>
>> However, more importantly, it can be too small when working with large
>> MTU. This is mutually exclusive with XDP in mlx4.
>>
>> Rx ring entries consist of 'frags', each entry needs between 1 to 4
>> (MLX4_EN_MAX_RX_FRAGS) frags. In default MTU, each page shared between
>> two entries.
>
> The pool_size is just the size of the cache, how many unallocated
> DMA mapped pages we can keep around before freeing them to system
> memory. It has no implications for correctness.
Right, it doesn't hurt correctness.
But, we better have the cache size derived from the overall ring buffer
size, so that the memory consumption/footprint reflects the user
configuration.
Something like:
ring->size * (priv->frag_info[i].frag_stride for i < num_frags).
or roughly ring->size * MLX4_EN_EFF_MTU(dev->mtu).
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 18:01 ` Tariq Toukan
@ 2025-02-11 19:06 ` Jakub Kicinski
2025-02-11 19:11 ` Tariq Toukan
0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-11 19:06 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Tue, 11 Feb 2025 20:01:08 +0200 Tariq Toukan wrote:
> > The pool_size is just the size of the cache, how many unallocated
> > DMA mapped pages we can keep around before freeing them to system
> > memory. It has no implications for correctness.
>
> Right, it doesn't hurt correctness.
> But, we better have the cache size derived from the overall ring buffer
> size, so that the memory consumption/footprint reflects the user
> configuration.
>
> Something like:
>
> ring->size * (priv->frag_info[i].frag_stride for i < num_frags).
>
> or roughly ring->size * MLX4_EN_EFF_MTU(dev->mtu).
These calculations appear to produce byte count?
The ring size is in *pages*. Frag is also somewhat irrelevant, given
that we're talking about full pages here, not 2k frags. So I think
I'll go with:
pp.pool_size =
size * DIV_ROUND_UP(MLX4_EN_EFF_MTU(dev->mtu), PAGE_SIZE);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 19:06 ` Jakub Kicinski
@ 2025-02-11 19:11 ` Tariq Toukan
2025-02-11 19:21 ` Tariq Toukan
0 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 19:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On 11/02/2025 21:06, Jakub Kicinski wrote:
> On Tue, 11 Feb 2025 20:01:08 +0200 Tariq Toukan wrote:
>>> The pool_size is just the size of the cache, how many unallocated
>>> DMA mapped pages we can keep around before freeing them to system
>>> memory. It has no implications for correctness.
>>
>> Right, it doesn't hurt correctness.
>> But, we better have the cache size derived from the overall ring buffer
>> size, so that the memory consumption/footprint reflects the user
>> configuration.
>>
>> Something like:
>>
>> ring->size * (priv->frag_info[i].frag_stride for i < num_frags).
>>
>> or roughly ring->size * MLX4_EN_EFF_MTU(dev->mtu).
>
> These calculations appear to produce byte count?
Yes.
Of course, need to align and translate to page size.
> The ring size is in *pages*. Frag is also somewhat irrelevant, given
> that we're talking about full pages here, not 2k frags. So I think
> I'll go with:
>
> pp.pool_size =
> size * DIV_ROUND_UP(MLX4_EN_EFF_MTU(dev->mtu), PAGE_SIZE);
LGTM.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 19:11 ` Tariq Toukan
@ 2025-02-11 19:21 ` Tariq Toukan
2025-02-11 19:23 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 19:21 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On 11/02/2025 21:11, Tariq Toukan wrote:
>
>
> On 11/02/2025 21:06, Jakub Kicinski wrote:
>> On Tue, 11 Feb 2025 20:01:08 +0200 Tariq Toukan wrote:
>>>> The pool_size is just the size of the cache, how many unallocated
>>>> DMA mapped pages we can keep around before freeing them to system
>>>> memory. It has no implications for correctness.
>>>
>>> Right, it doesn't hurt correctness.
>>> But, we better have the cache size derived from the overall ring buffer
>>> size, so that the memory consumption/footprint reflects the user
>>> configuration.
>>>
>>> Something like:
>>>
>>> ring->size * (priv->frag_info[i].frag_stride for i < num_frags).
>>>
>>> or roughly ring->size * MLX4_EN_EFF_MTU(dev->mtu).
>>
>> These calculations appear to produce byte count?
>
> Yes.
> Of course, need to align and translate to page size.
>
>> The ring size is in *pages*. Frag is also somewhat irrelevant, given
>> that we're talking about full pages here, not 2k frags. So I think
>> I'll go with:
>>
>> pp.pool_size =
>> size * DIV_ROUND_UP(MLX4_EN_EFF_MTU(dev->mtu), PAGE_SIZE);
>
Can use priv->rx_skb_size as well, it hosts the eff mtu value.
>
> LGTM.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 19:21 ` Tariq Toukan
@ 2025-02-11 19:23 ` Jakub Kicinski
2025-02-11 19:38 ` Tariq Toukan
0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-11 19:23 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Tue, 11 Feb 2025 21:21:13 +0200 Tariq Toukan wrote:
> >> The ring size is in *pages*. Frag is also somewhat irrelevant, given
> >> that we're talking about full pages here, not 2k frags. So I think
> >> I'll go with:
> >>
> >> pp.pool_size =
> >> size * DIV_ROUND_UP(MLX4_EN_EFF_MTU(dev->mtu), PAGE_SIZE);
> >
>
> Can use priv->rx_skb_size as well, it hosts the eff mtu value.
Missed this before hitting send, I'll repost tomorrow, I guess.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 19:23 ` Jakub Kicinski
@ 2025-02-11 19:38 ` Tariq Toukan
0 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 19:38 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On 11/02/2025 21:23, Jakub Kicinski wrote:
> On Tue, 11 Feb 2025 21:21:13 +0200 Tariq Toukan wrote:
>>>> The ring size is in *pages*. Frag is also somewhat irrelevant, given
>>>> that we're talking about full pages here, not 2k frags. So I think
>>>> I'll go with:
>>>>
>>>> pp.pool_size =
>>>> size * DIV_ROUND_UP(MLX4_EN_EFF_MTU(dev->mtu), PAGE_SIZE);
>>>
>>
>> Can use priv->rx_skb_size as well, it hosts the eff mtu value.
>
> Missed this before hitting send, I'll repost tomorrow, I guess.
Sorry for posting them a bit late.
This one is a nit, I want to make sure you noticed my other comment
regarding pp.dma_dir.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-05 3:12 ` [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
2025-02-06 19:44 ` Tariq Toukan
@ 2025-02-11 19:18 ` Tariq Toukan
2025-02-11 19:44 ` Jakub Kicinski
1 sibling, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 19:18 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk
On 05/02/2025 5:12, Jakub Kicinski wrote:
> Create a pool per rx queue. Subsequent patches will make use of it.
>
> Move fcs_del to a hole to make space for the pointer.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3 ++-
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 24 +++++++++++++++++++-
> 2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 28b70dcc652e..29f48e63081b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
..
> @@ -286,9 +288,26 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->log_stride = ffs(ring->stride) - 1;
> ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
>
> - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + pp.flags = PP_FLAG_DMA_MAP;
> + pp.pool_size = MLX4_EN_MAX_RX_SIZE;
> + pp.nid = node;
> + pp.napi = &priv->rx_cq[queue_index]->napi;
> + pp.netdev = priv->dev;
> + pp.dev = &mdev->dev->persist->pdev->dev;
> + pp.dma_dir = DMA_BIDIRECTIONAL;
I just noticed one more thing, here we better take the value from
priv->dma_dir, as it could be DMA_FROM_DEVICE or DMA_BIDIRECTIONAL
depending on XDP program presence.
> +
> + ring->pp = page_pool_create(&pp);
> + if (!ring->pp)
> goto err_ring;
>
> + if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> + goto err_pp;
> +
> + err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_POOL,
> + ring->pp);
> + if (err)
> + goto err_xdp_info;
> +
> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> sizeof(struct mlx4_en_rx_alloc));
> ring->rx_info = kvzalloc_node(tmp, GFP_KERNEL, node);
> @@ -319,6 +338,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> ring->rx_info = NULL;
> err_xdp_info:
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> +err_pp:
> + page_pool_destroy(ring->pp);
> err_ring:
> kfree(ring);
> *pring = NULL;
> @@ -445,6 +466,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
> xdp_rxq_info_unreg(&ring->xdp_rxq);
> mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
> kvfree(ring->rx_info);
> + page_pool_destroy(ring->pp);
> ring->rx_info = NULL;
> kfree(ring);
> *pring = NULL;
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx
2025-02-11 19:18 ` Tariq Toukan
@ 2025-02-11 19:44 ` Jakub Kicinski
0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-11 19:44 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Tue, 11 Feb 2025 21:18:46 +0200 Tariq Toukan wrote:
> > - if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
> > + pp.flags = PP_FLAG_DMA_MAP;
> > + pp.pool_size = MLX4_EN_MAX_RX_SIZE;
> > + pp.nid = node;
> > + pp.napi = &priv->rx_cq[queue_index]->napi;
> > + pp.netdev = priv->dev;
> > + pp.dev = &mdev->dev->persist->pdev->dev;
> > + pp.dma_dir = DMA_BIDIRECTIONAL;
>
> I just noticed one more thing, here we better take the value from
> priv->dma_dir, as it could be DMA_FROM_DEVICE or DMA_BIDIRECTIONAL
> depending on XDP program presence.
ack!
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll
2025-02-05 3:12 [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-05 3:12 ` [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
@ 2025-02-05 3:12 ` Jakub Kicinski
2025-02-06 19:50 ` Tariq Toukan
2025-02-05 3:12 ` [PATCH net-next 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
` (2 subsequent siblings)
4 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-05 3:12 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk,
Jakub Kicinski
mlx4 doesn't support xdp xmit and wasn't using page pool
until now, so it could run XDP completions in netpoll
(NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call
it from what is potentially HW IRQ context.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 1ddb11cb25f9..6e077d202827 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
if (unlikely(!priv->port_up))
return 0;
+ if (unlikely(!napi_budget) && cq->type == TX_XDP)
+ return 0;
netdev_txq_bql_complete_prefetchw(ring->tx_queue);
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll
2025-02-05 3:12 ` [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
@ 2025-02-06 19:50 ` Tariq Toukan
2025-02-06 23:05 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-06 19:50 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk
On 05/02/2025 5:12, Jakub Kicinski wrote:
> mlx4 doesn't support xdp xmit
Are you referring to xdp_redirect egress?
mlx4 supports XDP_TX ("bounce"), as well as XDP_DROP, XDP_PASS, and
XDP_REDIRECT.
> and wasn't using page pool> until now, so it could run XDP
completions in netpoll
> (NAPI budget == 0) just fine. Page pool has calling
> context requirements, make sure we don't try to call
> it from what is potentially HW IRQ context.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 1ddb11cb25f9..6e077d202827 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
>
> if (unlikely(!priv->port_up))
> return 0;
> + if (unlikely(!napi_budget) && cq->type == TX_XDP)
> + return 0;
>
> netdev_txq_bql_complete_prefetchw(ring->tx_queue);
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll
2025-02-06 19:50 ` Tariq Toukan
@ 2025-02-06 23:05 ` Jakub Kicinski
0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-06 23:05 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Thu, 6 Feb 2025 21:50:56 +0200 Tariq Toukan wrote:
> On 05/02/2025 5:12, Jakub Kicinski wrote:
> > mlx4 doesn't support xdp xmit
>
> Are you referring to xdp_redirect egress?
>
> mlx4 supports XDP_TX ("bounce"), as well as XDP_DROP, XDP_PASS, and
> XDP_REDIRECT.
Fair point, I probably worded it this way because I was thinking
of .ndo_xdp_xmit
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH net-next 3/4] eth: mlx4: remove the local XDP fast-recycling ring
2025-02-05 3:12 [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-05 3:12 ` [PATCH net-next 1/4] eth: mlx4: create a page pool for Rx Jakub Kicinski
2025-02-05 3:12 ` [PATCH net-next 2/4] eth: mlx4: don't try to complete XDP frames in netpoll Jakub Kicinski
@ 2025-02-05 3:12 ` Jakub Kicinski
2025-02-05 3:12 ` [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
2025-02-06 12:57 ` [PATCH net-next 0/4] " Tariq Toukan
4 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-05 3:12 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk,
Jakub Kicinski
It will be replaced with page pool's built-in recycling.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 11 ------
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 --------------------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 11 +-----
3 files changed, 2 insertions(+), 60 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 29f48e63081b..97311c98569f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -253,14 +253,6 @@ struct mlx4_en_rx_alloc {
#define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
-struct mlx4_en_page_cache {
- u32 index;
- struct {
- struct page *page;
- dma_addr_t dma;
- } buf[MLX4_EN_CACHE_SIZE];
-};
-
enum {
MLX4_EN_TX_RING_STATE_RECOVERING,
};
@@ -343,7 +335,6 @@ struct mlx4_en_rx_ring {
void *buf;
void *rx_info;
struct bpf_prog __rcu *xdp_prog;
- struct mlx4_en_page_cache page_cache;
unsigned long bytes;
unsigned long packets;
unsigned long csum_ok;
@@ -708,8 +699,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
struct mlx4_en_priv *priv, unsigned int length,
int tx_ind, bool *doorbell_pending);
void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
-bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
- struct mlx4_en_rx_alloc *frame);
int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_tx_ring **pring,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2c23d75baf14..9de5449667bb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -142,18 +142,6 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
(index << ring->log_stride);
struct mlx4_en_rx_alloc *frags = ring->rx_info +
(index << priv->log_rx_info);
- if (likely(ring->page_cache.index > 0)) {
- /* XDP uses a single page per frame */
- if (!frags->page) {
- ring->page_cache.index--;
- frags->page = ring->page_cache.buf[ring->page_cache.index].page;
- frags->dma = ring->page_cache.buf[ring->page_cache.index].dma;
- }
- frags->page_offset = XDP_PACKET_HEADROOM;
- rx_desc->data[0].addr = cpu_to_be64(frags->dma +
- XDP_PACKET_HEADROOM);
- return 0;
- }
return mlx4_en_alloc_frags(priv, ring, rx_desc, frags, gfp);
}
@@ -430,26 +418,6 @@ void mlx4_en_recover_from_oom(struct mlx4_en_priv *priv)
}
}
-/* When the rx ring is running in page-per-packet mode, a released frame can go
- * directly into a small cache, to avoid unmapping or touching the page
- * allocator. In bpf prog performance scenarios, buffers are either forwarded
- * or dropped, never converted to skbs, so every page can come directly from
- * this cache when it is sized to be a multiple of the napi budget.
- */
-bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
- struct mlx4_en_rx_alloc *frame)
-{
- struct mlx4_en_page_cache *cache = &ring->page_cache;
-
- if (cache->index >= MLX4_EN_CACHE_SIZE)
- return false;
-
- cache->buf[cache->index].page = frame->page;
- cache->buf[cache->index].dma = frame->dma;
- cache->index++;
- return true;
-}
-
void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride)
@@ -475,14 +443,6 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring)
{
- int i;
-
- for (i = 0; i < ring->page_cache.index; i++) {
- dma_unmap_page(priv->ddev, ring->page_cache.buf[i].dma,
- PAGE_SIZE, priv->dma_dir);
- put_page(ring->page_cache.buf[i].page);
- }
- ring->page_cache.index = 0;
mlx4_en_free_rx_buf(priv, ring);
if (ring->stride <= TXBB_SIZE)
ring->buf -= TXBB_SIZE;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6e077d202827..fe1378a689a1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -350,16 +350,9 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
int napi_mode)
{
struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
- struct mlx4_en_rx_alloc frame = {
- .page = tx_info->page,
- .dma = tx_info->map0_dma,
- };
- if (!napi_mode || !mlx4_en_rx_recycle(ring->recycle_ring, &frame)) {
- dma_unmap_page(priv->ddev, tx_info->map0_dma,
- PAGE_SIZE, priv->dma_dir);
- put_page(tx_info->page);
- }
+ dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
+ put_page(tx_info->page);
return tx_info->nr_txbb;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers
2025-02-05 3:12 [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
` (2 preceding siblings ...)
2025-02-05 3:12 ` [PATCH net-next 3/4] eth: mlx4: remove the local XDP fast-recycling ring Jakub Kicinski
@ 2025-02-05 3:12 ` Jakub Kicinski
2025-02-05 18:05 ` Ido Schimmel
2025-02-06 12:57 ` [PATCH net-next 0/4] " Tariq Toukan
4 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-05 3:12 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk,
Jakub Kicinski
Simple conversion to page pool. Preserve the current fragmentation
logic / page splitting. Each page starts with a single frag reference,
and then we bump that when attaching to skbs. This can likely be
optimized further.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 -
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 56 ++++++++------------
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +--
3 files changed, 26 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 97311c98569f..ad0d91a75184 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -247,7 +247,6 @@ struct mlx4_en_tx_desc {
struct mlx4_en_rx_alloc {
struct page *page;
- dma_addr_t dma;
u32 page_offset;
};
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9de5449667bb..0e74d9c75c71 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -52,57 +52,39 @@
#include "mlx4_en.h"
-static int mlx4_alloc_page(struct mlx4_en_priv *priv,
- struct mlx4_en_rx_alloc *frag,
- gfp_t gfp)
-{
- struct page *page;
- dma_addr_t dma;
-
- page = alloc_page(gfp);
- if (unlikely(!page))
- return -ENOMEM;
- dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE, priv->dma_dir);
- if (unlikely(dma_mapping_error(priv->ddev, dma))) {
- __free_page(page);
- return -ENOMEM;
- }
- frag->page = page;
- frag->dma = dma;
- frag->page_offset = priv->rx_headroom;
- return 0;
-}
-
static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_desc *rx_desc,
struct mlx4_en_rx_alloc *frags,
gfp_t gfp)
{
+ dma_addr_t dma;
int i;
for (i = 0; i < priv->num_frags; i++, frags++) {
if (!frags->page) {
- if (mlx4_alloc_page(priv, frags, gfp)) {
+ frags->page = page_pool_alloc_pages(ring->pp, gfp);
+ if (!frags->page) {
ring->alloc_fail++;
return -ENOMEM;
}
+ page_pool_fragment_page(frags->page, 1);
+ frags->page_offset = priv->rx_headroom;
+
ring->rx_alloc_pages++;
}
- rx_desc->data[i].addr = cpu_to_be64(frags->dma +
- frags->page_offset);
+ dma = page_pool_get_dma_addr(frags->page);
+ rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
}
return 0;
}
static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
+ struct mlx4_en_rx_ring *ring,
struct mlx4_en_rx_alloc *frag)
{
- if (frag->page) {
- dma_unmap_page(priv->ddev, frag->dma,
- PAGE_SIZE, priv->dma_dir);
- __free_page(frag->page);
- }
+ if (frag->page)
+ page_pool_put_full_page(ring->pp, frag->page, false);
/* We need to clear all fields, otherwise a change of priv->log_rx_info
* could lead to see garbage later in frag->page.
*/
@@ -167,7 +149,7 @@ static void mlx4_en_free_rx_desc(const struct mlx4_en_priv *priv,
frags = ring->rx_info + (index << priv->log_rx_info);
for (nr = 0; nr < priv->num_frags; nr++) {
en_dbg(DRV, priv, "Freeing fragment:%d\n", nr);
- mlx4_en_free_frag(priv, frags + nr);
+ mlx4_en_free_frag(priv, ring, frags + nr);
}
}
@@ -283,6 +265,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
pp.netdev = priv->dev;
pp.dev = &mdev->dev->persist->pdev->dev;
pp.dma_dir = DMA_BIDIRECTIONAL;
+ pp.max_len = PAGE_SIZE;
ring->pp = page_pool_create(&pp);
if (!ring->pp)
@@ -469,7 +452,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
if (unlikely(!page))
goto fail;
- dma = frags->dma;
+ dma = page_pool_get_dma_addr(page);
dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
frag_size, priv->dma_dir);
@@ -480,6 +463,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
if (frag_info->frag_stride == PAGE_SIZE / 2) {
frags->page_offset ^= PAGE_SIZE / 2;
release = page_count(page) != 1 ||
+ atomic_long_read(&page->pp_ref_count) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
} else if (!priv->rx_headroom) {
@@ -493,10 +477,9 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
}
if (release) {
- dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
frags->page = NULL;
} else {
- page_ref_inc(page);
+ page_pool_ref_page(page);
}
nr++;
@@ -766,7 +749,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Get pointer to first fragment since we haven't
* skb yet and cast it to ethhdr struct
*/
- dma = frags[0].dma + frags[0].page_offset;
+ dma = page_pool_get_dma_addr(frags[0].page);
+ dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
DMA_FROM_DEVICE);
@@ -805,7 +789,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
void *orig_data;
u32 act;
- dma = frags[0].dma + frags[0].page_offset;
+ dma = page_pool_get_dma_addr(frags[0].page);
+ dma += frags[0].page_offset;
dma_sync_single_for_cpu(priv->ddev, dma,
priv->frag_info[0].frag_size,
DMA_FROM_DEVICE);
@@ -868,6 +853,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
skb = napi_get_frags(&cq->napi);
if (unlikely(!skb))
goto next;
+ skb_mark_for_recycle(skb);
if (unlikely(ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL)) {
u64 timestamp = mlx4_en_get_cqe_ts(cqe);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index fe1378a689a1..87f35bcbeff8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -44,6 +44,7 @@
#include <linux/ipv6.h>
#include <linux/indirect_call_wrapper.h>
#include <net/ipv6.h>
+#include <net/page_pool/helpers.h>
#include "mlx4_en.h"
@@ -350,9 +351,10 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
int napi_mode)
{
struct mlx4_en_tx_info *tx_info = &ring->tx_info[index];
+ struct page_pool *pool = ring->recycle_ring->pp;
- dma_unmap_page(priv->ddev, tx_info->map0_dma, PAGE_SIZE, priv->dma_dir);
- put_page(tx_info->page);
+ /* Note that napi_mode = 0 means ndo_close() path, not budget = 0 */
+ page_pool_put_full_page(pool, tx_info->page, !!napi_mode);
return tx_info->nr_txbb;
}
@@ -1189,7 +1191,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
tx_desc = ring->buf + (index << LOG_TXBB_SIZE);
data = &tx_desc->data;
- dma = frame->dma;
+ dma = page_pool_get_dma_addr(frame->page);
tx_info->page = frame->page;
frame->page = NULL;
--
2.48.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers
2025-02-05 3:12 ` [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-05 18:05 ` Ido Schimmel
2025-02-05 19:11 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: Ido Schimmel @ 2025-02-05 18:05 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Tue, Feb 04, 2025 at 07:12:13PM -0800, Jakub Kicinski wrote:
> @@ -283,6 +265,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> pp.netdev = priv->dev;
> pp.dev = &mdev->dev->persist->pdev->dev;
> pp.dma_dir = DMA_BIDIRECTIONAL;
> + pp.max_len = PAGE_SIZE;
Possibly a stupid question, but can you explain this hunk given patch #1
does not set 'PP_FLAG_DMA_SYNC_DEV' ?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers
2025-02-05 18:05 ` Ido Schimmel
@ 2025-02-05 19:11 ` Jakub Kicinski
0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-05 19:11 UTC (permalink / raw)
To: Ido Schimmel
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Wed, 5 Feb 2025 20:05:27 +0200 Ido Schimmel wrote:
> On Tue, Feb 04, 2025 at 07:12:13PM -0800, Jakub Kicinski wrote:
> > @@ -283,6 +265,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
> > pp.netdev = priv->dev;
> > pp.dev = &mdev->dev->persist->pdev->dev;
> > pp.dma_dir = DMA_BIDIRECTIONAL;
> > + pp.max_len = PAGE_SIZE;
>
> Possibly a stupid question, but can you explain this hunk given patch #1
> does not set 'PP_FLAG_DMA_SYNC_DEV' ?
Not sure, I wrote this a while back, I probably left this here
"for future self", when sync support is added. To remember that
since we still do the page flipping thing we must sync the entire
page, even if MTU is smaller. It's a common source of bugs.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers
2025-02-05 3:12 [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
` (3 preceding siblings ...)
2025-02-05 3:12 ` [PATCH net-next 4/4] eth: mlx4: use the page pool for Rx buffers Jakub Kicinski
@ 2025-02-06 12:57 ` Tariq Toukan
2025-02-06 15:58 ` Jakub Kicinski
4 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-06 12:57 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, tariqt, hawk
On 05/02/2025 5:12, Jakub Kicinski wrote:
> Convert mlx4 to page pool. I've been sitting on these patches for
> over a year, and Jonathan Lemon had a similar series years before.
> We never deployed it or sent upstream because it didn't really show
> much perf win under normal load (admittedly I think the real testing
> was done before Ilias's work on recycling).
>
> During the v6.9 kernel rollout Meta's CDN team noticed that machines
> with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
> spent mapping buffers in the IOMMU). The problem does not occur with
> modern NICs, so I dusted off this series and reportedly it still works.
> And it makes the problem go away, no overloads, perf back in line with
> older kernels. Something must have changed in IOMMU code, I guess.
>
> This series is very simple, and can very likely be optimized further.
> Thing is, I don't have access to any CX3 Pro NICs. They only exist
> in CDN locations which haven't had a HW refresh for a while. So I can
> say this series survives a week under traffic w/ XDP enabled, but
> my ability to iterate and improve is a bit limited.
Hi Jakub,
Thanks for your patches.
As this series touches critical data-path area, and you had no real
option of testing it, we are taking it through a regression cycle, in
parallel to the code review.
We should have results early next week. We'll update.
Regards,
Tariq
>
> Jakub Kicinski (4):
> eth: mlx4: create a page pool for Rx
> eth: mlx4: don't try to complete XDP frames in netpoll
> eth: mlx4: remove the local XDP fast-recycling ring
> eth: mlx4: use the page pool for Rx buffers
>
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 15 +--
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 120 +++++++------------
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 17 ++-
> 3 files changed, 53 insertions(+), 99 deletions(-)
>
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers
2025-02-06 12:57 ` [PATCH net-next 0/4] " Tariq Toukan
@ 2025-02-06 15:58 ` Jakub Kicinski
2025-02-11 18:05 ` Tariq Toukan
0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-06 15:58 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Thu, 6 Feb 2025 14:57:59 +0200 Tariq Toukan wrote:
> On 05/02/2025 5:12, Jakub Kicinski wrote:
> > Convert mlx4 to page pool. I've been sitting on these patches for
> > over a year, and Jonathan Lemon had a similar series years before.
> > We never deployed it or sent upstream because it didn't really show
> > much perf win under normal load (admittedly I think the real testing
> > was done before Ilias's work on recycling).
> >
> > During the v6.9 kernel rollout Meta's CDN team noticed that machines
> > with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
> > spent mapping buffers in the IOMMU). The problem does not occur with
> > modern NICs, so I dusted off this series and reportedly it still works.
> > And it makes the problem go away, no overloads, perf back in line with
> > older kernels. Something must have changed in IOMMU code, I guess.
> >
> > This series is very simple, and can very likely be optimized further.
> > Thing is, I don't have access to any CX3 Pro NICs. They only exist
> > in CDN locations which haven't had a HW refresh for a while. So I can
> > say this series survives a week under traffic w/ XDP enabled, but
> > my ability to iterate and improve is a bit limited.
>
> Hi Jakub,
>
> Thanks for your patches.
>
> As this series touches critical data-path area, and you had no real
> option of testing it, we are taking it through a regression cycle, in
> parallel to the code review.
>
> We should have results early next week. We'll update.
Sounds good, could you repost once ready?
I'll mark it as awaiting upstream in patchwork for now.
And feel free to drop the line pointed out by Ido, no real
preference either way there.
--
pw-bot: au
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers
2025-02-06 15:58 ` Jakub Kicinski
@ 2025-02-11 18:05 ` Tariq Toukan
2025-02-11 18:50 ` Jakub Kicinski
0 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-11 18:05 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On 06/02/2025 17:58, Jakub Kicinski wrote:
> On Thu, 6 Feb 2025 14:57:59 +0200 Tariq Toukan wrote:
>> On 05/02/2025 5:12, Jakub Kicinski wrote:
>>> Convert mlx4 to page pool. I've been sitting on these patches for
>>> over a year, and Jonathan Lemon had a similar series years before.
>>> We never deployed it or sent upstream because it didn't really show
>>> much perf win under normal load (admittedly I think the real testing
>>> was done before Ilias's work on recycling).
>>>
>>> During the v6.9 kernel rollout Meta's CDN team noticed that machines
>>> with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
>>> spent mapping buffers in the IOMMU). The problem does not occur with
>>> modern NICs, so I dusted off this series and reportedly it still works.
>>> And it makes the problem go away, no overloads, perf back in line with
>>> older kernels. Something must have changed in IOMMU code, I guess.
>>>
>>> This series is very simple, and can very likely be optimized further.
>>> Thing is, I don't have access to any CX3 Pro NICs. They only exist
>>> in CDN locations which haven't had a HW refresh for a while. So I can
>>> say this series survives a week under traffic w/ XDP enabled, but
>>> my ability to iterate and improve is a bit limited.
>>
>> Hi Jakub,
>>
>> Thanks for your patches.
>>
>> As this series touches critical data-path area, and you had no real
>> option of testing it, we are taking it through a regression cycle, in
>> parallel to the code review.
>>
>> We should have results early next week. We'll update.
>
> Sounds good, could you repost once ready?
> I'll mark it as awaiting upstream in patchwork for now.
> And feel free to drop the line pointed out by Ido, no real
> preference either way there.
Hi,
Patches passed functional tests.
Overall, the patches look good.
Only a few comments:
1. Nit by Ido.
2. pool size.
3. xdp xmit support description.
How do you want to proceed?
Do you want to fix and re-spin?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH net-next 0/4] eth: mlx4: use the page pool for Rx buffers
2025-02-11 18:05 ` Tariq Toukan
@ 2025-02-11 18:50 ` Jakub Kicinski
0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-11 18:50 UTC (permalink / raw)
To: Tariq Toukan
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, tariqt,
hawk
On Tue, 11 Feb 2025 20:05:01 +0200 Tariq Toukan wrote:
> On 06/02/2025 17:58, Jakub Kicinski wrote:
> > On Thu, 6 Feb 2025 14:57:59 +0200 Tariq Toukan wrote:
> >> Thanks for your patches.
> >>
> >> As this series touches critical data-path area, and you had no real
> >> option of testing it, we are taking it through a regression cycle, in
> >> parallel to the code review.
> >>
> >> We should have results early next week. We'll update.
> >
> > Sounds good, could you repost once ready?
> > I'll mark it as awaiting upstream in patchwork for now.
> > And feel free to drop the line pointed out by Ido, no real
> > preference either way there.
>
> Hi,
>
> Patches passed functional tests.
>
> Overall, the patches look good.
Thanks!
> Only a few comments:
> 1. Nit by Ido.
> 2. pool size.
> 3. xdp xmit support description.
>
> How do you want to proceed?
> Do you want to fix and re-spin?
Sure thing, will do.
^ permalink raw reply [flat|nested] 23+ messages in thread