* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-14 21:31 [PATCH net] bnxt_en: do not map packet buffers twice Michael Chan
@ 2023-12-14 23:18 ` David Wei
2023-12-15 0:27 ` Michael Chan
2023-12-15 5:54 ` David Wei
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: David Wei @ 2023-12-14 23:18 UTC (permalink / raw)
To: Michael Chan, davem
Cc: netdev, edumazet, kuba, pabeni, bpf, hawk, ast, daniel,
john.fastabend, Andy Gospodarek, Somnath Kotur
On 2023-12-14 13:31, Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed. Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
>
> Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 96f5ca778c67..8cb9a99154aa 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> for (i = 0; i < num_frags ; i++) {
> skb_frag_t *frag = &sinfo->frags[i];
> struct bnxt_sw_tx_bd *frag_tx_buf;
> - struct pci_dev *pdev = bp->pdev;
> dma_addr_t frag_mapping;
> int frag_len;
>
> @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
>
> frag_len = skb_frag_size(frag);
> - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> - frag_len, DMA_TO_DEVICE);
> -
> - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> - return NULL;
> -
> - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
If this is no longer set, what would happen to dma_unmap_single() in
bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()?
> -
> flags = frag_len << TX_BD_LEN_SHIFT;
> txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
> + frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
> + skb_frag_off(frag);
> txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
>
> len = frag_len;
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-14 23:18 ` David Wei
@ 2023-12-15 0:27 ` Michael Chan
0 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2023-12-15 0:27 UTC (permalink / raw)
To: David Wei
Cc: davem, netdev, edumazet, kuba, pabeni, bpf, hawk, ast, daniel,
john.fastabend, Andy Gospodarek, Somnath Kotur
[-- Attachment #1: Type: text/plain, Size: 2528 bytes --]
On Thu, Dec 14, 2023 at 3:18 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2023-12-14 13:31, Michael Chan wrote:
> > From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> >
> > Remove double-mapping of DMA buffers as it can prevent page pool entries
> > from being freed. Mapping is managed by page pool infrastructure and
> > was previously managed by the driver in __bnxt_alloc_rx_page before
> > allowing the page pool infrastructure to manage it.
> >
> > Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> > ---
> > drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
> > 1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > index 96f5ca778c67..8cb9a99154aa 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> > @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> > for (i = 0; i < num_frags ; i++) {
> > skb_frag_t *frag = &sinfo->frags[i];
> > struct bnxt_sw_tx_bd *frag_tx_buf;
> > - struct pci_dev *pdev = bp->pdev;
> > dma_addr_t frag_mapping;
> > int frag_len;
> >
> > @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> > txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
> >
> > frag_len = skb_frag_size(frag);
> > - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> > - frag_len, DMA_TO_DEVICE);
> > -
> > - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> > - return NULL;
> > -
> > - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
>
> If this is no longer set, what would happen to dma_unmap_single() in
> bnxt_tx_int_xdp() that is then reading `mapping` via dma_unmap_addr()?
The dma_unmap_addr() call in bnxt_tx_int_xdp() is for XDP_REDIRECT
packets only. Our current XDP_REDIRECT implementation supports only a
single buffer per packet. Here, this code path is for XDP_TX
multi-buffers. The DMA mapping for the frag pages is always handled
by the page pool. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-14 21:31 [PATCH net] bnxt_en: do not map packet buffers twice Michael Chan
2023-12-14 23:18 ` David Wei
@ 2023-12-15 5:54 ` David Wei
2023-12-15 16:37 ` Jakub Kicinski
2023-12-15 18:20 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 9+ messages in thread
From: David Wei @ 2023-12-15 5:54 UTC (permalink / raw)
To: Michael Chan, davem
Cc: netdev, edumazet, kuba, pabeni, bpf, hawk, ast, daniel,
john.fastabend, Andy Gospodarek, Somnath Kotur
On 2023-12-14 13:31, Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed. Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
>
> Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: David Wei <dw@davidwei.uk>
> ---
> drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index 96f5ca778c67..8cb9a99154aa 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -59,7 +59,6 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> for (i = 0; i < num_frags ; i++) {
> skb_frag_t *frag = &sinfo->frags[i];
> struct bnxt_sw_tx_bd *frag_tx_buf;
> - struct pci_dev *pdev = bp->pdev;
> dma_addr_t frag_mapping;
> int frag_len;
>
> @@ -73,16 +72,10 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> txbd = &txr->tx_desc_ring[TX_RING(prod)][TX_IDX(prod)];
>
> frag_len = skb_frag_size(frag);
> - frag_mapping = skb_frag_dma_map(&pdev->dev, frag, 0,
> - frag_len, DMA_TO_DEVICE);
I checked that skb_frag_dma_map() calls dma_map_page() with page set to
skb_frag_page(frag) and offset set to skb_frag_off(frag) + offset where
offset is 0. This is thus equivalent to the line added below:
page_pool_get_dma_addr(skb_frag_page(frag)) + skb_frag_off(frag)
> -
> - if (unlikely(dma_mapping_error(&pdev->dev, frag_mapping)))
> - return NULL;
I checked that page_pool_get_dma_addr() cannot fail or return an invalid
mapping. The DMA mapping happens when bulk allocating the pp alloc cache
during __page_pool_alloc_pages_slow(). If DMA mapping fails during
page_pool_dma_map() then the page is not stored in the cache. Therefore
any pages allocated from the pp will have a valid DMA addr.
> -
> - dma_unmap_addr_set(frag_tx_buf, mapping, frag_mapping);
As discussed with Michael Chan, only XDP_TX will have multiple page
frags. Presumably only XDP_TX will have num_frags > 0 and enter this for
loop. Even though XDP_REDIRECT also calls bnxt_xmit_bd() from
__bnxt_xmit_xdp_redirect(), I assume xdp_buff_has_frags() returns false.
> -
> flags = frag_len << TX_BD_LEN_SHIFT;
> txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
> + frag_mapping = page_pool_get_dma_addr(skb_frag_page(frag)) +
> + skb_frag_off(frag);
I trust that the page pool DMA mapping management is correct.
Both skb_frag_dma_map() and page_pool_dma_map() call into
dma_map_page_attrs(), but page_pool_dma_map() has flags
DMA_ATTR_SKIP_CPU_SYNC and DMA_ATTR_WEAK_ORDERING set whereas
skb_frag_dma_map() has no flags.
DMA_ATTR_WEAK_ORDERING is optional and ignored for platforms that do not
support it, therefore safe to use.
DMA_ATTR_SKIP_CPU_SYNC is used since presumably there is no sharing of
pages between multiple devices. IIRC there is a single page pool per Rx
queue/NAPI context.
> txbd->tx_bd_haddr = cpu_to_le64(frag_mapping);
>
> len = frag_len;
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-14 21:31 [PATCH net] bnxt_en: do not map packet buffers twice Michael Chan
2023-12-14 23:18 ` David Wei
2023-12-15 5:54 ` David Wei
@ 2023-12-15 16:37 ` Jakub Kicinski
2023-12-15 16:57 ` Andy Gospodarek
2023-12-15 18:20 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-12-15 16:37 UTC (permalink / raw)
To: Michael Chan, Andy Gospodarek
Cc: davem, netdev, edumazet, pabeni, bpf, hawk, ast, daniel,
john.fastabend, Somnath Kotur
On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed. Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
This patch is all good, but I'm confused by the handling of head.
Do you recycle it immediately and hope that the Tx happens before
the Rx gets around to using the recycled page again? Am I misreading?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-15 16:37 ` Jakub Kicinski
@ 2023-12-15 16:57 ` Andy Gospodarek
2023-12-15 17:21 ` Jakub Kicinski
0 siblings, 1 reply; 9+ messages in thread
From: Andy Gospodarek @ 2023-12-15 16:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Michael Chan, Andy Gospodarek, davem, netdev, edumazet, pabeni,
bpf, hawk, ast, daniel, john.fastabend, Somnath Kotur
On Fri, Dec 15, 2023 at 08:37:59AM -0800, Jakub Kicinski wrote:
> On Thu, 14 Dec 2023 13:31:38 -0800 Michael Chan wrote:
> > From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> >
> > Remove double-mapping of DMA buffers as it can prevent page pool entries
> > from being freed. Mapping is managed by page pool infrastructure and
> > was previously managed by the driver in __bnxt_alloc_rx_page before
> > allowing the page pool infrastructure to manage it.
>
> This patch is all good, but I'm confused by the handling of head.
> Do you recycle it immediately and hope that the Tx happens before
> the Rx gets around to using the recycled page again? Am I misreading?
Your description is correct, but we use a better strategy that just
hoping it works out. :)
The design is that we do not update the rx ring with the producer value
that was present when the packet was received until after getting the tx
completion indicating that the packet sent via XDP_TX action has been
sent.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-15 16:57 ` Andy Gospodarek
@ 2023-12-15 17:21 ` Jakub Kicinski
2023-12-15 20:45 ` Michael Chan
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-12-15 17:21 UTC (permalink / raw)
To: Andy Gospodarek
Cc: Michael Chan, davem, netdev, edumazet, pabeni, bpf, hawk, ast,
daniel, john.fastabend, Somnath Kotur
On Fri, 15 Dec 2023 11:57:14 -0500 Andy Gospodarek wrote:
> > This patch is all good, but I'm confused by the handling of head.
> > Do you recycle it immediately and hope that the Tx happens before
> > the Rx gets around to using the recycled page again? Am I misreading?
>
> Your description is correct, but we use a better strategy that just
> hoping it works out. :)
>
> The design is that we do not update the rx ring with the producer value
> that was present when the packet was received until after getting the tx
> completion indicating that the packet sent via XDP_TX action has been
> sent.
Ah, I see it, interesting! In that case - next question.. :)
Are the XDP_REDIRECT (target) and XDP_TX going to the same rings?
The locking seems to be missing, and bnxt_tx_int_xdp() does not
seem to be able to handle the optimization you described if
a ring contains a mix of XDP_REDIRECT and XDP_TX.
If I'm reading the assignment in bnxt_alloc_mem() and indexing
right - XDP_REDIRECT and XDP_TX do seem to go to the same rings.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-15 17:21 ` Jakub Kicinski
@ 2023-12-15 20:45 ` Michael Chan
0 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2023-12-15 20:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andy Gospodarek, davem, netdev, edumazet, pabeni, bpf, hawk, ast,
daniel, john.fastabend, Somnath Kotur
[-- Attachment #1: Type: text/plain, Size: 777 bytes --]
On Fri, Dec 15, 2023 at 9:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Are the XDP_REDIRECT (target) and XDP_TX going to the same rings?
> The locking seems to be missing, and bnxt_tx_int_xdp() does not
> seem to be able to handle the optimization you described if
> a ring contains a mix of XDP_REDIRECT and XDP_TX.
XDP_REDIRECT mixed with XDP_TX won't work well currently. It was
briefly mentioned on the list a few months ago:
https://lore.kernel.org/netdev/CACKFLin+1whPs0qeM5xBb1yXx8FkFS_vGrW6PaGy41_XVH=SGg@mail.gmail.com/
Yes, they share the same set of TX rings in the current code. My plan
is to have a set of dedicated TX rings for XDP_REDIRECT. Adding
locking to properly support XDP_REDIRECT and XDP_TX seems not ideal
for performance.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] bnxt_en: do not map packet buffers twice
2023-12-14 21:31 [PATCH net] bnxt_en: do not map packet buffers twice Michael Chan
` (2 preceding siblings ...)
2023-12-15 16:37 ` Jakub Kicinski
@ 2023-12-15 18:20 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-15 18:20 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, kuba, pabeni, bpf, hawk, ast, daniel,
john.fastabend, andrew.gospodarek, somnath.kotur
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 14 Dec 2023 13:31:38 -0800 you wrote:
> From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>
> Remove double-mapping of DMA buffers as it can prevent page pool entries
> from being freed. Mapping is managed by page pool infrastructure and
> was previously managed by the driver in __bnxt_alloc_rx_page before
> allowing the page pool infrastructure to manage it.
>
> [...]
Here is the summary with links:
- [net] bnxt_en: do not map packet buffers twice
https://git.kernel.org/netdev/net/c/23c93c3b6275
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread