* [PATCH net-next 1/3] bnxt_en: Fix page pool logic for page size >= 64K
2023-07-28 23:18 [PATCH net-next 0/3] bnxt_en: Add support for page pool Michael Chan
@ 2023-07-28 23:18 ` Michael Chan
2023-07-29 0:35 ` Jakub Kicinski
2023-07-28 23:18 ` [PATCH net-next 2/3] bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP Michael Chan
2023-07-28 23:18 ` [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping Michael Chan
2 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2023-07-28 23:18 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, kuba, pabeni, gospo, bpf, somnath.kotur,
Andy Gospodarek
[-- Attachment #1: Type: text/plain, Size: 6535 bytes --]
From: Somnath Kotur <somnath.kotur@broadcom.com>
The RXBD length field on all bnxt chips is 16-bit and so we cannot
support a full page when the native page size is 64K or greater.
The non-XDP (non page pool) code path has logic to handle this but
the XDP page pool code path does not handle this. Add the missing
logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if
the page size is 64K or greater.
Fixes: 9f4b28301ce6 ("bnxt: XDP multibuffer enablement")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 ++++++++++++-------
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 ++--
2 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a3bbd13c070f..77ce494643f2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -756,17 +756,24 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts)
static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
struct bnxt_rx_ring_info *rxr,
+ unsigned int *offset,
gfp_t gfp)
{
struct device *dev = &bp->pdev->dev;
struct page *page;
- page = page_pool_dev_alloc_pages(rxr->page_pool);
+ if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
+ page = page_pool_dev_alloc_frag(rxr->page_pool, offset,
+ BNXT_RX_PAGE_SIZE);
+ } else {
+ page = page_pool_dev_alloc_pages(rxr->page_pool);
+ *offset = 0;
+ }
if (!page)
return NULL;
- *mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
+ *mapping = dma_map_page_attrs(dev, page, *offset, BNXT_RX_PAGE_SIZE,
+ bp->rx_dir, DMA_ATTR_WEAK_ORDERING);
if (dma_mapping_error(dev, *mapping)) {
page_pool_recycle_direct(rxr->page_pool, page);
return NULL;
@@ -806,15 +813,16 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
dma_addr_t mapping;
if (BNXT_RX_PAGE_MODE(bp)) {
+ unsigned int offset;
struct page *page =
- __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp);
+ __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
if (!page)
return -ENOMEM;
mapping += bp->rx_dma_offset;
rx_buf->data = page;
- rx_buf->data_ptr = page_address(page) + bp->rx_offset;
+ rx_buf->data_ptr = page_address(page) + offset + bp->rx_offset;
} else {
u8 *data = __bnxt_alloc_rx_frag(bp, &mapping, gfp);
@@ -874,7 +882,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
unsigned int offset = 0;
if (BNXT_RX_PAGE_MODE(bp)) {
- page = __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp);
+ page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
if (!page)
return -ENOMEM;
@@ -1021,15 +1029,15 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
return NULL;
}
dma_addr -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir,
+ dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
DMA_ATTR_WEAK_ORDERING);
- skb = build_skb(page_address(page), PAGE_SIZE);
+ skb = build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE);
if (!skb) {
page_pool_recycle_direct(rxr->page_pool, page);
return NULL;
}
skb_mark_for_recycle(skb);
- skb_reserve(skb, bp->rx_dma_offset);
+ skb_reserve(skb, bp->rx_offset);
__skb_put(skb, len);
return skb;
@@ -1055,7 +1063,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
return NULL;
}
dma_addr -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir,
+ dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
DMA_ATTR_WEAK_ORDERING);
if (unlikely(!payload))
@@ -1069,7 +1077,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
skb_mark_for_recycle(skb);
off = (void *)data_ptr - page_address(page);
- skb_add_rx_frag(skb, 0, page, off, len, PAGE_SIZE);
+ skb_add_rx_frag(skb, 0, page, off, len, BNXT_RX_PAGE_SIZE);
memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN,
payload + NET_IP_ALIGN);
@@ -1200,7 +1208,7 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
skb->data_len += total_frag_len;
skb->len += total_frag_len;
- skb->truesize += PAGE_SIZE * agg_bufs;
+ skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs;
return skb;
}
@@ -2969,7 +2977,7 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
rx_buf->data = NULL;
if (BNXT_RX_PAGE_MODE(bp)) {
mapping -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&pdev->dev, mapping, PAGE_SIZE,
+ dma_unmap_page_attrs(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
bp->rx_dir,
DMA_ATTR_WEAK_ORDERING);
page_pool_recycle_direct(rxr->page_pool, data);
@@ -3239,6 +3247,8 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.napi = &rxr->bnapi->napi;
pp.dev = &bp->pdev->dev;
pp.dma_dir = DMA_BIDIRECTIONAL;
+ if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
+ pp.flags |= PP_FLAG_PAGE_FRAG;
rxr->page_pool = page_pool_create(&pp);
if (IS_ERR(rxr->page_pool)) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 5b6fbdc4dc40..fab3924d5070 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -185,7 +185,7 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
struct xdp_buff *xdp)
{
struct bnxt_sw_rx_bd *rx_buf;
- u32 buflen = PAGE_SIZE;
+ u32 buflen = BNXT_RX_PAGE_SIZE;
struct pci_dev *pdev;
dma_addr_t mapping;
u32 offset;
@@ -301,7 +301,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
rx_buf = &rxr->rx_buf_ring[cons];
mapping = rx_buf->mapping - bp->rx_dma_offset;
dma_unmap_page_attrs(&pdev->dev, mapping,
- PAGE_SIZE, bp->rx_dir,
+ BNXT_RX_PAGE_SIZE, bp->rx_dir,
DMA_ATTR_WEAK_ORDERING);
/* if we are unable to allocate a new buffer, abort and reuse */
@@ -484,7 +484,7 @@ bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags,
}
xdp_update_skb_shared_info(skb, num_frags,
sinfo->xdp_frags_size,
- PAGE_SIZE * sinfo->nr_frags,
+ BNXT_RX_PAGE_SIZE * sinfo->nr_frags,
xdp_buff_is_frag_pfmemalloc(xdp));
return skb;
}
--
2.30.1
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net-next 1/3] bnxt_en: Fix page pool logic for page size >= 64K
2023-07-28 23:18 ` [PATCH net-next 1/3] bnxt_en: Fix page pool logic for page size >= 64K Michael Chan
@ 2023-07-29 0:35 ` Jakub Kicinski
0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2023-07-29 0:35 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, gospo, bpf, somnath.kotur,
Andy Gospodarek
On Fri, 28 Jul 2023 16:18:27 -0700 Michael Chan wrote:
> From: Somnath Kotur <somnath.kotur@broadcom.com>
>
> The RXBD length field on all bnxt chips is 16-bit and so we cannot
> support a full page when the native page size is 64K or greater.
> The non-XDP (non page pool) code path has logic to handle this but
> the XDP page pool code path does not handle this. Add the missing
> logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if
> the page size is 64K or greater.
>
> Fixes: 9f4b28301ce6 ("bnxt: XDP multibuffer enablement")
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Fix is a fix... Let's get this into net, first.
> - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir,
> + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
> DMA_ATTR_WEAK_ORDERING);
this
> - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir,
> + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
> DMA_ATTR_WEAK_ORDERING);
this
> - dma_unmap_page_attrs(&pdev->dev, mapping, PAGE_SIZE,
> + dma_unmap_page_attrs(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
> bp->rx_dir,
> DMA_ATTR_WEAK_ORDERING);
and this - unnecessarily go over 80 chars when there's already
a continuation line that could take the last argument.
> @@ -185,7 +185,7 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
> struct xdp_buff *xdp)
> {
> struct bnxt_sw_rx_bd *rx_buf;
> - u32 buflen = PAGE_SIZE;
> + u32 buflen = BNXT_RX_PAGE_SIZE;
nit: rev xmas tree here
> struct pci_dev *pdev;
> dma_addr_t mapping;
> u32 offset;
--
pw-bot: cr
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 2/3] bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP
2023-07-28 23:18 [PATCH net-next 0/3] bnxt_en: Add support for page pool Michael Chan
2023-07-28 23:18 ` [PATCH net-next 1/3] bnxt_en: Fix page pool logic for page size >= 64K Michael Chan
@ 2023-07-28 23:18 ` Michael Chan
2023-07-28 23:18 ` [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping Michael Chan
2 siblings, 0 replies; 14+ messages in thread
From: Michael Chan @ 2023-07-28 23:18 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, kuba, pabeni, gospo, bpf, somnath.kotur
[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]
From: Somnath Kotur <somnath.kotur@broadcom.com>
Convert to use the page pool buffers for the aggregation ring when
running in non-XDP mode. This simplifies the driver and we benefit
from the recycling of pages. Adjust the page pool size to account
for the aggregation ring size.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 71 +++++------------------
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 -
2 files changed, 14 insertions(+), 60 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 77ce494643f2..adf785b7aa42 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -875,48 +875,15 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
struct rx_bd *rxbd =
&rxr->rx_agg_desc_ring[RX_RING(prod)][RX_IDX(prod)];
struct bnxt_sw_rx_agg_bd *rx_agg_buf;
- struct pci_dev *pdev = bp->pdev;
struct page *page;
dma_addr_t mapping;
u16 sw_prod = rxr->rx_sw_agg_prod;
unsigned int offset = 0;
- if (BNXT_RX_PAGE_MODE(bp)) {
- page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
-
- if (!page)
- return -ENOMEM;
-
- } else {
- if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
- page = rxr->rx_page;
- if (!page) {
- page = alloc_page(gfp);
- if (!page)
- return -ENOMEM;
- rxr->rx_page = page;
- rxr->rx_page_offset = 0;
- }
- offset = rxr->rx_page_offset;
- rxr->rx_page_offset += BNXT_RX_PAGE_SIZE;
- if (rxr->rx_page_offset == PAGE_SIZE)
- rxr->rx_page = NULL;
- else
- get_page(page);
- } else {
- page = alloc_page(gfp);
- if (!page)
- return -ENOMEM;
- }
+ page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
- mapping = dma_map_page_attrs(&pdev->dev, page, offset,
- BNXT_RX_PAGE_SIZE, DMA_FROM_DEVICE,
- DMA_ATTR_WEAK_ORDERING);
- if (dma_mapping_error(&pdev->dev, mapping)) {
- __free_page(page);
- return -EIO;
- }
- }
+ if (!page)
+ return -ENOMEM;
if (unlikely(test_bit(sw_prod, rxr->rx_agg_bmap)))
sw_prod = bnxt_find_next_agg_idx(rxr, sw_prod);
@@ -1202,6 +1169,7 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo, idx,
agg_bufs, tpa, NULL);
if (!total_frag_len) {
+ skb_mark_for_recycle(skb);
dev_kfree_skb(skb);
return NULL;
}
@@ -1792,6 +1760,7 @@ static void bnxt_deliver_skb(struct bnxt *bp, struct bnxt_napi *bnapi,
return;
}
skb_record_rx_queue(skb, bnapi->index);
+ skb_mark_for_recycle(skb);
napi_gro_receive(&bnapi->napi, skb);
}
@@ -3000,30 +2969,16 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
if (!page)
continue;
- if (BNXT_RX_PAGE_MODE(bp)) {
- dma_unmap_page_attrs(&pdev->dev, rx_agg_buf->mapping,
- BNXT_RX_PAGE_SIZE, bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
- rx_agg_buf->page = NULL;
- __clear_bit(i, rxr->rx_agg_bmap);
-
- page_pool_recycle_direct(rxr->page_pool, page);
- } else {
- dma_unmap_page_attrs(&pdev->dev, rx_agg_buf->mapping,
- BNXT_RX_PAGE_SIZE, DMA_FROM_DEVICE,
- DMA_ATTR_WEAK_ORDERING);
- rx_agg_buf->page = NULL;
- __clear_bit(i, rxr->rx_agg_bmap);
+ dma_unmap_page_attrs(&pdev->dev, rx_agg_buf->mapping,
+ BNXT_RX_PAGE_SIZE, bp->rx_dir,
+ DMA_ATTR_WEAK_ORDERING);
+ rx_agg_buf->page = NULL;
+ __clear_bit(i, rxr->rx_agg_bmap);
- __free_page(page);
- }
+ page_pool_recycle_direct(rxr->page_pool, page);
}
skip_rx_agg_free:
- if (rxr->rx_page) {
- __free_page(rxr->rx_page);
- rxr->rx_page = NULL;
- }
map = rxr->rx_tpa_idx_map;
if (map)
memset(map->agg_idx_bmap, 0, sizeof(map->agg_idx_bmap));
@@ -3242,7 +3197,9 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
{
struct page_pool_params pp = { 0 };
- pp.pool_size = bp->rx_ring_size;
+ pp.pool_size = bp->rx_agg_ring_size;
+ if (BNXT_RX_PAGE_MODE(bp))
+ pp.pool_size += bp->rx_ring_size;
pp.nid = dev_to_node(&bp->pdev->dev);
pp.napi = &rxr->bnapi->napi;
pp.dev = &bp->pdev->dev;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 9d16757e27fe..c446037f6bd9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -919,9 +919,6 @@ struct bnxt_rx_ring_info {
unsigned long *rx_agg_bmap;
u16 rx_agg_bmap_size;
- struct page *rx_page;
- unsigned int rx_page_offset;
-
dma_addr_t rx_desc_mapping[MAX_RX_PAGES];
dma_addr_t rx_agg_desc_mapping[MAX_RX_AGG_PAGES];
--
2.30.1
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-28 23:18 [PATCH net-next 0/3] bnxt_en: Add support for page pool Michael Chan
2023-07-28 23:18 ` [PATCH net-next 1/3] bnxt_en: Fix page pool logic for page size >= 64K Michael Chan
2023-07-28 23:18 ` [PATCH net-next 2/3] bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP Michael Chan
@ 2023-07-28 23:18 ` Michael Chan
2023-07-29 0:42 ` Jakub Kicinski
2 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2023-07-28 23:18 UTC (permalink / raw)
To: davem; +Cc: netdev, edumazet, kuba, pabeni, gospo, bpf, somnath.kotur
[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]
From: Somnath Kotur <somnath.kotur@broadcom.com>
Use the page pool's ability to maintain DMA mappings for us.
This avoids re-mapping of the recycled pages.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 32 +++++++----------------
1 file changed, 10 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index adf785b7aa42..b35bc92094ce 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -759,7 +759,6 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
unsigned int *offset,
gfp_t gfp)
{
- struct device *dev = &bp->pdev->dev;
struct page *page;
if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
@@ -772,12 +771,7 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
if (!page)
return NULL;
- *mapping = dma_map_page_attrs(dev, page, *offset, BNXT_RX_PAGE_SIZE,
- bp->rx_dir, DMA_ATTR_WEAK_ORDERING);
- if (dma_mapping_error(dev, *mapping)) {
- page_pool_recycle_direct(rxr->page_pool, page);
- return NULL;
- }
+ *mapping = page_pool_get_dma_addr(page) + *offset;
return page;
}
@@ -996,8 +990,8 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
return NULL;
}
dma_addr -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
+ dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
+ bp->rx_dir);
skb = build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE);
if (!skb) {
page_pool_recycle_direct(rxr->page_pool, page);
@@ -1030,8 +1024,8 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
return NULL;
}
dma_addr -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
+ dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
+ bp->rx_dir);
if (unlikely(!payload))
payload = eth_get_headlen(bp->dev, data_ptr, len);
@@ -1147,9 +1141,8 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
return 0;
}
- dma_unmap_page_attrs(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
- bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
+ dma_sync_single_for_cpu(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
+ bp->rx_dir);
total_frag_len += frag_len;
prod = NEXT_RX_AGG(prod);
@@ -2945,10 +2938,6 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
rx_buf->data = NULL;
if (BNXT_RX_PAGE_MODE(bp)) {
- mapping -= bp->rx_dma_offset;
- dma_unmap_page_attrs(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
- bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
page_pool_recycle_direct(rxr->page_pool, data);
} else {
dma_unmap_single_attrs(&pdev->dev, mapping,
@@ -2969,9 +2958,6 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
if (!page)
continue;
- dma_unmap_page_attrs(&pdev->dev, rx_agg_buf->mapping,
- BNXT_RX_PAGE_SIZE, bp->rx_dir,
- DMA_ATTR_WEAK_ORDERING);
rx_agg_buf->page = NULL;
__clear_bit(i, rxr->rx_agg_bmap);
@@ -3203,7 +3189,9 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.nid = dev_to_node(&bp->pdev->dev);
pp.napi = &rxr->bnapi->napi;
pp.dev = &bp->pdev->dev;
- pp.dma_dir = DMA_BIDIRECTIONAL;
+ pp.dma_dir = bp->rx_dir;
+ pp.max_len = BNXT_RX_PAGE_SIZE;
+ pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
pp.flags |= PP_FLAG_PAGE_FRAG;
--
2.30.1
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-28 23:18 ` [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping Michael Chan
@ 2023-07-29 0:42 ` Jakub Kicinski
2023-07-31 17:47 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-07-29 0:42 UTC (permalink / raw)
To: Michael Chan
Cc: davem, netdev, edumazet, pabeni, gospo, bpf, somnath.kotur,
Jesper Dangaard Brouer
On Fri, 28 Jul 2023 16:18:29 -0700 Michael Chan wrote:
> + pp.dma_dir = bp->rx_dir;
> + pp.max_len = BNXT_RX_PAGE_SIZE;
I _think_ you need PAGE_SIZE here.
This should be smaller than PAGE_SIZE only if you're wasting the rest
of the buffer, e.g. MTU is 3k so you know last 1k will never get used.
PAGE_SIZE is always a multiple of BNXT_RX_PAGE so you waste nothing.
Adding Jesper to CC to keep me honest.
> + pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-29 0:42 ` Jakub Kicinski
@ 2023-07-31 17:47 ` Jesper Dangaard Brouer
2023-07-31 18:00 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2023-07-31 17:47 UTC (permalink / raw)
To: Jakub Kicinski, Michael Chan
Cc: davem, netdev, edumazet, pabeni, gospo, bpf, somnath.kotur,
Jesper Dangaard Brouer, Ilias Apalodimas
On 29/07/2023 02.42, Jakub Kicinski wrote:
> On Fri, 28 Jul 2023 16:18:29 -0700 Michael Chan wrote:
>> + pp.dma_dir = bp->rx_dir;
>> + pp.max_len = BNXT_RX_PAGE_SIZE;
>
> I _think_ you need PAGE_SIZE here.
>
I actually think pp.max_len = BNXT_RX_PAGE_SIZE is correct here.
(Although it can be optimized, see below)
> This should be smaller than PAGE_SIZE only if you're wasting the rest
> of the buffer, e.g. MTU is 3k so you know last 1k will never get used.
> PAGE_SIZE is always a multiple of BNXT_RX_PAGE so you waste nothing.
>
Remember pp.max_len is used for dma_sync_for_device.
If driver is smart, it can set pp.max_len according to MTU, as the (DMA
sync for) device knows hardware will not go beyond this.
On Intel "dma_sync_for_device" is a no-op, so most drivers done
optimized for this. I remember is had HUGE effects on ARM EspressoBin board.
> Adding Jesper to CC to keep me honest.
Adding Ilias to keep me honest ;-)
To follow/understand these changes, reviewers need to keep the context
of patch 1/3 in mind [1].
[1]
https://lore.kernel.org/all/20230728231829.235716-2-michael.chan@broadcom.com/
>
>> + pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
>
--Jesper
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 17:47 ` Jesper Dangaard Brouer
@ 2023-07-31 18:00 ` Jakub Kicinski
2023-07-31 18:16 ` Michael Chan
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-07-31 18:00 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Michael Chan, davem, netdev, edumazet, pabeni, gospo, bpf,
somnath.kotur, Ilias Apalodimas
On Mon, 31 Jul 2023 19:47:08 +0200 Jesper Dangaard Brouer wrote:
> > This should be smaller than PAGE_SIZE only if you're wasting the rest
> > of the buffer, e.g. MTU is 3k so you know last 1k will never get used.
> > PAGE_SIZE is always a multiple of BNXT_RX_PAGE so you waste nothing.
>
> Remember pp.max_len is used for dma_sync_for_device.
> If driver is smart, it can set pp.max_len according to MTU, as the (DMA
> sync for) device knows hardware will not go beyond this.
> On Intel "dma_sync_for_device" is a no-op, so most drivers done
> optimized for this. I remember is had HUGE effects on ARM EspressoBin board.
Note that (AFAIU) there is no MTU here, these are pages for LRO/GRO,
they will be filled with TCP payload start to end. page_pool_put_page()
does nothing for non-last frag, so we'll only sync for the last
(BNXT_RX_PAGE-sized) frag released, and we need to sync the entire
host page.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 18:00 ` Jakub Kicinski
@ 2023-07-31 18:16 ` Michael Chan
2023-07-31 18:44 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2023-07-31 18:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jesper Dangaard Brouer, davem, netdev, edumazet, pabeni, gospo,
bpf, somnath.kotur, Ilias Apalodimas
[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]
On Mon, Jul 31, 2023 at 11:00 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 31 Jul 2023 19:47:08 +0200 Jesper Dangaard Brouer wrote:
> > > This should be smaller than PAGE_SIZE only if you're wasting the rest
> > > of the buffer, e.g. MTU is 3k so you know last 1k will never get used.
> > > PAGE_SIZE is always a multiple of BNXT_RX_PAGE so you waste nothing.
> >
> > Remember pp.max_len is used for dma_sync_for_device.
> > If driver is smart, it can set pp.max_len according to MTU, as the (DMA
> > sync for) device knows hardware will not go beyond this.
> > On Intel "dma_sync_for_device" is a no-op, so most drivers done
> > optimized for this. I remember is had HUGE effects on ARM EspressoBin board.
>
> Note that (AFAIU) there is no MTU here, these are pages for LRO/GRO,
> they will be filled with TCP payload start to end. page_pool_put_page()
> does nothing for non-last frag, so we'll only sync for the last
> (BNXT_RX_PAGE-sized) frag released, and we need to sync the entire
> host page.
Correct, there is no MTU here. Remember this matters only when
PAGE_SIZE > BNXT_RX_PAGE_SIZE (e.g. 64K PAGE_SIZE and 32K
BNXT_RX_PAGE_SIZE). I think we want to dma_sync_for_device for 32K in
this case.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 18:16 ` Michael Chan
@ 2023-07-31 18:44 ` Jakub Kicinski
2023-07-31 20:20 ` Michael Chan
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-07-31 18:44 UTC (permalink / raw)
To: Michael Chan
Cc: Jesper Dangaard Brouer, davem, netdev, edumazet, pabeni, gospo,
bpf, somnath.kotur, Ilias Apalodimas
On Mon, 31 Jul 2023 11:16:55 -0700 Michael Chan wrote:
> > > Remember pp.max_len is used for dma_sync_for_device.
> > > If driver is smart, it can set pp.max_len according to MTU, as the (DMA
> > > sync for) device knows hardware will not go beyond this.
> > > On Intel "dma_sync_for_device" is a no-op, so most drivers done
> > > optimized for this. I remember is had HUGE effects on ARM EspressoBin board.
> >
> > Note that (AFAIU) there is no MTU here, these are pages for LRO/GRO,
> > they will be filled with TCP payload start to end. page_pool_put_page()
> > does nothing for non-last frag, so we'll only sync for the last
> > (BNXT_RX_PAGE-sized) frag released, and we need to sync the entire
> > host page.
>
> Correct, there is no MTU here. Remember this matters only when
> PAGE_SIZE > BNXT_RX_PAGE_SIZE (e.g. 64K PAGE_SIZE and 32K
> BNXT_RX_PAGE_SIZE). I think we want to dma_sync_for_device for 32K in
> this case.
Maybe I'm misunderstanding. Let me tell you how I think this works and
perhaps we should update the docs based on this discussion.
Note that the max_len is applied to the full host page when the full
host page is returned. Not to fragments, and not at allocation.
The .max_len is the max offset within the host page that the HW may
access. For page-per-packet, 1500B MTU this could matter quite a bit,
because we only have to sync ~1500B rather than 4096B.
some wasted headroom/padding, pp.offset can be used to skip
/ device may touch this section
/ / device will not touch, sync not needed
/ / /
|**| ===== MTU 1500B ====== | - skb_shinfo and unused --- |
<------ .max_len -------->
For fragmented pages it becomes:
middle skb_shinfo
/ remainder
/ |
|**| == MTU == | - shinfo- |**| == MTU == | - shinfo- |+++|
<------------ .max_len ---------------->
So max_len will only exclude the _last_ shinfo and the wasted space
(reminder of dividing page by buffer size). We must sync _all_ packet
sections ("== MTU ==") within the packet.
In bnxt's case - the page is fragmented (latter diagram), and there is
no start offset or wasted space. Ergo .max_len = PAGE_SIZE.
Where did I get off the track?
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 18:44 ` Jakub Kicinski
@ 2023-07-31 20:20 ` Michael Chan
2023-07-31 20:44 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2023-07-31 20:20 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jesper Dangaard Brouer, davem, netdev, edumazet, pabeni, gospo,
bpf, somnath.kotur, Ilias Apalodimas
[-- Attachment #1: Type: text/plain, Size: 984 bytes --]
On Mon, Jul 31, 2023 at 11:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
> Maybe I'm misunderstanding. Let me tell you how I think this works and
> perhaps we should update the docs based on this discussion.
>
> Note that the max_len is applied to the full host page when the full
> host page is returned. Not to fragments, and not at allocation.
>
I think I am beginning to understand what the confusion is. These 32K
page fragments within the page may not belong to the same (GRO)
packet. So we cannot dma_sync the whole page at the same time.
Without setting PP_FLAG_DMA_SYNC_DEV, the driver code should be
something like this:
mapping = page_pool_get_dma_addr(page) + offset;
dma_sync_single_for_device(dev, mapping, BNXT_RX_PAGE_SIZE, bp->rx_dir);
offset may be 0, 32K, etc.
Since the PP_FLAG_DMA_SYNC_DEV logic is not aware of this offset, we
actually must do our own dma_sync and not use PP_FLAG_DMA_SYNC_DEV in
this case. Does that sound right?
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 20:20 ` Michael Chan
@ 2023-07-31 20:44 ` Jakub Kicinski
2023-07-31 21:11 ` Michael Chan
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2023-07-31 20:44 UTC (permalink / raw)
To: Michael Chan
Cc: Jesper Dangaard Brouer, davem, netdev, edumazet, pabeni, gospo,
bpf, somnath.kotur, Ilias Apalodimas
On Mon, 31 Jul 2023 13:20:04 -0700 Michael Chan wrote:
> I think I am beginning to understand what the confusion is. These 32K
> page fragments within the page may not belong to the same (GRO)
> packet.
Right.
> So we cannot dma_sync the whole page at the same time.
I wouldn't phrase it like that.
> Without setting PP_FLAG_DMA_SYNC_DEV, the driver code should be
> something like this:
>
> mapping = page_pool_get_dma_addr(page) + offset;
> dma_sync_single_for_device(dev, mapping, BNXT_RX_PAGE_SIZE, bp->rx_dir);
>
> offset may be 0, 32K, etc.
>
> Since the PP_FLAG_DMA_SYNC_DEV logic is not aware of this offset, we
> actually must do our own dma_sync and not use PP_FLAG_DMA_SYNC_DEV in
> this case. Does that sound right?
No, no, all I'm saying is that with the current code (in page pool)
you can't be very intelligent about the sync'ing. Every time a page
enters the pool - the whole page should be synced. But that's fine,
it's still better to let page pool do the syncing than trying to
do it manually in the driver (since freshly allocated pages do not
have to be synced).
I think the confusion comes partially from the fact that the driver
only ever deals with fragments (32k), but internally page pool does
recycling in full pages (64k). And .max_len is part of the recycling
machinery, so to speak, not part of the allocation machinery.
tl;dr just set .max_len = PAGE_SIZE and all will be right.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 20:44 ` Jakub Kicinski
@ 2023-07-31 21:11 ` Michael Chan
2023-08-01 17:06 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 14+ messages in thread
From: Michael Chan @ 2023-07-31 21:11 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Jesper Dangaard Brouer, davem, netdev, edumazet, pabeni, gospo,
bpf, somnath.kotur, Ilias Apalodimas
[-- Attachment #1: Type: text/plain, Size: 296 bytes --]
On Mon, Jul 31, 2023 at 1:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> tl;dr just set .max_len = PAGE_SIZE and all will be right.
OK I think I got it now. The page is only recycled when all the
fragments are recycled and so we can let page pool DMA sync the whole
page at that time.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 3/3] bnxt_en: Let the page pool manage the DMA mapping
2023-07-31 21:11 ` Michael Chan
@ 2023-08-01 17:06 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-01 17:06 UTC (permalink / raw)
To: Michael Chan, Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, gospo, bpf, somnath.kotur,
Ilias Apalodimas
On 31/07/2023 23.11, Michael Chan wrote:
> On Mon, Jul 31, 2023 at 1:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> tl;dr just set .max_len = PAGE_SIZE and all will be right.
>
> OK I think I got it now. The page is only recycled when all the
> fragments are recycled and so we can let page pool DMA sync the whole
> page at that time.
Yes, Jakub is right, I see that now.
When using page_pool "frag" API (e.g. page_pool_dev_alloc_frag) then the
optimization I talked about isn't valid. We simply have to DMA sync the
entire page, when it gets back to the recycle stage.
--Jesper
^ permalink raw reply [flat|nested] 14+ messages in thread