* [PATCH net] bnxt: fix head underflow on XDP head-grow
@ 2026-06-08 20:31 Joe Damato
2026-06-09 4:22 ` Michael Chan
0 siblings, 1 reply; 3+ messages in thread
From: Joe Damato @ 2026-06-08 20:31 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, Somnath Kotur, Andy Gospodarek
Cc: horms, Joe Damato, linux-kernel, bpf
The xdp.py test test_xdp_native_adjst_head_grow_data crashes when run on
a bnxt machine (and also crashes in NIPA).
It seems that the bug is an underflow in bnxt_rx_multi_page_skb, which
builds the skb head:
napi_build_skb(data_ptr - bp->rx_offset, rxr->rx_page_size);
The problem with this expression is that in page mode rx_offset is:
bp->rx_offset = NET_IP_ALIGN + XDP_PACKET_HEADROOM;
Which evaluates (at least on x86_64) to 258.
The test test_xdp_native_adjst_head_grow_data tests a case where the
head is adjusted by -256.
When this test runs, data_ptr is shifted to frag_start + 2 (where
frag_start = page_address(page) + offset).
Then, bnxt_rx_multi_page_skb is invoked and the napi_build_skb
expression subtracts 258, landing at an address before frag_start. This
could be either the previous fragment or the previous physical page when
the frag_offset is < 256 (e.g. if the fragment started at offset 0).
When the skb is freed, the page pool fragment reference is dropped on
either the wrong page or the wrong frag of the right page. In either
case, the corrupted reference count can lead to the page being
prematurely recycled while still in use. Once (incorrectly) recycled, it
can be handed out again and on driver teardown this would result in a
double free.
The commit under fixes updated this code to handle the case where the
native page size is >= 64k, but it unintentionally broke the head grow
case.
To fix this, we need to do a bit of math to recover the offset if this
is a page fragment since it is not passed into rx_skb_func
(bnxt_rx_multi_page_skb, in this case).
Once the offset is recovered, build the skb at the start of the fragment
and then use skb_reserve to adjust the layout.
There are two cases, the non-adjustment case and the adjustment case.
In both cases, the skb is built at page_address(page) + frag_offset to
account for the case where the native page size >= 64K and skb_reserve
is called with data_ptr - (page_address(page) + frag_offset). That
difference equals bp->rx_offset when data_ptr was not moved, or
bp->rx_offset + xdp_adjust when XDP adjusted the head.
Re-running the failing test with this commit applied causes the test to
run successfully to completion.
The other rx_skb_func implementations don't have this issue.
Fixes: f6974b4c2d8e ("bnxt_en: Fix page pool logic for page size >= 64K")
Signed-off-by: Joe Damato <joe@dama.to>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 35e1f8f663c7..448609cc1617 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1174,7 +1174,9 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
unsigned int len = offset_and_len & 0xffff;
struct page *page = data;
u16 prod = rxr->rx_prod;
+ unsigned int frag_off;
struct sk_buff *skb;
+ void *frag_start;
int err;
err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
@@ -1185,13 +1187,15 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
dma_addr -= bp->rx_dma_offset;
dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, rxr->rx_page_size,
bp->rx_dir);
- skb = napi_build_skb(data_ptr - bp->rx_offset, rxr->rx_page_size);
+ frag_off = dma_addr - page_pool_get_dma_addr(page);
+ frag_start = page_address(page) + frag_off;
+ skb = napi_build_skb(frag_start, rxr->rx_page_size);
if (!skb) {
page_pool_recycle_direct(rxr->page_pool, page);
return NULL;
}
skb_mark_for_recycle(skb);
- skb_reserve(skb, bp->rx_offset);
+ skb_reserve(skb, data_ptr - (u8 *)frag_start);
__skb_put(skb, len);
return skb;
base-commit: 9772589b57e44aedc240211c5c3f7a684a034d3a
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] bnxt: fix head underflow on XDP head-grow
2026-06-08 20:31 [PATCH net] bnxt: fix head underflow on XDP head-grow Joe Damato
@ 2026-06-09 4:22 ` Michael Chan
2026-06-09 14:50 ` Joe Damato
0 siblings, 1 reply; 3+ messages in thread
From: Michael Chan @ 2026-06-09 4:22 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, Pavan Chebbi, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Somnath Kotur, Andy Gospodarek, horms, linux-kernel, bpf
[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]
On Mon, Jun 8, 2026 at 1:31 PM Joe Damato <joe@dama.to> wrote:
>
> The xdp.py test test_xdp_native_adjst_head_grow_data crashes when run on
> a bnxt machine (and also crashes in NIPA).
>
> It seems that the bug is an underflow in bnxt_rx_multi_page_skb, which
> builds the skb head:
>
> napi_build_skb(data_ptr - bp->rx_offset, rxr->rx_page_size);
>
> The problem with this expression is that in page mode rx_offset is:
>
> bp->rx_offset = NET_IP_ALIGN + XDP_PACKET_HEADROOM;
>
> Which evaluates (at least on x86_64) to 258.
>
> The test test_xdp_native_adjst_head_grow_data tests a case where the
> head is adjusted by -256.
>
> When this test runs, data_ptr is shifted to frag_start + 2 (where
> frag_start = page_address(page) + offset).
>
> Then, bnxt_rx_multi_page_skb is invoked and the napi_build_skb
> expression subtracts 258, landing at an address before frag_start. This
> could be either the previous fragment or the previous physical page when
> the frag_offset is < 256 (e.g. if the fragment started at offset 0).
>
> When the skb is freed, the page pool fragment reference is dropped on
> either the wrong page or the wrong frag of the right page. In either
> case, the corrupted reference count can lead to the page being
> prematurely recycled while still in use. Once (incorrectly) recycled, it
> can be handed out again and on driver teardown this would result in a
> double free.
>
> The commit under fixes updated this code to handle the case where the
> native page size is >= 64k, but it unintentionally broke the head grow
> case.
>
> To fix this, we need to do a bit of math to recover the offset if this
> is a page fragment since it is not passed into rx_skb_func
> (bnxt_rx_multi_page_skb, in this case).
I wonder if we should add an offset field to struct bnxt_sw_rx_bd to
simplify things for page mode. Struct bnxt_sw_rx_agg_bd has the
offset field for a similar purpose. Thanks.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] bnxt: fix head underflow on XDP head-grow
2026-06-09 4:22 ` Michael Chan
@ 2026-06-09 14:50 ` Joe Damato
0 siblings, 0 replies; 3+ messages in thread
From: Joe Damato @ 2026-06-09 14:50 UTC (permalink / raw)
To: Michael Chan
Cc: netdev, Pavan Chebbi, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Somnath Kotur, Andy Gospodarek, horms, linux-kernel, bpf
On Mon, Jun 08, 2026 at 09:22:07PM -0700, Michael Chan wrote:
> On Mon, Jun 8, 2026 at 1:31 PM Joe Damato <joe@dama.to> wrote:
> >
> > The xdp.py test test_xdp_native_adjst_head_grow_data crashes when run on
> > a bnxt machine (and also crashes in NIPA).
> >
> > It seems that the bug is an underflow in bnxt_rx_multi_page_skb, which
> > builds the skb head:
> >
> > napi_build_skb(data_ptr - bp->rx_offset, rxr->rx_page_size);
> >
> > The problem with this expression is that in page mode rx_offset is:
> >
> > bp->rx_offset = NET_IP_ALIGN + XDP_PACKET_HEADROOM;
> >
> > Which evaluates (at least on x86_64) to 258.
> >
> > The test test_xdp_native_adjst_head_grow_data tests a case where the
> > head is adjusted by -256.
> >
> > When this test runs, data_ptr is shifted to frag_start + 2 (where
> > frag_start = page_address(page) + offset).
> >
> > Then, bnxt_rx_multi_page_skb is invoked and the napi_build_skb
> > expression subtracts 258, landing at an address before frag_start. This
> > could be either the previous fragment or the previous physical page when
> > the frag_offset is < 256 (e.g. if the fragment started at offset 0).
> >
> > When the skb is freed, the page pool fragment reference is dropped on
> > either the wrong page or the wrong frag of the right page. In either
> > case, the corrupted reference count can lead to the page being
> > prematurely recycled while still in use. Once (incorrectly) recycled, it
> > can be handed out again and on driver teardown this would result in a
> > double free.
> >
> > The commit under fixes updated this code to handle the case where the
> > native page size is >= 64k, but it unintentionally broke the head grow
> > case.
> >
> > To fix this, we need to do a bit of math to recover the offset if this
> > is a page fragment since it is not passed into rx_skb_func
> > (bnxt_rx_multi_page_skb, in this case).
>
> I wonder if we should add an offset field to struct bnxt_sw_rx_bd to
> simplify things for page mode. Struct bnxt_sw_rx_agg_bd has the
> offset field for a similar purpose. Thanks.
I don't mind doing that, but I wonder if that's better material for net-next?
In other words, we get the minimal fix in (this patch?) and then do the
cleanup and struct tweaking as a follow-up in net-next?
I could definitely be wrong; I had just assumed patches for net were supposed
to be as minimal as possible.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-09 14:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 20:31 [PATCH net] bnxt: fix head underflow on XDP head-grow Joe Damato
2026-06-09 4:22 ` Michael Chan
2026-06-09 14:50 ` Joe Damato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox