* [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5
@ 2025-09-10 3:41 Amery Hung
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung
0 siblings, 2 replies; 8+ messages in thread
From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw)
To: netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung,
kernel-team
v1
- Separate the set from [0] (Dragos)
- Split legacy RQ and striding RQ fixes (Dragos)
- Drop conditional truesize and end frag ptr update (Dragos)
- Fix truesize calculation in striding RQ (Dragos)
- Fix the always zero headlen passed to __pskb_pull_tail() that
causes kernel panic (Nimrod)
Hi all,
This patchset, separated from [0], contains fixes to mlx5 when handling
non-linear xdp_buff. The driver currently generates skb based on
information obtained before the XDP program runs, such as the number of
fragments and the size of the linear data. However, the XDP program can
actually change them through bpf_adjust_{head,tail}(). Fix the bugs
bygenerating skb according to xdp_buff after the XDP program runs.
[0] https://lore.kernel.org/bpf/20250905173352.3759457-1-ameryhung@gmail.com/
Amery Hung (2):
net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy
RQ
net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for
striding RQ
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 30 +++++++++++++++++--
1 file changed, 27 insertions(+), 3 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ 2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung @ 2025-09-10 3:41 ` Amery Hung 2025-09-10 16:23 ` Dragos Tatulea 2025-09-11 5:47 ` Tariq Toukan 2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung 1 sibling, 2 replies; 8+ messages in thread From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw) To: netdev Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung, kernel-team XDP programs can release xdp_buff fragments when calling bpf_xdp_adjust_tail(). The driver currently assumes the number of fragments to be unchanged and may generate skb with wrong truesize or containing invalid frags. Fix the bug by generating skb according to xdp_buff after the XDP program runs. Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Signed-off-by: Amery Hung <ameryhung@gmail.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index b8c609d91d11..1d3eacfd0325 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi struct mlx5e_wqe_frag_info *head_wi = wi; u16 rx_headroom = rq->buff.headroom; struct mlx5e_frag_page *frag_page; + u8 nr_frags_free, old_nr_frags; struct skb_shared_info *sinfo; u32 frag_consumed_bytes; struct bpf_prog *prog; @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi wi++; } + old_nr_frags = sinfo->nr_frags; + prog = rcu_dereference(rq->xdp_prog); if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) { if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { struct mlx5e_wqe_frag_info *pwi; + wi -= old_nr_frags - sinfo->nr_frags; + for (pwi = head_wi; pwi < wi; pwi++) pwi->frag_page->frags++; } return NULL; /* page/packet was consumed by XDP */ } + nr_frags_free = old_nr_frags - sinfo->nr_frags; + wi -= nr_frags_free; + truesize -= nr_frags_free * frag_info->frag_stride; + skb = mlx5e_build_linear_skb( rq, mxbuf->xdp.data_hard_start, rq->buff.frame0_sz, mxbuf->xdp.data - mxbuf->xdp.data_hard_start, -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ 2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung @ 2025-09-10 16:23 ` Dragos Tatulea 2025-09-10 16:39 ` Amery Hung 2025-09-11 5:47 ` Tariq Toukan 1 sibling, 1 reply; 8+ messages in thread From: Dragos Tatulea @ 2025-09-10 16:23 UTC (permalink / raw) To: Amery Hung, netdev Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, saeedm, tariqt, mbloch, cpaasch, kernel-team On Tue, Sep 09, 2025 at 08:41:02PM -0700, Amery Hung wrote: > XDP programs can release xdp_buff fragments when calling > bpf_xdp_adjust_tail(). The driver currently assumes the number of > fragments to be unchanged and may generate skb with wrong truesize or > containing invalid frags. Fix the bug by generating skb according to > xdp_buff after the XDP program runs. > > Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > index b8c609d91d11..1d3eacfd0325 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > struct mlx5e_wqe_frag_info *head_wi = wi; > u16 rx_headroom = rq->buff.headroom; > struct mlx5e_frag_page *frag_page; > + u8 nr_frags_free, old_nr_frags; > struct skb_shared_info *sinfo; > u32 frag_consumed_bytes; > struct bpf_prog *prog; > @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > wi++; > } > > + old_nr_frags = sinfo->nr_frags; > + > prog = rcu_dereference(rq->xdp_prog); > if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) { > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { > struct mlx5e_wqe_frag_info *pwi; > > + wi -= old_nr_frags - sinfo->nr_frags; > + > for (pwi = head_wi; pwi < wi; pwi++) > pwi->frag_page->frags++; > } > return NULL; /* page/packet was consumed by XDP */ > } > > + nr_frags_free = old_nr_frags - sinfo->nr_frags; Just double checking that my understanding is correct: bpf_xdp_adjust_tail() can increase the tail only up to fragment limit, right? So this operation can always be >= 0. If yes: Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Thanks, Dragos ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ 2025-09-10 16:23 ` Dragos Tatulea @ 2025-09-10 16:39 ` Amery Hung 0 siblings, 0 replies; 8+ messages in thread From: Amery Hung @ 2025-09-10 16:39 UTC (permalink / raw) To: Dragos Tatulea Cc: netdev, bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, saeedm, tariqt, mbloch, cpaasch, kernel-team On Wed, Sep 10, 2025 at 12:24 PM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > On Tue, Sep 09, 2025 at 08:41:02PM -0700, Amery Hung wrote: > > XDP programs can release xdp_buff fragments when calling > > bpf_xdp_adjust_tail(). The driver currently assumes the number of > > fragments to be unchanged and may generate skb with wrong truesize or > > containing invalid frags. Fix the bug by generating skb according to > > xdp_buff after the XDP program runs. > > > > Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > index b8c609d91d11..1d3eacfd0325 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > > struct mlx5e_wqe_frag_info *head_wi = wi; > > u16 rx_headroom = rq->buff.headroom; > > struct mlx5e_frag_page *frag_page; > > + u8 nr_frags_free, old_nr_frags; > > struct skb_shared_info *sinfo; > > u32 frag_consumed_bytes; > > struct bpf_prog *prog; > > @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > > wi++; > > } > > > > + old_nr_frags = sinfo->nr_frags; > > + > > prog = rcu_dereference(rq->xdp_prog); > > if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) { > > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { > > struct mlx5e_wqe_frag_info *pwi; > > > > + wi -= old_nr_frags - sinfo->nr_frags; > > + > > for (pwi = head_wi; pwi < wi; pwi++) > > pwi->frag_page->frags++; > > } > > return NULL; /* page/packet was consumed by XDP */ > > } > > > > + nr_frags_free = old_nr_frags - sinfo->nr_frags; > Just double checking that my understanding is correct: > bpf_xdp_adjust_tail() can increase the tail only up to fragment limit, > right? So this operation can always be >= 0. > Right, AFAIK bpf programs cannot add fragments to xdp_buff. > If yes: > Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> > > Thanks, > Dragos ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ 2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung 2025-09-10 16:23 ` Dragos Tatulea @ 2025-09-11 5:47 ` Tariq Toukan 1 sibling, 0 replies; 8+ messages in thread From: Tariq Toukan @ 2025-09-11 5:47 UTC (permalink / raw) To: Amery Hung, netdev Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, kernel-team On 10/09/2025 6:41, Amery Hung wrote: > XDP programs can release xdp_buff fragments when calling > bpf_xdp_adjust_tail(). The driver currently assumes the number of > fragments to be unchanged and may generate skb with wrong truesize or > containing invalid frags. Fix the bug by generating skb according to > xdp_buff after the XDP program runs. > > Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- Hi, Thanks for your patch! > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > index b8c609d91d11..1d3eacfd0325 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > struct mlx5e_wqe_frag_info *head_wi = wi; > u16 rx_headroom = rq->buff.headroom; > struct mlx5e_frag_page *frag_page; > + u8 nr_frags_free, old_nr_frags; > struct skb_shared_info *sinfo; > u32 frag_consumed_bytes; > struct bpf_prog *prog; > @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi > wi++; > } > > + old_nr_frags = sinfo->nr_frags; > + > prog = rcu_dereference(rq->xdp_prog); > if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) { > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { > struct mlx5e_wqe_frag_info *pwi; > > + wi -= old_nr_frags - sinfo->nr_frags; > + > for (pwi = head_wi; pwi < wi; pwi++) > pwi->frag_page->frags++; > } > return NULL; /* page/packet was consumed by XDP */ > } > > + nr_frags_free = old_nr_frags - sinfo->nr_frags; > + wi -= nr_frags_free; > + truesize -= nr_frags_free * frag_info->frag_stride; > + New code section better be under if (prog), rather than running unconditionally. Also move all needed new local vars under if (prog) to minimize their scope. > skb = mlx5e_build_linear_skb( > rq, mxbuf->xdp.data_hard_start, rq->buff.frame0_sz, > mxbuf->xdp.data - mxbuf->xdp.data_hard_start, ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ 2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung 2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung @ 2025-09-10 3:41 ` Amery Hung 2025-09-11 6:19 ` Tariq Toukan 1 sibling, 1 reply; 8+ messages in thread From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw) To: netdev Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung, kernel-team XDP programs can change the layout of an xdp_buff through bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver cannot assume the size of the linear data area nor fragments. Fix the bug in mlx5 by generating skb according to xdp_buff after XDP programs run. Currently, when handling multi-buf XDP, the mlx5 driver assumes the layout of an xdp_buff to be unchanged. That is, the linear data area continues to be empty and fragments remain the same. This may cause the driver to generate erroneous skb or triggering a kernel warning. When an XDP program added linear data through bpf_xdp_adjust_head(), the linear data will be ignored as mlx5e_build_linear_skb() builds an skb without linear data and then pull data from fragments to fill the linear data area. When an XDP program has shrunk the non-linear data through bpf_xdp_adjust_tail(), the delta passed to __pskb_pull_tail() may exceed the actual nonlinear data size and trigger the BUG_ON in it. To fix the issue, first record the original number of fragments. If the number of fragments changes after the XDP program runs, rewind the end fragment pointer by the difference and recalculate the truesize. Then, build the skb with the linear data area matching the xdp_buff. Finally, only pull data in if there is non-linear data and fill the linear part up to 256 bytes. Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ") Signed-off-by: Amery Hung <ameryhung@gmail.com> --- .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 1d3eacfd0325..fc881d8d2d21 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w u32 byte_cnt = cqe_bcnt; struct skb_shared_info *sinfo; unsigned int truesize = 0; + u32 pg_consumed_bytes; struct bpf_prog *prog; struct sk_buff *skb; u32 linear_frame_sz; @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w while (byte_cnt) { /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */ - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) truesize += pg_consumed_bytes; @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w } if (prog) { + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; + u32 len; + if (mlx5e_xdp_handle(rq, prog, mxbuf)) { if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { struct mlx5e_frag_page *pfp; + frag_page -= old_nr_frags - sinfo->nr_frags; + for (pfp = head_page; pfp < frag_page; pfp++) pfp->frags++; @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w return NULL; /* page/packet was consumed by XDP */ } + nr_frags_free = old_nr_frags - sinfo->nr_frags; + frag_page -= nr_frags_free; + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) + + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz)); + + len = mxbuf->xdp.data_end - mxbuf->xdp.data; + skb = mlx5e_build_linear_skb( rq, mxbuf->xdp.data_hard_start, linear_frame_sz, - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0, + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len, mxbuf->xdp.data - mxbuf->xdp.data_meta); if (unlikely(!skb)) { mlx5e_page_release_fragmented(rq->page_pool, @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w do pagep->frags++; while (++pagep < frag_page); + + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len); + __pskb_pull_tail(skb, headlen); } - __pskb_pull_tail(skb, headlen); } else { dma_addr_t addr; -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ 2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung @ 2025-09-11 6:19 ` Tariq Toukan 2025-09-15 20:39 ` Amery Hung 0 siblings, 1 reply; 8+ messages in thread From: Tariq Toukan @ 2025-09-11 6:19 UTC (permalink / raw) To: Amery Hung, netdev Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, kernel-team On 10/09/2025 6:41, Amery Hung wrote: > XDP programs can change the layout of an xdp_buff through > bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver > cannot assume the size of the linear data area nor fragments. Fix the > bug in mlx5 by generating skb according to xdp_buff after XDP programs > run. > > Currently, when handling multi-buf XDP, the mlx5 driver assumes the > layout of an xdp_buff to be unchanged. That is, the linear data area > continues to be empty and fragments remain the same. This may cause > the driver to generate erroneous skb or triggering a kernel > warning. When an XDP program added linear data through > bpf_xdp_adjust_head(), the linear data will be ignored as > mlx5e_build_linear_skb() builds an skb without linear data and then > pull data from fragments to fill the linear data area. When an XDP > program has shrunk the non-linear data through bpf_xdp_adjust_tail(), > the delta passed to __pskb_pull_tail() may exceed the actual nonlinear > data size and trigger the BUG_ON in it. > > To fix the issue, first record the original number of fragments. If the > number of fragments changes after the XDP program runs, rewind the end > fragment pointer by the difference and recalculate the truesize. Then, > build the skb with the linear data area matching the xdp_buff. Finally, > only pull data in if there is non-linear data and fill the linear part > up to 256 bytes. > > Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ") > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- Thanks for your patch! > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > index 1d3eacfd0325..fc881d8d2d21 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > u32 byte_cnt = cqe_bcnt; > struct skb_shared_info *sinfo; > unsigned int truesize = 0; > + u32 pg_consumed_bytes; > struct bpf_prog *prog; > struct sk_buff *skb; > u32 linear_frame_sz; > @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > while (byte_cnt) { > /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */ > - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); > + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); > > if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) > truesize += pg_consumed_bytes; > @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > } > > if (prog) { > + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; > + u32 len; > + > if (mlx5e_xdp_handle(rq, prog, mxbuf)) { > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { > struct mlx5e_frag_page *pfp; > > + frag_page -= old_nr_frags - sinfo->nr_frags; > + > for (pfp = head_page; pfp < frag_page; pfp++) > pfp->frags++; > > @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > return NULL; /* page/packet was consumed by XDP */ > } > > + nr_frags_free = old_nr_frags - sinfo->nr_frags; > + frag_page -= nr_frags_free; > + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) + > + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz)); This is a very complicated calculation resulting zero in the common case nr_frags_free == 0. Maybe better do it conditionally under if (nr_frags_free), together with 'frag_page -= nr_frags_free;' ? We never use stride_size > PAGE_SIZE so the second alignment here is redundant. Also, what about truesize changes due to adjust header, i.e. when we extend the header into the linear part. I think 'len' calculated below is missing from truesize. > + > + len = mxbuf->xdp.data_end - mxbuf->xdp.data; > + > skb = mlx5e_build_linear_skb( > rq, mxbuf->xdp.data_hard_start, linear_frame_sz, > - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0, > + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len, > mxbuf->xdp.data - mxbuf->xdp.data_meta); > if (unlikely(!skb)) { > mlx5e_page_release_fragmented(rq->page_pool, > @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > do > pagep->frags++; > while (++pagep < frag_page); > + > + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len); > + __pskb_pull_tail(skb, headlen); > } > - __pskb_pull_tail(skb, headlen); > } else { > dma_addr_t addr; > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ 2025-09-11 6:19 ` Tariq Toukan @ 2025-09-15 20:39 ` Amery Hung 0 siblings, 0 replies; 8+ messages in thread From: Amery Hung @ 2025-09-15 20:39 UTC (permalink / raw) To: Tariq Toukan Cc: netdev, bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, kernel-team On Thu, Sep 11, 2025 at 2:19 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote: > > > > On 10/09/2025 6:41, Amery Hung wrote: > > XDP programs can change the layout of an xdp_buff through > > bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver > > cannot assume the size of the linear data area nor fragments. Fix the > > bug in mlx5 by generating skb according to xdp_buff after XDP programs > > run. > > > > Currently, when handling multi-buf XDP, the mlx5 driver assumes the > > layout of an xdp_buff to be unchanged. That is, the linear data area > > continues to be empty and fragments remain the same. This may cause > > the driver to generate erroneous skb or triggering a kernel > > warning. When an XDP program added linear data through > > bpf_xdp_adjust_head(), the linear data will be ignored as > > mlx5e_build_linear_skb() builds an skb without linear data and then > > pull data from fragments to fill the linear data area. When an XDP > > program has shrunk the non-linear data through bpf_xdp_adjust_tail(), > > the delta passed to __pskb_pull_tail() may exceed the actual nonlinear > > data size and trigger the BUG_ON in it. > > > > To fix the issue, first record the original number of fragments. If the > > number of fragments changes after the XDP program runs, rewind the end > > fragment pointer by the difference and recalculate the truesize. Then, > > build the skb with the linear data area matching the xdp_buff. Finally, > > only pull data in if there is non-linear data and fill the linear part > > up to 256 bytes. > > > > Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ") > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > > --- > > Thanks for your patch! > > > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++--- > > 1 file changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > index 1d3eacfd0325..fc881d8d2d21 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c > > @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > u32 byte_cnt = cqe_bcnt; > > struct skb_shared_info *sinfo; > > unsigned int truesize = 0; > > + u32 pg_consumed_bytes; > > struct bpf_prog *prog; > > struct sk_buff *skb; > > u32 linear_frame_sz; > > @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > > > while (byte_cnt) { > > /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */ > > - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); > > + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt); > > > > if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state)) > > truesize += pg_consumed_bytes; > > @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > } > > > > if (prog) { > > + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags; > > + u32 len; > > + > > if (mlx5e_xdp_handle(rq, prog, mxbuf)) { > > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) { > > struct mlx5e_frag_page *pfp; > > > > + frag_page -= old_nr_frags - sinfo->nr_frags; > > + > > for (pfp = head_page; pfp < frag_page; pfp++) > > pfp->frags++; > > > > @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > return NULL; /* page/packet was consumed by XDP */ > > } > > > > + nr_frags_free = old_nr_frags - sinfo->nr_frags; > > + frag_page -= nr_frags_free; > > + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) + > > + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz)); > > This is a very complicated calculation resulting zero in the common case > nr_frags_free == 0. > Maybe better do it conditionally under if (nr_frags_free), together with > 'frag_page -= nr_frags_free;' ? > Will change the recalculation back to conditional. > We never use stride_size > PAGE_SIZE so the second alignment here is > redundant. Got it. I will remove the ALIGN for the second part. > > Also, what about truesize changes due to adjust header, i.e. when we > extend the header into the linear part. > I think 'len' calculated below is missing from truesize. The linear part will be included later in mlx5e_build_linear_skb() -> napi_build_skb() -> ... -> __finalize_skb_around(). > > + > > + len = mxbuf->xdp.data_end - mxbuf->xdp.data; > > + > > skb = mlx5e_build_linear_skb( > > rq, mxbuf->xdp.data_hard_start, linear_frame_sz, > > - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0, > > + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len, > > mxbuf->xdp.data - mxbuf->xdp.data_meta); > > if (unlikely(!skb)) { > > mlx5e_page_release_fragmented(rq->page_pool, > > @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w > > do > > pagep->frags++; > > while (++pagep < frag_page); > > + > > + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len); > > + __pskb_pull_tail(skb, headlen); > > } > > - __pskb_pull_tail(skb, headlen); > > } else { > > dma_addr_t addr; > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-15 20:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung 2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung 2025-09-10 16:23 ` Dragos Tatulea 2025-09-10 16:39 ` Amery Hung 2025-09-11 5:47 ` Tariq Toukan 2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung 2025-09-11 6:19 ` Tariq Toukan 2025-09-15 20:39 ` Amery Hung
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox