* [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5
@ 2025-09-10 3:41 Amery Hung
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung
0 siblings, 2 replies; 8+ messages in thread
From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw)
To: netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung,
kernel-team
v1
- Separate the set from [0] (Dragos)
- Split legacy RQ and striding RQ fixes (Dragos)
- Drop conditional truesize and end frag ptr update (Dragos)
- Fix truesize calculation in striding RQ (Dragos)
- Fix the always zero headlen passed to __pskb_pull_tail() that
causes kernel panic (Nimrod)
Hi all,
This patchset, separated from [0], contains fixes to mlx5 when handling
non-linear xdp_buff. The driver currently generates skb based on
information obtained before the XDP program runs, such as the number of
fragments and the size of the linear data. However, the XDP program can
actually change them through bpf_adjust_{head,tail}(). Fix the bugs
bygenerating skb according to xdp_buff after the XDP program runs.
[0] https://lore.kernel.org/bpf/20250905173352.3759457-1-ameryhung@gmail.com/
Amery Hung (2):
net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy
RQ
net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for
striding RQ
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 30 +++++++++++++++++--
1 file changed, 27 insertions(+), 3 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ
2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung
@ 2025-09-10 3:41 ` Amery Hung
2025-09-10 16:23 ` Dragos Tatulea
2025-09-11 5:47 ` Tariq Toukan
2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung
1 sibling, 2 replies; 8+ messages in thread
From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw)
To: netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung,
kernel-team
XDP programs can release xdp_buff fragments when calling
bpf_xdp_adjust_tail(). The driver currently assumes the number of
fragments to be unchanged and may generate skb with wrong truesize or
containing invalid frags. Fix the bug by generating skb according to
xdp_buff after the XDP program runs.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b8c609d91d11..1d3eacfd0325 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
struct mlx5e_wqe_frag_info *head_wi = wi;
u16 rx_headroom = rq->buff.headroom;
struct mlx5e_frag_page *frag_page;
+ u8 nr_frags_free, old_nr_frags;
struct skb_shared_info *sinfo;
u32 frag_consumed_bytes;
struct bpf_prog *prog;
@@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
wi++;
}
+ old_nr_frags = sinfo->nr_frags;
+
prog = rcu_dereference(rq->xdp_prog);
if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) {
if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
struct mlx5e_wqe_frag_info *pwi;
+ wi -= old_nr_frags - sinfo->nr_frags;
+
for (pwi = head_wi; pwi < wi; pwi++)
pwi->frag_page->frags++;
}
return NULL; /* page/packet was consumed by XDP */
}
+ nr_frags_free = old_nr_frags - sinfo->nr_frags;
+ wi -= nr_frags_free;
+ truesize -= nr_frags_free * frag_info->frag_stride;
+
skb = mlx5e_build_linear_skb(
rq, mxbuf->xdp.data_hard_start, rq->buff.frame0_sz,
mxbuf->xdp.data - mxbuf->xdp.data_hard_start,
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ
2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
@ 2025-09-10 3:41 ` Amery Hung
2025-09-11 6:19 ` Tariq Toukan
1 sibling, 1 reply; 8+ messages in thread
From: Amery Hung @ 2025-09-10 3:41 UTC (permalink / raw)
To: netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, ameryhung,
kernel-team
XDP programs can change the layout of an xdp_buff through
bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
cannot assume the size of the linear data area nor fragments. Fix the
bug in mlx5 by generating skb according to xdp_buff after XDP programs
run.
Currently, when handling multi-buf XDP, the mlx5 driver assumes the
layout of an xdp_buff to be unchanged. That is, the linear data area
continues to be empty and fragments remain the same. This may cause
the driver to generate erroneous skb or triggering a kernel
warning. When an XDP program added linear data through
bpf_xdp_adjust_head(), the linear data will be ignored as
mlx5e_build_linear_skb() builds an skb without linear data and then
pull data from fragments to fill the linear data area. When an XDP
program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
data size and trigger the BUG_ON in it.
To fix the issue, first record the original number of fragments. If the
number of fragments changes after the XDP program runs, rewind the end
fragment pointer by the difference and recalculate the truesize. Then,
build the skb with the linear data area matching the xdp_buff. Finally,
only pull data in if there is non-linear data and fill the linear part
up to 256 bytes.
Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1d3eacfd0325..fc881d8d2d21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
u32 byte_cnt = cqe_bcnt;
struct skb_shared_info *sinfo;
unsigned int truesize = 0;
+ u32 pg_consumed_bytes;
struct bpf_prog *prog;
struct sk_buff *skb;
u32 linear_frame_sz;
@@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
while (byte_cnt) {
/* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
- u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
+ pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
truesize += pg_consumed_bytes;
@@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
}
if (prog) {
+ u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
+ u32 len;
+
if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
struct mlx5e_frag_page *pfp;
+ frag_page -= old_nr_frags - sinfo->nr_frags;
+
for (pfp = head_page; pfp < frag_page; pfp++)
pfp->frags++;
@@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
return NULL; /* page/packet was consumed by XDP */
}
+ nr_frags_free = old_nr_frags - sinfo->nr_frags;
+ frag_page -= nr_frags_free;
+ truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) +
+ (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz));
+
+ len = mxbuf->xdp.data_end - mxbuf->xdp.data;
+
skb = mlx5e_build_linear_skb(
rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
- mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0,
+ mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
mxbuf->xdp.data - mxbuf->xdp.data_meta);
if (unlikely(!skb)) {
mlx5e_page_release_fragmented(rq->page_pool,
@@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
do
pagep->frags++;
while (++pagep < frag_page);
+
+ headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len);
+ __pskb_pull_tail(skb, headlen);
}
- __pskb_pull_tail(skb, headlen);
} else {
dma_addr_t addr;
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
@ 2025-09-10 16:23 ` Dragos Tatulea
2025-09-10 16:39 ` Amery Hung
2025-09-11 5:47 ` Tariq Toukan
1 sibling, 1 reply; 8+ messages in thread
From: Dragos Tatulea @ 2025-09-10 16:23 UTC (permalink / raw)
To: Amery Hung, netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, saeedm, tariqt, mbloch, cpaasch, kernel-team
On Tue, Sep 09, 2025 at 08:41:02PM -0700, Amery Hung wrote:
> XDP programs can release xdp_buff fragments when calling
> bpf_xdp_adjust_tail(). The driver currently assumes the number of
> fragments to be unchanged and may generate skb with wrong truesize or
> containing invalid frags. Fix the bug by generating skb according to
> xdp_buff after the XDP program runs.
>
> Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index b8c609d91d11..1d3eacfd0325 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> struct mlx5e_wqe_frag_info *head_wi = wi;
> u16 rx_headroom = rq->buff.headroom;
> struct mlx5e_frag_page *frag_page;
> + u8 nr_frags_free, old_nr_frags;
> struct skb_shared_info *sinfo;
> u32 frag_consumed_bytes;
> struct bpf_prog *prog;
> @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> wi++;
> }
>
> + old_nr_frags = sinfo->nr_frags;
> +
> prog = rcu_dereference(rq->xdp_prog);
> if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) {
> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> struct mlx5e_wqe_frag_info *pwi;
>
> + wi -= old_nr_frags - sinfo->nr_frags;
> +
> for (pwi = head_wi; pwi < wi; pwi++)
> pwi->frag_page->frags++;
> }
> return NULL; /* page/packet was consumed by XDP */
> }
>
> + nr_frags_free = old_nr_frags - sinfo->nr_frags;
Just double checking that my understanding is correct:
bpf_xdp_adjust_tail() can increase the tail only up to fragment limit,
right? So this operation can always be >= 0.
If yes:
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Thanks,
Dragos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ
2025-09-10 16:23 ` Dragos Tatulea
@ 2025-09-10 16:39 ` Amery Hung
0 siblings, 0 replies; 8+ messages in thread
From: Amery Hung @ 2025-09-10 16:39 UTC (permalink / raw)
To: Dragos Tatulea
Cc: netdev, bpf, andrew+netdev, davem, edumazet, pabeni, kuba,
martin.lau, noren, saeedm, tariqt, mbloch, cpaasch, kernel-team
On Wed, Sep 10, 2025 at 12:24 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Tue, Sep 09, 2025 at 08:41:02PM -0700, Amery Hung wrote:
> > XDP programs can release xdp_buff fragments when calling
> > bpf_xdp_adjust_tail(). The driver currently assumes the number of
> > fragments to be unchanged and may generate skb with wrong truesize or
> > containing invalid frags. Fix the bug by generating skb according to
> > xdp_buff after the XDP program runs.
> >
> > Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index b8c609d91d11..1d3eacfd0325 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> > struct mlx5e_wqe_frag_info *head_wi = wi;
> > u16 rx_headroom = rq->buff.headroom;
> > struct mlx5e_frag_page *frag_page;
> > + u8 nr_frags_free, old_nr_frags;
> > struct skb_shared_info *sinfo;
> > u32 frag_consumed_bytes;
> > struct bpf_prog *prog;
> > @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> > wi++;
> > }
> >
> > + old_nr_frags = sinfo->nr_frags;
> > +
> > prog = rcu_dereference(rq->xdp_prog);
> > if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) {
> > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> > struct mlx5e_wqe_frag_info *pwi;
> >
> > + wi -= old_nr_frags - sinfo->nr_frags;
> > +
> > for (pwi = head_wi; pwi < wi; pwi++)
> > pwi->frag_page->frags++;
> > }
> > return NULL; /* page/packet was consumed by XDP */
> > }
> >
> > + nr_frags_free = old_nr_frags - sinfo->nr_frags;
> Just double checking that my understanding is correct:
> bpf_xdp_adjust_tail() can increase the tail only up to fragment limit,
> right? So this operation can always be >= 0.
>
Right, AFAIK bpf programs cannot add fragments to xdp_buff.
> If yes:
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
>
> Thanks,
> Dragos
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
2025-09-10 16:23 ` Dragos Tatulea
@ 2025-09-11 5:47 ` Tariq Toukan
1 sibling, 0 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-09-11 5:47 UTC (permalink / raw)
To: Amery Hung, netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, kernel-team
On 10/09/2025 6:41, Amery Hung wrote:
> XDP programs can release xdp_buff fragments when calling
> bpf_xdp_adjust_tail(). The driver currently assumes the number of
> fragments to be unchanged and may generate skb with wrong truesize or
> containing invalid frags. Fix the bug by generating skb according to
> xdp_buff after the XDP program runs.
>
> Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
Hi,
Thanks for your patch!
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index b8c609d91d11..1d3eacfd0325 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1729,6 +1729,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> struct mlx5e_wqe_frag_info *head_wi = wi;
> u16 rx_headroom = rq->buff.headroom;
> struct mlx5e_frag_page *frag_page;
> + u8 nr_frags_free, old_nr_frags;
> struct skb_shared_info *sinfo;
> u32 frag_consumed_bytes;
> struct bpf_prog *prog;
> @@ -1772,17 +1773,25 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
> wi++;
> }
>
> + old_nr_frags = sinfo->nr_frags;
> +
> prog = rcu_dereference(rq->xdp_prog);
> if (prog && mlx5e_xdp_handle(rq, prog, mxbuf)) {
> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> struct mlx5e_wqe_frag_info *pwi;
>
> + wi -= old_nr_frags - sinfo->nr_frags;
> +
> for (pwi = head_wi; pwi < wi; pwi++)
> pwi->frag_page->frags++;
> }
> return NULL; /* page/packet was consumed by XDP */
> }
>
> + nr_frags_free = old_nr_frags - sinfo->nr_frags;
> + wi -= nr_frags_free;
> + truesize -= nr_frags_free * frag_info->frag_stride;
> +
New code section better be under if (prog), rather than running
unconditionally.
Also move all needed new local vars under if (prog) to minimize their scope.
> skb = mlx5e_build_linear_skb(
> rq, mxbuf->xdp.data_hard_start, rq->buff.frame0_sz,
> mxbuf->xdp.data - mxbuf->xdp.data_hard_start,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ
2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung
@ 2025-09-11 6:19 ` Tariq Toukan
2025-09-15 20:39 ` Amery Hung
0 siblings, 1 reply; 8+ messages in thread
From: Tariq Toukan @ 2025-09-11 6:19 UTC (permalink / raw)
To: Amery Hung, netdev
Cc: bpf, andrew+netdev, davem, edumazet, pabeni, kuba, martin.lau,
noren, dtatulea, saeedm, tariqt, mbloch, cpaasch, kernel-team
On 10/09/2025 6:41, Amery Hung wrote:
> XDP programs can change the layout of an xdp_buff through
> bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
> cannot assume the size of the linear data area nor fragments. Fix the
> bug in mlx5 by generating skb according to xdp_buff after XDP programs
> run.
>
> Currently, when handling multi-buf XDP, the mlx5 driver assumes the
> layout of an xdp_buff to be unchanged. That is, the linear data area
> continues to be empty and fragments remain the same. This may cause
> the driver to generate erroneous skb or triggering a kernel
> warning. When an XDP program added linear data through
> bpf_xdp_adjust_head(), the linear data will be ignored as
> mlx5e_build_linear_skb() builds an skb without linear data and then
> pull data from fragments to fill the linear data area. When an XDP
> program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
> the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
> data size and trigger the BUG_ON in it.
>
> To fix the issue, first record the original number of fragments. If the
> number of fragments changes after the XDP program runs, rewind the end
> fragment pointer by the difference and recalculate the truesize. Then,
> build the skb with the linear data area matching the xdp_buff. Finally,
> only pull data in if there is non-linear data and fill the linear part
> up to 256 bytes.
>
> Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
Thanks for your patch!
> .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 1d3eacfd0325..fc881d8d2d21 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> u32 byte_cnt = cqe_bcnt;
> struct skb_shared_info *sinfo;
> unsigned int truesize = 0;
> + u32 pg_consumed_bytes;
> struct bpf_prog *prog;
> struct sk_buff *skb;
> u32 linear_frame_sz;
> @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>
> while (byte_cnt) {
> /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
> - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
>
> if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
> truesize += pg_consumed_bytes;
> @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> }
>
> if (prog) {
> + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
> + u32 len;
> +
> if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> struct mlx5e_frag_page *pfp;
>
> + frag_page -= old_nr_frags - sinfo->nr_frags;
> +
> for (pfp = head_page; pfp < frag_page; pfp++)
> pfp->frags++;
>
> @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> return NULL; /* page/packet was consumed by XDP */
> }
>
> + nr_frags_free = old_nr_frags - sinfo->nr_frags;
> + frag_page -= nr_frags_free;
> + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) +
> + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz));
This is a very complicated calculation resulting zero in the common case
nr_frags_free == 0.
Maybe better do it conditionally under if (nr_frags_free), together with
'frag_page -= nr_frags_free;' ?
We never use stride_size > PAGE_SIZE so the second alignment here is
redundant.
Also, what about truesize changes due to adjust header, i.e. when we
extend the header into the linear part.
I think 'len' calculated below is missing from truesize.
> +
> + len = mxbuf->xdp.data_end - mxbuf->xdp.data;
> +
> skb = mlx5e_build_linear_skb(
> rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
> - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0,
> + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
> mxbuf->xdp.data - mxbuf->xdp.data_meta);
> if (unlikely(!skb)) {
> mlx5e_page_release_fragmented(rq->page_pool,
> @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> do
> pagep->frags++;
> while (++pagep < frag_page);
> +
> + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len);
> + __pskb_pull_tail(skb, headlen);
> }
> - __pskb_pull_tail(skb, headlen);
> } else {
> dma_addr_t addr;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ
2025-09-11 6:19 ` Tariq Toukan
@ 2025-09-15 20:39 ` Amery Hung
0 siblings, 0 replies; 8+ messages in thread
From: Amery Hung @ 2025-09-15 20:39 UTC (permalink / raw)
To: Tariq Toukan
Cc: netdev, bpf, andrew+netdev, davem, edumazet, pabeni, kuba,
martin.lau, noren, dtatulea, saeedm, tariqt, mbloch, cpaasch,
kernel-team
On Thu, Sep 11, 2025 at 2:19 AM Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
>
> On 10/09/2025 6:41, Amery Hung wrote:
> > XDP programs can change the layout of an xdp_buff through
> > bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
> > cannot assume the size of the linear data area nor fragments. Fix the
> > bug in mlx5 by generating skb according to xdp_buff after XDP programs
> > run.
> >
> > Currently, when handling multi-buf XDP, the mlx5 driver assumes the
> > layout of an xdp_buff to be unchanged. That is, the linear data area
> > continues to be empty and fragments remain the same. This may cause
> > the driver to generate erroneous skb or triggering a kernel
> > warning. When an XDP program added linear data through
> > bpf_xdp_adjust_head(), the linear data will be ignored as
> > mlx5e_build_linear_skb() builds an skb without linear data and then
> > pull data from fragments to fill the linear data area. When an XDP
> > program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
> > the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
> > data size and trigger the BUG_ON in it.
> >
> > To fix the issue, first record the original number of fragments. If the
> > number of fragments changes after the XDP program runs, rewind the end
> > fragment pointer by the difference and recalculate the truesize. Then,
> > build the skb with the linear data area matching the xdp_buff. Finally,
> > only pull data in if there is non-linear data and fill the linear part
> > up to 256 bytes.
> >
> > Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
>
> Thanks for your patch!
>
> > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++---
> > 1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 1d3eacfd0325..fc881d8d2d21 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > u32 byte_cnt = cqe_bcnt;
> > struct skb_shared_info *sinfo;
> > unsigned int truesize = 0;
> > + u32 pg_consumed_bytes;
> > struct bpf_prog *prog;
> > struct sk_buff *skb;
> > u32 linear_frame_sz;
> > @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >
> > while (byte_cnt) {
> > /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
> > - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> > + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> >
> > if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
> > truesize += pg_consumed_bytes;
> > @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > }
> >
> > if (prog) {
> > + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
> > + u32 len;
> > +
> > if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
> > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> > struct mlx5e_frag_page *pfp;
> >
> > + frag_page -= old_nr_frags - sinfo->nr_frags;
> > +
> > for (pfp = head_page; pfp < frag_page; pfp++)
> > pfp->frags++;
> >
> > @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > return NULL; /* page/packet was consumed by XDP */
> > }
> >
> > + nr_frags_free = old_nr_frags - sinfo->nr_frags;
> > + frag_page -= nr_frags_free;
> > + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) +
> > + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz));
>
> This is a very complicated calculation resulting zero in the common case
> nr_frags_free == 0.
> Maybe better do it conditionally under if (nr_frags_free), together with
> 'frag_page -= nr_frags_free;' ?
>
Will change the recalculation back to conditional.
> We never use stride_size > PAGE_SIZE so the second alignment here is
> redundant.
Got it. I will remove the ALIGN for the second part.
>
> Also, what about truesize changes due to adjust header, i.e. when we
> extend the header into the linear part.
> I think 'len' calculated below is missing from truesize.
The linear part will be included later in mlx5e_build_linear_skb() ->
napi_build_skb() -> ... -> __finalize_skb_around().
> > +
> > + len = mxbuf->xdp.data_end - mxbuf->xdp.data;
> > +
> > skb = mlx5e_build_linear_skb(
> > rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
> > - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0,
> > + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
> > mxbuf->xdp.data - mxbuf->xdp.data_meta);
> > if (unlikely(!skb)) {
> > mlx5e_page_release_fragmented(rq->page_pool,
> > @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > do
> > pagep->frags++;
> > while (++pagep < frag_page);
> > +
> > + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len);
> > + __pskb_pull_tail(skb, headlen);
> > }
> > - __pskb_pull_tail(skb, headlen);
> > } else {
> > dma_addr_t addr;
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-15 20:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 3:41 [PATCH net v1 0/2] Fix generating skb from non-linear xdp_buff for mlx5 Amery Hung
2025-09-10 3:41 ` [PATCH net v1 1/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ Amery Hung
2025-09-10 16:23 ` Dragos Tatulea
2025-09-10 16:39 ` Amery Hung
2025-09-11 5:47 ` Tariq Toukan
2025-09-10 3:41 ` [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ Amery Hung
2025-09-11 6:19 ` Tariq Toukan
2025-09-15 20:39 ` Amery Hung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).