* [PATCH net-next] net/mlx5e: Mask wqe_id when handling rx cqe
@ 2026-01-12 8:03 Leon Hwang
2026-01-14 8:23 ` Tariq Toukan
0 siblings, 1 reply; 3+ messages in thread
From: Leon Hwang @ 2026-01-12 8:03 UTC (permalink / raw)
To: netdev
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Oz Shlomo, Paul Blakey, Khalid Manaa, Achiad Shochat,
Jiayuan Chen, linux-rdma, linux-kernel, Leon Hwang, Leon Huang Fu
The wqe_id from CQE contains wrap counter bits in addition to the WQE
index. Mask it with sz_m1 to prevent out-of-bounds access to the
rq->mpwqe.info[] array when wrap counter causes wqe_id to exceed RQ size.
Without this fix, the driver crashes with NULL pointer dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:mlx5e_skb_from_cqe_mpwrq_linear+0xb3/0x280 [mlx5_core]
Call Trace:
<IRQ>
mlx5e_handle_rx_cqe_mpwrq+0xe3/0x290 [mlx5_core]
mlx5e_poll_rx_cq+0x97/0x820 [mlx5_core]
mlx5e_napi_poll+0x110/0x820 [mlx5_core]
Fixes: dfd9e7500cd4 ("net/mlx5e: Rx, Split rep rx mpwqe handler from nic")
Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 5 +++++
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 6 +++---
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
index 7e191e1569e8..df8e671d5115 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
@@ -583,4 +583,9 @@ static inline struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int
return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz));
}
+
+static inline u16 mlx5e_rq_cqe_wqe_id(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
+{
+ return be16_to_cpu(cqe->wqe_id) & rq->mpwqe.wq.fbc.sz_m1;
+}
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1f6930c77437..25c04684271c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1957,7 +1957,7 @@ static void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
static void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
{
u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
- u16 wqe_id = be16_to_cpu(cqe->wqe_id);
+ u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, wqe_id);
u16 stride_ix = mpwrq_get_cqe_stride_index(cqe);
u32 wqe_offset = stride_ix << rq->mpwqe.log_stride_sz;
@@ -2373,7 +2373,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
u32 data_offset = wqe_offset & (PAGE_SIZE - 1);
u32 cqe_bcnt = mpwrq_get_cqe_byte_cnt(cqe);
- u16 wqe_id = be16_to_cpu(cqe->wqe_id);
+ u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
u32 page_idx = wqe_offset >> PAGE_SHIFT;
u16 head_size = cqe->shampo.header_size;
struct sk_buff **skb = &rq->hw_gro_data->skb;
@@ -2478,7 +2478,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
{
u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
- u16 wqe_id = be16_to_cpu(cqe->wqe_id);
+ u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, wqe_id);
u16 stride_ix = mpwrq_get_cqe_stride_index(cqe);
u32 wqe_offset = stride_ix << rq->mpwqe.log_stride_sz;
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] net/mlx5e: Mask wqe_id when handling rx cqe
2026-01-12 8:03 [PATCH net-next] net/mlx5e: Mask wqe_id when handling rx cqe Leon Hwang
@ 2026-01-14 8:23 ` Tariq Toukan
2026-01-14 8:53 ` Leon Hwang
0 siblings, 1 reply; 3+ messages in thread
From: Tariq Toukan @ 2026-01-14 8:23 UTC (permalink / raw)
To: Leon Hwang, netdev
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Oz Shlomo, Paul Blakey, Khalid Manaa, Achiad Shochat,
Jiayuan Chen, linux-rdma, linux-kernel, Leon Huang Fu
On 12/01/2026 10:03, Leon Hwang wrote:
> The wqe_id from CQE contains wrap counter bits in addition to the WQE
> index. Mask it with sz_m1 to prevent out-of-bounds access to the
> rq->mpwqe.info[] array when wrap counter causes wqe_id to exceed RQ size.
>
> Without this fix, the driver crashes with NULL pointer dereference:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000020
> RIP: 0010:mlx5e_skb_from_cqe_mpwrq_linear+0xb3/0x280 [mlx5_core]
> Call Trace:
> <IRQ>
> mlx5e_handle_rx_cqe_mpwrq+0xe3/0x290 [mlx5_core]
> mlx5e_poll_rx_cq+0x97/0x820 [mlx5_core]
> mlx5e_napi_poll+0x110/0x820 [mlx5_core]
>
Hi,
We do not expect out-of-bounds index, fixing it this way is not
necessarily correct.
Can you please elaborate on your test case, setup, and how to repro?
> Fixes: dfd9e7500cd4 ("net/mlx5e: Rx, Split rep rx mpwqe handler from nic")
> Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
> Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
> Signed-off-by: Leon Huang Fu <leon.huangfu@shopee.com>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 5 +++++
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 6 +++---
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> index 7e191e1569e8..df8e671d5115 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
> @@ -583,4 +583,9 @@ static inline struct mlx5e_mpw_info *mlx5e_get_mpw_info(struct mlx5e_rq *rq, int
>
> return (struct mlx5e_mpw_info *)((char *)rq->mpwqe.info + array_size(i, isz));
> }
> +
> +static inline u16 mlx5e_rq_cqe_wqe_id(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
> +{
> + return be16_to_cpu(cqe->wqe_id) & rq->mpwqe.wq.fbc.sz_m1;
> +}
> #endif
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 1f6930c77437..25c04684271c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1957,7 +1957,7 @@ static void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
> static void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
> {
> u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
> - u16 wqe_id = be16_to_cpu(cqe->wqe_id);
> + u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
> struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, wqe_id);
> u16 stride_ix = mpwrq_get_cqe_stride_index(cqe);
> u32 wqe_offset = stride_ix << rq->mpwqe.log_stride_sz;
> @@ -2373,7 +2373,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
> u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
> u32 data_offset = wqe_offset & (PAGE_SIZE - 1);
> u32 cqe_bcnt = mpwrq_get_cqe_byte_cnt(cqe);
> - u16 wqe_id = be16_to_cpu(cqe->wqe_id);
> + u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
> u32 page_idx = wqe_offset >> PAGE_SHIFT;
> u16 head_size = cqe->shampo.header_size;
> struct sk_buff **skb = &rq->hw_gro_data->skb;
> @@ -2478,7 +2478,7 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq
> static void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
> {
> u16 cstrides = mpwrq_get_cqe_consumed_strides(cqe);
> - u16 wqe_id = be16_to_cpu(cqe->wqe_id);
> + u16 wqe_id = mlx5e_rq_cqe_wqe_id(rq, cqe);
> struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, wqe_id);
> u16 stride_ix = mpwrq_get_cqe_stride_index(cqe);
> u32 wqe_offset = stride_ix << rq->mpwqe.log_stride_sz;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] net/mlx5e: Mask wqe_id when handling rx cqe
2026-01-14 8:23 ` Tariq Toukan
@ 2026-01-14 8:53 ` Leon Hwang
0 siblings, 0 replies; 3+ messages in thread
From: Leon Hwang @ 2026-01-14 8:53 UTC (permalink / raw)
To: Tariq Toukan, netdev
Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Oz Shlomo, Paul Blakey, Khalid Manaa, Achiad Shochat,
Jiayuan Chen, linux-rdma, linux-kernel, Leon Huang Fu
On 14/1/26 16:23, Tariq Toukan wrote:
>
>
> On 12/01/2026 10:03, Leon Hwang wrote:
>> The wqe_id from CQE contains wrap counter bits in addition to the WQE
>> index. Mask it with sz_m1 to prevent out-of-bounds access to the
>> rq->mpwqe.info[] array when wrap counter causes wqe_id to exceed RQ size.
>>
>> Without this fix, the driver crashes with NULL pointer dereference:
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000020
>> RIP: 0010:mlx5e_skb_from_cqe_mpwrq_linear+0xb3/0x280 [mlx5_core]
>> Call Trace:
>> <IRQ>
>> mlx5e_handle_rx_cqe_mpwrq+0xe3/0x290 [mlx5_core]
>> mlx5e_poll_rx_cq+0x97/0x820 [mlx5_core]
>> mlx5e_napi_poll+0x110/0x820 [mlx5_core]
>>
>
> Hi,
>
> We do not expect out-of-bounds index, fixing it this way is not
> necessarily correct.
>
> Can you please elaborate on your test case, setup, and how to repro?
Hi,
Thanks for the feedback.
Unfortunately, we cannot reliably reproduce this issue on demand, as it
was triggered on a production server. However, we preserved both the
dmesg output and the coredump.
From analysis of the coredump, the wqe_id value was *4167*, which is
unexpectedly out of bounds for the RQ size and led to the NULL pointer
dereference shown above.
For reference, here is the environment where the issue was observed:
NIC: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Firmware: 22.43.2566 (MT_0000000359)
OFED: MLNX_OFED 24.10-2.1.8
Kernel: 5.15.0-189.012-shopee (Ubuntu 24.04 based)
Queue configuration:
# ethtool -g enp23s0f0np0
RX: 4096
TX: 4096
# ethtool -l enp23s0f0np0
Combined: 20
Please let me know if additional details would be helpful.
Thanks,
Leon
[...]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-01-14 8:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 8:03 [PATCH net-next] net/mlx5e: Mask wqe_id when handling rx cqe Leon Hwang
2026-01-14 8:23 ` Tariq Toukan
2026-01-14 8:53 ` Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox