* [PATCH v3] net/mlx5: Flag state up only after cmdif is ready
@ 2025-06-03 6:14 Chenguang Zhao
2025-06-03 17:25 ` Moshe Shemesh
2025-06-05 9:19 ` Paolo Abeni
0 siblings, 2 replies; 4+ messages in thread
From: Chenguang Zhao @ 2025-06-03 6:14 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Moshe Shemesh
Cc: Chenguang Zhao, netdev, linux-rdma
When driver is reloading during recovery flow, it can't get new commands
till command interface is up again. Otherwise we may get to null pointer
trying to access non initialized command structures.
The issue can be reproduced using the following script:
1)Use following script to trigger PCI error.
for((i=1;i<1000;i++));
do
echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
echo “pci reset test $i times”
done
2) Use following script to read speed.
while true; do cat /sys/class/net/eth0/speed &> /dev/null; done
task: ffff885f42820fd0 ti: ffff88603f758000 task.ti: ffff88603f758000
RIP: 0010:[] [] dma_pool_alloc+0x1ab/0×290
RSP: 0018:ffff88603f75baf0 EFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff882f77d90c80 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff882f77d90d10
RBP: ffff88603f75bb20 R08: 0000000000019ba0 R09: ffff88017fc07c00
R10: ffffffffc0a9c384 R11: 0000000000000246 R12: ffff882f77d90d00
R13: 00000000000080d0 R14: ffff882f77d90d10 R15: ffff88340b6c5ea8
FS: 00007efce8330740(0000) GS:ffff885f4da00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000003454fc6000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call trace:
mlx5_alloc_cmd_msg+0xb4/0×2a0 [mlx5_core]
mlx5_alloc_cmd_msg+0xd3/0×2a0 [mlx5_core]
cmd_exec+0xcf/0×8a0 [mlx5_core]
mlx5_cmd_exec+0x33/0×50 [mlx5_core]
mlx5_core_access_reg+0xf1/0×170 [mlx5_core]
mlx5_query_port_ptys+0x64/0×70 [mlx5_core]
mlx5e_get_link_ksettings+0x5c/0×360 [mlx5_core]
__ethtool_get_link_ksettings+0xa6/0×210
speed_show+0x78/0xb0
dev_attr_show+0x23/0×60
sysfs_read_file+0x99/0×190
vfs_read+0x9f/0×170
SyS_read+0x7f/0xe0
tracesys+0xe3/0xe8
Fixes: a80d1b68c8b7a0 ("net/mlx5: Break load_one into three stages")
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
---
v3:
- The recovery process of pci error is mlx5_load_one ->
mlx5_load_one_devl_locked -> mlx5_function_setup ->
mlx5_function_enable -> mlx5_cmd_enable. In the mlx5_cmd_enable
function, cmd->state will be set to MLX5_CMDIF_STATE_DOWN, and when the
pci error recovery fails, it is the recovery of the entire device, so I
prefer to use MLX5_DEVICE_STATE_UP.
v2:
https://lore.kernel.org/all/b8c300f8-bb3b-421f-81c5-f493984f922d@nvidia.com/
v1:
https://lore.kernel.org/all/20250527013723.242599-1-zhaochenguang@kylinos.cn/
---
drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 41e8660c819c..713f1f4f2b42 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1210,6 +1210,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
+ /* remove any previous indication of internal error */
+ dev->state = MLX5_DEVICE_STATE_UP;
+
err = mlx5_core_enable_hca(dev, 0);
if (err) {
mlx5_core_err(dev, "enable hca failed\n");
@@ -1602,8 +1605,6 @@ int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery)
mlx5_core_warn(dev, "interface is up, NOP\n");
goto out;
}
- /* remove any previous indication of internal error */
- dev->state = MLX5_DEVICE_STATE_UP;
if (recovery)
timeout = mlx5_tout_ms(dev, FW_PRE_INIT_ON_RECOVERY_TIMEOUT);
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v3] net/mlx5: Flag state up only after cmdif is ready
2025-06-03 6:14 [PATCH v3] net/mlx5: Flag state up only after cmdif is ready Chenguang Zhao
@ 2025-06-03 17:25 ` Moshe Shemesh
2025-06-05 9:19 ` Paolo Abeni
1 sibling, 0 replies; 4+ messages in thread
From: Moshe Shemesh @ 2025-06-03 17:25 UTC (permalink / raw)
To: Chenguang Zhao, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: netdev, linux-rdma
On 6/3/2025 9:14 AM, Chenguang Zhao wrote:
>
> When driver is reloading during recovery flow, it can't get new commands
> till command interface is up again. Otherwise we may get to null pointer
> trying to access non initialized command structures.
>
> The issue can be reproduced using the following script:
>
> 1)Use following script to trigger PCI error.
>
> for((i=1;i<1000;i++));
> do
> echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
> echo “pci reset test $i times”
> done
>
> 2) Use following script to read speed.
>
> while true; do cat /sys/class/net/eth0/speed &> /dev/null; done
>
> task: ffff885f42820fd0 ti: ffff88603f758000 task.ti: ffff88603f758000
> RIP: 0010:[] [] dma_pool_alloc+0x1ab/0×290
> RSP: 0018:ffff88603f75baf0 EFLAGS: 00010046
> RAX: 0000000000000246 RBX: ffff882f77d90c80 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff882f77d90d10
> RBP: ffff88603f75bb20 R08: 0000000000019ba0 R09: ffff88017fc07c00
> R10: ffffffffc0a9c384 R11: 0000000000000246 R12: ffff882f77d90d00
> R13: 00000000000080d0 R14: ffff882f77d90d10 R15: ffff88340b6c5ea8
> FS: 00007efce8330740(0000) GS:ffff885f4da00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000003454fc6000 CR4: 00000000003407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call trace:
> mlx5_alloc_cmd_msg+0xb4/0×2a0 [mlx5_core]
> mlx5_alloc_cmd_msg+0xd3/0×2a0 [mlx5_core]
> cmd_exec+0xcf/0×8a0 [mlx5_core]
> mlx5_cmd_exec+0x33/0×50 [mlx5_core]
> mlx5_core_access_reg+0xf1/0×170 [mlx5_core]
> mlx5_query_port_ptys+0x64/0×70 [mlx5_core]
> mlx5e_get_link_ksettings+0x5c/0×360 [mlx5_core]
> __ethtool_get_link_ksettings+0xa6/0×210
> speed_show+0x78/0xb0
> dev_attr_show+0x23/0×60
> sysfs_read_file+0x99/0×190
> vfs_read+0x9f/0×170
> SyS_read+0x7f/0xe0
> tracesys+0xe3/0xe8
>
> Fixes: a80d1b68c8b7a0 ("net/mlx5: Break load_one into three stages")
> Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
> ---
> v3:
> - The recovery process of pci error is mlx5_load_one ->
> mlx5_load_one_devl_locked -> mlx5_function_setup ->
> mlx5_function_enable -> mlx5_cmd_enable. In the mlx5_cmd_enable
> function, cmd->state will be set to MLX5_CMDIF_STATE_DOWN, and when the
Yes, but that is set when cmdif is being re-initialized while your
change removes MLX5_DEVICE_STATE_UP before.
The trace points to cmdif, that's why we better handle it there.
I couldn't reproduce it using the scripts above, what is the
reproduction frequency ? can you send me the whole log of reproduction ?
Thanks.
> pci error recovery fails, it is the recovery of the entire device, so I
> prefer to use MLX5_DEVICE_STATE_UP.
>
> v2:
> https://lore.kernel.org/all/b8c300f8-bb3b-421f-81c5-f493984f922d@nvidia.com/
>
> v1:
> https://lore.kernel.org/all/20250527013723.242599-1-zhaochenguang@kylinos.cn/
> ---
> drivers/net/ethernet/mellanox/mlx5/core/main.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 41e8660c819c..713f1f4f2b42 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1210,6 +1210,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
> dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
> mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
>
> + /* remove any previous indication of internal error */
> + dev->state = MLX5_DEVICE_STATE_UP;
> +
> err = mlx5_core_enable_hca(dev, 0);
> if (err) {
> mlx5_core_err(dev, "enable hca failed\n");
> @@ -1602,8 +1605,6 @@ int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery)
> mlx5_core_warn(dev, "interface is up, NOP\n");
> goto out;
> }
> - /* remove any previous indication of internal error */
> - dev->state = MLX5_DEVICE_STATE_UP;
>
> if (recovery)
> timeout = mlx5_tout_ms(dev, FW_PRE_INIT_ON_RECOVERY_TIMEOUT);
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v3] net/mlx5: Flag state up only after cmdif is ready
2025-06-03 6:14 [PATCH v3] net/mlx5: Flag state up only after cmdif is ready Chenguang Zhao
2025-06-03 17:25 ` Moshe Shemesh
@ 2025-06-05 9:19 ` Paolo Abeni
1 sibling, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2025-06-05 9:19 UTC (permalink / raw)
To: Chenguang Zhao, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Moshe Shemesh
Cc: netdev, linux-rdma
On 6/3/25 8:14 AM, Chenguang Zhao wrote:
> When driver is reloading during recovery flow, it can't get new commands
> till command interface is up again. Otherwise we may get to null pointer
> trying to access non initialized command structures.
>
> The issue can be reproduced using the following script:
>
> 1)Use following script to trigger PCI error.
>
> for((i=1;i<1000;i++));
> do
> echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
> echo “pci reset test $i times”
> done
>
> 2) Use following script to read speed.
>
> while true; do cat /sys/class/net/eth0/speed &> /dev/null; done
>
> task: ffff885f42820fd0 ti: ffff88603f758000 task.ti: ffff88603f758000
> RIP: 0010:[] [] dma_pool_alloc+0x1ab/0×290
> RSP: 0018:ffff88603f75baf0 EFLAGS: 00010046
> RAX: 0000000000000246 RBX: ffff882f77d90c80 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff882f77d90d10
> RBP: ffff88603f75bb20 R08: 0000000000019ba0 R09: ffff88017fc07c00
> R10: ffffffffc0a9c384 R11: 0000000000000246 R12: ffff882f77d90d00
> R13: 00000000000080d0 R14: ffff882f77d90d10 R15: ffff88340b6c5ea8
> FS: 00007efce8330740(0000) GS:ffff885f4da00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000003454fc6000 CR4: 00000000003407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call trace:
> mlx5_alloc_cmd_msg+0xb4/0×2a0 [mlx5_core]
> mlx5_alloc_cmd_msg+0xd3/0×2a0 [mlx5_core]
> cmd_exec+0xcf/0×8a0 [mlx5_core]
> mlx5_cmd_exec+0x33/0×50 [mlx5_core]
> mlx5_core_access_reg+0xf1/0×170 [mlx5_core]
> mlx5_query_port_ptys+0x64/0×70 [mlx5_core]
> mlx5e_get_link_ksettings+0x5c/0×360 [mlx5_core]
> __ethtool_get_link_ksettings+0xa6/0×210
> speed_show+0x78/0xb0
> dev_attr_show+0x23/0×60
> sysfs_read_file+0x99/0×190
> vfs_read+0x9f/0×170
> SyS_read+0x7f/0xe0
> tracesys+0xe3/0xe8
>
> Fixes: a80d1b68c8b7a0 ("net/mlx5: Break load_one into three stages")
> Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Minor nit: the 'net' tag should be in the subj prefix, alike:
[PATCH net v<n>] mlx5: #...
More importantly, please deal with Moshe feedback.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1l92ogj6wlz-1l96i9zg23c@nsmail7.0.0--kylin--1>]
* Re: [PATCH v3] net/mlx5: Flag state up only after cmdif is ready
[not found] <1l92ogj6wlz-1l96i9zg23c@nsmail7.0.0--kylin--1>
@ 2025-06-05 8:14 ` Moshe Shemesh
0 siblings, 0 replies; 4+ messages in thread
From: Moshe Shemesh @ 2025-06-05 8:14 UTC (permalink / raw)
To: 赵晨光, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-rdma
On 6/4/2025 6:07 AM, 赵晨光 wrote:
>
> The trace points to cmdif, that's why we better handle it there.
> I couldn't reproduce it using the scripts above, what is the
> reproduction frequency ? can you send me the whole log of reproduction ?
> See the attachment for all logs.
> Thanks!
Thanks for the logs and the data.
As I see, this is a reproduction on old OFED 4.7-3.2.9 driver, while on
new upstream driver we have mlx5_cmdif_state checked in cmd_exec() which
means to avoid such trace by setting mlx5_cmdif_state down on
mlx5_function_teardown() and up only after mlx5_cmd_enable() which do
create_msg_cache(). You can also check on latest OFED and if issue
reproduced there, please take it with Nvidia support. If you have
reproduction on latest upstream kernel driver, please send the
reproduction log and we can continue here till resolved upstream.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-06-05 9:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 6:14 [PATCH v3] net/mlx5: Flag state up only after cmdif is ready Chenguang Zhao
2025-06-03 17:25 ` Moshe Shemesh
2025-06-05 9:19 ` Paolo Abeni
[not found] <1l92ogj6wlz-1l96i9zg23c@nsmail7.0.0--kylin--1>
2025-06-05 8:14 ` Moshe Shemesh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox