All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] mlxsw: pci: Fix possible crash during initialization
@ 2023-04-17 16:52 Petr Machata
  2023-04-19 12:10 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 2+ messages in thread
From: Petr Machata @ 2023-04-17 16:52 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Amit Cohen, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

During initialization the driver issues a reset command via its command
interface in order to remove previous configuration from the device.

After issuing the reset, the driver waits for 200ms before polling on
the "system_status" register using memory-mapped IO until the device
reaches a ready state (0x5E). The wait is necessary because the reset
command only triggers the reset, but the reset itself happens
asynchronously. If the driver starts polling too soon, the read of the
"system_status" register will never return and the system will crash
[1].

The issue was discovered when the device was flashed with a development
firmware version where the reset routine took longer to complete. The
issue was fixed in the firmware, but it exposed the fact that the
current wait time is borderline.

Fix by increasing the wait time from 200ms to 400ms. With this patch and
the buggy firmware version, the issue did not reproduce in 10 reboots
whereas without the patch the issue is reproduced quite consistently.

[1]
mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
mce: CPUs not responding to MCE broadcast (may include false positives): 0,4
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
Shutting down cpus with NMI
Kernel Offset: 0x12000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: ac004e84164e ("mlxsw: pci: Wait longer before accessing the device after reset")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci_hw.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
index 48dbfea0a2a1..7cdf0ce24f28 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -26,7 +26,7 @@
 #define MLXSW_PCI_CIR_TIMEOUT_MSECS		1000
 
 #define MLXSW_PCI_SW_RESET_TIMEOUT_MSECS	900000
-#define MLXSW_PCI_SW_RESET_WAIT_MSECS		200
+#define MLXSW_PCI_SW_RESET_WAIT_MSECS		400
 #define MLXSW_PCI_FW_READY			0xA1844
 #define MLXSW_PCI_FW_READY_MASK			0xFFFF
 #define MLXSW_PCI_FW_READY_MAGIC		0x5E
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] mlxsw: pci: Fix possible crash during initialization
  2023-04-17 16:52 [PATCH net] mlxsw: pci: Fix possible crash during initialization Petr Machata
@ 2023-04-19 12:10 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 2+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-19 12:10 UTC (permalink / raw)
  To: Petr Machata; +Cc: davem, edumazet, kuba, pabeni, netdev, idosch, amitc, mlxsw

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 17 Apr 2023 18:52:51 +0200 you wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> During initialization the driver issues a reset command via its command
> interface in order to remove previous configuration from the device.
> 
> After issuing the reset, the driver waits for 200ms before polling on
> the "system_status" register using memory-mapped IO until the device
> reaches a ready state (0x5E). The wait is necessary because the reset
> command only triggers the reset, but the reset itself happens
> asynchronously. If the driver starts polling too soon, the read of the
> "system_status" register will never return and the system will crash
> [1].
> 
> [...]

Here is the summary with links:
  - [net] mlxsw: pci: Fix possible crash during initialization
    https://git.kernel.org/netdev/net/c/1f64757ee2bb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-04-19 12:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-17 16:52 [PATCH net] mlxsw: pci: Fix possible crash during initialization Petr Machata
2023-04-19 12:10 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.