netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards
@ 2024-10-10 15:11 Til Kaiser
  2024-10-15 15:06 ` Jakub Kicinski
  0 siblings, 1 reply; 3+ messages in thread
From: Til Kaiser @ 2024-10-10 15:11 UTC (permalink / raw)
  To: saeedm, leonro, tariqt; +Cc: netdev, linux-rdma

Hello,

I noticed on our dual-port 100G ConnectX-4 cards (MT27700 Family) 
running Linux Kernel version 6.6.56 and the latest ConnectX-4 firmware 
version 12.28.2302 that we do not have a sysfs hwmon entry for reading 
temperature values.
When running Kernel version 6.6.32, the hwmon entry is there again, and 
I can read the temperature values of those cards.
Strangely, this problem doesn't occur on our ConnectX-4 Lx cards 
(MT27710 Family), regardless of which Kernel version I use.

I looked into the mlx5 core driver and noticed that it is checking the 
MCAM register here: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c?h=v6.6.56#n380.
When I removed that check, the hwmon entry reappeared again.

Looking into recent mlx5 commits regarding this MCAM register, I found 
this commit: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.56&id=fb035aa9a3f8fd327ab83b15a94929d2b9045995.
When I reverted this commit, the hwmon entry also reappeared again.

I also found a firmware bug fix regarding that register inside the 
ConnectX-4 Lx bug fix history here (Ref. 2339971): 
https://docs.nvidia.com/networking/display/connectx4lxfirmwarev14321900/bug+fixes+history.
I couldn't find such a firmware fix for the non-Lx ConnectX-4 cards. So, 
I'm unsure whether this might be a mlx5 driver or firmware issue.

Kind regards
Til

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-10-16  7:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-10 15:11 [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards Til Kaiser
2024-10-15 15:06 ` Jakub Kicinski
2024-10-16  7:05   ` Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).