public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Til Kaiser <mail@tk154.de>
To: saeedm@nvidia.com, leonro@nvidia.com, tariqt@nvidia.com
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards
Date: Thu, 10 Oct 2024 17:11:21 +0200	[thread overview]
Message-ID: <bc8ba1b7-e4ad-40b5-b69d-9ea1e7a18a40@tk154.de> (raw)

Hello,

I noticed on our dual-port 100G ConnectX-4 cards (MT27700 Family) 
running Linux Kernel version 6.6.56 and the latest ConnectX-4 firmware 
version 12.28.2302 that we do not have a sysfs hwmon entry for reading 
temperature values.
When running Kernel version 6.6.32, the hwmon entry is there again, and 
I can read the temperature values of those cards.
Strangely, this problem doesn't occur on our ConnectX-4 Lx cards 
(MT27710 Family), regardless of which Kernel version I use.

I looked into the mlx5 core driver and noticed that it is checking the 
MCAM register here: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c?h=v6.6.56#n380.
When I removed that check, the hwmon entry reappeared again.

Looking into recent mlx5 commits regarding this MCAM register, I found 
this commit: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.56&id=fb035aa9a3f8fd327ab83b15a94929d2b9045995.
When I reverted this commit, the hwmon entry also reappeared again.

I also found a firmware bug fix regarding that register inside the 
ConnectX-4 Lx bug fix history here (Ref. 2339971): 
https://docs.nvidia.com/networking/display/connectx4lxfirmwarev14321900/bug+fixes+history.
I couldn't find such a firmware fix for the non-Lx ConnectX-4 cards. So, 
I'm unsure whether this might be a mlx5 driver or firmware issue.

Kind regards
Til

             reply	other threads:[~2024-10-10 15:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 15:11 Til Kaiser [this message]
2024-10-15 15:06 ` [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards Jakub Kicinski
2024-10-16  7:05   ` Tariq Toukan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc8ba1b7-e4ad-40b5-b69d-9ea1e7a18a40@tk154.de \
    --to=mail@tk154.de \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox