All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Will Mortensen <will@extrahop.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	netdev <netdev@vger.kernel.org>,
	Shahar Shitrit <shshitrit@nvidia.com>,
	Carolina Jubran <cjubran@nvidia.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Jeremy Royal <jeremyr@extrahop.com>
Subject: Re: [PATCH net v3] net/mlx5: don't printk garbage when transceiver overheats
Date: Mon, 18 May 2026 14:25:55 +0300	[thread overview]
Message-ID: <20260518112555.GM33515@unreal> (raw)
In-Reply-To: <20260515-b4-mlx5-sensor-fix-v3-1-f537ce191d6c@extrahop.com>

On Fri, May 15, 2026 at 11:10:36PM -0700, Will Mortensen wrote:
> When the mlx5 driver processes a temperature warning event, in events.c
> and hwmon.c, temp_warn() calls print_sensor_names_in_bit_set(), which
> calls hwmon_get_sensor_name() to get the NUL-terminated name of the
> relevant sensor, and then prints it to dmesg. In particular,
> print_sensor_names_in_bit_set() passes the bit index ("sensor index")
> within the 128-bit vector in the warning event to
> hwmon_get_sensor_name(). But hwmon_get_sensor_name() was expecting the
> index of the hwmon channel, and the driver registers hwmon channels for
> at most only two sensors: the ASIC sensor (sensor index 0) and the
> module sensor (sensor index 64 or 65 on a 2-port NIC). So when the
> warning event concerned a module, hwmon_get_sensor_name() took the 64th
> or 65th element of the likely 2-element temp_channel_desc array and thus
> returned a pointer to some other kernel memory past the end of it, which
> was printed to dmesg up to the first NUL byte.
> 
> A further difficulty is that, at least in testing on our CX-8 C8240 with
> firmware 40.47.1088, the warning event can have bits set for other
> modules, e.g. if this PCI physical function is associated with
> port/module 0, we might expect bit 64 to be set, but bit 65 (for port/
> module 1) can also be set unexpectedly.
> 
> Fix this by clarifying that the argument to hwmon_get_sensor_name() is
> the raw sensor index, and correctly converting it to the hwmon channel
> index. Return NULL if the sensor index doesn't correspond to a hwmon
> channel (e.g. because it's for the other function/port's module).
> 
> Fixes: 46fd50cfcc12 ("net/mlx5: Add sensor name to temperature event message")
> Reviewed-by: Jeremy Royal <jeremyr@extrahop.com>
> Signed-off-by: Will Mortensen <will@extrahop.com>
> ---
> Changes in v3:
> - Fix ordering of Signed-off-by
> - Add target tree name
> - Add CCs
> - Tweak commit message
> - Link to v2: https://lore.kernel.org/r/20260512-b4-mlx5-sensor-fix-v2-1-531fee4fd7fd@extrahop.com
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/events.c |  2 ++
>  drivers/net/ethernet/mellanox/mlx5/core/hwmon.c  | 15 ++++++++++++++-
>  drivers/net/ethernet/mellanox/mlx5/core/hwmon.h  |  2 +-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> index 4d7f35b96876..9372551c7f90 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> @@ -165,6 +165,8 @@ static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5
>  	for_each_set_bit(i, bit_set_ptr, num_bits) {
>  		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
>  
> +		if (!sensor_name)
> +			continue;
>  		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
>  	}
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
> index afcdebac9c4f..747ff30362f1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
> @@ -417,7 +417,20 @@ void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev)
>  	mdev->hwmon = NULL;
>  }
>  
> -const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel)
> +const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int sensor_idx)
>  {
> +	int channel;
> +
> +	if (sensor_idx >= 64) {
> +		if (hwmon->module_scount == 0)
> +			return NULL;
> +		channel = hwmon->asic_platform_scount;
> +		if (sensor_idx != hwmon->temp_channel_desc[channel].sensor_index)
> +			return NULL;
> +	} else {
> +		if (sensor_idx >= hwmon->asic_platform_scount)
> +			return NULL;
> +		channel = sensor_idx;
> +	}
>  	return hwmon->temp_channel_desc[channel].sensor_name;

Honestly, this approach feels overly complex and fragile for something as
simple as printing to dmesg. In my opinion, you should drop
print_sensor_names_in_bit_set().

Thanks

  reply	other threads:[~2026-05-18 11:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16  6:10 [PATCH net v3] net/mlx5: don't printk garbage when transceiver overheats Will Mortensen
2026-05-18 11:25 ` Leon Romanovsky [this message]
2026-05-19  7:35   ` Will Mortensen
2026-05-19 13:59     ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260518112555.GM33515@unreal \
    --to=leon@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jeremyr@extrahop.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=shshitrit@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=will@extrahop.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.