netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] mlx5: Add sensor name in temperature message
@ 2025-02-13  9:46 Tariq Toukan
  2025-02-13  9:46 ` [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning Tariq Toukan
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Tariq Toukan @ 2025-02-13  9:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel

Hi,

This small series from Shahar adds the sensors names to the temperature
event messages, in addition to the existing bitmap indicators.
This improves human readability.

Series starts with simple refactoring and modifications. The top patch
adds the sensors names.

Regards,
Tariq

Shahar Shitrit (4):
  net/mlx5: Apply rate-limiting to high temperature warning
  net/mlx5: Prefix temperature event bitmap with '0x' for clarity
  net/mlx5: Modify LSB bitmask in temperature event to include only the
    first bit
  net/mlx5: Add sensor name to temperature event message

 .../net/ethernet/mellanox/mlx5/core/events.c  | 36 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/hwmon.c   |  5 +++
 .../net/ethernet/mellanox/mlx5/core/hwmon.h   |  1 +
 3 files changed, 39 insertions(+), 3 deletions(-)


base-commit: 8dbf0c7556454b52af91bae305ca71500c31495c
-- 
2.45.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning
  2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
@ 2025-02-13  9:46 ` Tariq Toukan
  2025-02-13 10:15   ` Mateusz Polchlopek
  2025-02-13  9:46 ` [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity Tariq Toukan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-02-13  9:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel

From: Shahar Shitrit <shshitrit@nvidia.com>

Wrap the high temperature warning in a temperature event with
a call to net_ratelimit() to prevent flooding the kernel log
with repeated warning messages when temperature exceeds the
threshold multiple times within a short duration.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/events.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index d91ea53eb394..e8beb6289d01 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -165,9 +165,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 	value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
 	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
 
-	mlx5_core_warn(events->dev,
-		       "High temperature on sensors with bit set %llx %llx",
-		       value_msb, value_lsb);
+	if (net_ratelimit())
+		mlx5_core_warn(events->dev,
+			       "High temperature on sensors with bit set %llx %llx",
+			       value_msb, value_lsb);
 
 	return NOTIFY_OK;
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity
  2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
  2025-02-13  9:46 ` [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning Tariq Toukan
@ 2025-02-13  9:46 ` Tariq Toukan
  2025-02-13 10:33   ` Mateusz Polchlopek
  2025-02-13  9:46 ` [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Tariq Toukan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-02-13  9:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel

From: Shahar Shitrit <shshitrit@nvidia.com>

Prepend '0x' to the sensor bitmap in the warning message to clearly
indicate that the bitmap is in hexadecimal format.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index e8beb6289d01..a661aa522a9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -167,7 +167,7 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 
 	if (net_ratelimit())
 		mlx5_core_warn(events->dev,
-			       "High temperature on sensors with bit set %llx %llx",
+			       "High temperature on sensors with bit set %#llx %#llx",
 			       value_msb, value_lsb);
 
 	return NOTIFY_OK;
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
  2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
  2025-02-13  9:46 ` [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning Tariq Toukan
  2025-02-13  9:46 ` [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity Tariq Toukan
@ 2025-02-13  9:46 ` Tariq Toukan
  2025-02-13 12:11   ` Mateusz Polchlopek
  2025-02-13  9:46 ` [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message Tariq Toukan
  2025-02-18  0:40 ` [PATCH net-next 0/4] mlx5: Add sensor name in temperature message patchwork-bot+netdevbpf
  4 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-02-13  9:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel

From: Shahar Shitrit <shshitrit@nvidia.com>

In the sensor_count field of the MTEWE register, bits 1-62 are
supported only for unmanaged switches, not for NICs, and bit 63
is reserved for internal use.

To prevent confusing output that may include set bits that are
not relevant to NIC sensors, we update the bitmask to retain only
the first bit, which corresponds to the sensor ASIC.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/events.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index a661aa522a9a..e85a9042e3c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -163,6 +163,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 	u64 value_msb;
 
 	value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
+	/* bit 1-63 are not supported for NICs,
+	 * hence read only bit 0 (asic) from lsb.
+	 */
+	value_lsb &= 0x1;
 	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
 
 	if (net_ratelimit())
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
                   ` (2 preceding siblings ...)
  2025-02-13  9:46 ` [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Tariq Toukan
@ 2025-02-13  9:46 ` Tariq Toukan
  2025-02-15 19:29   ` Simon Horman
  2025-02-18  0:40 ` [PATCH net-next 0/4] mlx5: Add sensor name in temperature message patchwork-bot+netdevbpf
  4 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-02-13  9:46 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel, Carolina Jubran

From: Shahar Shitrit <shshitrit@nvidia.com>

Previously, a temperature event message included a bitmap indicating
which sensors detect high temperatures.

To enhance clarity, we modify the message format to explicitly list
the names of the overheating sensors, alongside the sensors bitmap.
If HWMON is not configured, the event message remains unchanged.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/events.c  | 31 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/hwmon.c   |  5 +++
 .../net/ethernet/mellanox/mlx5/core/hwmon.h   |  1 +
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index e85a9042e3c2..01c5f5990f9a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -6,6 +6,7 @@
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "lib/events.h"
+#include "hwmon.h"
 
 struct mlx5_event_nb {
 	struct mlx5_nb  nb;
@@ -153,11 +154,28 @@ static int any_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+#if IS_ENABLED(CONFIG_HWMON)
+static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
+					  u64 bit_set, int bit_set_offset)
+{
+	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
+	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
+	int i;
+
+	for_each_set_bit(i, bit_set_ptr, num_bits) {
+		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
+
+		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
+	}
+}
+#endif /* CONFIG_HWMON */
+
 /* type == MLX5_EVENT_TYPE_TEMP_WARN_EVENT */
 static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 {
 	struct mlx5_event_nb *event_nb = mlx5_nb_cof(nb, struct mlx5_event_nb, nb);
 	struct mlx5_events   *events   = event_nb->ctx;
+	struct mlx5_core_dev *dev      = events->dev;
 	struct mlx5_eqe      *eqe      = data;
 	u64 value_lsb;
 	u64 value_msb;
@@ -169,10 +187,17 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
 	value_lsb &= 0x1;
 	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
 
-	if (net_ratelimit())
-		mlx5_core_warn(events->dev,
-			       "High temperature on sensors with bit set %#llx %#llx",
+	if (net_ratelimit()) {
+		mlx5_core_warn(dev, "High temperature on sensors with bit set %#llx %#llx.\n",
 			       value_msb, value_lsb);
+#if IS_ENABLED(CONFIG_HWMON)
+		if (dev->hwmon) {
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_lsb, 0);
+			print_sensor_names_in_bit_set(dev, dev->hwmon, value_msb,
+						      sizeof(value_lsb) * BITS_PER_BYTE);
+		}
+#endif
+	}
 
 	return NOTIFY_OK;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
index 353f81dccd1c..4ba2636d7fb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c
@@ -416,3 +416,8 @@ void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev)
 	mlx5_hwmon_free(hwmon);
 	mdev->hwmon = NULL;
 }
+
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel)
+{
+	return hwmon->temp_channel_desc[channel].sensor_name;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
index 999654a9b9da..f38271c22c10 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/hwmon.h
@@ -10,6 +10,7 @@
 
 int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev);
 void mlx5_hwmon_dev_unregister(struct mlx5_core_dev *mdev);
+const char *hwmon_get_sensor_name(struct mlx5_hwmon *hwmon, int channel);
 
 #else
 static inline int mlx5_hwmon_dev_register(struct mlx5_core_dev *mdev)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning
  2025-02-13  9:46 ` [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning Tariq Toukan
@ 2025-02-13 10:15   ` Mateusz Polchlopek
  0 siblings, 0 replies; 14+ messages in thread
From: Mateusz Polchlopek @ 2025-02-13 10:15 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	netdev, linux-rdma, linux-kernel



On 2/13/2025 10:46 AM, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Wrap the high temperature warning in a temperature event with
> a call to net_ratelimit() to prevent flooding the kernel log
> with repeated warning messages when temperature exceeds the
> threshold multiple times within a short duration.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/events.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> index d91ea53eb394..e8beb6289d01 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> @@ -165,9 +165,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
>   	value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
>   	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
>   
> -	mlx5_core_warn(events->dev,
> -		       "High temperature on sensors with bit set %llx %llx",
> -		       value_msb, value_lsb);
> +	if (net_ratelimit())
> +		mlx5_core_warn(events->dev,
> +			       "High temperature on sensors with bit set %llx %llx",
> +			       value_msb, value_lsb);
>   
>   	return NOTIFY_OK;
>   }

Nice improvement, thanks
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity
  2025-02-13  9:46 ` [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity Tariq Toukan
@ 2025-02-13 10:33   ` Mateusz Polchlopek
  0 siblings, 0 replies; 14+ messages in thread
From: Mateusz Polchlopek @ 2025-02-13 10:33 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	netdev, linux-rdma, linux-kernel



On 2/13/2025 10:46 AM, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Prepend '0x' to the sensor bitmap in the warning message to clearly
> indicate that the bitmap is in hexadecimal format.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/events.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> index e8beb6289d01..a661aa522a9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> @@ -167,7 +167,7 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
>   
>   	if (net_ratelimit())
>   		mlx5_core_warn(events->dev,
> -			       "High temperature on sensors with bit set %llx %llx",
> +			       "High temperature on sensors with bit set %#llx %#llx",
>   			       value_msb, value_lsb);
>   
>   	return NOTIFY_OK;

Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
  2025-02-13  9:46 ` [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Tariq Toukan
@ 2025-02-13 12:11   ` Mateusz Polchlopek
  0 siblings, 0 replies; 14+ messages in thread
From: Mateusz Polchlopek @ 2025-02-13 12:11 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn
  Cc: Shahar Shitrit, Gal Pressman, Saeed Mahameed, Leon Romanovsky,
	netdev, linux-rdma, linux-kernel



On 2/13/2025 10:46 AM, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> In the sensor_count field of the MTEWE register, bits 1-62 are
> supported only for unmanaged switches, not for NICs, and bit 63
> is reserved for internal use.
> 
> To prevent confusing output that may include set bits that are
> not relevant to NIC sensors, we update the bitmask to retain only
> the first bit, which corresponds to the sensor ASIC.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/events.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> index a661aa522a9a..e85a9042e3c2 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
> @@ -163,6 +163,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
>   	u64 value_msb;
>   
>   	value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
> +	/* bit 1-63 are not supported for NICs,
> +	 * hence read only bit 0 (asic) from lsb.
> +	 */
> +	value_lsb &= 0x1;
>   	value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
>   
>   	if (net_ratelimit())

Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-13  9:46 ` [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message Tariq Toukan
@ 2025-02-15 19:29   ` Simon Horman
  2025-02-18  0:27     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Horman @ 2025-02-15 19:29 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Shahar Shitrit, Gal Pressman, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel,
	Carolina Jubran

On Thu, Feb 13, 2025 at 11:46:41AM +0200, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Previously, a temperature event message included a bitmap indicating
> which sensors detect high temperatures.
> 
> To enhance clarity, we modify the message format to explicitly list
> the names of the overheating sensors, alongside the sensors bitmap.
> If HWMON is not configured, the event message remains unchanged.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>

...

> +#if IS_ENABLED(CONFIG_HWMON)
> +static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev, struct mlx5_hwmon *hwmon,
> +					  u64 bit_set, int bit_set_offset)
> +{
> +	unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
> +	int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
> +	int i;
> +
> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> +
> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> +	}
> +}

nit:

If you have to respin for some other reason, please consider limiting lines
to 80 columns wide or less here and elsewhere in this patch where it
doesn't reduce readability (subjective I know).

e.g.:

static void print_sensor_names_in_bit_set(struct mlx5_core_dev *dev,
                                          struct mlx5_hwmon *hwmon,
                                          u64 bit_set, int bit_set_offset)
{
        unsigned long *bit_set_ptr = (unsigned long *)&bit_set;
        int num_bits = sizeof(bit_set) * BITS_PER_BYTE;
        int i;

        for_each_set_bit(i, bit_set_ptr, num_bits) {
                const char *sensor_name;

                sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);

                mlx5_core_warn(dev, "Sensor name[%d]: %s\n",
                               i + bit_set_offset, sensor_name);
        }
}

...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-15 19:29   ` Simon Horman
@ 2025-02-18  0:27     ` Jakub Kicinski
  2025-02-19 13:00       ` Tariq Toukan
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-02-18  0:27 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Simon Horman, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Shahar Shitrit, Gal Pressman, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel,
	Carolina Jubran

On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
> > +	for_each_set_bit(i, bit_set_ptr, num_bits) {
> > +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
> > +
> > +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
> > +	}
> > +}  
> 
> nit:
> 
> If you have to respin for some other reason, please consider limiting lines
> to 80 columns wide or less here and elsewhere in this patch where it
> doesn't reduce readability (subjective I know).

+1, please try to catch such situations going forward

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 0/4] mlx5: Add sensor name in temperature message
  2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
                   ` (3 preceding siblings ...)
  2025-02-13  9:46 ` [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message Tariq Toukan
@ 2025-02-18  0:40 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-18  0:40 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, shshitrit, gal,
	saeedm, leon, netdev, linux-rdma, linux-kernel

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 13 Feb 2025 11:46:37 +0200 you wrote:
> Hi,
> 
> This small series from Shahar adds the sensors names to the temperature
> event messages, in addition to the existing bitmap indicators.
> This improves human readability.
> 
> Series starts with simple refactoring and modifications. The top patch
> adds the sensors names.
> 
> [...]

Here is the summary with links:
  - [net-next,1/4] net/mlx5: Apply rate-limiting to high temperature warning
    https://git.kernel.org/netdev/net-next/c/9dd3d5d258ac
  - [net-next,2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity
    https://git.kernel.org/netdev/net-next/c/b9b72ce0f5f4
  - [net-next,3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
    https://git.kernel.org/netdev/net-next/c/633f16d7e07c
  - [net-next,4/4] net/mlx5: Add sensor name to temperature event message
    https://git.kernel.org/netdev/net-next/c/46fd50cfcc12

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-18  0:27     ` Jakub Kicinski
@ 2025-02-19 13:00       ` Tariq Toukan
  2025-02-19 15:28         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-02-19 13:00 UTC (permalink / raw)
  To: Jakub Kicinski, Tariq Toukan
  Cc: Simon Horman, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Shahar Shitrit, Gal Pressman, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel,
	Carolina Jubran



On 18/02/2025 2:27, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 19:29:35 +0000 Simon Horman wrote:
>>> +	for_each_set_bit(i, bit_set_ptr, num_bits) {
>>> +		const char *sensor_name = hwmon_get_sensor_name(hwmon, i + bit_set_offset);
>>> +
>>> +		mlx5_core_warn(dev, "Sensor name[%d]: %s\n", i + bit_set_offset, sensor_name);
>>> +	}
>>> +}
>>
>> nit:
>>
>> If you have to respin for some other reason, please consider limiting lines
>> to 80 columns wide or less here and elsewhere in this patch where it
>> doesn't reduce readability (subjective I know).
> 
> +1, please try to catch such situations going forward
> 

Hi Jakub,

This was not missed.
This is not a new thing...
We've been enforcing a max line length of 100 chars in mlx5 driver for 
the past few years.
I don't have the full image now, but I'm convinced that this dates back 
to an agreement between the mlx5 and netdev maintainers at that time.

80 chars could be too restrictive, especially with today's large 
monitors, while 100-chars is still highly readable.
This is subjective of course...

If you don't have a strong preference, we'll keep the current 100 chars 
limit. Otherwise, just let me know and we'll start enforcing the 
80-chars limit for future patches.

Regards,
Tariq

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-19 13:00       ` Tariq Toukan
@ 2025-02-19 15:28         ` Jakub Kicinski
  2025-02-20 22:22           ` Saeed Mahameed
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-02-19 15:28 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Tariq Toukan, Simon Horman, David S. Miller, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Shahar Shitrit, Gal Pressman,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel,
	Carolina Jubran

On Wed, 19 Feb 2025 15:00:57 +0200 Tariq Toukan wrote:
> >> If you have to respin for some other reason, please consider limiting lines
> >> to 80 columns wide or less here and elsewhere in this patch where it
> >> doesn't reduce readability (subjective I know).  
> > 
> > +1, please try to catch such situations going forward
> 
> This was not missed.
> This is not a new thing...
> We've been enforcing a max line length of 100 chars in mlx5 driver for 
> the past few years.
> I don't have the full image now, but I'm convinced that this dates back 
> to an agreement between the mlx5 and netdev maintainers at that time.
> 
> 80 chars could be too restrictive, especially with today's large 
> monitors, while 100-chars is still highly readable.
> This is subjective of course...
> 
> If you don't have a strong preference, we'll keep the current 100 chars 
> limit. Otherwise, just let me know and we'll start enforcing the 
> 80-chars limit for future patches.

Right, I think mlx5 is the only exception to the 80 column guidance.
I don't think it's resulting in more readable code, so yes, my
preference is to end this experiment.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message
  2025-02-19 15:28         ` Jakub Kicinski
@ 2025-02-20 22:22           ` Saeed Mahameed
  0 siblings, 0 replies; 14+ messages in thread
From: Saeed Mahameed @ 2025-02-20 22:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, Tariq Toukan, Simon Horman, David S. Miller,
	Paolo Abeni, Eric Dumazet, Andrew Lunn, Shahar Shitrit,
	Gal Pressman, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, Carolina Jubran

On 19 Feb 07:28, Jakub Kicinski wrote:
>On Wed, 19 Feb 2025 15:00:57 +0200 Tariq Toukan wrote:
>> >> If you have to respin for some other reason, please consider limiting lines
>> >> to 80 columns wide or less here and elsewhere in this patch where it
>> >> doesn't reduce readability (subjective I know).
>> >
>> > +1, please try to catch such situations going forward
>>
>> This was not missed.
>> This is not a new thing...
>> We've been enforcing a max line length of 100 chars in mlx5 driver for
>> the past few years.
>> I don't have the full image now, but I'm convinced that this dates back
>> to an agreement between the mlx5 and netdev maintainers at that time.
>>
>> 80 chars could be too restrictive, especially with today's large
>> monitors, while 100-chars is still highly readable.
>> This is subjective of course...
>>
>> If you don't have a strong preference, we'll keep the current 100 chars
>> limit. Otherwise, just let me know and we'll start enforcing the
>> 80-chars limit for future patches.
>
>Right, I think mlx5 is the only exception to the 80 column guidance.
>I don't think it's resulting in more readable code, so yes, my
>preference is to end this experiment.
>

The reason in mlx5 was that we wanted to preserve the official HW spec
auto-generated fields names and they are really long.
100 chars worked very well with us for example the following sequence of
code setting up a FW command buffer would have to be broken in every line
if we were to restrict 80 chars per line.

  MLX5_SET(modify_vhca_state_in, in, opcode, MLX5_CMD_OP_MODIFY_VHCA_STATE);
  MLX5_SET(modify_vhca_state_in, in, vhca_state_field_select.sw_function_id, 1);
  MLX5_SET(modify_vhca_state_in, in, vhca_state_context.sw_function_id, sw_fn_id);
  MLX5_SET(modify_vhca_state_in, in, vhca_state_context.arm_change_event, 1);
  MLX5_SET(modify_vhca_state_in, in, vhca_state_field_select.arm_change_event, 1);

But I believe the driver grow larger than caring about those lines too
much, I just did a quick check and it seems less than 2% of the lines are
actually > 80, not sure this is due to being more strict in the past few
years or that we don't really need more than 80 lines.

I also check the interesting cases with macros such
MLX5_SET/MLX5_GET/MLX5_CAP and also the percentile of long lines was very
minor just about 5% in all cases.. 

So I kinda agree mlx5 doesn't need be so special anymore. 
Tariq up to you, you are the main  reviewer now.

Thanks
Saeed.





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-20 22:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13  9:46 [PATCH net-next 0/4] mlx5: Add sensor name in temperature message Tariq Toukan
2025-02-13  9:46 ` [PATCH net-next 1/4] net/mlx5: Apply rate-limiting to high temperature warning Tariq Toukan
2025-02-13 10:15   ` Mateusz Polchlopek
2025-02-13  9:46 ` [PATCH net-next 2/4] net/mlx5: Prefix temperature event bitmap with '0x' for clarity Tariq Toukan
2025-02-13 10:33   ` Mateusz Polchlopek
2025-02-13  9:46 ` [PATCH net-next 3/4] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Tariq Toukan
2025-02-13 12:11   ` Mateusz Polchlopek
2025-02-13  9:46 ` [PATCH net-next 4/4] net/mlx5: Add sensor name to temperature event message Tariq Toukan
2025-02-15 19:29   ` Simon Horman
2025-02-18  0:27     ` Jakub Kicinski
2025-02-19 13:00       ` Tariq Toukan
2025-02-19 15:28         ` Jakub Kicinski
2025-02-20 22:22           ` Saeed Mahameed
2025-02-18  0:40 ` [PATCH net-next 0/4] mlx5: Add sensor name in temperature message patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).