Re: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm
       [not found] ` <20250916054731.1412031-3-crajank@nvidia.com>
@ 2025-09-16 10:00   ` Ilpo Järvinen
  2025-09-19 12:46     ` Ciju Rajan K
  2025-09-27 15:45     ` Thomas Gleixner
  0 siblings, 2 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2025-09-16 10:00 UTC (permalink / raw)
  To: Ciju Rajan K, Hans de Goede, Thomas Gleixner, LKML,
	Andy Shevchenko
  Cc: christophe.jaillet, platform-driver-x86, vadimp

+ Thomas, lkml

On Tue, 16 Sep 2025, Ciju Rajan K wrote:

> In case of broken hardware, it is possible that broken device will
> flood interrupt handler with false events. For example, if fan or
> power supply has damaged presence pin, it will cause permanent
> generation of plugged in / plugged out events. As a result, interrupt
> handler will consume a lot of CPU resources and will keep raising
> "UDEV" events to the user space.
> 
> This patch provides a mechanism to detect device causing interrupt
> flooding and mask interrupt for this specific device, to isolate
> from interrupt handling flow. Use the following criteria: if the
> specific interrupt was generated 'N' times during 'T' seconds,
> such device is to be considered as broken and will be closed for
> getting interrupts. User will be notified through the log error
> and will be instructed to replace broken device.
> 
> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
> Signed-off-by: Ciju Rajan K <crajank@nvidia.com>

I've a general question on this approach, probably more directed towards 
Hans, Thomas, or others who might have some insight.

Are drivers expected to build their own workarounds for interrupt storms 
due to broken HW such as this? It sounds something that should be at least 
in part handled by something generic, while the lower-most level masking 
and detection might still need to be done by the driver to handle the HW 
specific aspects, there seems to be a generic aspect in all this.

Is there something generic for this already? If not, should there be 
instead of adding this in full into an end-of-the-food-chain driver?


> ---
>  drivers/platform/mellanox/mlxreg-hotplug.c | 35 ++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c
> index d246772aafd6..97b84c584d5e 100644
> --- a/drivers/platform/mellanox/mlxreg-hotplug.c
> +++ b/drivers/platform/mellanox/mlxreg-hotplug.c
> @@ -11,6 +11,7 @@
>  #include <linux/hwmon-sysfs.h>
>  #include <linux/i2c.h>
>  #include <linux/interrupt.h>
> +#include <linux/jiffies.h>
>  #include <linux/module.h>
>  #include <linux/platform_data/mlxreg.h>
>  #include <linux/platform_device.h>
> @@ -30,6 +31,11 @@
>  #define MLXREG_HOTPLUG_ATTRS_MAX	128
>  #define MLXREG_HOTPLUG_NOT_ASSERT	3
>  
> +/* Interrupt storm definitions */
> +#define MLXREG_HOTPLUG_WM_COUNTER 100
> +/* Time window in milliseconds */
> +#define MLXREG_HOTPLUG_WM_WINDOW 3000

Define's name should indicate the unit so please postfix it with _MS.

> +
>  /**
>   * struct mlxreg_hotplug_priv_data - platform private data:
>   * @irq: platform device interrupt number;
> @@ -344,7 +350,7 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  			   struct mlxreg_core_item *item)
>  {
>  	struct mlxreg_core_data *data;
> -	unsigned long asserted;
> +	unsigned long asserted, wmark_low_ts_window;
>  	u32 regval, bit;
>  	int ret;
>  
> @@ -366,11 +372,34 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  	for_each_set_bit(bit, &asserted, 8) {
>  		int pos;
>  
> +		/* Skip already marked storming bit. */
> +		if (item->storming_bits & BIT(bit))
> +			continue;
> +
>  		pos = mlxreg_hotplug_item_label_index_get(item->mask, bit);
>  		if (pos < 0)
>  			goto out;
>  		data = item->data + pos;
> +
> +		/* Interrupt storm handling logic. */
> +		if (data->wmark_low_cntr == 0)
> +			data->wmark_low_ts = jiffies;
> +
> +		if (data->wmark_low_cntr == MLXREG_HOTPLUG_WM_COUNTER - 1) {
> +			data->wmark_high_ts = jiffies;

Why does this timestamp have to be saved?

> +			wmark_low_ts_window = data->wmark_low_ts +
> +					      msecs_to_jiffies(MLXREG_HOTPLUG_WM_WINDOW);

Why not just calculate the ending of the window right at the beginning?
I'd call the member e.g. ->wmark_window (or ->wmark_window_end).

> +			if (time_after(data->wmark_high_ts, wmark_low_ts_window)) {
> +				dev_err(priv->dev, "Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.",
> +					bit, data->label);
> +				/* Mark bit as storming. */
> +				item->storming_bits |= BIT(bit);

Why not using continue here for consistency with the other skip above?

If you add the continue, you can remove the else and deindent the zero 
assignment by one level:

> +			} else {
> +				data->wmark_low_cntr = 0;
> +			}
> +		}
> +		data->wmark_low_cntr++;
>  		if (regval & BIT(bit)) {
>  			if (item->inversed)
>  				mlxreg_hotplug_device_destroy(priv, data, item->kind);
> @@ -390,9 +419,9 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv,
>  	if (ret)
>  		goto out;
>  
> -	/* Unmask event. */
> +	/* Unmask event, exclude storming bits. */
>  	ret = regmap_write(priv->regmap, item->reg + MLXREG_HOTPLUG_MASK_OFF,
> -			   item->mask);
> +			   item->mask & ~item->storming_bits);
>  
>   out:
>  	if (ret)
> 


-- 
 i.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm
  2025-09-16 10:00   ` [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm Ilpo Järvinen
@ 2025-09-19 12:46     ` Ciju Rajan K
  2025-09-27 15:45     ` Thomas Gleixner
  1 sibling, 0 replies; 4+ messages in thread
From: Ciju Rajan K @ 2025-09-19 12:46 UTC (permalink / raw)
  To: Ilpo Järvinen, Hans de Goede, Thomas Gleixner, LKML,
	Andy Shevchenko
  Cc: christophe.jaillet@wanadoo.fr,
	platform-driver-x86@vger.kernel.org, Vadim Pasternak

Hi Ilpo,

Thank you very much for the review.

> > +
> > +             /* Interrupt storm handling logic. */
> > +             if (data->wmark_low_cntr == 0)
> > +                     data->wmark_low_ts = jiffies;
> > +
> > +             if (data->wmark_low_cntr == MLXREG_HOTPLUG_WM_COUNTER -
> 1) {
> > +                     data->wmark_high_ts = jiffies;
> 
> Why does this timestamp have to be saved?

Good point, I will remove it in the next version.

> 
> > +                     wmark_low_ts_window = data->wmark_low_ts +
> > +
> msecs_to_jiffies(MLXREG_HOTPLUG_WM_WINDOW);
> 
> Why not just calculate the ending of the window right at the beginning?
> I'd call the member e.g. ->wmark_window (or ->wmark_window_end).

ACK. Will introduce a new variable in the next version.

> 
> > +                     if (time_after(data->wmark_high_ts, wmark_low_ts_window)) {
> > +                             dev_err(priv->dev, "Storming bit %d (label: %s) - interrupt
> masked permanently. Replace broken HW.",
> > +                                     bit, data->label);
> > +                             /* Mark bit as storming. */
> > +                             item->storming_bits |= BIT(bit);
> 
> Why not using continue here for consistency with the other skip above?
> 
> If you add the continue, you can remove the else and deindent the zero
> assignment by one level:

Yes. Agree. Will change it in the next version.

Thanks
Ciju

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm
  2025-09-16 10:00   ` [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm Ilpo Järvinen
  2025-09-19 12:46     ` Ciju Rajan K
@ 2025-09-27 15:45     ` Thomas Gleixner
  2025-09-29 16:40       ` Ilpo Järvinen
  1 sibling, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2025-09-27 15:45 UTC (permalink / raw)
  To: Ilpo Järvinen, Ciju Rajan K, Hans de Goede, LKML,
	Andy Shevchenko
  Cc: christophe.jaillet, platform-driver-x86, vadimp

On Tue, Sep 16 2025 at 13:00, Ilpo Järvinen wrote:
> On Tue, 16 Sep 2025, Ciju Rajan K wrote:
>> This patch provides a mechanism to detect device causing interrupt
>> flooding and mask interrupt for this specific device, to isolate
>> from interrupt handling flow. Use the following criteria: if the
>> specific interrupt was generated 'N' times during 'T' seconds,
>> such device is to be considered as broken and will be closed for
>> getting interrupts. User will be notified through the log error
>> and will be instructed to replace broken device.
>> 
>> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
>> Signed-off-by: Ciju Rajan K <crajank@nvidia.com>
>
> I've a general question on this approach, probably more directed towards 
> Hans, Thomas, or others who might have some insight.
>
> Are drivers expected to build their own workarounds for interrupt storms 
> due to broken HW such as this? It sounds something that should be at least 
> in part handled by something generic, while the lower-most level masking 
> and detection might still need to be done by the driver to handle the HW 
> specific aspects, there seems to be a generic aspect in all this.
>
> Is there something generic for this already? If not, should there be 
> instead of adding this in full into an end-of-the-food-chain driver?

We don't have a generic mechanism for that. The core handles only the
case of unhandled runaway interrupts.

I agree, that there should be some generic infrastructure for that.

Something like the incomplete below, which obviously needs some thoughts
vs. shared interrupts and other details, but you get the idea.

Thanks,

        tglx
---
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1927,6 +1927,10 @@ static struct irqaction *__free_irq(stru
 		irq_release_resources(desc);
 		chip_bus_sync_unlock(desc);
 		irq_remove_timings(desc);
+		if (desc->irq_storm) {
+			kfree(desc->irq_storm);
+			desc->irq_storm = NULL;
+		}
 	}
 
 	mutex_unlock(&desc->request_mutex);
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -22,6 +22,44 @@ static DEFINE_TIMER(poll_spurious_irq_ti
 int irq_poll_cpu;
 static atomic_t irq_poll_active;
 
+/* Runaway interrupt detection */
+struct irq_storm {
+	unsigned long		max_cnt;
+	unsigned long		last_cnt;
+	irq_storm_cb_t		cb;
+	void			*cb_arg;
+};
+
+bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq,
+				  irq_storm_cb_t cb, void *cb_arg)
+{
+	struct irq_storm *is;
+	unsigned long cnt;
+	bool first;
+
+	if (max_freq < 500)
+		return false;
+
+	is = kzalloc(sizeof(*is), GFP_KERNEL);
+	if (!is)
+		return false;
+
+	/* Adjust to cnt/10ms */
+	is->max_cnt = max_freq / 100;
+	is->cb_arg = cb->arg;
+	is->cb = cb;
+
+	scoped_irqdesc_get_and_buslock(irq, 0) {
+		if (scoped_desc->action) {
+			is->last_cnt = scoped_desc->tot_cnt;
+			scoped_desc->irq_storm = is;
+			return true;
+		}
+	}
+	kfree(is);
+	return false;
+}
+
 /*
  * Recovery handler for misrouted interrupts.
  */
@@ -217,6 +255,19 @@ static inline bool try_misrouted_irq(uns
 	return action && (action->flags & IRQF_IRQPOLL);
 }
 
+static void irq_storm_check(struct irq_desc *desc)
+{
+	unsigned long delta, now = jiffies;
+
+	if (!time_after_irq(now, desc->irq_storm.next_period))
+		return;
+	desc->irq_storm.next_period = now + msec_to_jiffies(10);
+	delta = desc->tot_cnt - desc->irq_storm.last_cnt;
+	desc->irq_storm.last_cnt = desc->tot_cnt;
+	if (delta > desc->irq_storm.max_cnt)
+		desc->irq_storm.cb(irq_desc_get_irq(desc), delta * 100, desc->irq_storm.dev_id);
+}
+
 #define SPURIOUS_DEFERRED	0x80000000
 
 void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
@@ -231,6 +282,9 @@ void note_interrupt(struct irq_desc *des
 		return;
 	}
 
+	if (desc->irq_storm && action_ret == IRQ_HANDLED)
+		irq_storm_check(desc);
+
 	/*
 	 * We cannot call note_interrupt from the threaded handler
 	 * because we need to look at the compound of all handlers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm
  2025-09-27 15:45     ` Thomas Gleixner
@ 2025-09-29 16:40       ` Ilpo Järvinen
  0 siblings, 0 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2025-09-29 16:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ciju Rajan K
  Cc: Hans de Goede, LKML, Andy Shevchenko, christophe.jaillet,
	platform-driver-x86, vadimp

[-- Attachment #1: Type: text/plain, Size: 4738 bytes --]

On Sat, 27 Sep 2025, Thomas Gleixner wrote:

> On Tue, Sep 16 2025 at 13:00, Ilpo Järvinen wrote:
> > On Tue, 16 Sep 2025, Ciju Rajan K wrote:
> >> This patch provides a mechanism to detect device causing interrupt
> >> flooding and mask interrupt for this specific device, to isolate
> >> from interrupt handling flow. Use the following criteria: if the
> >> specific interrupt was generated 'N' times during 'T' seconds,
> >> such device is to be considered as broken and will be closed for
> >> getting interrupts. User will be notified through the log error
> >> and will be instructed to replace broken device.
> >> 
> >> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
> >> Signed-off-by: Ciju Rajan K <crajank@nvidia.com>
> >
> > I've a general question on this approach, probably more directed towards 
> > Hans, Thomas, or others who might have some insight.
> >
> > Are drivers expected to build their own workarounds for interrupt storms 
> > due to broken HW such as this? It sounds something that should be at least 
> > in part handled by something generic, while the lower-most level masking 
> > and detection might still need to be done by the driver to handle the HW 
> > specific aspects, there seems to be a generic aspect in all this.
> >
> > Is there something generic for this already? If not, should there be 
> > instead of adding this in full into an end-of-the-food-chain driver?
> 
> We don't have a generic mechanism for that. The core handles only the
> case of unhandled runaway interrupts.
> 
> I agree, that there should be some generic infrastructure for that.
> 
> Something like the incomplete below, which obviously needs some thoughts
> vs. shared interrupts and other details, but you get the idea.

Thanks Thomas!

Ciju, please base this storm detection on this kind of approach. I also 
started to wonder could some non-storming bit end up being detect as 
storming because your v3 approach doesn't do counters per asserted bits so 
if unlucky, non-storming bits might be asserted on the very moment the 
detection threshold was crossed and gets marked as storming even if it's 
not?

-- 
 i.

> ---
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1927,6 +1927,10 @@ static struct irqaction *__free_irq(stru
>  		irq_release_resources(desc);
>  		chip_bus_sync_unlock(desc);
>  		irq_remove_timings(desc);
> +		if (desc->irq_storm) {
> +			kfree(desc->irq_storm);
> +			desc->irq_storm = NULL;
> +		}
>  	}
>  
>  	mutex_unlock(&desc->request_mutex);
> --- a/kernel/irq/spurious.c
> +++ b/kernel/irq/spurious.c
> @@ -22,6 +22,44 @@ static DEFINE_TIMER(poll_spurious_irq_ti
>  int irq_poll_cpu;
>  static atomic_t irq_poll_active;
>  
> +/* Runaway interrupt detection */
> +struct irq_storm {
> +	unsigned long		max_cnt;
> +	unsigned long		last_cnt;
> +	irq_storm_cb_t		cb;
> +	void			*cb_arg;
> +};
> +
> +bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq,
> +				  irq_storm_cb_t cb, void *cb_arg)
> +{
> +	struct irq_storm *is;
> +	unsigned long cnt;
> +	bool first;
> +
> +	if (max_freq < 500)
> +		return false;
> +
> +	is = kzalloc(sizeof(*is), GFP_KERNEL);
> +	if (!is)
> +		return false;
> +
> +	/* Adjust to cnt/10ms */
> +	is->max_cnt = max_freq / 100;
> +	is->cb_arg = cb->arg;
> +	is->cb = cb;
> +
> +	scoped_irqdesc_get_and_buslock(irq, 0) {
> +		if (scoped_desc->action) {
> +			is->last_cnt = scoped_desc->tot_cnt;
> +			scoped_desc->irq_storm = is;
> +			return true;
> +		}
> +	}
> +	kfree(is);
> +	return false;
> +}
> +
>  /*
>   * Recovery handler for misrouted interrupts.
>   */
> @@ -217,6 +255,19 @@ static inline bool try_misrouted_irq(uns
>  	return action && (action->flags & IRQF_IRQPOLL);
>  }
>  
> +static void irq_storm_check(struct irq_desc *desc)
> +{
> +	unsigned long delta, now = jiffies;
> +
> +	if (!time_after_irq(now, desc->irq_storm.next_period))
> +		return;
> +	desc->irq_storm.next_period = now + msec_to_jiffies(10);
> +	delta = desc->tot_cnt - desc->irq_storm.last_cnt;
> +	desc->irq_storm.last_cnt = desc->tot_cnt;
> +	if (delta > desc->irq_storm.max_cnt)
> +		desc->irq_storm.cb(irq_desc_get_irq(desc), delta * 100, desc->irq_storm.dev_id);
> +}
> +
>  #define SPURIOUS_DEFERRED	0x80000000
>  
>  void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
> @@ -231,6 +282,9 @@ void note_interrupt(struct irq_desc *des
>  		return;
>  	}
>  
> +	if (desc->irq_storm && action_ret == IRQ_HANDLED)
> +		irq_storm_check(desc);
> +
>  	/*
>  	 * We cannot call note_interrupt from the threaded handler
>  	 * because we need to look at the compound of all handlers
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-09-29 16:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250916054731.1412031-1-crajank@nvidia.com>
     [not found] ` <20250916054731.1412031-3-crajank@nvidia.com>
2025-09-16 10:00   ` [PATCH platform-next 2/2] platform/mellanox: mlxreg-hotplug: Add support for handling interrupt storm Ilpo Järvinen
2025-09-19 12:46     ` Ciju Rajan K
2025-09-27 15:45     ` Thomas Gleixner
2025-09-29 16:40       ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox