* [PATCH platform-next v4 0/2] Interrupt storm detection @ 2026-01-15 7:49 Ciju Rajan K 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K 2026-01-15 7:49 ` [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection Ciju Rajan K 0 siblings, 2 replies; 8+ messages in thread From: Ciju Rajan K @ 2026-01-15 7:49 UTC (permalink / raw) To: hdegoede, ilpo.jarvinen, tglx Cc: christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K This patcheset contain: Patch #1 Add generic interrupt storm detection mechanism Patch #2 Enabling interrupt storm detection for mlxreg-hotplug driver v0->v4 - Impletemented generic interrupt storm detection as suggested by Thomas Gleixner. - Updated the mlxreg-hotplug driver to make use of generic interrupt storm detection. - Modified the logic in mlxreg-hotplug driver to track per device interrupts for the the shared IRQ based on the call back function provided by generic interrupt storm detection. - Addressed the comments pointed out by Ilpo on earlier versions. Ciju Rajan K (2): kernel/irq: Add generic interrupt storm detection mechanism platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection drivers/platform/mellanox/mlxreg-hotplug.c | 74 +++++++++++++++++- include/linux/interrupt.h | 13 ++++ include/linux/irqdesc.h | 20 +++++ include/linux/platform_data/mlxreg.h | 4 + kernel/irq/manage.c | 4 + kernel/irq/spurious.c | 87 ++++++++++++++++++++++ 6 files changed, 200 insertions(+), 2 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism 2026-01-15 7:49 [PATCH platform-next v4 0/2] Interrupt storm detection Ciju Rajan K @ 2026-01-15 7:49 ` Ciju Rajan K 2026-01-15 8:29 ` Andy Shevchenko ` (2 more replies) 2026-01-15 7:49 ` [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection Ciju Rajan K 1 sibling, 3 replies; 8+ messages in thread From: Ciju Rajan K @ 2026-01-15 7:49 UTC (permalink / raw) To: hdegoede, ilpo.jarvinen, tglx Cc: christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K If the hardware is broken, it is possible that faulty device will flood interrupt handler with false events. For example, if fan or power supply has damaged presence pin, it will cause permanent generation of plugged in / plugged out events. As a result, interrupt handler will consume a lot of CPU resources and will keep raising "UDEV" events to the user space. This patch provides a mechanism for detecting interrupt storm. Use the following criteria: if the specific interrupt was generated 'N' times during 'T' seconds, such device is to be considered as broken and user will be notified through a call back function. This feature can be used by any kernel subsystems or drivers. The implementation includes: - irq_storm_cb_t: Callback function type for storm notifications - struct irq_storm: Per-IRQ storm detection data structure - irq_register_storm_detection(): Register storm detection with configurable parameters - irq_unregister_storm_detection(): Unregister storm detection - Integration with note_interrupt() for automatic storm checking Callback API parameters: - irq: interrupt number to monitor - max_freq: maximum allowed frequency (interrupts per second) - dev_id: device identifier passed to callback Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ciju Rajan K <crajank@nvidia.com> --- include/linux/interrupt.h | 13 ++++++ include/linux/irqdesc.h | 20 +++++++++ kernel/irq/manage.c | 4 ++ kernel/irq/spurious.c | 87 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 124 insertions(+) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 266f2b39213a..9fbda5d08a8f 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -20,6 +20,7 @@ #include <asm/ptrace.h> #include <asm/irq.h> #include <asm/sections.h> +#include <linux/jiffies.h> /* * These correspond to the IORESOURCE_IRQ_* defines in @@ -139,6 +140,14 @@ struct irqaction { struct proc_dir_entry *dir; } ____cacheline_internodealigned_in_smp; +/** + * irq_storm_cb_t - callback function type for interrupt storm detection + * @irq: interrupt number that is storming + * @freq: detected frequency (interrupts per second) + * @dev_id: device identifier passed during registration + */ +typedef void (*irq_storm_cb_t)(unsigned int irq, unsigned int freq, void *dev_id); + extern irqreturn_t no_action(int cpl, void *dev_id); /* @@ -331,6 +340,10 @@ extern int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask); extern int irq_can_set_affinity(unsigned int irq); extern int irq_select_affinity(unsigned int irq); +extern bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, + irq_storm_cb_t cb, void *dev_id); +extern void irq_unregister_storm_detection(unsigned int irq); + extern int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m, bool setaffinity); diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index 17902861de76..d27f02371a6c 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -17,6 +17,9 @@ struct irq_desc; struct irq_domain; struct pt_regs; +/* Forward declaration - full definition in interrupt.h */ +typedef void (*irq_storm_cb_t)(unsigned int, unsigned int, void *); + /** * struct irqstat - interrupt statistics * @cnt: real-time interrupt count @@ -29,6 +32,22 @@ struct irqstat { #endif }; +/** + * struct irq_storm - interrupt storm detection data + * @max_cnt: maximum interrupt count per time window + * @last_cnt: last total interrupt count snapshot + * @next_period: next time period boundary (jiffies) + * @cb: callback function to invoke on storm detection + * @dev_id: device identifier for callback + */ +struct irq_storm { + unsigned long max_cnt; + unsigned long last_cnt; + unsigned long next_period; + irq_storm_cb_t cb; + void *dev_id; +}; + /** * struct irq_desc - interrupt descriptor * @irq_common_data: per irq and chip data passed down to chip functions @@ -101,6 +120,7 @@ struct irq_desc { #ifdef CONFIG_PROC_FS struct proc_dir_entry *dir; #endif + struct irq_storm *irq_storm; #ifdef CONFIG_GENERIC_IRQ_DEBUGFS struct dentry *debugfs_file; const char *dev_name; diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 349ae7979da0..d413bf11ffde 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1951,6 +1951,10 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id) irq_release_resources(desc); chip_bus_sync_unlock(desc); irq_remove_timings(desc); + if (desc->irq_storm) { + kfree(desc->irq_storm); + desc->irq_storm = NULL; + } } mutex_unlock(&desc->request_mutex); diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c index 73280ccb74b0..525dc0e384f1 100644 --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -22,6 +22,90 @@ static DEFINE_TIMER(poll_spurious_irq_timer, poll_spurious_irqs); int irq_poll_cpu; static atomic_t irq_poll_active; +/* Minimum frequency threshold */ +#define IRQ_STORM_MIN_FREQ_HZ 50 +#define IRQ_STORM_MAX_FREQ_SCALE (IRQ_STORM_MIN_FREQ_HZ * 2) +/* Time window over which storm check is performed */ +#define IRQ_STORM_PERIOD_WINDOW_MS (IRQ_STORM_MIN_FREQ_HZ * 20) + + +/** + * irq_register_storm_detection - register interrupt storm detection for an IRQ + * @irq: interrupt number + * @max_freq: maximum allowed frequency (interrupts per second) + * @cb: callback function to invoke when storm is detected + * @dev_id: device identifier passed to callback + * + * Returns: true on success, false on failure + */ +bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, + irq_storm_cb_t cb, void *dev_id) +{ + struct irq_storm *storm; + bool ret = false; + + if (max_freq < IRQ_STORM_MIN_FREQ_HZ || !cb) + return false; + + storm = kzalloc(sizeof(*storm), GFP_KERNEL); + if (!storm) + return false; + + /* Adjust to count per 10ms */ + storm->max_cnt = max_freq / (IRQ_STORM_MAX_FREQ_SCALE); + storm->cb = cb; + storm->dev_id = dev_id; + + scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { + if (scoped_irqdesc->action && !scoped_irqdesc->irq_storm) { + storm->last_cnt = scoped_irqdesc->tot_count; + storm->next_period = jiffies + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); + scoped_irqdesc->irq_storm = storm; + ret = true; + } + } + + if (!ret) + kfree(storm); + + return ret; +} +EXPORT_SYMBOL_GPL(irq_register_storm_detection); + +/** + * irq_unregister_storm_detection - unregister interrupt storm detection + * @irq: interrupt number + */ +void irq_unregister_storm_detection(unsigned int irq) +{ + scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { + if (scoped_irqdesc->irq_storm) { + kfree(scoped_irqdesc->irq_storm); + scoped_irqdesc->irq_storm = NULL; + } + } +} +EXPORT_SYMBOL_GPL(irq_unregister_storm_detection); + +static void irq_storm_check(struct irq_desc *desc) +{ + struct irq_storm *storm = desc->irq_storm; + unsigned long delta, now = jiffies; + + if (!time_after_eq(now, storm->next_period)) + return; + + storm->next_period = now + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); + delta = desc->tot_count - storm->last_cnt; + storm->last_cnt = desc->tot_count; + if (delta > storm->max_cnt) { + /* Calculate actual frequency: interrupts per second */ + storm->cb(irq_desc_get_irq(desc), + (delta * (IRQ_STORM_MAX_FREQ_SCALE)), + storm->dev_id); + } +} + /* * Recovery handler for misrouted interrupts. */ @@ -231,6 +315,9 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret) return; } + if (desc->irq_storm && action_ret == IRQ_HANDLED) + irq_storm_check(desc); + /* * We cannot call note_interrupt from the threaded handler * because we need to look at the compound of all handlers -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K @ 2026-01-15 8:29 ` Andy Shevchenko 2026-01-15 14:00 ` kernel test robot 2026-01-15 14:11 ` kernel test robot 2 siblings, 0 replies; 8+ messages in thread From: Andy Shevchenko @ 2026-01-15 8:29 UTC (permalink / raw) To: Ciju Rajan K Cc: hdegoede, ilpo.jarvinen, tglx, christophe.jaillet, vadimp, platform-driver-x86, linux-kernel On Thu, Jan 15, 2026 at 09:49:08AM +0200, Ciju Rajan K wrote: > If the hardware is broken, it is possible that faulty device will > flood interrupt handler with false events. For example, if fan or > power supply has damaged presence pin, it will cause permanent I would say "has a floating presence pin" as the term floating is well established and describes the case exactly how you put it further in the text. > generation of plugged in / plugged out events. As a result, interrupt > handler will consume a lot of CPU resources and will keep raising > "UDEV" events to the user space. > > This patch provides a mechanism for detecting interrupt storm. > Use the following criteria: if the specific interrupt was generated > 'N' times during 'T' seconds, such device is to be considered as > broken and user will be notified through a call back function. > This feature can be used by any kernel subsystems or drivers. > > The implementation includes: > > - irq_storm_cb_t: Callback function type for storm notifications > - struct irq_storm: Per-IRQ storm detection data structure > - irq_register_storm_detection(): Register storm detection with > configurable parameters > - irq_unregister_storm_detection(): Unregister storm detection > - Integration with note_interrupt() for automatic storm checking > > Callback API parameters: > - irq: interrupt number to monitor > - max_freq: maximum allowed frequency (interrupts per second) > - dev_id: device identifier passed to callback ... > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -20,6 +20,7 @@ > #include <asm/ptrace.h> > #include <asm/irq.h> > #include <asm/sections.h> > +#include <linux/jiffies.h> I would not mix linux/* group with asm/* group and to me the best location for this inclusion is just before kref.h. Note, this header needs a bit more of a cleanup, as it includes basically everything (due to kernel.h and maybe other messed up headers). But this is out of scope of your series. ... > extern int irq_can_set_affinity(unsigned int irq); > extern int irq_select_affinity(unsigned int irq); > +extern bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, > + irq_storm_cb_t cb, void *dev_id); > +extern void irq_unregister_storm_detection(unsigned int irq); Do we still need "extern" keyword? > extern int __irq_apply_affinity_hint(unsigned int irq, const struct cpumask *m, > bool setaffinity); ... > +struct irq_storm { > + unsigned long max_cnt; > + unsigned long last_cnt; > + unsigned long next_period; > + irq_storm_cb_t cb; > + void *dev_id; > +}; I'm wondering if you have tried to shuffle this layout based on the frequency of a use of each member. In some cases it might generate less code (can be measured with bloat-o-meter). ... > static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id) > irq_release_resources(desc); > chip_bus_sync_unlock(desc); > irq_remove_timings(desc); > + if (desc->irq_storm) { Unneeded (duplicate) check. > + kfree(desc->irq_storm); > + desc->irq_storm = NULL; Do we need this? If so, still can be done unconditionally. > + } > } > +/* Minimum frequency threshold */ > +#define IRQ_STORM_MIN_FREQ_HZ 50 > +#define IRQ_STORM_MAX_FREQ_SCALE (IRQ_STORM_MIN_FREQ_HZ * 2) Plain numbers are easier to read, hence 100 > +/* Time window over which storm check is performed */ > +#define IRQ_STORM_PERIOD_WINDOW_MS (IRQ_STORM_MIN_FREQ_HZ * 20) MS = HZ?! It's from some other universe with different physics laws. ... > + * Returns: true on success, false on failure Are the rest use "Returns" (with "s")? Because the main keyword is "Return". (Yes, "Returns" works, but it's a secondary one, not even documented IIRC.) ... > +bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, > + irq_storm_cb_t cb, void *dev_id) > +{ > + struct irq_storm *storm; > + bool ret = false; > + > + if (max_freq < IRQ_STORM_MIN_FREQ_HZ || !cb) > + return false; > + > + storm = kzalloc(sizeof(*storm), GFP_KERNEL); > + if (!storm) > + return false; > + > + /* Adjust to count per 10ms */ > + storm->max_cnt = max_freq / (IRQ_STORM_MAX_FREQ_SCALE); > + storm->cb = cb; > + storm->dev_id = dev_id; > + > + scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { > + if (scoped_irqdesc->action && !scoped_irqdesc->irq_storm) { > + storm->last_cnt = scoped_irqdesc->tot_count; > + storm->next_period = jiffies + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); > + scoped_irqdesc->irq_storm = storm; > + ret = true; > + } > + } > + if (!ret) > + kfree(storm); > + > + return ret; Better to avoid negative conditionals in such contexts. if (ret) return ret; kfree(...); But it's boolean and assigned inside scoped section. Why we can't return true directly from scoped section? > +} ... > +void irq_unregister_storm_detection(unsigned int irq) > +{ > + scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { > + if (scoped_irqdesc->irq_storm) { Dup check, drop it. > + kfree(scoped_irqdesc->irq_storm); > + scoped_irqdesc->irq_storm = NULL; > + } > + } > +} ... > +static void irq_storm_check(struct irq_desc *desc) > +{ > + struct irq_storm *storm = desc->irq_storm; > + unsigned long delta, now = jiffies; > + > + if (!time_after_eq(now, storm->next_period)) > + return; > + storm->next_period = now + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); Just do #define IRQ_STORM_PERIOD_WINDOW msecs_to_jiffies(1000) It will address my above comment and make this be read better. > + delta = desc->tot_count - storm->last_cnt; > + storm->last_cnt = desc->tot_count; > + if (delta > storm->max_cnt) { > + /* Calculate actual frequency: interrupts per second */ > + storm->cb(irq_desc_get_irq(desc), > + (delta * (IRQ_STORM_MAX_FREQ_SCALE)), > + storm->dev_id); > + } > +} -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K 2026-01-15 8:29 ` Andy Shevchenko @ 2026-01-15 14:00 ` kernel test robot 2026-01-15 14:11 ` kernel test robot 2 siblings, 0 replies; 8+ messages in thread From: kernel test robot @ 2026-01-15 14:00 UTC (permalink / raw) To: Ciju Rajan K, hdegoede, ilpo.jarvinen, tglx Cc: oe-kbuild-all, christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K Hi Ciju, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.19-rc5] [cannot apply to tip/irq/core next-20260115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ciju-Rajan-K/kernel-irq-Add-generic-interrupt-storm-detection-mechanism/20260115-155438 base: linus/master patch link: https://lore.kernel.org/r/20260115074909.245852-2-crajank%40nvidia.com patch subject: [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism config: arc-allnoconfig (https://download.01.org/0day-ci/archive/20260115/202601152136.LGHBo3k1-lkp@intel.com/config) compiler: arc-linux-gcc (GCC) 15.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260115/202601152136.LGHBo3k1-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202601152136.LGHBo3k1-lkp@intel.com/ All warnings (new ones prefixed by >>): >> kernel/irq/spurious.c:41:6: warning: no previous prototype for 'irq_register_storm_detection' [-Wmissing-prototypes] 41 | bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> kernel/irq/spurious.c:79:6: warning: no previous prototype for 'irq_unregister_storm_detection' [-Wmissing-prototypes] 79 | void irq_unregister_storm_detection(unsigned int irq) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim +/irq_register_storm_detection +41 kernel/irq/spurious.c 30 31 32 /** 33 * irq_register_storm_detection - register interrupt storm detection for an IRQ 34 * @irq: interrupt number 35 * @max_freq: maximum allowed frequency (interrupts per second) 36 * @cb: callback function to invoke when storm is detected 37 * @dev_id: device identifier passed to callback 38 * 39 * Returns: true on success, false on failure 40 */ > 41 bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, 42 irq_storm_cb_t cb, void *dev_id) 43 { 44 struct irq_storm *storm; 45 bool ret = false; 46 47 if (max_freq < IRQ_STORM_MIN_FREQ_HZ || !cb) 48 return false; 49 50 storm = kzalloc(sizeof(*storm), GFP_KERNEL); 51 if (!storm) 52 return false; 53 54 /* Adjust to count per 10ms */ 55 storm->max_cnt = max_freq / (IRQ_STORM_MAX_FREQ_SCALE); 56 storm->cb = cb; 57 storm->dev_id = dev_id; 58 59 scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { 60 if (scoped_irqdesc->action && !scoped_irqdesc->irq_storm) { 61 storm->last_cnt = scoped_irqdesc->tot_count; 62 storm->next_period = jiffies + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); 63 scoped_irqdesc->irq_storm = storm; 64 ret = true; 65 } 66 } 67 68 if (!ret) 69 kfree(storm); 70 71 return ret; 72 } 73 EXPORT_SYMBOL_GPL(irq_register_storm_detection); 74 75 /** 76 * irq_unregister_storm_detection - unregister interrupt storm detection 77 * @irq: interrupt number 78 */ > 79 void irq_unregister_storm_detection(unsigned int irq) 80 { 81 scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { 82 if (scoped_irqdesc->irq_storm) { 83 kfree(scoped_irqdesc->irq_storm); 84 scoped_irqdesc->irq_storm = NULL; 85 } 86 } 87 } 88 EXPORT_SYMBOL_GPL(irq_unregister_storm_detection); 89 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K 2026-01-15 8:29 ` Andy Shevchenko 2026-01-15 14:00 ` kernel test robot @ 2026-01-15 14:11 ` kernel test robot 2 siblings, 0 replies; 8+ messages in thread From: kernel test robot @ 2026-01-15 14:11 UTC (permalink / raw) To: Ciju Rajan K, hdegoede, ilpo.jarvinen, tglx Cc: llvm, oe-kbuild-all, christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K Hi Ciju, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.19-rc5] [cannot apply to tip/irq/core next-20260115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ciju-Rajan-K/kernel-irq-Add-generic-interrupt-storm-detection-mechanism/20260115-155438 base: linus/master patch link: https://lore.kernel.org/r/20260115074909.245852-2-crajank%40nvidia.com patch subject: [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20260115/202601152104.pBPeNPHR-lkp@intel.com/config) compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9b8addffa70cee5b2acc5454712d9cf78ce45710) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260115/202601152104.pBPeNPHR-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202601152104.pBPeNPHR-lkp@intel.com/ All warnings (new ones prefixed by >>): >> kernel/irq/spurious.c:41:6: warning: no previous prototype for function 'irq_register_storm_detection' [-Wmissing-prototypes] 41 | bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, | ^ kernel/irq/spurious.c:41:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 41 | bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, | ^ | static >> kernel/irq/spurious.c:79:6: warning: no previous prototype for function 'irq_unregister_storm_detection' [-Wmissing-prototypes] 79 | void irq_unregister_storm_detection(unsigned int irq) | ^ kernel/irq/spurious.c:79:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 79 | void irq_unregister_storm_detection(unsigned int irq) | ^ | static 2 warnings generated. vim +/irq_register_storm_detection +41 kernel/irq/spurious.c 30 31 32 /** 33 * irq_register_storm_detection - register interrupt storm detection for an IRQ 34 * @irq: interrupt number 35 * @max_freq: maximum allowed frequency (interrupts per second) 36 * @cb: callback function to invoke when storm is detected 37 * @dev_id: device identifier passed to callback 38 * 39 * Returns: true on success, false on failure 40 */ > 41 bool irq_register_storm_detection(unsigned int irq, unsigned int max_freq, 42 irq_storm_cb_t cb, void *dev_id) 43 { 44 struct irq_storm *storm; 45 bool ret = false; 46 47 if (max_freq < IRQ_STORM_MIN_FREQ_HZ || !cb) 48 return false; 49 50 storm = kzalloc(sizeof(*storm), GFP_KERNEL); 51 if (!storm) 52 return false; 53 54 /* Adjust to count per 10ms */ 55 storm->max_cnt = max_freq / (IRQ_STORM_MAX_FREQ_SCALE); 56 storm->cb = cb; 57 storm->dev_id = dev_id; 58 59 scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { 60 if (scoped_irqdesc->action && !scoped_irqdesc->irq_storm) { 61 storm->last_cnt = scoped_irqdesc->tot_count; 62 storm->next_period = jiffies + msecs_to_jiffies(IRQ_STORM_PERIOD_WINDOW_MS); 63 scoped_irqdesc->irq_storm = storm; 64 ret = true; 65 } 66 } 67 68 if (!ret) 69 kfree(storm); 70 71 return ret; 72 } 73 EXPORT_SYMBOL_GPL(irq_register_storm_detection); 74 75 /** 76 * irq_unregister_storm_detection - unregister interrupt storm detection 77 * @irq: interrupt number 78 */ > 79 void irq_unregister_storm_detection(unsigned int irq) 80 { 81 scoped_irqdesc_get_and_buslock(irq, IRQ_GET_DESC_CHECK_GLOBAL) { 82 if (scoped_irqdesc->irq_storm) { 83 kfree(scoped_irqdesc->irq_storm); 84 scoped_irqdesc->irq_storm = NULL; 85 } 86 } 87 } 88 EXPORT_SYMBOL_GPL(irq_unregister_storm_detection); 89 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection 2026-01-15 7:49 [PATCH platform-next v4 0/2] Interrupt storm detection Ciju Rajan K 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K @ 2026-01-15 7:49 ` Ciju Rajan K 2026-01-15 8:34 ` Andy Shevchenko 2026-01-15 14:43 ` kernel test robot 1 sibling, 2 replies; 8+ messages in thread From: Ciju Rajan K @ 2026-01-15 7:49 UTC (permalink / raw) To: hdegoede, ilpo.jarvinen, tglx Cc: christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K This patch enables the interrupt storm detection feature and also adds the per device counter for tracking the faulty devices. It also masks the faulty devices from generating any further interrupts. Add field for interrupt storm handling. Extend structure mlxreg_core_data with the following field: 'wmark_cntr' - interrupt storm counter. Extend structure mlxreg_core_item with the following field: 'storming_bits' - interrupt storming bits mask. Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ciju Rajan K <crajank@nvidia.com> -- --- drivers/platform/mellanox/mlxreg-hotplug.c | 74 +++++++++++++++++++++- include/linux/platform_data/mlxreg.h | 4 ++ 2 files changed, 76 insertions(+), 2 deletions(-) diff --git a/drivers/platform/mellanox/mlxreg-hotplug.c b/drivers/platform/mellanox/mlxreg-hotplug.c index d246772aafd6..4752477207d4 100644 --- a/drivers/platform/mellanox/mlxreg-hotplug.c +++ b/drivers/platform/mellanox/mlxreg-hotplug.c @@ -30,6 +30,9 @@ #define MLXREG_HOTPLUG_ATTRS_MAX 128 #define MLXREG_HOTPLUG_NOT_ASSERT 3 +/* Interrupt storm frequency */ +#define MLXREG_HOTPLUG_INTR_FREQ_HZ 100 + /** * struct mlxreg_hotplug_priv_data - platform private data: * @irq: platform device interrupt number; @@ -339,6 +342,57 @@ static int mlxreg_hotplug_attr_init(struct mlxreg_hotplug_priv_data *priv) return 0; } +/** + * mlxreg_hotplug_storm_handler - generic interrupt storm detection callback + * @irq: interrupt number experiencing storm + * @freq: detected frequency (interrupts per second) + * @dev_id: device data (mlxreg_hotplug_priv_data) + * + * This callback is invoked by the generic interrupt storm detection mechanism + * when an interrupt storm is detected on the shared IRQ line. The driver then + * analyzes per-device interrupt counters to identify which specific devices + * are causing excessive interrupts without blocking operations. + */ +static void mlxreg_hotplug_storm_handler(unsigned int irq, unsigned int freq, void *dev_id) +{ + struct mlxreg_hotplug_priv_data *priv = dev_id; + struct mlxreg_core_hotplug_platform_data *pdata; + struct mlxreg_core_item *item; + struct mlxreg_core_data *data; + unsigned long asserted; + u32 bit; + + dev_warn(priv->dev, + "Interrupt storm detected on IRQ %u (%u interrupts/sec)", + irq, freq); + + pdata = dev_get_platdata(&priv->pdev->dev); + item = pdata->items; + asserted = item->cache; + + for_each_set_bit(bit, &asserted, 8) { + int pos; + + pos = mlxreg_hotplug_item_label_index_get(item->mask, bit); + if (pos < 0) + goto out; + + data = item->data + pos; + /* Check per device interrupt counter */ + if (data->wmark_cntr >= MLXREG_HOTPLUG_INTR_FREQ_HZ - 1) { + dev_err(priv->dev, + "Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.", + bit, data->label); + /* Mark bit as storming. */ + item->storming_bits |= BIT(bit); + } + data->wmark_cntr = 0; + } + return; + out: + dev_err(priv->dev, "Failed to complete interrupt storm handler\n"); +} + static void mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv, struct mlxreg_core_item *item) @@ -371,6 +425,10 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv, goto out; data = item->data + pos; + + /* Counter to keep track of interrupt storm */ + data->wmark_cntr++; + if (regval & BIT(bit)) { if (item->inversed) mlxreg_hotplug_device_destroy(priv, data, item->kind); @@ -390,9 +448,9 @@ mlxreg_hotplug_work_helper(struct mlxreg_hotplug_priv_data *priv, if (ret) goto out; - /* Unmask event. */ + /* Unmask event, exclude storming bits. */ ret = regmap_write(priv->regmap, item->reg + MLXREG_HOTPLUG_MASK_OFF, - item->mask); + item->mask & ~item->storming_bits); out: if (ret) @@ -767,6 +825,15 @@ static int mlxreg_hotplug_probe(struct platform_device *pdev) /* Perform initial interrupts setup. */ mlxreg_hotplug_set_irq(priv); + + /* Register with generic interrupt storm detection */ + if (!irq_register_storm_detection(priv->irq, MLXREG_HOTPLUG_INTR_FREQ_HZ, + mlxreg_hotplug_storm_handler, priv)) { + dev_warn(&pdev->dev, "Failed to register generic interrupt storm detection\n"); + } else { + dev_info(&pdev->dev, "Registered generic storm detection for IRQ %d\n", priv->irq); + } + priv->after_probe = true; return 0; @@ -776,6 +843,9 @@ static void mlxreg_hotplug_remove(struct platform_device *pdev) { struct mlxreg_hotplug_priv_data *priv = dev_get_drvdata(&pdev->dev); + /* Unregister generic interrupt storm detection */ + irq_unregister_storm_detection(priv->irq); + /* Clean interrupts setup. */ mlxreg_hotplug_unset_irq(priv); devm_free_irq(&pdev->dev, priv->irq, priv); diff --git a/include/linux/platform_data/mlxreg.h b/include/linux/platform_data/mlxreg.h index f6cca7a035c7..592256570175 100644 --- a/include/linux/platform_data/mlxreg.h +++ b/include/linux/platform_data/mlxreg.h @@ -131,6 +131,7 @@ struct mlxreg_hotplug_device { * @regnum: number of registers occupied by multi-register attribute; * @slot: slot number, at which device is located; * @secured: if set indicates that entry access is secured; + * @wmark_cntr: interrupt storm counter; */ struct mlxreg_core_data { char label[MLXREG_CORE_LABEL_MAX_SIZE]; @@ -151,6 +152,7 @@ struct mlxreg_core_data { u8 regnum; u8 slot; u8 secured; + unsigned int wmark_cntr; }; /** @@ -167,6 +169,7 @@ struct mlxreg_core_data { * @ind: element's index inside the group; * @inversed: if 0: 0 for signal status is OK, if 1 - 1 is OK; * @health: true if device has health indication, false in other case; + * @storming_bits: interrupt storming bits mask; */ struct mlxreg_core_item { struct mlxreg_core_data *data; @@ -180,6 +183,7 @@ struct mlxreg_core_item { u8 ind; u8 inversed; u8 health; + u32 storming_bits; }; /** -- 2.47.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection 2026-01-15 7:49 ` [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection Ciju Rajan K @ 2026-01-15 8:34 ` Andy Shevchenko 2026-01-15 14:43 ` kernel test robot 1 sibling, 0 replies; 8+ messages in thread From: Andy Shevchenko @ 2026-01-15 8:34 UTC (permalink / raw) To: Ciju Rajan K Cc: hdegoede, ilpo.jarvinen, tglx, christophe.jaillet, vadimp, platform-driver-x86, linux-kernel On Thu, Jan 15, 2026 at 09:49:09AM +0200, Ciju Rajan K wrote: > This patch enables the interrupt storm detection feature and > also adds the per device counter for tracking the faulty > devices. It also masks the faulty devices from generating > any further interrupts. > > Add field for interrupt storm handling. > Extend structure mlxreg_core_data with the following field: > 'wmark_cntr' - interrupt storm counter. > > Extend structure mlxreg_core_item with the following field: > 'storming_bits' - interrupt storming bits mask. ... > +static void mlxreg_hotplug_storm_handler(unsigned int irq, unsigned int freq, void *dev_id) > +{ > + struct mlxreg_hotplug_priv_data *priv = dev_id; > + struct mlxreg_core_hotplug_platform_data *pdata; > + struct mlxreg_core_item *item; > + struct mlxreg_core_data *data; > + unsigned long asserted; > + u32 bit; > + > + dev_warn(priv->dev, > + "Interrupt storm detected on IRQ %u (%u interrupts/sec)", > + irq, freq); Below you put long line, here it seems wrapped by 80, why so inconsistent? Please, choose one style and use it everywhere (inside the same file). > + pdata = dev_get_platdata(&priv->pdev->dev); > + item = pdata->items; > + asserted = item->cache; > + > + for_each_set_bit(bit, &asserted, 8) { > + int pos; > + > + pos = mlxreg_hotplug_item_label_index_get(item->mask, bit); > + if (pos < 0) > + goto out; Used only once. Just drop the label and move the related code under the branch. > + data = item->data + pos; > + /* Check per device interrupt counter */ > + if (data->wmark_cntr >= MLXREG_HOTPLUG_INTR_FREQ_HZ - 1) { > + dev_err(priv->dev, > + "Storming bit %d (label: %s) - interrupt masked permanently. Replace broken HW.", > + bit, data->label); > + /* Mark bit as storming. */ > + item->storming_bits |= BIT(bit); > + } > + data->wmark_cntr = 0; > + } > + return; > + out: > + dev_err(priv->dev, "Failed to complete interrupt storm handler\n"); > +} ... > + /* Register with generic interrupt storm detection */ > + if (!irq_register_storm_detection(priv->irq, MLXREG_HOTPLUG_INTR_FREQ_HZ, > + mlxreg_hotplug_storm_handler, priv)) { > + dev_warn(&pdev->dev, "Failed to register generic interrupt storm detection\n"); > + } else { > + dev_info(&pdev->dev, "Registered generic storm detection for IRQ %d\n", priv->irq); > + } Invert the conditional, it will be slightly easier to parse. ... > struct mlxreg_core_data { > char label[MLXREG_CORE_LABEL_MAX_SIZE]; > u8 regnum; > u8 slot; > u8 secured; > + unsigned int wmark_cntr; > }; Have you run `pahole`? No issues / room to improve this layout? ... > struct mlxreg_core_item { > struct mlxreg_core_data *data; > u8 ind; > u8 inversed; > u8 health; > + u32 storming_bits; > }; Ditto. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection 2026-01-15 7:49 ` [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection Ciju Rajan K 2026-01-15 8:34 ` Andy Shevchenko @ 2026-01-15 14:43 ` kernel test robot 1 sibling, 0 replies; 8+ messages in thread From: kernel test robot @ 2026-01-15 14:43 UTC (permalink / raw) To: Ciju Rajan K, hdegoede, ilpo.jarvinen, tglx Cc: oe-kbuild-all, christophe.jaillet, andriy.shevchenko, vadimp, platform-driver-x86, linux-kernel, Ciju Rajan K Hi Ciju, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.19-rc5] [cannot apply to tip/irq/core next-20260115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ciju-Rajan-K/kernel-irq-Add-generic-interrupt-storm-detection-mechanism/20260115-155438 base: linus/master patch link: https://lore.kernel.org/r/20260115074909.245852-3-crajank%40nvidia.com patch subject: [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection config: x86_64-randconfig-161-20260115 (https://download.01.org/0day-ci/archive/20260115/202601152235.2MC3FUQp-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) rustc: rustc 1.88.0 (6b00bc388 2025-06-23) smatch version: v0.5.0-8985-g2614ff1a reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260115/202601152235.2MC3FUQp-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202601152235.2MC3FUQp-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/platform/mellanox/mlxreg-hotplug.c:830:7: error: call to undeclared function 'irq_register_storm_detection'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 830 | if (!irq_register_storm_detection(priv->irq, MLXREG_HOTPLUG_INTR_FREQ_HZ, | ^ >> drivers/platform/mellanox/mlxreg-hotplug.c:847:2: error: call to undeclared function 'irq_unregister_storm_detection'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 847 | irq_unregister_storm_detection(priv->irq); | ^ 2 errors generated. vim +/irq_register_storm_detection +830 drivers/platform/mellanox/mlxreg-hotplug.c 762 763 static int mlxreg_hotplug_probe(struct platform_device *pdev) 764 { 765 struct mlxreg_core_hotplug_platform_data *pdata; 766 struct mlxreg_hotplug_priv_data *priv; 767 struct i2c_adapter *deferred_adap; 768 int err; 769 770 pdata = dev_get_platdata(&pdev->dev); 771 if (!pdata) { 772 dev_err(&pdev->dev, "Failed to get platform data.\n"); 773 return -EINVAL; 774 } 775 776 /* Defer probing if the necessary adapter is not configured yet. */ 777 deferred_adap = i2c_get_adapter(pdata->deferred_nr); 778 if (!deferred_adap) 779 return -EPROBE_DEFER; 780 i2c_put_adapter(deferred_adap); 781 782 priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); 783 if (!priv) 784 return -ENOMEM; 785 786 if (pdata->irq) { 787 priv->irq = pdata->irq; 788 } else { 789 priv->irq = platform_get_irq(pdev, 0); 790 if (priv->irq < 0) 791 return priv->irq; 792 } 793 794 priv->regmap = pdata->regmap; 795 priv->dev = pdev->dev.parent; 796 priv->pdev = pdev; 797 798 err = devm_request_irq(&pdev->dev, priv->irq, 799 mlxreg_hotplug_irq_handler, IRQF_TRIGGER_FALLING 800 | IRQF_SHARED, "mlxreg-hotplug", priv); 801 if (err) { 802 dev_err(&pdev->dev, "Failed to request irq: %d\n", err); 803 return err; 804 } 805 806 disable_irq(priv->irq); 807 spin_lock_init(&priv->lock); 808 INIT_DELAYED_WORK(&priv->dwork_irq, mlxreg_hotplug_work_handler); 809 dev_set_drvdata(&pdev->dev, priv); 810 811 err = mlxreg_hotplug_attr_init(priv); 812 if (err) { 813 dev_err(&pdev->dev, "Failed to allocate attributes: %d\n", 814 err); 815 return err; 816 } 817 818 priv->hwmon = devm_hwmon_device_register_with_groups(&pdev->dev, 819 "mlxreg_hotplug", priv, priv->groups); 820 if (IS_ERR(priv->hwmon)) { 821 dev_err(&pdev->dev, "Failed to register hwmon device %ld\n", 822 PTR_ERR(priv->hwmon)); 823 return PTR_ERR(priv->hwmon); 824 } 825 826 /* Perform initial interrupts setup. */ 827 mlxreg_hotplug_set_irq(priv); 828 829 /* Register with generic interrupt storm detection */ > 830 if (!irq_register_storm_detection(priv->irq, MLXREG_HOTPLUG_INTR_FREQ_HZ, 831 mlxreg_hotplug_storm_handler, priv)) { 832 dev_warn(&pdev->dev, "Failed to register generic interrupt storm detection\n"); 833 } else { 834 dev_info(&pdev->dev, "Registered generic storm detection for IRQ %d\n", priv->irq); 835 } 836 837 priv->after_probe = true; 838 839 return 0; 840 } 841 842 static void mlxreg_hotplug_remove(struct platform_device *pdev) 843 { 844 struct mlxreg_hotplug_priv_data *priv = dev_get_drvdata(&pdev->dev); 845 846 /* Unregister generic interrupt storm detection */ > 847 irq_unregister_storm_detection(priv->irq); 848 849 /* Clean interrupts setup. */ 850 mlxreg_hotplug_unset_irq(priv); 851 devm_free_irq(&pdev->dev, priv->irq, priv); 852 } 853 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-15 14:44 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-15 7:49 [PATCH platform-next v4 0/2] Interrupt storm detection Ciju Rajan K 2026-01-15 7:49 ` [PATCH platform-next v4 1/2] kernel/irq: Add generic interrupt storm detection mechanism Ciju Rajan K 2026-01-15 8:29 ` Andy Shevchenko 2026-01-15 14:00 ` kernel test robot 2026-01-15 14:11 ` kernel test robot 2026-01-15 7:49 ` [PATCH platform-next v4 2/2] platform/mellanox: mlxreg-hotplug: Enabling interrupt storm detection Ciju Rajan K 2026-01-15 8:34 ` Andy Shevchenko 2026-01-15 14:43 ` kernel test robot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.