linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] genirq: add support for warning on long-running IRQ handlers
@ 2025-07-23 18:28 Wladislav Wiebe
  2025-07-24  5:18 ` Jiri Slaby
  0 siblings, 1 reply; 5+ messages in thread
From: Wladislav Wiebe @ 2025-07-23 18:28 UTC (permalink / raw)
  To: tglx, corbet, jirislaby
  Cc: akpm, paulmck, rostedt, Neeraj.Upadhyay, david, bp, arnd, fvdl,
	linux-doc, linux-kernel, wladislav.wiebe, peterz

Introduce a mechanism to detect and warn about prolonged IRQ handlers.
With a new command-line parameter (irqhandler.duration_warn_us=),
users can configure the duration threshold in microseconds when a warning
in such format should be emitted:

"[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 1330 us"

The implementation uses local_clock() to measure the execution duration of the
generic IRQ per-CPU event handler.

Signed-off-by: Wladislav Wiebe <wladislav.wiebe@nokia.com>
---
V2 -> V3: Addressed review comments based on v2:
	  https://lore.kernel.org/lkml/20250714084209.918-1-wladislav.wiebe@nokia.com/
	  - refactor commit message
	  - switch from early_param() to __setup()
	  - comment on approximation of nano to microseconds conversion
	  - move ts_start to if() branch
	  - align pr_warn arguments
	  - surround else block with brackets as well
	  - invert the condition and drop the "else {}" in cmdline arg. check
	  - make struct irqaction *action function param. const
	    in irqhandler_duration_check()
	  - print smp_processor_id() return as unsigned int
	  - fix warning text "on IRQ[...]" -> "of IRQ[...]"
V1 -> V2: refactor to use local_clock() instead of jiffies and replace
	  Kconfig knobs by a new command-line parameter.
V1 link:  https://lore.kernel.org/lkml/20250630124721.18232-1-wladislav.wiebe@nokia.com/
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 kernel/irq/handle.c                           | 49 ++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 07e22ba5bfe3..7a2d0338ee91 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2543,6 +2543,11 @@
 			for it. Intended to get systems with badly broken
 			firmware running.
 
+	irqhandler.duration_warn_us= [KNL]
+			Warn if an IRQ handler exceeds the specified duration
+			threshold in microseconds. Useful for identifying
+			long-running IRQs in the system.
+
 	irqpoll		[HW]
 			When an interrupt is not handled search all handlers
 			for it. Also check all handlers each timer
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 9489f93b3db3..258f40ad8cb1 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,6 +136,44 @@ void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
 	wake_up_process(action->thread);
 }
 
+static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
+static u64 irqhandler_duration_threshold_us __ro_after_init;
+
+static int __init irqhandler_duration_check_setup(char *arg)
+{
+	unsigned long val;
+	int ret;
+
+	ret = kstrtoul(arg, 0, &val);
+	if (ret) {
+		pr_err("Unable to parse irqhandler.duration_warn_us setting: ret=%d\n", ret);
+		return 0;
+	}
+
+	if (!val) {
+		pr_err("Invalid irqhandler.duration_warn_us setting, must be > 0\n");
+		return 0;
+	}
+
+	irqhandler_duration_threshold_us = val;
+	static_branch_enable(&irqhandler_duration_check_enabled);
+
+	return 1;
+}
+__setup("irqhandler.duration_warn_us=", irqhandler_duration_check_setup);
+
+static inline void irqhandler_duration_check(u64 ts_start, unsigned int irq,
+					     const struct irqaction *action)
+{
+	/* Approx. conversion to microseconds */
+	u64 delta_us = (local_clock() - ts_start) >> 10;
+
+	if (unlikely(delta_us > irqhandler_duration_threshold_us)) {
+		pr_warn_ratelimited("[CPU%u] long duration of IRQ[%u:%ps], took: %llu us\n",
+				    smp_processor_id(), irq, action->handler, delta_us);
+	}
+}
+
 irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc)
 {
 	irqreturn_t retval = IRQ_NONE;
@@ -155,7 +193,16 @@ irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc)
 			lockdep_hardirq_threaded();
 
 		trace_irq_handler_entry(irq, action);
-		res = action->handler(irq, action->dev_id);
+
+		if (static_branch_unlikely(&irqhandler_duration_check_enabled)) {
+			u64 ts_start = local_clock();
+
+			res = action->handler(irq, action->dev_id);
+			irqhandler_duration_check(ts_start, irq, action);
+		} else {
+			res = action->handler(irq, action->dev_id);
+		}
+
 		trace_irq_handler_exit(irq, action, res);
 
 		if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled interrupts\n",
-- 
2.39.3.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] genirq: add support for warning on long-running IRQ handlers
  2025-07-23 18:28 [PATCH v3] genirq: add support for warning on long-running IRQ handlers Wladislav Wiebe
@ 2025-07-24  5:18 ` Jiri Slaby
  2025-07-24  5:30   ` Jiri Slaby
  2025-07-24  9:47   ` Thomas Gleixner
  0 siblings, 2 replies; 5+ messages in thread
From: Jiri Slaby @ 2025-07-24  5:18 UTC (permalink / raw)
  To: Wladislav Wiebe, tglx, corbet, jirislaby
  Cc: akpm, paulmck, rostedt, Neeraj.Upadhyay, david, bp, arnd, fvdl,
	linux-doc, linux-kernel, peterz

On 23. 07. 25, 20:28, Wladislav Wiebe wrote:
> Introduce a mechanism to detect and warn about prolonged IRQ handlers.
> With a new command-line parameter (irqhandler.duration_warn_us=),
> users can configure the duration threshold in microseconds when a warning
> in such format should be emitted:
> 
> "[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 1330 us"
> 
> The implementation uses local_clock() to measure the execution duration of the
> generic IRQ per-CPU event handler.
...> +static inline void irqhandler_duration_check(u64 ts_start, 
unsigned int irq,
> +					     const struct irqaction *action)
> +{
> +	/* Approx. conversion to microseconds */
> +	u64 delta_us = (local_clock() - ts_start) >> 10;

Is this a microoptimization -- have you measured what speedup does it 
bring? IOW is it worth it instead of cleaner "/ NSEC_PER_USEC"?

Or instead, you could store the diff in irqhandler_duration_threshold_ns 
(mind that "_ns") and avoid the shift and div completely.

And what about the wrap? Don't you need abs_diff()?

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] genirq: add support for warning on long-running IRQ handlers
  2025-07-24  5:18 ` Jiri Slaby
@ 2025-07-24  5:30   ` Jiri Slaby
  2025-07-24  9:47   ` Thomas Gleixner
  1 sibling, 0 replies; 5+ messages in thread
From: Jiri Slaby @ 2025-07-24  5:30 UTC (permalink / raw)
  To: Wladislav Wiebe, tglx, corbet
  Cc: akpm, paulmck, rostedt, Neeraj.Upadhyay, david, bp, arnd, fvdl,
	linux-doc, linux-kernel, peterz

On 24. 07. 25, 7:18, Jiri Slaby wrote:
> On 23. 07. 25, 20:28, Wladislav Wiebe wrote:
>> Introduce a mechanism to detect and warn about prolonged IRQ handlers.
>> With a new command-line parameter (irqhandler.duration_warn_us=),
>> users can configure the duration threshold in microseconds when a warning
>> in such format should be emitted:
>>
>> "[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 
>> 1330 us"
>>
>> The implementation uses local_clock() to measure the execution 
>> duration of the
>> generic IRQ per-CPU event handler.
> ...> +static inline void irqhandler_duration_check(u64 ts_start, 
> unsigned int irq,
>> +                         const struct irqaction *action)
>> +{
>> +    /* Approx. conversion to microseconds */
>> +    u64 delta_us = (local_clock() - ts_start) >> 10;
> 
> Is this a microoptimization -- have you measured what speedup does it 
> bring? IOW is it worth it instead of cleaner "/ NSEC_PER_USEC"?
> 
> Or instead, you could store the diff in irqhandler_duration_threshold_ns 
> (mind that "_ns") and avoid the shift and div completely.
> 
> And what about the wrap? Don't you need abs_diff()?

Not that ^^^, it won't work, but something else. But if I am counting 
correctly, the wrap is in 584 years if counted from 0. Well, for 
native/tsc, "Intel guarantees that the time-stamp counter will not 
wraparound within 10 years after being reset". I have no idea what 
virtualizations return to local_clock(). This is not my call to decide, 
though.

> thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] genirq: add support for warning on long-running IRQ handlers
  2025-07-24  5:18 ` Jiri Slaby
  2025-07-24  5:30   ` Jiri Slaby
@ 2025-07-24  9:47   ` Thomas Gleixner
  2025-07-24 16:07     ` Wladislav Wiebe
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-07-24  9:47 UTC (permalink / raw)
  To: Jiri Slaby, Wladislav Wiebe, corbet, jirislaby
  Cc: akpm, paulmck, rostedt, Neeraj.Upadhyay, david, bp, arnd, fvdl,
	linux-doc, linux-kernel, peterz

On Thu, Jul 24 2025 at 07:18, Jiri Slaby wrote:

> On 23. 07. 25, 20:28, Wladislav Wiebe wrote:
>> Introduce a mechanism to detect and warn about prolonged IRQ handlers.
>> With a new command-line parameter (irqhandler.duration_warn_us=),
>> users can configure the duration threshold in microseconds when a warning
>> in such format should be emitted:
>> 
>> "[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 1330 us"
>> 
>> The implementation uses local_clock() to measure the execution duration of the
>> generic IRQ per-CPU event handler.
> ...> +static inline void irqhandler_duration_check(u64 ts_start, 
> unsigned int irq,
>> +					     const struct irqaction *action)
>> +{
>> +	/* Approx. conversion to microseconds */
>> +	u64 delta_us = (local_clock() - ts_start) >> 10;
>
> Is this a microoptimization -- have you measured what speedup does it 
> bring? IOW is it worth it instead of cleaner "/ NSEC_PER_USEC"?

A 64-bit division is definitely more expensive than a shift operation
and on 32-bit w/o a 64-bit divide instruction it's more than horribly
slow.

> Or instead, you could store the diff in irqhandler_duration_threshold_ns 
> (mind that "_ns") and avoid the shift and div completely.

That's the right thing to do. The setup code can do a *1000 and be done.

> And what about the wrap? Don't you need abs_diff()?

~500 years after boot :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] genirq: add support for warning on long-running IRQ handlers
  2025-07-24  9:47   ` Thomas Gleixner
@ 2025-07-24 16:07     ` Wladislav Wiebe
  0 siblings, 0 replies; 5+ messages in thread
From: Wladislav Wiebe @ 2025-07-24 16:07 UTC (permalink / raw)
  To: Thomas Gleixner, Jiri Slaby, corbet
  Cc: akpm, paulmck, rostedt, Neeraj.Upadhyay, david, bp, arnd, fvdl,
	linux-doc, linux-kernel, peterz


On 24/07/2025 11:47, Thomas Gleixner wrote:
> On Thu, Jul 24 2025 at 07:18, Jiri Slaby wrote:
>
>> On 23. 07. 25, 20:28, Wladislav Wiebe wrote:
>>> Introduce a mechanism to detect and warn about prolonged IRQ handlers.
>>> With a new command-line parameter (irqhandler.duration_warn_us=),
>>> users can configure the duration threshold in microseconds when a warning
>>> in such format should be emitted:
>>>
>>> "[CPU14] long duration of IRQ[159:bad_irq_handler [long_irq]], took: 1330 us"
>>>
>>> The implementation uses local_clock() to measure the execution duration of the
>>> generic IRQ per-CPU event handler.
>> ...> +static inline void irqhandler_duration_check(u64 ts_start,
>> unsigned int irq,
>>> +                                         const struct irqaction *action)
>>> +{
>>> +    /* Approx. conversion to microseconds */
>>> +    u64 delta_us = (local_clock() - ts_start) >> 10;
>> Is this a microoptimization -- have you measured what speedup does it
>> bring? IOW is it worth it instead of cleaner "/ NSEC_PER_USEC"?
> A 64-bit division is definitely more expensive than a shift operation
> and on 32-bit w/o a 64-bit divide instruction it's more than horribly
> slow.
>
>> Or instead, you could store the diff in irqhandler_duration_threshold_ns
>> (mind that "_ns") and avoid the shift and div completely.
> That's the right thing to do. The setup code can do a *1000 and be done.

Excellent optimization proposal! It has been included in v4: https://lore.kernel.org/lkml/20250724155059.2992-1-wladislav.wiebe@nokia.com/ Thanks, - W.W.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-24 16:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 18:28 [PATCH v3] genirq: add support for warning on long-running IRQ handlers Wladislav Wiebe
2025-07-24  5:18 ` Jiri Slaby
2025-07-24  5:30   ` Jiri Slaby
2025-07-24  9:47   ` Thomas Gleixner
2025-07-24 16:07     ` Wladislav Wiebe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).