public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/irq: Optimize interrupts decimals printing
@ 2025-11-03 18:51 Dmitry Ilvokhin
  2026-02-25 17:22 ` Dmitry Ilvokhin
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Ilvokhin @ 2025-11-03 18:51 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: linux-kernel, kernel-team

Monitoring tools (such as Prometheus [1] or dynolog [2]) periodically
pull /proc/interrupts to export metrics as a timeseries for future
analysis and investigation.

In large fleets, /proc/interrupts is polled (often every few seconds) on
every machine. The cumulative overhead adds up quickly across thousands
of nodes, so reducing the cost of generating these stats does have a
measurable operational impact. With the ongoing trend toward higher core
counts per machine, this cost becomes even more noticeable over time,
since interrupt counters are per-CPU. In Meta's fleet, we have observed
this overhead at scale.

Although a binary /proc interface would be a better long-term solution
due to lower formatting (kernel side) and parsing (userspace side)
overhead, the text interface will remain in use for some time, even if
better solutions will be available. Optimizing the /proc/interrupts
printing code is therefore still beneficial.

Function seq_printf() supports rich format string for decimals printing,
but it doesn't required for printing /proc/interrupts per CPU counters,
seq_put_decimal_ull_width() function can be used instead to print per
CPU counters, because very limited formatting is required for this case.
Similar optimization idea is already used in show_interrupts().

Performance counter stats (truncated) for 'sh -c cat /proc/interrupts
>/dev/null' (1000 runs) before and after applying the patch below.

Before:

      3.42 msec task-clock        #    0.802 CPUs utilized   ( +-  0.05% )
         1      context-switches  #  291.991 /sec            ( +-  0.74% )
         0      cpu-migrations    #    0.000 /sec
       343      page-faults       #  100.153 K/sec           ( +-  0.01% )
 8,932,242      instructions      #    1.66  insn per cycle  ( +-  0.34% )
 5,374,427      cycles            #    1.569 GHz             ( +-  0.04% )
 1,483,154      branches          #  433.068 M/sec           ( +-  0.22% )
    28,768      branch-misses     #    1.94% of all branches ( +-  0.31% )

0.00427182 +- 0.00000215 seconds time elapsed  ( +-  0.05% )

After:

      2.39 msec task-clock        #    0.796 CPUs utilized   ( +-  0.06% )
         1      context-switches  #  418.541 /sec            ( +-  0.70% )
         0      cpu-migrations    #    0.000 /sec
       343      page-faults       #  143.560 K/sec           ( +-  0.01% )
 7,020,982      instructions      #    1.30  insn per cycle  ( +-  0.52% )
 5,397,266      cycles            #    2.259 GHz             ( +-  0.06% )
 1,569,648      branches          #  656.962 M/sec           ( +-  0.08% )
    25,419      branch-misses     #    1.62% of all branches ( +-  0.72% )

0.00299996 +- 0.00000206 seconds time elapsed  ( +-  0.07% )

Relative speed up in time elapsed is around 29%.

[1]: https://github.com/prometheus/prometheus
[2]: https://github.com/facebookincubator/dynolog

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
Changes v2:
- Expanded commit message: add more rationale for the proposed change.
- Renamed helper put_spaced_decimal() -> put_decimal() primarely to make
  checkpatch.pl --strict pass.

 arch/x86/kernel/irq.c | 107 ++++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 10721a125226..4a8bac31be70 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -61,6 +61,18 @@ void ack_bad_irq(unsigned int irq)
 	apic_eoi();
 }
 
+/*
+ * A helper routine for putting space and decimal number without overhead
+ * from rich format of printf().
+ */
+static void put_decimal(struct seq_file *p, unsigned long long num)
+{
+	const char *delimiter = " ";
+	unsigned int width = 10;
+
+	seq_put_decimal_ull_width(p, delimiter, num, width);
+}
+
 #define irq_stats(x)		(&per_cpu(irq_stat, x))
 /*
  * /proc/interrupts printing for arch specific interrupts
@@ -69,103 +81,101 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 {
 	int j;
 
-	seq_printf(p, "%*s: ", prec, "NMI");
+	seq_printf(p, "%*s:", prec, "NMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
+		put_decimal(p, irq_stats(j)->__nmi_count);
 	seq_puts(p, "  Non-maskable interrupts\n");
 #ifdef CONFIG_X86_LOCAL_APIC
-	seq_printf(p, "%*s: ", prec, "LOC");
+	seq_printf(p, "%*s:", prec, "LOC");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs);
+		put_decimal(p, irq_stats(j)->apic_timer_irqs);
 	seq_puts(p, "  Local timer interrupts\n");
 
-	seq_printf(p, "%*s: ", prec, "SPU");
+	seq_printf(p, "%*s:", prec, "SPU");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count);
+		put_decimal(p, irq_stats(j)->irq_spurious_count);
 	seq_puts(p, "  Spurious interrupts\n");
-	seq_printf(p, "%*s: ", prec, "PMI");
+	seq_printf(p, "%*s:", prec, "PMI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs);
+		put_decimal(p, irq_stats(j)->apic_perf_irqs);
 	seq_puts(p, "  Performance monitoring interrupts\n");
-	seq_printf(p, "%*s: ", prec, "IWI");
+	seq_printf(p, "%*s:", prec, "IWI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs);
+		put_decimal(p, irq_stats(j)->apic_irq_work_irqs);
 	seq_puts(p, "  IRQ work interrupts\n");
-	seq_printf(p, "%*s: ", prec, "RTR");
+	seq_printf(p, "%*s:", prec, "RTR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count);
+		put_decimal(p, irq_stats(j)->icr_read_retry_count);
 	seq_puts(p, "  APIC ICR read retries\n");
 	if (x86_platform_ipi_callback) {
-		seq_printf(p, "%*s: ", prec, "PLT");
+		seq_printf(p, "%*s:", prec, "PLT");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis);
+			put_decimal(p, irq_stats(j)->x86_platform_ipis);
 		seq_puts(p, "  Platform interrupts\n");
 	}
 #endif
 #ifdef CONFIG_SMP
-	seq_printf(p, "%*s: ", prec, "RES");
+	seq_printf(p, "%*s:", prec, "RES");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count);
+		put_decimal(p, irq_stats(j)->irq_resched_count);
 	seq_puts(p, "  Rescheduling interrupts\n");
-	seq_printf(p, "%*s: ", prec, "CAL");
+	seq_printf(p, "%*s:", prec, "CAL");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_call_count);
+		put_decimal(p, irq_stats(j)->irq_call_count);
 	seq_puts(p, "  Function call interrupts\n");
-	seq_printf(p, "%*s: ", prec, "TLB");
+	seq_printf(p, "%*s:", prec, "TLB");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count);
+		put_decimal(p, irq_stats(j)->irq_tlb_count);
 	seq_puts(p, "  TLB shootdowns\n");
 #endif
 #ifdef CONFIG_X86_THERMAL_VECTOR
-	seq_printf(p, "%*s: ", prec, "TRM");
+	seq_printf(p, "%*s:", prec, "TRM");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count);
+		put_decimal(p, irq_stats(j)->irq_thermal_count);
 	seq_puts(p, "  Thermal event interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_THRESHOLD
-	seq_printf(p, "%*s: ", prec, "THR");
+	seq_printf(p, "%*s:", prec, "THR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
+		put_decimal(p, irq_stats(j)->irq_threshold_count);
 	seq_puts(p, "  Threshold APIC interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE_AMD
-	seq_printf(p, "%*s: ", prec, "DFR");
+	seq_printf(p, "%*s:", prec, "DFR");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->irq_deferred_error_count);
+		put_decimal(p, irq_stats(j)->irq_deferred_error_count);
 	seq_puts(p, "  Deferred Error APIC interrupts\n");
 #endif
 #ifdef CONFIG_X86_MCE
-	seq_printf(p, "%*s: ", prec, "MCE");
+	seq_printf(p, "%*s:", prec, "MCE");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(mce_exception_count, j));
+		put_decimal(p, per_cpu(mce_exception_count, j));
 	seq_puts(p, "  Machine check exceptions\n");
-	seq_printf(p, "%*s: ", prec, "MCP");
+	seq_printf(p, "%*s:", prec, "MCP");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
+		put_decimal(p, per_cpu(mce_poll_count, j));
 	seq_puts(p, "  Machine check polls\n");
 #endif
 #ifdef CONFIG_X86_HV_CALLBACK_VECTOR
 	if (test_bit(HYPERVISOR_CALLBACK_VECTOR, system_vectors)) {
-		seq_printf(p, "%*s: ", prec, "HYP");
+		seq_printf(p, "%*s:", prec, "HYP");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ",
-				   irq_stats(j)->irq_hv_callback_count);
+			put_decimal(p, irq_stats(j)->irq_hv_callback_count);
 		seq_puts(p, "  Hypervisor callback interrupts\n");
 	}
 #endif
 #if IS_ENABLED(CONFIG_HYPERV)
 	if (test_bit(HYPERV_REENLIGHTENMENT_VECTOR, system_vectors)) {
-		seq_printf(p, "%*s: ", prec, "HRE");
+		seq_printf(p, "%*s:", prec, "HRE");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ",
-				   irq_stats(j)->irq_hv_reenlightenment_count);
+			put_decimal(p,
+				    irq_stats(j)->irq_hv_reenlightenment_count);
 		seq_puts(p, "  Hyper-V reenlightenment interrupts\n");
 	}
 	if (test_bit(HYPERV_STIMER0_VECTOR, system_vectors)) {
-		seq_printf(p, "%*s: ", prec, "HVS");
+		seq_printf(p, "%*s:", prec, "HVS");
 		for_each_online_cpu(j)
-			seq_printf(p, "%10u ",
-				   irq_stats(j)->hyperv_stimer0_count);
+			put_decimal(p, irq_stats(j)->hyperv_stimer0_count);
 		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
 	}
 #endif
@@ -174,28 +184,25 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 	seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));
 #endif
 #if IS_ENABLED(CONFIG_KVM)
-	seq_printf(p, "%*s: ", prec, "PIN");
+	seq_printf(p, "%*s:", prec, "PIN");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ", irq_stats(j)->kvm_posted_intr_ipis);
+		put_decimal(p, irq_stats(j)->kvm_posted_intr_ipis);
 	seq_puts(p, "  Posted-interrupt notification event\n");
 
-	seq_printf(p, "%*s: ", prec, "NPI");
+	seq_printf(p, "%*s:", prec, "NPI");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ",
-			   irq_stats(j)->kvm_posted_intr_nested_ipis);
+		put_decimal(p, irq_stats(j)->kvm_posted_intr_nested_ipis);
 	seq_puts(p, "  Nested posted-interrupt event\n");
 
-	seq_printf(p, "%*s: ", prec, "PIW");
+	seq_printf(p, "%*s:", prec, "PIW");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ",
-			   irq_stats(j)->kvm_posted_intr_wakeup_ipis);
+		put_decimal(p, irq_stats(j)->kvm_posted_intr_wakeup_ipis);
 	seq_puts(p, "  Posted-interrupt wakeup event\n");
 #endif
 #ifdef CONFIG_X86_POSTED_MSI
-	seq_printf(p, "%*s: ", prec, "PMN");
+	seq_printf(p, "%*s:", prec, "PMN");
 	for_each_online_cpu(j)
-		seq_printf(p, "%10u ",
-			   irq_stats(j)->posted_msi_notification_count);
+		put_decimal(p, irq_stats(j)->posted_msi_notification_count);
 	seq_puts(p, "  Posted MSI notification event\n");
 #endif
 	return 0;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/irq: Optimize interrupts decimals printing
  2025-11-03 18:51 [PATCH v2] x86/irq: Optimize interrupts decimals printing Dmitry Ilvokhin
@ 2026-02-25 17:22 ` Dmitry Ilvokhin
  2026-02-27 13:20   ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Ilvokhin @ 2026-02-25 17:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: linux-kernel, kernel-team

I would like to follow up on this patch with additional data from production
deployment.

We have rolled out this change to a subset of the Meta fleet. On each machine,
dynolog [1] periodically reads /proc/interrupts to collect per-CPU interrupt
statistics, so this path executes frequently across the fleet.

After deploying the patch, we measured the reduction in CPU cycles
spent in the /proc/interrupts read path on machines with different
virtual core counts.

vCores   Cycle reduction
------   ---------------
36          -18.9%
72          -25.5%
252         -34.7%

As expected, the benefit increases with higher core counts, since the
formatting work scales with the number of CPUs.

We have not observed any functional regressions or changes in the output
format. Existing userspace parsers continue to work without
modification.

Please let me know if there are any concerns or if additional data would be
helpful.

[1]: https://github.com/facebookincubator/dynolog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/irq: Optimize interrupts decimals printing
  2026-02-25 17:22 ` Dmitry Ilvokhin
@ 2026-02-27 13:20   ` Thomas Gleixner
  2026-02-27 14:04     ` Dmitry Ilvokhin
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2026-02-27 13:20 UTC (permalink / raw)
  To: Dmitry Ilvokhin, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin
  Cc: linux-kernel, kernel-team

On Wed, Feb 25 2026 at 17:22, Dmitry Ilvokhin wrote:

Sorry, this fell through the cracks.

> I would like to follow up on this patch with additional data from production
> deployment.

Instead of the non-interesting advertising blurb, you could have had the
courtesy to follow up with a new version of this patch after checking
whether it still applies, which it does not.

But please spare the effort because I already fixed it up locally and
thereby looked at the larger picture of /proc/interrupts.

It's amazing that all the people who "improve" that stuff never see the
obvious low hanging fruit there. I'll send out a series later.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/irq: Optimize interrupts decimals printing
  2026-02-27 13:20   ` Thomas Gleixner
@ 2026-02-27 14:04     ` Dmitry Ilvokhin
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Ilvokhin @ 2026-02-27 14:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	linux-kernel, kernel-team

On Fri, Feb 27, 2026 at 02:20:57PM +0100, Thomas Gleixner wrote:
> On Wed, Feb 25 2026 at 17:22, Dmitry Ilvokhin wrote:
> 
> Sorry, this fell through the cracks.
> 
> > I would like to follow up on this patch with additional data from production
> > deployment.
> 
> Instead of the non-interesting advertising blurb, you could have had the
> courtesy to follow up with a new version of this patch after checking
> whether it still applies, which it does not.
> 
> But please spare the effort because I already fixed it up locally and
> thereby looked at the larger picture of /proc/interrupts.
> 
> It's amazing that all the people who "improve" that stuff never see the
> obvious low hanging fruit there. I'll send out a series later.

Hi Thomas,

Thanks for taking a look and for fixing it up locally.

I apologize for not checking whether the patch still applied before
following up, that was my mistake.

I'm glad to hear you're looking at the larger picture of
/proc/interrupts. If there’s anything I can help with (testing,
benchmarking, etc.), I'd be happy to assist.

> 
> Thanks,
> 
>         tglx
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-27 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 18:51 [PATCH v2] x86/irq: Optimize interrupts decimals printing Dmitry Ilvokhin
2026-02-25 17:22 ` Dmitry Ilvokhin
2026-02-27 13:20   ` Thomas Gleixner
2026-02-27 14:04     ` Dmitry Ilvokhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox