From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Dmitry Ilvokhin <d@ilvokhin.com>,
Neil Horman <nhorman@tuxdriver.com>
Subject: [patch 00/14] genirq: Improve /proc/interrupts for real and add a binary interface
Date: Wed, 04 Mar 2026 19:55:27 +0100 [thread overview]
Message-ID: <20260303150539.513068586@kernel.org> (raw)
I started to look into this a few days ago due to the recent patch reminder
from Dmitry:
https://lore.kernel.org/aQj5mGZ6_BBlAm3B@shell.ilvokhin.com
Let me start with the history or /proc/interrupts. I've been observing for
more than two decades that developers micro enhance performance of
/proc/interrupts readout and stop right there after gaining $N% in a
particular part of the code instead of actually looking at the bigger
picture. I understand that people have time constraints, but it's amazing
how much low hanging fruit has been left on the table due to that.
Not to mention the more than a decade old talking about a new binary
interface which addresses the underlying problem of the unfortunately
unchangeable /proc/interrupts ABI. Just for giggles, the patch above
mentions it even in the change log:
"Although a binary /proc interface would be a better long-term solution
due to lower formatting (kernel side) and parsing (user-space side)
overhead the text interface will remain in use for some time, even if
better solutions will be available. Optimizing the /proc/interrupts
printing code is therefore still beneficial."
Coming back to the above referenced patch which triggered me to actually
look into that. The patch achieves an impressive readout time improvement
of ~19% on my 256 CPUs test machine with a trivial C code test case which
essentially does:
fd = open("/proc/interrupts");
for (i = 0; i < 1000; i++) {
t0 = now();
read_all_data(fd);
deltas[i] = now() - t0;
lseek(fd, 0, SEEK_SET);
}
print_mean_and_rel_stddev(deltas);
I wrote that trivial test because the numbers provided in the patch above
are based on 'perf stat -r 1000 cat /proc/interrupts >/dev/null', which is
taking all the irrelevant setup and tear-down costs of 'cat' into
account. That makes it tedious to observe the actual problems via perf
[stat|top] because the setup/teardown overhead stats obfuscate the output.
For completeness sake the numbers observed with that very same perf command
line are provided below for reference. They pretty much confirm the
findings of the narrowed down micro benchmark.
Let's take a look at the resulting numbers:
Patch t mean rel. stddev delta base delta prev
Baseline v7.0-rc1 1310.363 us +/-1.81%
1 x86/irq: Optimize interrupts decimal 1059.238 us +/-1.76% -19% -19%
Impressive, but not so impressive when looking at perf top output and
addressing all the other offenders one by one:
Patch t mean rel. stddev delta base delta prev
Baseline v7.0-rc1 1310.363 us +/-1.81%
1 x86/irq: Optimize interrupts decimal 1059.238 us +/-1.76% -19% -19%
3 genirq/proc: Utilize irq_desc::tot_c 652.365 us +/-0.81% -50% -38%
4 x86/irq: Make irqstats array based 605.326 us +/-1.63% -54% - 7%
7 genirq: Calculate precision only whe 575.973 us +/-1.12% -56% - 5%
10 genirq/proc: Speed up /proc/interrup 209.907 us +/-1.84% -84% -64%
Now let's look how I got there by simply running the microbenchmark with
infinite loops and using perf top to analyze the hotspots.
Baseline
20.19% [k] mtree_load
19.34% [k] num_to_str
10.12% [k] number
9.18% [k] vsnprintf
7.55% [k] seq_put_decimal_ull_width
6.16% [k] format_decode
5.65% [k] show_interrupts
4.88% [k] _find_next_bit
3.46% [k] seq_read_iter
3.29% [k] seq_printf
1.58% [k] memcpy_orig
1.50% [k] __rcu_read_lock
1.26% [k] arch_show_interrupts
1.05% [k] __rcu_read_unlock
1.04% [k] int_seq_next
0.94% [k] rep_movs_alternative
0.82% [k] irq_to_desc
0.56% [k] irq_get_nr_irqs
It's interesting to me that vsnprintf() was the first item to look at.
1 x86/irq: Optimize interrupts decimal printing
29.94% [k] num_to_str
25.16% [k] mtree_load
12.48% [k] seq_put_decimal_ull_width
7.05% [k] show_interrupts
6.53% [k] _find_next_bit
4.48% [k] seq_read_iter
1.87% [k] __rcu_read_lock
1.84% [k] arch_show_interrupts
1.60% [k] vsnprintf
1.32% [k] __rcu_read_unlock
1.18% [k] rep_movs_alternative
1.16% [k] int_seq_next
1.01% [k] format_decode
0.98% [k] irq_to_desc
0.65% [k] irq_get_nr_irqs
So Dmitry's patch removed the vsnprintf() overhead and made num_to_str()
more prominent.
That number is insanely high. So I analyzed /proc/interrupt output which
gave an easy way out. There is a large number of interrupts with all but
one CPU having zero counts. That's normal for interrupt managed multiqueue
devices. Also low frequency interrupts tend to stay on their initial
affinity and are not moved around by balancers.
With single CPU effective affinity targets (x86 and other architectures)
the majority of lines have just _one_ non zero entry, unless the balancer
or admin changed the affinity.
So it was pretty obvious to use a fixed string and write it directly
instead of repeating the string conversion of zero over and over.
3 genirq/proc: Utilize irq_desc::tot_count to avoid evaluation
41.04% [kernel] [k] mtree_load
10.92% [kernel] [k] num_to_str
5.97% [kernel] [k] show_interrupts
5.18% [kernel] [k] _find_next_bit
4.86% [kernel] [k] seq_put_decimal_ull_width
4.65% [kernel] [k] seq_read_iter
3.02% [kernel] [k] __rcu_read_lock
2.84% [kernel] [k] arch_show_interrupts
2.71% [kernel] [k] irq_proc_emit_counts
2.69% [kernel] [k] memcpy_orig
2.32% [kernel] [k] vsnprintf
2.01% [kernel] [k] __rcu_read_unlock
1.79% [kernel] [k] int_seq_next
1.71% [kernel] [k] format_decode
1.62% [kernel] [k] irq_to_desc
1.47% [kernel] [k] rep_movs_alternative
1.13% [kernel] [k] irq_get_nr_irqs
num_to_str() has lost the pole position and got replaced by mtree_load()
again. mtree_load() is used to get the interrupt descriptors, but there is
something seriously wrong with the high CPU usage. Though I decided to look
at that later.
When fixing up Dmitry's patch I noticed the way how x86 manages the
architecture specific statistics is suboptimal. x86 holds the per interrupt
counters in struct members and therefore requires an #ifdeffed series of
copied code per counter to emit them. The struct member based counters are
also in the way of implementing a binary interface without adding more
architecture specific duplicated code.
The same can be achieved with an array of counters. That's not changing the
actual code in the interrupt hotpath, which increments the counter, as the
array indices are constant so the compiler still calculates the offset from
the per CPU data pointer as before. This also fixes the out of sync
arch_show_interrupts() and arch_irq_stat_cpu() implementations as they now
use the same table.
4 x86/irq: Make irqstats array based
43.99% [kernel] [k] mtree_load
9.16% [kernel] [k] irq_proc_emit_counts
6.41% [kernel] [k] show_interrupts
5.86% [kernel] [k] _find_next_bit
4.91% [kernel] [k] seq_read_iter
3.98% [kernel] [k] memcpy_orig
3.28% [kernel] [k] __rcu_read_lock
2.95% [kernel] [k] num_to_str
2.28% [kernel] [k] vsnprintf
2.06% [kernel] [k] __rcu_read_unlock
2.02% [kernel] [k] rep_movs_alternative
1.99% [kernel] [k] int_seq_next
1.96% [kernel] [k] format_decode
1.69% [kernel] [k] irq_to_desc
1.58% [kernel] [k] seq_put_decimal_ull_width
1.20% [kernel] [k] irq_get_nr_irqs
num_to_str() got further demoted because x86 now uses
irq_proc_emit_counts() with the optimized 0 output as well and
arch_show_interrupts() is off the radar because it only contains a trivial
loop.
This reduces text size by ~2k, which obviously reduces the I-cache foot
print. The loop overhead is completely irrelevant compared to the actual
costs of doing the for_each_online_cpu() loop once per interrupt and in
each iteration access memory in the worst possible pattern especially when
reading counters from remote nodes. There is not much which can be done
about that. I tried to copy all per CPU counters into local data storage
first, but that just trades one memory/cacheline massacre against another
for no gain.
So back to mtree_load(). That high CPU usage doesn't make any sense because
the whole point of sparse interrupts and the underlying maple tree is to
have quick iterations through the maple tree to skip holes. So much for the
theory.
fs/proc/interrupts got never updated and still iterates over the possible
interrupt number space one by one thereby defeating the whole purpose of
the maple tree. On the test machine that's almost 1000 lookups of interrupt
descriptor, while only 153 are in use and exist in the maple tree.
The trivial fix would have been to use the proper iterator in
fs/proc/interrupts, but that would need some investigation vs. the
architectures which do not use the generic version of show_interrupts().
That can be done by those people if they actually care.
Aside of that it still would touch the maple tree twice for each interrupt.
First for finding the next number and then to actually load the interrupt
descriptor. This can be done smarter by retrieving the next descriptor
right away and storing it in the *v data pointer of the seq_file ops
instead of using the pointer to store the number.
But doing that required some preparatory changes and while thinking about
them I noticed a few obvious improvements to avoid doing the same thing
over and over for no reason and to reduce the number of conditional
branches in show_interrupts()
7 genirq: Calculate precision only when required
46.41% [k] mtree_load
10.42% [k] irq_proc_emit_counts
5.86% [k] show_interrupts
5.11% [k] seq_read_iter
5.05% [k] _find_next_bit
3.56% [k] memcpy_orig
3.23% [k] num_to_str
2.51% [k] __rcu_read_unlock
2.27% [k] vsnprintf
2.23% [k] __rcu_read_lock
2.00% [k] rep_movs_alternative
1.90% [k] int_seq_next
1.81% [k] seq_put_decimal_ull_width
1.58% [k] format_decode
1.09% [k] number
0.91% [k] string
0.76% [k] put_dec_trunc8
0.67% [k] seq_printf
0.60% [k] irq_to_desc
0.46% [k] irq_get_nr_irqs
show_interrupts() became slightly less expensive. The mtree_load()
leader position stays obviously the same.
With that addressed adding optimized seq_file code into the interrupt core
became feasible. All what it still required was to provide a fast refcount
mechanism so that the pointer can safely be stored in the seq_file iterator
and cannot be freed between seq_next() and seq_show(). I pondered briefly
to use the existing kobject in the interrupt descriptor, but that uses a
refcount_t underneath which is way more expensive than rcuref_t and
requires function calls. As the descriptors are RCU managed rcuref_t is the
obvious choice.
10 genirq/proc: Speed up /proc/interrupts iteration
26.77% [k] irq_proc_emit_counts
15.90% [k] _find_next_bit
10.56% [k] memcpy_orig
9.82% [k] num_to_str
5.68% [k] rep_movs_alternative
5.34% [k] vsnprintf
4.76% [k] seq_put_decimal_ull_width
4.01% [k] format_decode
3.17% [k] mt_find
2.51% [k] string
2.24% [k] number
1.52% [k] put_dec_trunc8
1.44% [k] seq_printf
0.87% [k] irq_seq_show
mt_find() is the one retrieving the next descriptor and the numbers are now
where one would expect them to be.
I might have missed some low hanging fruit there as well. If you find
it, you are entitled to keep it and fix it yourself. :)
That said, let's talk about the previously mentioned optimized binary
interface. As I'm not interested in reading "it would be better" over and
over for another ten years, I sat down and implemented a straight forward
interface which provides a variable record sized binary dump of the
relevant statistics separated into three files:
1) device interrupts
2) per CPU interrupts (irq descriptor based)
3) architecture specific interrupts
The separation is done because for interrupt balancing purposes #1 is the
interesting part as #2 and #3 cannot be influenced by modifying
affinities. They can be monitored of course, but that's a different class
of events.
This interface comes with the following semantics:
- each record starts with a pair of 'interrupt number' and 'number of
entries'
- each entry is a pair of 'cpu number' and 'interrupt counts'
- records are only emitted for interrupts which have a total non-zero
event count as that's what matters for observation.
- records are only emitted if the counter(s) for the effective affinity
mask CPU target(s) are non zero.
Emitting counts from a previous affinity setting is irrelevant and can
be easily cached by the monitoring application if needed.
- completely lockless vs. the interrupt descriptor
The readout does not touch the interrupt descriptor lock so it does
not interfere with concurrent high frequency interrupts at all except
for the memory access to the counter which can't obviously be avoided.
- contrary to the non-sensical seq_file seek, which handles
/proc/interrupts, allow only seeking back to the origin.
/proc/interrupts is also a non constant record file and the seq_file
lseek() implementation does therefore not guarantee a consistent
lseek() at all, except for seek(0, SEEK_SET).
Reading those files with the same 1000 loops test takes on average:
~ 8 us for the device interrupts
~37 us for x86 architecture interrupts
---
~45 us total
Compared to the fully applied /proc/interrupt enhancements:
209us vs. 45us =~ -79% --> ~4.5X
The comparison to the baseline v7.0-rc1 is:
1310us vs. 45us =~ -96% --> ~29X
For completeness sake the perf top list of the endless read/lseek loop
accessing /proc/irq/device_stats:
70.52% [kernel] [k] mt_find
6.36% [kernel] [k] _copy_to_iter
5.09% [kernel] [k] irq_stats_read
3.77% [kernel] [k] irq_find_desc_at_or_after
2.09% [kernel] [k] __rcu_read_lock
2.04% [kernel] [k] __rcu_read_unlock
1.67% [kernel] [k] __check_object_size
1.07% [kernel] [k] __virt_addr_valid
0.82% [kernel] [k] entry_SYSCALL_64
0.81% [kernel] [k] vfs_read
Here mt_find() dominates rightfully because the decision to output data is
trivial as all interrupts are single CPU target so there is only one
counter per interrupt to access. If non-zero the store and the output is
just vanishing in the noise compared to the actual lookup costs. Based on
the 8us average that means:
5.641 us mt_find()
0.509 us copy_to_iter()
0.407 us irq_stats_read()
0.302 us irq_find_desc_at_or_after
mt_find() and irq_find_desc_at_or_after() belong together so we have:
5.943 us lookup()
0.509 us copy_to_iter() // Copies data to user space
0.407 us irq_stats_read() // Counter analysis
Broken down to a per interrupt average with 153 requested interrupts on
that machine this means:
38.8 ns lookup time
3.3 ns copy to user
2.7 ns analayis
And the same for the endless read/lseek loop accessing
/proc/irq/arch_stats:
57.82% [k] _find_next_bit
37.20% [k] irq_arch_stats_read
1.06% [k] _copy_to_iter
1.03% [k] rep_movs_alternative
0.73% [k] irq_stat_copy_to_iter
0.27% [k] vfs_read
The most expensive part of the latter are the for_each_online_cpu() loops
and the sub-optimal memory access patterns as explained above.
So with 37us readout time for 256 CPUs and 19 x86 architecture specific
counters this gives:
21.393 us for_each_online_cpu()
13.764 us irq_arch_stats_read() // Counter analysis
That means an average per counter:
1.12 us for_each_online_cpu()
0.72 us irq_arch_stats_read()
In theory we could optimize the for_each_online_cpu() overhead, but that
creates a lot of inlined code for a meager 5% performance improvement over
the out of line version. Not really worth it.
Note that the /proc/interrupt numbers were taken on a mostly idle system
with almost zero propability to hit irq_desc::lock contention.
All test results are obviously skewed due to the repetitive invocations,
which prime the cache. But cache cold tests with no repeats are resulting
in roughly the same performance difference ratio for all scenarios.
A quick python hack computing the total number of interrupts from the
optimized /proc/interrupts and from the new binary interfaces yields:
/proc/interrupts optimized /proc/irq/[device+arch]_stats
6.957 ms 0.394 ms -94% (~17X)
which covers both the costs of reading and computing. The read advantage of
the binary interface is ~4.5X (see above), so the compute advantage of
avoiding the text parsing and not looking at pointless numbers amounts to
~3.7X which is unsurprisingly in the expected ballpark.
The last four patches related to the binary interface need obviously some
thought vs. the interface and are therefore marked RFC.
The series applies on top of v7.0-rc1 and is also available via git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git irq/core
Thanks
tglx
---
It's not that I'm so smart, it's just that I stay with problems longer.
- Albert Einstein
---
'perf stat -r -r 1000 cat /proc/interrupts' data series
Baseline v7.0-rc1
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
2.55 msec task-clock # 0.830 CPUs utilized ( +- 0.09% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
94 page-faults # 36.877 K/sec ( +- 0.05% )
5,072,386 cycles # 1.990 GHz ( +- 0.08% )
357,009 stalled-cycles-frontend # 7.04% frontend cycles idle ( +- 0.29% )
17,197,829 instructions # 3.39 insn per cycle
# 0.02 stalled cycles per insn ( +- 0.01% )
3,323,704 branches # 1.304 G/sec ( +- 0.01% )
13,773 branch-misses # 0.41% of all branches ( +- 0.09% )
0.00307221 +- 0.00000369 seconds time elapsed ( +- 0.12% )
x86/irq: Optimize interrupts decimals printing
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
2.10 msec task-clock # 0.809 CPUs utilized ( +- 0.18% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
94 page-faults # 44.720 K/sec ( +- 0.05% )
4,179,092 cycles # 1.988 GHz ( +- 0.13% )
360,135 stalled-cycles-frontend # 8.62% frontend cycles idle ( +- 0.35% )
13,174,711 instructions # 3.15 insn per cycle
# 0.03 stalled cycles per insn ( +- 0.02% )
2,596,179 branches # 1.235 G/sec ( +- 0.02% )
13,793 branch-misses # 0.53% of all branches ( +- 0.09% )
0.00259694 +- 0.00000597 seconds time elapsed ( +- 0.23% )
genirq/proc: Utilize irq_desc::tot_count to avoid evaluation
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
1.51 msec task-clock # 0.753 CPUs utilized ( +- 0.23% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
94 page-faults # 62.425 K/sec ( +- 0.05% )
2,989,307 cycles # 1.985 GHz ( +- 0.18% )
331,156 stalled-cycles-frontend # 11.08% frontend cycles idle ( +- 0.55% )
7,564,373 instructions # 2.53 insn per cycle
# 0.04 stalled cycles per insn ( +- 0.03% )
1,554,530 branches # 1.032 G/sec ( +- 0.03% )
13,531 branch-misses # 0.87% of all branches ( +- 0.10% )
0.00199971 +- 0.00000644 seconds time elapsed ( +- 0.32% )
x86/irq: Make irqstats array based
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
1.46 msec task-clock # 0.730 CPUs utilized ( +- 0.17% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
95 page-faults # 65.261 K/sec ( +- 0.05% )
2,891,394 cycles # 1.986 GHz ( +- 0.16% )
335,074 stalled-cycles-frontend # 11.59% frontend cycles idle ( +- 0.40% )
6,461,160 instructions # 2.23 insn per cycle
# 0.05 stalled cycles per insn ( +- 0.04% )
1,370,320 branches # 941.353 M/sec ( +- 0.03% )
13,409 branch-misses # 0.98% of all branches ( +- 0.10% )
0.00199471 +- 0.00000400 seconds time elapsed ( +- 0.20% )
genirq: Calculate precision only when required
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
1.42 msec task-clock # 0.732 CPUs utilized ( +- 0.23% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
95 page-faults # 67.004 K/sec ( +- 0.05% )
2,817,515 cycles # 1.987 GHz ( +- 0.23% )
327,877 stalled-cycles-frontend # 11.64% frontend cycles idle ( +- 0.73% )
6,391,647 instructions # 2.27 insn per cycle
# 0.05 stalled cycles per insn ( +- 0.04% )
1,348,883 branches # 951.380 M/sec ( +- 0.03% )
13,483 branch-misses # 1.00% of all branches ( +- 0.09% )
0.00193706 +- 0.00000522 seconds time elapsed ( +- 0.27% )
genirq/proc: Speed up /proc/interrupts iteration
Performance counter stats for 'cat /proc/interrupts' (1000 runs):
1.05 msec task-clock # 0.671 CPUs utilized ( +- 0.23% )
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
95 page-faults # 90.552 K/sec ( +- 0.05% )
2,077,859 cycles # 1.981 GHz ( +- 0.18% )
313,913 stalled-cycles-frontend # 15.11% frontend cycles idle ( +- 0.35% )
3,891,909 instructions # 1.87 insn per cycle
# 0.08 stalled cycles per insn ( +- 0.06% )
744,475 branches # 709.616 M/sec ( +- 0.06% )
12,962 branch-misses # 1.74% of all branches ( +- 0.10% )
0.00156440 +- 0.00000414 seconds time elapsed ( +- 0.26% )
---
arch/x86/Kconfig | 1
arch/x86/events/amd/core.c | 2
arch/x86/events/amd/ibs.c | 2
arch/x86/events/core.c | 2
arch/x86/events/intel/core.c | 2
arch/x86/events/intel/knc.c | 2
arch/x86/events/intel/p4.c | 2
arch/x86/events/zhaoxin/core.c | 2
arch/x86/hyperv/hv_init.c | 2
arch/x86/include/asm/hardirq.h | 76 ++--
arch/x86/include/asm/mce.h | 3
arch/x86/kernel/apic/apic.c | 4
arch/x86/kernel/apic/ipi.c | 2
arch/x86/kernel/cpu/acrn.c | 2
arch/x86/kernel/cpu/mce/amd.c | 2
arch/x86/kernel/cpu/mce/core.c | 8
arch/x86/kernel/cpu/mce/threshold.c | 2
arch/x86/kernel/cpu/mshyperv.c | 4
arch/x86/kernel/irq.c | 225 ++++----------
arch/x86/kernel/irq_work.c | 2
arch/x86/kernel/kvm.c | 2
arch/x86/kernel/nmi.c | 4
arch/x86/kernel/smp.c | 6
arch/x86/mm/tlb.c | 2
arch/x86/xen/enlighten_hvm.c | 2
arch/x86/xen/enlighten_pv.c | 2
arch/x86/xen/smp.c | 6
arch/x86/xen/smp_pv.c | 2
fs/proc/Makefile | 4
include/linux/interrupt.h | 1
include/linux/irq.h | 18 +
include/linux/irqdesc.h | 8
include/uapi/linux/irqstats.h | 27 +
kernel/irq/Kconfig | 6
kernel/irq/chip.c | 2
kernel/irq/internals.h | 24 +
kernel/irq/irqdesc.c | 67 ++--
kernel/irq/manage.c | 16 -
kernel/irq/proc.c | 556 +++++++++++++++++++++++++++++++++---
kernel/irq/settings.h | 14
40 files changed, 815 insertions(+), 301 deletions(-)
next reply other threads:[~2026-03-04 18:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 18:55 Thomas Gleixner [this message]
2026-03-04 18:55 ` [patch 01/14] x86/irq: Optimize interrupts decimals printing Thomas Gleixner
2026-03-04 18:55 ` [patch 02/14] genirq/proc: Avoid formatting zero counts in /proc/interrupts Thomas Gleixner
2026-03-09 15:59 ` Dmitry Ilvokhin
2026-03-04 18:55 ` [patch 03/14] genirq/proc: Utilize irq_desc::tot_count to avoid evaluation Thomas Gleixner
2026-03-09 16:04 ` Dmitry Ilvokhin
2026-03-04 18:55 ` [patch 04/14] x86/irq: Make irqstats array based Thomas Gleixner
2026-03-04 22:18 ` Michael Kelley
2026-03-05 15:52 ` Thomas Gleixner
2026-03-09 18:12 ` Dmitry Ilvokhin
2026-03-10 10:15 ` Thomas Gleixner
2026-03-04 18:55 ` [patch 05/14] genirq: Expose nr_irqs in core code Thomas Gleixner
2026-03-09 18:26 ` Dmitry Ilvokhin
2026-03-04 18:55 ` [patch 06/14] genirq: Cache the condition for /proc/interrupts exposure Thomas Gleixner
2026-03-16 18:46 ` Dmitry Ilvokhin
2026-03-04 18:55 ` [patch 07/14] genirq: Calculate precision only when required Thomas Gleixner
2026-03-16 18:57 ` Dmitry Ilvokhin
2026-03-04 18:56 ` [patch 08/14] genirq: Add rcuref count to struct irq_desc Thomas Gleixner
2026-03-04 18:56 ` [patch 09/14] genirq: Expose irq_find_desc_at_or_after() in core code Thomas Gleixner
2026-03-04 18:56 ` [patch 10/14] genirq/proc: Speed up /proc/interrupts iteration Thomas Gleixner
2026-03-04 18:56 ` [patch 11/14] [RFC] genirq: Cache target CPU for single CPU affinities Thomas Gleixner
2026-03-04 18:56 ` [patch 12/14] [RFC] genirq/proc: Provide binary statistic interface Thomas Gleixner
2026-03-04 18:56 ` [patch 13/14] [RFC] genirq/proc: Provide architecture specific binary statistics Thomas Gleixner
2026-03-04 18:56 ` [patch 14/14] [RFC] x86/irq: Hook up architecture specific stats Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260303150539.513068586@kernel.org \
--to=tglx@kernel.org \
--cc=d@ilvokhin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox