From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, Neil Horman <nhorman@tuxdriver.com>,
Radu Rendec <radu@rendec.net>
Subject: Re: [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface
Date: Wed, 1 Apr 2026 16:42:01 +0000 [thread overview]
Message-ID: <ac1K2YdCv-IngSi7@shell.ilvokhin.com> (raw)
In-Reply-To: <20260320132102.841834115@kernel.org>
On Fri, Mar 20, 2026 at 02:22:24PM +0100, Thomas Gleixner wrote:
> /proc/interrupts is expensive to evaluate for monitoring because:
>
> - it is text based and contains a lot of information which is not
> relevant for interrupt frequency analysis. Due to the extra information
> like chip name, hardware interrupt number, interrupt action names, it
> has to take the interrupt descriptor lock to output those items into
> the seq_file buffer. That obviously interferes with high frequency
> interrupt workloads.
>
> - it contains both device interrupts, per CPU and architecture specific
> interrupt counters without being able to look at them separately. The
> file is seekable by some definition of seekable as the position can
> change when interrupts are requested or freed, so the data has to be
> read completely to get a coherent picture.
>
> - it emits records for requested interrupts even if their interrupt count
> is zero.
>
> - it always prints the per CPU counters even if all but one of them are
> zero.
>
> - converting numbers to text and then parsing the text back to numbers in
> user space is a pretty wasteful exercise
>
> Provide a new interface which addresses the above pain points:
>
> 1) The interface is binary and emits variable length records per
> interrupt. Each record starts with a header containing the interrupt
> number and the number of data entries following the header. The data
> entries consist of a CPU number and count pair.
>
> 2) Interrupts with a total count of zero are skipped and produce no
> output at all.
>
> 3) Interrupts which have a single CPU affinity either due to a restricted
> affinity mask or due to the underlying interrupt chip restricting a
> mask to a single CPU target emit only one data entry.
>
> That means they are not emitting the stale counts on previous target
> CPUs but they are not really interesting for interrupt frequency
> analysis as they are not changing and therefore pointless for
> accounting.
>
> 4) The interface separates device interrupts, per CPU interrupts and
> architecture specific interrupts.
>
> Per CPU and architecture specific interrupts can only be monitored,
> while device interrupts can also be steered by changing the affinity
> unless they are affinity managed by the kernel.
>
> Per CPU interrupts are only available on architectures, e.g. ARM64,
> which use the regular interrupt descriptor mechanism for per CPU
> interrupt handling.
>
> Architectures which have their own mechanics, e.g. x86, do not enable
> and provide the per CPU interface as those interrupts are covered by
> the architecture specific accounting.
>
> 5) The readout is fully lockless so it does not interfere with concurrent
> interrupt handling.
>
> 6) Seek is restricted to seek(fd, 0, SEEK_SET) as that's the only
> operation which makes sense due to the variable record length and the
> dynamics of interrupt request/free operations which influence the
> position of the records in the output. For all other seek()
> invocations return the current file position, which makes e.g. python
> happy as an error code causes the file open checks to mark the
> resulting file object non-seekable.
>
> Implement support for /proc/irq/device_stats and /proc/irq/percpu_stats.
>
> The support for architecture specific interrupt statistics is added in a
> separate step.
>
> Reading /proc/irq/device_stats on a 256 CPU x86 machine with 83 requested
> interrupts produces 13 records due to skipping zero count interrupts. It
> results in 13 * 16 = 208 bytes of data as all device interrupts on x86 are
> single CPU targeted. That readout takes ~8us time in the kernel, while the
> full /proc/interrupts readout takes about 360us.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> ---
> include/uapi/linux/irqstats.h | 27 +++
> kernel/irq/Kconfig | 3
> kernel/irq/proc.c | 314 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 344 insertions(+)
>
> --- /dev/null
> +++ b/include/uapi/linux/irqstats.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +#ifndef LINUX_UAPI_IRQSTATS_H
> +#define LINUX_UAPI_IRQSTATS_H
> +
> +/**
> + * irq_proc_stat_cpu - Data record for /proc/irq/stats
> + * @cpu: The CPU associated to @cnt
> + * @cnt: The count assiciated to @cpu
nit: s/assiciated/associated/
> + */
> +struct irq_proc_stat_cpu {
> + unsigned int cpu;
> + unsigned int cnt;
> +};
nit: UAPI structs should use __u32 instead of unsigned int.
> +
> +/**
> + * irq_proc_stat_data - Data header for /proc/irq/stats
> + * @irqnr: The interrupt number
> + * @entries: The number of records (max. nr_cpu_ids)
> + * @pcpu: Runtime sized array of per CPU stat records
> + */
> +struct irq_proc_stat_data {
> + unsigned int irqnr;
> + unsigned int entries;
> + struct irq_proc_stat_cpu pcpu[];
> +};
Same here.
Also, this struct has no extensibility mechanism. If irq_proc_stat_cpu
ever needs a new field, there's no way for userspace to detect the
layout change.
A __u32 entry_size set to sizeof(struct irq_proc_stat_cpu) would let
userspace stride through entries safely, even if the struct grows later.
> +
> +#endif
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -18,6 +18,9 @@ config GENERIC_IRQ_SHOW
> config GENERIC_IRQ_SHOW_LEVEL
> bool
>
> +config GENERIC_IRQ_STATS_PERCPU
> + bool
> +
[...]
> +static bool irq_stat_update_one(struct irq_proc_stat *s)
> +{
> + struct irq_proc_stat_data *d = s->data;
> +
> + if (IS_ENABLED(CONFIG_GENERIC_IRQ_PERCPU_STATS) && s->percpu)
> + irq_percpu_stat_update_one(s);
Should be GENERIC_IRQ_STATS_PERCPU, PERCPU and STATS are swapped with
each other.
next prev parent reply other threads:[~2026-04-01 16:42 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 13:21 [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Thomas Gleixner
2026-03-20 13:21 ` [patch v2 01/14] x86/irq: Optimize interrupts decimals printing Thomas Gleixner
2026-03-21 16:10 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 02/14] genirq/proc: Avoid formatting zero counts in /proc/interrupts Thomas Gleixner
2026-03-21 16:38 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 03/14] genirq/proc: Utilize irq_desc::tot_count to avoid evaluation Thomas Gleixner
2026-03-22 19:59 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 04/14] x86/irq: Make irqstats array based Thomas Gleixner
2026-03-20 16:39 ` Michael Kelley
2026-03-21 16:38 ` Thomas Gleixner
2026-03-21 20:32 ` Michael Kelley
2026-03-23 19:24 ` Radu Rendec
2026-03-24 19:54 ` Thomas Gleixner
2026-03-24 20:21 ` Thomas Gleixner
2026-03-24 20:32 ` Radu Rendec
2026-03-25 19:20 ` Radu Rendec
2026-03-25 22:52 ` Thomas Gleixner
2026-03-25 22:54 ` Florian Fainelli
2026-03-26 10:29 ` Thomas Gleixner
2026-03-26 23:00 ` Florian Fainelli
2026-03-27 11:03 ` Thomas Gleixner
2026-03-26 12:34 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 05/14] genirq: Expose nr_irqs in core code Thomas Gleixner
2026-03-23 19:48 ` Radu Rendec
2026-03-23 21:27 ` Thomas Gleixner
2026-03-20 13:21 ` [patch v2 06/14] genirq: Cache the condition for /proc/interrupts exposure Thomas Gleixner
2026-03-23 20:58 ` Radu Rendec
2026-03-24 20:31 ` Thomas Gleixner
2026-03-24 20:36 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 07/14] genirq: Calculate precision only when required Thomas Gleixner
2026-03-25 19:47 ` Radu Rendec
2026-03-20 13:22 ` [patch v2 08/14] genirq: Add rcuref count to struct irq_desc Thomas Gleixner
2026-03-26 18:43 ` Dmitry Ilvokhin
2026-03-20 13:22 ` [patch v2 09/14] genirq: Expose irq_find_desc_at_or_after() in core code Thomas Gleixner
2026-03-26 19:13 ` Dmitry Ilvokhin
2026-03-26 21:11 ` Thomas Gleixner
2026-03-26 21:25 ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 10/14] genirq/proc: Speed up /proc/interrupts iteration Thomas Gleixner
2026-03-20 13:22 ` [patch v2 11/14] [RFC] genirq: Cache target CPU for single CPU affinities Thomas Gleixner
2026-03-20 13:22 ` [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface Thomas Gleixner
2026-04-01 16:42 ` Dmitry Ilvokhin [this message]
2026-03-20 13:22 ` [patch v2 13/14] [RFC] genirq/proc: Provide architecture specific binary statistics Thomas Gleixner
2026-04-01 16:51 ` Dmitry Ilvokhin
2026-04-01 19:33 ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 14/14] [RFC] x86/irq: Hook up architecture specific stats Thomas Gleixner
2026-03-20 16:45 ` [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac1K2YdCv-IngSi7@shell.ilvokhin.com \
--to=d@ilvokhin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=radu@rendec.net \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox