From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, Neil Horman <nhorman@tuxdriver.com>,
Radu Rendec <radu@rendec.net>
Subject: Re: [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface
Date: Wed, 1 Apr 2026 16:42:01 +0000 [thread overview]
Message-ID: <ac1K2YdCv-IngSi7@shell.ilvokhin.com> (raw)
In-Reply-To: <20260320132102.841834115@kernel.org>
On Fri, Mar 20, 2026 at 02:22:24PM +0100, Thomas Gleixner wrote:
> /proc/interrupts is expensive to evaluate for monitoring because:
>
> - it is text based and contains a lot of information which is not
> relevant for interrupt frequency analysis. Due to the extra information
> like chip name, hardware interrupt number, interrupt action names, it
> has to take the interrupt descriptor lock to output those items into
> the seq_file buffer. That obviously interferes with high frequency
> interrupt workloads.
>
> - it contains both device interrupts, per CPU and architecture specific
> interrupt counters without being able to look at them separately. The
> file is seekable by some definition of seekable as the position can
> change when interrupts are requested or freed, so the data has to be
> read completely to get a coherent picture.
>
> - it emits records for requested interrupts even if their interrupt count
> is zero.
>
> - it always prints the per CPU counters even if all but one of them are
> zero.
>
> - converting numbers to text and then parsing the text back to numbers in
> user space is a pretty wasteful exercise
>
> Provide a new interface which addresses the above pain points:
>
> 1) The interface is binary and emits variable length records per
> interrupt. Each record starts with a header containing the interrupt
> number and the number of data entries following the header. The data
> entries consist of a CPU number and count pair.
>
> 2) Interrupts with a total count of zero are skipped and produce no
> output at all.
>
> 3) Interrupts which have a single CPU affinity either due to a restricted
> affinity mask or due to the underlying interrupt chip restricting a
> mask to a single CPU target emit only one data entry.
>
> That means they are not emitting the stale counts on previous target
> CPUs but they are not really interesting for interrupt frequency
> analysis as they are not changing and therefore pointless for
> accounting.
>
> 4) The interface separates device interrupts, per CPU interrupts and
> architecture specific interrupts.
>
> Per CPU and architecture specific interrupts can only be monitored,
> while device interrupts can also be steered by changing the affinity
> unless they are affinity managed by the kernel.
>
> Per CPU interrupts are only available on architectures, e.g. ARM64,
> which use the regular interrupt descriptor mechanism for per CPU
> interrupt handling.
>
> Architectures which have their own mechanics, e.g. x86, do not enable
> and provide the per CPU interface as those interrupts are covered by
> the architecture specific accounting.
>
> 5) The readout is fully lockless so it does not interfere with concurrent
> interrupt handling.
>
> 6) Seek is restricted to seek(fd, 0, SEEK_SET) as that's the only
> operation which makes sense due to the variable record length and the
> dynamics of interrupt request/free operations which influence the
> position of the records in the output. For all other seek()
> invocations return the current file position, which makes e.g. python
> happy as an error code causes the file open checks to mark the
> resulting file object non-seekable.
>
> Implement support for /proc/irq/device_stats and /proc/irq/percpu_stats.
>
> The support for architecture specific interrupt statistics is added in a
> separate step.
>
> Reading /proc/irq/device_stats on a 256 CPU x86 machine with 83 requested
> interrupts produces 13 records due to skipping zero count interrupts. It
> results in 13 * 16 = 208 bytes of data as all device interrupts on x86 are
> single CPU targeted. That readout takes ~8us time in the kernel, while the
> full /proc/interrupts readout takes about 360us.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> ---
> include/uapi/linux/irqstats.h | 27 +++
> kernel/irq/Kconfig | 3
> kernel/irq/proc.c | 314 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 344 insertions(+)
>
> --- /dev/null
> +++ b/include/uapi/linux/irqstats.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +#ifndef LINUX_UAPI_IRQSTATS_H
> +#define LINUX_UAPI_IRQSTATS_H
> +
> +/**
> + * irq_proc_stat_cpu - Data record for /proc/irq/stats
> + * @cpu: The CPU associated to @cnt
> + * @cnt: The count assiciated to @cpu
nit: s/assiciated/associated/
> + */
> +struct irq_proc_stat_cpu {
> + unsigned int cpu;
> + unsigned int cnt;
> +};
nit: UAPI structs should use __u32 instead of unsigned int.
> +
> +/**
> + * irq_proc_stat_data - Data header for /proc/irq/stats
> + * @irqnr: The interrupt number
> + * @entries: The number of records (max. nr_cpu_ids)
> + * @pcpu: Runtime sized array of per CPU stat records
> + */
> +struct irq_proc_stat_data {
> + unsigned int irqnr;
> + unsigned int entries;
> + struct irq_proc_stat_cpu pcpu[];
> +};
Same here.
Also, this struct has no extensibility mechanism. If irq_proc_stat_cpu
ever needs a new field, there's no way for userspace to detect the
layout change.
A __u32 entry_size set to sizeof(struct irq_proc_stat_cpu) would let
userspace stride through entries safely, even if the struct grows later.
> +
> +#endif
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -18,6 +18,9 @@ config GENERIC_IRQ_SHOW
> config GENERIC_IRQ_SHOW_LEVEL
> bool
>
> +config GENERIC_IRQ_STATS_PERCPU
> + bool
> +
[...]
> +static bool irq_stat_update_one(struct irq_proc_stat *s)
> +{
> + struct irq_proc_stat_data *d = s->data;
> +
> + if (IS_ENABLED(CONFIG_GENERIC_IRQ_PERCPU_STATS) && s->percpu)
> + irq_percpu_stat_update_one(s);
Should be GENERIC_IRQ_STATS_PERCPU, PERCPU and STATS are swapped with
each other.
next prev parent reply other threads:[~2026-04-01 16:42 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 13:21 [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Thomas Gleixner
2026-03-20 13:21 ` [patch v2 01/14] x86/irq: Optimize interrupts decimals printing Thomas Gleixner
2026-03-21 16:10 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 02/14] genirq/proc: Avoid formatting zero counts in /proc/interrupts Thomas Gleixner
2026-03-21 16:38 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 03/14] genirq/proc: Utilize irq_desc::tot_count to avoid evaluation Thomas Gleixner
2026-03-22 19:59 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 04/14] x86/irq: Make irqstats array based Thomas Gleixner
2026-03-20 16:39 ` Michael Kelley
2026-03-21 16:38 ` Thomas Gleixner
2026-03-21 20:32 ` Michael Kelley
2026-03-23 19:24 ` Radu Rendec
2026-03-24 19:54 ` Thomas Gleixner
2026-03-24 20:21 ` Thomas Gleixner
2026-03-24 20:32 ` Radu Rendec
2026-03-25 19:20 ` Radu Rendec
2026-03-25 22:52 ` Thomas Gleixner
2026-03-25 22:54 ` Florian Fainelli
2026-03-26 10:29 ` Thomas Gleixner
2026-03-26 23:00 ` Florian Fainelli
2026-03-27 11:03 ` Thomas Gleixner
2026-03-26 12:34 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 05/14] genirq: Expose nr_irqs in core code Thomas Gleixner
2026-03-23 19:48 ` Radu Rendec
2026-03-23 21:27 ` Thomas Gleixner
2026-03-20 13:21 ` [patch v2 06/14] genirq: Cache the condition for /proc/interrupts exposure Thomas Gleixner
2026-03-23 20:58 ` Radu Rendec
2026-03-24 20:31 ` Thomas Gleixner
2026-03-24 20:36 ` Radu Rendec
2026-03-20 13:21 ` [patch v2 07/14] genirq: Calculate precision only when required Thomas Gleixner
2026-03-25 19:47 ` Radu Rendec
2026-03-20 13:22 ` [patch v2 08/14] genirq: Add rcuref count to struct irq_desc Thomas Gleixner
2026-03-26 18:43 ` Dmitry Ilvokhin
2026-03-20 13:22 ` [patch v2 09/14] genirq: Expose irq_find_desc_at_or_after() in core code Thomas Gleixner
2026-03-26 19:13 ` Dmitry Ilvokhin
2026-03-26 21:11 ` Thomas Gleixner
2026-03-26 21:25 ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 10/14] genirq/proc: Speed up /proc/interrupts iteration Thomas Gleixner
2026-03-20 13:22 ` [patch v2 11/14] [RFC] genirq: Cache target CPU for single CPU affinities Thomas Gleixner
2026-03-20 13:22 ` [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface Thomas Gleixner
2026-04-01 16:42 ` Dmitry Ilvokhin [this message]
2026-03-20 13:22 ` [patch v2 13/14] [RFC] genirq/proc: Provide architecture specific binary statistics Thomas Gleixner
2026-04-01 16:51 ` Dmitry Ilvokhin
2026-04-01 19:33 ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 14/14] [RFC] x86/irq: Hook up architecture specific stats Thomas Gleixner
2026-03-20 16:45 ` [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac1K2YdCv-IngSi7@shell.ilvokhin.com \
--to=d@ilvokhin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=radu@rendec.net \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.