public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Neil Horman <nhorman@tuxdriver.com>,
	Radu Rendec <radu@rendec.net>
Subject: Re: [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface
Date: Wed, 1 Apr 2026 16:42:01 +0000	[thread overview]
Message-ID: <ac1K2YdCv-IngSi7@shell.ilvokhin.com> (raw)
In-Reply-To: <20260320132102.841834115@kernel.org>

On Fri, Mar 20, 2026 at 02:22:24PM +0100, Thomas Gleixner wrote:
> /proc/interrupts is expensive to evaluate for monitoring because:
> 
>   - it is text based and contains a lot of information which is not
>     relevant for interrupt frequency analysis. Due to the extra information
>     like chip name, hardware interrupt number, interrupt action names, it
>     has to take the interrupt descriptor lock to output those items into
>     the seq_file buffer. That obviously interferes with high frequency
>     interrupt workloads.
> 
>   - it contains both device interrupts, per CPU and architecture specific
>     interrupt counters without being able to look at them separately. The
>     file is seekable by some definition of seekable as the position can
>     change when interrupts are requested or freed, so the data has to be
>     read completely to get a coherent picture.
> 
>   - it emits records for requested interrupts even if their interrupt count
>     is zero.
> 
>   - it always prints the per CPU counters even if all but one of them are
>     zero.
> 
>   - converting numbers to text and then parsing the text back to numbers in
>     user space is a pretty wasteful exercise
> 
> Provide a new interface which addresses the above pain points:
> 
>   1) The interface is binary and emits variable length records per
>      interrupt. Each record starts with a header containing the interrupt
>      number and the number of data entries following the header. The data
>      entries consist of a CPU number and count pair.
> 
>   2) Interrupts with a total count of zero are skipped and produce no
>      output at all.
> 
>   3) Interrupts which have a single CPU affinity either due to a restricted
>      affinity mask or due to the underlying interrupt chip restricting a
>      mask to a single CPU target emit only one data entry.
> 
>      That means they are not emitting the stale counts on previous target
>      CPUs but they are not really interesting for interrupt frequency
>      analysis as they are not changing and therefore pointless for
>      accounting.
> 
>   4) The interface separates device interrupts, per CPU interrupts and
>      architecture specific interrupts.
> 
>      Per CPU and architecture specific interrupts can only be monitored,
>      while device interrupts can also be steered by changing the affinity
>      unless they are affinity managed by the kernel.
> 
>      Per CPU interrupts are only available on architectures, e.g. ARM64,
>      which use the regular interrupt descriptor mechanism for per CPU
>      interrupt handling.
> 
>      Architectures which have their own mechanics, e.g. x86, do not enable
>      and provide the per CPU interface as those interrupts are covered by
>      the architecture specific accounting.
> 
>   5) The readout is fully lockless so it does not interfere with concurrent
>      interrupt handling.
> 
>   6) Seek is restricted to seek(fd, 0, SEEK_SET) as that's the only
>      operation which makes sense due to the variable record length and the
>      dynamics of interrupt request/free operations which influence the
>      position of the records in the output. For all other seek()
>      invocations return the current file position, which makes e.g. python
>      happy as an error code causes the file open checks to mark the
>      resulting file object non-seekable.
> 
> Implement support for /proc/irq/device_stats and /proc/irq/percpu_stats.
> 
> The support for architecture specific interrupt statistics is added in a
> separate step.
> 
> Reading /proc/irq/device_stats on a 256 CPU x86 machine with 83 requested
> interrupts produces 13 records due to skipping zero count interrupts. It
> results in 13 * 16 = 208 bytes of data as all device interrupts on x86 are
> single CPU targeted. That readout takes ~8us time in the kernel, while the
> full /proc/interrupts readout takes about 360us.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> ---
>  include/uapi/linux/irqstats.h |   27 +++
>  kernel/irq/Kconfig            |    3 
>  kernel/irq/proc.c             |  314 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 344 insertions(+)
> 
> --- /dev/null
> +++ b/include/uapi/linux/irqstats.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +#ifndef LINUX_UAPI_IRQSTATS_H
> +#define LINUX_UAPI_IRQSTATS_H
> +
> +/**
> + * irq_proc_stat_cpu - Data record for /proc/irq/stats
> + * @cpu:	The CPU associated to @cnt
> + * @cnt:	The count assiciated to @cpu

nit: s/assiciated/associated/

> + */
> +struct irq_proc_stat_cpu {
> +	unsigned int	cpu;
> +	unsigned int	cnt;
> +};

nit: UAPI structs should use __u32 instead of unsigned int.

> +
> +/**
> + * irq_proc_stat_data - Data header for /proc/irq/stats
> + * @irqnr:	The interrupt number
> + * @entries:	The number of records (max. nr_cpu_ids)
> + * @pcpu:	Runtime sized array of per CPU stat records
> + */
> +struct irq_proc_stat_data {
> +	unsigned int			irqnr;
> +	unsigned int			entries;
> +	struct irq_proc_stat_cpu	pcpu[];
> +};

Same here.

Also, this struct has no extensibility mechanism. If irq_proc_stat_cpu
ever needs a new field, there's no way for userspace to detect the
layout change.

A __u32 entry_size set to sizeof(struct irq_proc_stat_cpu) would let
userspace stride through entries safely, even if the struct grows later.

> +
> +#endif
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -18,6 +18,9 @@ config GENERIC_IRQ_SHOW
>  config GENERIC_IRQ_SHOW_LEVEL
>         bool
>  
> +config GENERIC_IRQ_STATS_PERCPU
> +       bool
> +

[...]

> +static bool irq_stat_update_one(struct irq_proc_stat *s)
> +{
> +	struct irq_proc_stat_data *d = s->data;
> +
> +	if (IS_ENABLED(CONFIG_GENERIC_IRQ_PERCPU_STATS) && s->percpu)
> +		irq_percpu_stat_update_one(s);

Should be GENERIC_IRQ_STATS_PERCPU, PERCPU and STATS are swapped with
each other.

  reply	other threads:[~2026-04-01 16:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 13:21 [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Thomas Gleixner
2026-03-20 13:21 ` [patch v2 01/14] x86/irq: Optimize interrupts decimals printing Thomas Gleixner
2026-03-21 16:10   ` Radu Rendec
2026-03-20 13:21 ` [patch v2 02/14] genirq/proc: Avoid formatting zero counts in /proc/interrupts Thomas Gleixner
2026-03-21 16:38   ` Radu Rendec
2026-03-20 13:21 ` [patch v2 03/14] genirq/proc: Utilize irq_desc::tot_count to avoid evaluation Thomas Gleixner
2026-03-22 19:59   ` Radu Rendec
2026-03-20 13:21 ` [patch v2 04/14] x86/irq: Make irqstats array based Thomas Gleixner
2026-03-20 16:39   ` Michael Kelley
2026-03-21 16:38     ` Thomas Gleixner
2026-03-21 20:32       ` Michael Kelley
2026-03-23 19:24   ` Radu Rendec
2026-03-24 19:54     ` Thomas Gleixner
2026-03-24 20:21       ` Thomas Gleixner
2026-03-24 20:32         ` Radu Rendec
2026-03-25 19:20           ` Radu Rendec
2026-03-25 22:52             ` Thomas Gleixner
2026-03-25 22:54               ` Florian Fainelli
2026-03-26 10:29                 ` Thomas Gleixner
2026-03-26 23:00                   ` Florian Fainelli
2026-03-27 11:03                     ` Thomas Gleixner
2026-03-26 12:34               ` Radu Rendec
2026-03-20 13:21 ` [patch v2 05/14] genirq: Expose nr_irqs in core code Thomas Gleixner
2026-03-23 19:48   ` Radu Rendec
2026-03-23 21:27     ` Thomas Gleixner
2026-03-20 13:21 ` [patch v2 06/14] genirq: Cache the condition for /proc/interrupts exposure Thomas Gleixner
2026-03-23 20:58   ` Radu Rendec
2026-03-24 20:31     ` Thomas Gleixner
2026-03-24 20:36       ` Radu Rendec
2026-03-20 13:21 ` [patch v2 07/14] genirq: Calculate precision only when required Thomas Gleixner
2026-03-25 19:47   ` Radu Rendec
2026-03-20 13:22 ` [patch v2 08/14] genirq: Add rcuref count to struct irq_desc Thomas Gleixner
2026-03-26 18:43   ` Dmitry Ilvokhin
2026-03-20 13:22 ` [patch v2 09/14] genirq: Expose irq_find_desc_at_or_after() in core code Thomas Gleixner
2026-03-26 19:13   ` Dmitry Ilvokhin
2026-03-26 21:11     ` Thomas Gleixner
2026-03-26 21:25       ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 10/14] genirq/proc: Speed up /proc/interrupts iteration Thomas Gleixner
2026-03-20 13:22 ` [patch v2 11/14] [RFC] genirq: Cache target CPU for single CPU affinities Thomas Gleixner
2026-03-20 13:22 ` [patch v2 12/14] [RFC] genirq/proc: Provide binary statistic interface Thomas Gleixner
2026-04-01 16:42   ` Dmitry Ilvokhin [this message]
2026-03-20 13:22 ` [patch v2 13/14] [RFC] genirq/proc: Provide architecture specific binary statistics Thomas Gleixner
2026-04-01 16:51   ` Dmitry Ilvokhin
2026-04-01 19:33     ` Thomas Gleixner
2026-03-20 13:22 ` [patch v2 14/14] [RFC] x86/irq: Hook up architecture specific stats Thomas Gleixner
2026-03-20 16:45 ` [patch v2 00/14] Improve /proc/interrupts further and add a binary interface Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac1K2YdCv-IngSi7@shell.ilvokhin.com \
    --to=d@ilvokhin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=radu@rendec.net \
    --cc=tglx@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox