linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] platform wide software interrupt moderation
@ 2025-11-12 19:24 Luigi Rizzo
  2025-11-12 19:24 ` [PATCH 1/6] genirq: platform wide interrupt moderation: Documentation, Kconfig, irq_desc Luigi Rizzo
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Luigi Rizzo @ 2025-11-12 19:24 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier, Luigi Rizzo, Paolo Abeni,
	Andrew Morton, Sean Christopherson, Jacob Pan
  Cc: linux-kernel, linux-arch, Bjorn Helgaas, Willem de Bruijn,
	Luigi Rizzo

Platform wide software interrupt moderation specifically addresses a
limitation of platforms, from many vendors, whose I/O performance drops
significantly when the total rate of MSI-X interrupts is too high (e.g
1-3M intr/s depending on the platform).

Conventional interrupt moderation operates separately on each source,
hence the configuration should target the worst case. On large servers
with hundreds of interrupt sources, keeping the total rate bounded would
require delays of 100-200us; and adaptive moderation would have to reach
those delays with as little as 10K intr/s per source. These values are
unacceptable for RPC or transactional workloads.

To address this problem, this code measures efficiently the total and
per-CPU interrupt rates, so that individual moderation delays can be
adjusted based on actual global and local load. This way, the system
controls both global interrupt rates and individual CPU load, and
tunes delays so they are normally 0 or very small except during actual
local/global overload.

Configuration is easy and robust. System administrators specify the
maximum targets (moderation delay; interrupt rate; and fraction of time
spent in hardirq), and per-CPU control loops adjust actual delays to try
and keep metrics within the bounds.

There is no need for exact targets, because the system is adaptive.
Values like delay_us=100, target_irq_rate=1000000, hardirq_percent=70
are good almost everywhere.

The system does not rely on any special hardware feature except from
MSI-X Pending Bit Array (PBA), a mandatory component of MSI-X

Boot defaults are set via module parameters (/sys/module/irq_moderation
and /sys/module/${DRIVER}) or at runtime via /proc/irq/moderation, which
is also used to export statistics.  Moderation on individual irq can be
turned on/off via /proc/irq/NN/moderation .

PERFORMANCE BENEFITS:
Below are some experimental results under high load (before/after rates
are measured with conventional moderation or with this change):

- 100Gbps NIC, 32 queues: rx goes from 50-60Gbps to 92.8 Gbps (line rate).
- 200Gbps NIC, 10 VMs (total 160 queues): rx goes from 30 Gbps to 190 Gbps (line rate).
- 12 SSD, 96 queues: 4K random read goes from 6M to 20.5 MIOPS (device max).

In all cases, latency up to p95 is unaffected at low/moderate load,
even if compared with no moderation at all.

IMPLEMENTATION
- Most of the code, including module parameters and procfs hooks for
  configuration and telemetry, is in two files
  include/linux/irq_moderation.h and kernel/irq/irq_moderation.c.

- struct irq_desc is extended with a list entry and one field indicating
  whether this source should use moderation

- handle_irq_event() and sysrec_posted_msi_notification() have small
  inline hooks to track interrupts and trigger moderation as needed.

- per-CPU state is initialized via hooks in per-architecture files

- optional device driver module parameters can be added to set driver
  defaults to enable/disable moderation

Luigi Rizzo (6):
  genirq: platform wide interrupt moderation: Documentation, Kconfig,
    irq_desc
  genirq: soft_moderation: add base files, procfs hooks
  genirq: soft_moderation: activate hooks in handle_irq_event()
  genirq: soft_moderation: implement adaptive moderation
  x86/irq: soft_moderation: add support for posted_msi (intel)
  genirq: soft_moderation: implement per-driver defaults (nvme and vfio)

 Documentation/core-api/irq/index.rst          |   1 +
 Documentation/core-api/irq/irq-moderation.rst | 215 ++++++++
 arch/x86/kernel/cpu/common.c                  |   1 +
 arch/x86/kernel/irq.c                         |  12 +
 drivers/irqchip/irq-gic-v3.c                  |   2 +
 drivers/nvme/host/pci.c                       |   3 +
 drivers/vfio/pci/vfio_pci_intrs.c             |   3 +
 include/linux/interrupt.h                     |  28 +
 include/linux/irq_moderation.h                | 265 ++++++++++
 include/linux/irqdesc.h                       |   5 +
 kernel/irq/Kconfig                            |  11 +
 kernel/irq/Makefile                           |   1 +
 kernel/irq/handle.c                           |   3 +
 kernel/irq/irq_moderation.c                   | 482 ++++++++++++++++++
 kernel/irq/irqdesc.c                          |   1 +
 kernel/irq/proc.c                             |   2 +
 16 files changed, 1035 insertions(+)
 create mode 100644 Documentation/core-api/irq/irq-moderation.rst
 create mode 100644 include/linux/irq_moderation.h
 create mode 100644 kernel/irq/irq_moderation.c

-- 
2.51.2.1041.gc1ab5b90ca-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-11-14  8:28 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-12 19:24 [PATCH 0/6] platform wide software interrupt moderation Luigi Rizzo
2025-11-12 19:24 ` [PATCH 1/6] genirq: platform wide interrupt moderation: Documentation, Kconfig, irq_desc Luigi Rizzo
2025-11-13  8:17   ` Thomas Gleixner
2025-11-13  9:44   ` Thomas Gleixner
2025-11-13 13:25   ` Marc Zyngier
2025-11-13 13:33     ` Luigi Rizzo
2025-11-13 14:42       ` Marc Zyngier
2025-11-13 14:55         ` Luigi Rizzo
2025-11-13 19:02           ` Marc Zyngier
2025-11-12 19:24 ` [PATCH 2/6] genirq: soft_moderation: add base files, procfs hooks Luigi Rizzo
2025-11-13  9:29   ` Thomas Gleixner
2025-11-13 10:24     ` Thomas Gleixner
2025-11-13 22:42       ` Luigi Rizzo
2025-11-13 22:32     ` Luigi Rizzo
2025-11-13  9:40   ` Thomas Gleixner
2025-11-12 19:24 ` [PATCH 3/6] genirq: soft_moderation: activate hooks in handle_irq_event() Luigi Rizzo
2025-11-13  9:45   ` Thomas Gleixner
2025-11-14  8:27     ` Luigi Rizzo
2025-11-12 19:24 ` [PATCH 4/6] genirq: soft_moderation: implement adaptive moderation Luigi Rizzo
2025-11-13 10:15   ` Thomas Gleixner
2025-11-12 19:24 ` [PATCH 5/6] x86/irq: soft_moderation: add support for posted_msi (intel) Luigi Rizzo
2025-11-12 19:24 ` [PATCH 6/6] genirq: soft_moderation: implement per-driver defaults (nvme and vfio) Luigi Rizzo
2025-11-13 10:18   ` Thomas Gleixner
2025-11-13 10:42     ` Luigi Rizzo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).