* + sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch added to mm-new branch
@ 2026-06-25 3:29 Andrew Morton
2026-06-25 8:44 ` Peter Zijlstra
0 siblings, 1 reply; 2+ messages in thread
From: Andrew Morton @ 2026-06-25 3:29 UTC (permalink / raw)
To: mm-commits, vschneid, vincent.guittot, surenb, shakeel.butt,
rostedt, riel, peterz, mingo, mgorman, kprateek.nayak, juri.lelli,
hannes, dietmar.eggemann, david, chengming.zhou, bsegall,
usama.arif, akpm
The patch titled
Subject: sched/psi: skip irqtime accounting when no new irq time has elapsed
has been added to the -mm mm-new branch. Its filename is
sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
If a few days of testing in mm-new is successful, the patch will me moved
into mm.git's mm-unstable branch, which is included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Usama Arif <usama.arif@linux.dev>
Subject: sched/psi: skip irqtime accounting when no new irq time has elapsed
Date: Wed, 17 Jun 2026 10:50:06 -0700
psi_account_irqtime() reads irq_time_read() into a per-rq cumulative
counter and only bails out when the delta vs. the previously accounted
amount is negative. A delta of exactly zero is treated as "do the work":
psi_write_begin() is taken, cpu_clock(cpu) is read (which on x86 ends up
in native_sched_clock() / rdtsc) and the cgroup ancestor chain is walked
to add zero to every group's PSI_IRQ_FULL bucket.
The zero-delta case is common in practice -- it fires every time a context
switch crosses a PSI group boundary on a CPU that hasn't serviced an
interrupt between the two switches.
Measured on a 176-thread AMD EPYC 9D64 server running a compute intensive
production workload, instrumented with bpftrace over a 30s window
(irq_time_read() read directly from the per-CPU cpu_irqtime so that delta
== 0 and delta < 0 could be separated):
@total 17,229,311 (100.0%)
@ret_curr_swapper 7,864,195 ( 45.6%) curr->pid == 0
@ret_samegrp 323,299 ( 1.9%) same cgroup as prev
@reached_delta 9,041,817 ( 52.5%)
@delta_positive 6,358,192 ( 36.9%) real work
@delta_zero 2,683,625 ( 15.6%) work wasted (this patch)
@delta_negative (0) ( 0.0%) monotonic clock
So 15.6% of all psi_account_irqtime() calls - and 29.7% of the calls that
get past the early returns - hit the delta == 0 case; delta < 0 did not
occur once in the 30s window. Under the current code each of those ~89k
calls per second performs the full seqcount write + cpu_clock() read +
cgroup-chain walk just to add 0 to every group's PSI_IRQ_FULL counter.
Extend the early-return to also cover delta == 0. rq->psi_irq_time does
not need updating in that case (it would store the same value back) and no
PSI bucket would change. The existing behaviour for delta > 0 is
untouched.
Link: https://lore.kernel.org/20260617175219.2494857-2-usama.arif@linux.dev
Signed-off-by: Usama Arif <usama.arif@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
kernel/sched/psi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/psi.c~sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed
+++ a/kernel/sched/psi.c
@@ -1023,7 +1023,7 @@ void psi_account_irqtime(struct rq *rq,
irq = irq_time_read(cpu);
delta = (s64)(irq - rq->psi_irq_time);
- if (delta < 0)
+ if (delta <= 0)
return;
rq->psi_irq_time = irq;
_
Patches currently in -mm which might be from usama.arif@linux.dev are
sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: + sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch added to mm-new branch
2026-06-25 3:29 + sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch added to mm-new branch Andrew Morton
@ 2026-06-25 8:44 ` Peter Zijlstra
0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2026-06-25 8:44 UTC (permalink / raw)
To: Andrew Morton
Cc: mm-commits, vschneid, vincent.guittot, surenb, shakeel.butt,
rostedt, riel, mingo, mgorman, kprateek.nayak, juri.lelli, hannes,
dietmar.eggemann, david, chengming.zhou, bsegall, usama.arif
On Wed, Jun 24, 2026 at 08:29:51PM -0700, Andrew Morton wrote:
>
> The patch titled
> Subject: sched/psi: skip irqtime accounting when no new irq time has elapsed
> has been added to the -mm mm-new branch. Its filename is
> sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch
>
Andrew, seriously.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-25 8:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 3:29 + sched-psi-skip-irqtime-accounting-when-no-new-irq-time-has-elapsed.patch added to mm-new branch Andrew Morton
2026-06-25 8:44 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.