From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
david@kernel.org, linux-mm@kvack.org, bsegall@google.com,
dietmar.eggemann@arm.com, hannes@cmpxchg.org,
juri.lelli@redhat.com, kprateek.nayak@amd.com,
linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com,
peterz@infradead.org, rostedt@goodmis.org, surenb@google.com,
vincent.guittot@linaro.org, vschneid@redhat.com
Cc: shakeel.butt@linux.dev, riel@surriel.com, kernel-team@meta.com,
Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 1/1] sched/psi: skip irqtime accounting when no new irq time has elapsed
Date: Wed, 17 Jun 2026 10:50:06 -0700 [thread overview]
Message-ID: <20260617175219.2494857-2-usama.arif@linux.dev> (raw)
In-Reply-To: <20260617175219.2494857-1-usama.arif@linux.dev>
psi_account_irqtime() reads irq_time_read() into a per-rq cumulative
counter and only bails out when the delta vs. the previously accounted
amount is negative. A delta of exactly zero is treated as "do the
work": psi_write_begin() is taken, cpu_clock(cpu) is read (which on
x86 ends up in native_sched_clock() / rdtsc) and the cgroup ancestor
chain is walked to add zero to every group's PSI_IRQ_FULL bucket.
The zero-delta case is common in practice -- it fires every time a
context switch crosses a PSI group boundary on a CPU that hasn't
serviced an interrupt between the two switches.
Measured on a 176-thread AMD EPYC 9D64 server running a compute
intensive production workload, instrumented with bpftrace over a 30s
window (irq_time_read() read directly from the per-CPU cpu_irqtime so
that delta == 0 and delta < 0 could be separated):
@total 17,229,311 (100.0%)
@ret_curr_swapper 7,864,195 ( 45.6%) curr->pid == 0
@ret_samegrp 323,299 ( 1.9%) same cgroup as prev
@reached_delta 9,041,817 ( 52.5%)
@delta_positive 6,358,192 ( 36.9%) real work
@delta_zero 2,683,625 ( 15.6%) work wasted (this patch)
@delta_negative (0) ( 0.0%) monotonic clock
So 15.6 % of all psi_account_irqtime() calls - and 29.7 % of the
calls that get past the early returns - hit the delta == 0 case;
delta < 0 did not occur once in the 30 s window. Under the current
code each of those ~89 k calls per second performs the full seqcount
write + cpu_clock() read + cgroup-chain walk just to add 0 to every
group's PSI_IRQ_FULL counter.
Extend the early-return to also cover delta == 0. rq->psi_irq_time
does not need updating in that case (it would store the same value
back) and no PSI bucket would change. The existing behaviour for
delta > 0 is untouched.
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
kernel/sched/psi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index d9c9d9480a45..848955f8893d 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1023,7 +1023,7 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st
irq = irq_time_read(cpu);
delta = (s64)(irq - rq->psi_irq_time);
- if (delta < 0)
+ if (delta <= 0)
return;
rq->psi_irq_time = irq;
--
2.53.0-Meta
prev parent reply other threads:[~2026-06-17 17:52 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 17:50 [PATCH 0/1] sched/psi: skip irqtime accounting when no new irq time has elapsed Usama Arif
2026-06-17 17:50 ` Usama Arif [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617175219.2494857-2-usama.arif@linux.dev \
--to=usama.arif@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bsegall@google.com \
--cc=david@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.