From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
Kieran Bingham <kbingham@kernel.org>,
Ben Segall <bsegall@google.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Xin Zhao <jackzxcui1989@163.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
linuxppc-dev@lists.ozlabs.org,
"Paul E . McKenney" <paulmck@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Uladzislau Rezki <urezki@gmail.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Heiko Carstens <hca@linux.ibm.com>,
linux-pm@vger.kernel.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Nicholas Piggin <npiggin@gmail.com>,
linux-s390@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH 00/15] tick/sched: Refactor idle cputime accounting
Date: Fri, 16 Jan 2026 15:51:53 +0100 [thread overview]
Message-ID: <20260116145208.87445-1-frederic@kernel.org> (raw)
Hi,
After the issue reported here:
https://lore.kernel.org/all/20251210083135.3993562-1-jackzxcui1989@163.com/
It occurs that the idle cputime accounting is a big mess that
accumulates within two concurrent statistics, each having their own
shortcomings:
* The accounting for online CPUs which is based on the delta between
tick_nohz_start_idle() and tick_nohz_stop_idle().
Pros:
- Works when the tick is off
- Has nsecs granularity
Cons:
- Account idle steal time but doesn't substract it from idle
cputime.
- Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but
the IRQ time is simply ignored when
CONFIG_IRQ_TIME_ACCOUNTING=n
- The windows between 1) idle task scheduling and the first call
to tick_nohz_start_idle() and 2) idle task between the last
tick_nohz_stop_idle() and the rest of the idle time are
blindspots wrt. cputime accounting (though mostly insignificant
amount)
- Relies on private fields outside of kernel stats, with specific
accessors.
* The accounting for offline CPUs which is based on ticks and the
jiffies delta during which the tick was stopped.
Pros:
- Handles steal time correctly
- Handle CONFIG_IRQ_TIME_ACCOUNTING=y and
CONFIG_IRQ_TIME_ACCOUNTING=n correctly.
- Handles the whole idle task
- Accounts directly to kernel stats, without midlayer accumulator.
Cons:
- Doesn't elapse when the tick is off, which doesn't make it
suitable for online CPUs.
- Has TICK_NSEC granularity (jiffies)
- Needs to track the dyntick-idle ticks that were accounted and
substract them from the total jiffies time spent while the tick
was stopped. This is an ugly workaround.
Having two different accounting for a single context is not the only
problem: since those accountings are of different natures, it is
possible to observe the global idle time going backward after a CPU goes
offline, as reported by Xin Zhao.
Clean up the situation with introducing a hybrid approach that stays
coherent, fixes the backward jumps and works for both online and offline
CPUs:
* Tick based or native vtime accounting operate before the tick is
stopped and resumes once the tick is restarted.
* When the idle loop starts, switch to dynticks-idle accounting as is
done currently, except that the statistics accumulate directly to the
relevant kernel stat fields.
* Private dyntick cputime accounting fields are removed.
* Works on both online and offline case.
* Move most of the relevant code to the common sched/cputime subsystem
* Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the
dynticks-idle accounting still elapses while on IRQs.
* Correctly substract idle steal cputime from idle time
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/core
HEAD: 6a3d814ef2f6142714bef862be36def5ca4c9d96
Thanks,
Frederic
---
Frederic Weisbecker (15):
sched/idle: Handle offlining first in idle loop
sched/cputime: Remove superfluous and error prone kcpustat_field() parameter
sched/cputime: Correctly support generic vtime idle time
powerpc/time: Prepare to stop elapsing in dynticks-idle
s390/time: Prepare to stop elapsing in dynticks-idle
tick/sched: Unify idle cputime accounting
cpufreq: ondemand: Simplify idle cputime granularity test
tick/sched: Remove nohz disabled special case in cputime fetch
tick/sched: Move dyntick-idle cputime accounting to cputime code
tick/sched: Remove unused fields
tick/sched: Account tickless idle cputime only when tick is stopped
tick/sched: Consolidate idle time fetching APIs
sched/cputime: Consolidate get_cpu_[idle|iowait]_time_us()
sched/cputime: Handle idle irqtime gracefully
sched/cputime: Handle dyntick-idle steal time correctly
arch/powerpc/kernel/time.c | 41 +++++
arch/s390/include/asm/idle.h | 11 +-
arch/s390/kernel/idle.c | 13 +-
arch/s390/kernel/vtime.c | 57 ++++++-
drivers/cpufreq/cpufreq.c | 29 +---
drivers/cpufreq/cpufreq_governor.c | 6 +-
drivers/cpufreq/cpufreq_ondemand.c | 7 +-
drivers/macintosh/rack-meter.c | 2 +-
fs/proc/stat.c | 40 +----
fs/proc/uptime.c | 8 +-
include/linux/kernel_stat.h | 76 ++++++++--
include/linux/tick.h | 4 -
include/linux/vtime.h | 20 ++-
kernel/rcu/tree.c | 9 +-
kernel/rcu/tree_stall.h | 7 +-
kernel/sched/cputime.c | 302 +++++++++++++++++++++++++++++++------
kernel/sched/idle.c | 11 +-
kernel/sched/sched.h | 1 +
kernel/time/tick-sched.c | 203 +++++--------------------
kernel/time/tick-sched.h | 12 --
kernel/time/timer_list.c | 6 +-
scripts/gdb/linux/timerlist.py | 4 -
22 files changed, 505 insertions(+), 364 deletions(-)
next reply other threads:[~2026-01-16 14:52 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 14:51 Frederic Weisbecker [this message]
2026-01-16 14:51 ` [PATCH 01/15] sched/idle: Handle offlining first in idle loop Frederic Weisbecker
2026-01-19 12:53 ` Peter Zijlstra
2026-01-19 21:04 ` Frederic Weisbecker
2026-01-20 4:26 ` K Prateek Nayak
2026-01-20 14:52 ` Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 02/15] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 03/15] sched/cputime: Correctly support generic vtime idle time Frederic Weisbecker
2026-01-19 13:02 ` Peter Zijlstra
2026-01-19 21:35 ` Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-25 17:53 ` Christophe Leroy (CS GROUP)
2026-01-16 14:51 ` [PATCH 05/15] s390/time: " Frederic Weisbecker
2026-01-21 12:17 ` Heiko Carstens
2026-01-21 18:04 ` Frederic Weisbecker
2026-01-22 14:40 ` Heiko Carstens
2026-01-27 14:45 ` Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 06/15] tick/sched: Unify idle cputime accounting Frederic Weisbecker
2026-01-19 14:26 ` Peter Zijlstra
2026-01-19 22:00 ` Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 07/15] cpufreq: ondemand: Simplify idle cputime granularity test Frederic Weisbecker
2026-01-19 5:37 ` Viresh Kumar
2026-01-19 12:30 ` Rafael J. Wysocki
2026-01-19 22:06 ` Frederic Weisbecker
2026-01-20 12:32 ` Rafael J. Wysocki
2026-01-20 14:28 ` Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 08/15] tick/sched: Remove nohz disabled special case in cputime fetch Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 09/15] tick/sched: Move dyntick-idle cputime accounting to cputime code Frederic Weisbecker
2026-01-19 14:35 ` Peter Zijlstra
2026-01-19 22:08 ` Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 10/15] tick/sched: Remove unused fields Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 11/15] tick/sched: Account tickless idle cputime only when tick is stopped Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 12/15] tick/sched: Consolidate idle time fetching APIs Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 13/15] sched/cputime: Consolidate get_cpu_[idle|iowait]_time_us() Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 14/15] sched/cputime: Handle idle irqtime gracefully Frederic Weisbecker
2026-01-16 14:52 ` [PATCH 15/15] sched/cputime: Handle dyntick-idle steal time correctly Frederic Weisbecker
2026-01-16 14:57 ` [PATCH 00/15] tick/sched: Refactor idle cputime accounting Frederic Weisbecker
2026-01-20 12:42 ` Shrikanth Hegde
2026-01-21 16:55 ` Frederic Weisbecker
2026-01-19 14:53 ` Peter Zijlstra
2026-01-19 22:12 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260116145208.87445-1-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jackzxcui1989@163.com \
--cc=jan.kiszka@siemens.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kbingham@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=neeraj.upadhyay@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.