From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Frederic Weisbecker <frederic@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Ben Segall <bsegall@google.com>,
Boqun Feng <boqun.feng@gmail.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Ingo Molnar <mingo@redhat.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Juri Lelli <juri.lelli@redhat.com>,
Kieran Bingham <kbingham@kernel.org>,
Mel Gorman <mgorman@suse.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Uladzislau Rezki <urezki@gmail.com>,
Valentin Schneider <vschneid@redhat.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Xin Zhao <jackzxcui1989@163.com>,
linux-pm@vger.kernel.org, linux-s390@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle
Date: Fri, 20 Feb 2026 00:00:06 +0530 [thread overview]
Message-ID: <9413517d-963b-4e6d-b11b-b440acd7cb5a@linux.ibm.com> (raw)
In-Reply-To: <20260206142245.58987-5-frederic@kernel.org>
On 2/6/26 7:52 PM, Frederic Weisbecker wrote:
> Currently the tick subsystem stores the idle cputime accounting in
> private fields, allowing cohabitation with architecture idle vtime
> accounting. The former is fetched on online CPUs, the latter on offline
> CPUs.
>
> For consolidation purpose, architecture vtime accounting will continue
> to account the cputime but will make a break when the idle tick is
> stopped. The dyntick cputime accounting will then be relayed by the tick
> subsystem so that the idle cputime is still seen advancing coherently
> even when the tick isn't there to flush the idle vtime.
>
> Prepare for that and introduce three new APIs which will be used in
> subsequent patches:
>
> _ vtime_dynticks_start() is deemed to be called when idle enters in
> dyntick mode. The idle cputime that elapsed so far is accumulated.
>
> - vtime_dynticks_stop() is deemed to be called when idle exits from
> dyntick mode. The vtime entry clocks are fast-forward to current time
> so that idle accounting restarts elapsing from now.
>
> - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
> fast-forward the clock to current time so that the IRQ time is still
> accounted by vtime while nohz cputime is paused.
>
> Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
> accounting twice the idle cputime, along with nohz accounting.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> arch/powerpc/kernel/time.c | 41 ++++++++++++++++++++++++++++++++++++++
> include/linux/vtime.h | 6 ++++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 4bbeb8644d3d..18506740f4a4 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -376,6 +376,47 @@ void vtime_task_switch(struct task_struct *prev)
> acct->starttime = acct0->starttime;
> }
> }
> +
> +#ifdef CONFIG_NO_HZ_COMMON
> +/**
> + * vtime_reset - Fast forward vtime entry clocks
> + *
> + * Called from dynticks idle IRQ entry to fast-forward the clocks to current time
> + * so that the IRQ time is still accounted by vtime while nohz cputime is paused.
> + */
> +void vtime_reset(void)
> +{
> + struct cpu_accounting_data *acct = get_accounting(current);
> +
> + acct->starttime = mftb();
I figured out why those huge values happen.
This happens because mftb is from when the system is booted.
I was doing kexec to start the new kernel and mftb wasn't getting
reset.
I thought about this. This is concern for pseries too, where LPAR's
restart but system won't restart and mftb will continue to run instead of
reset.
I think we should be using sched_clock instead of mftb here.
Though we need it a few more places and some cosmetic changes around it.
Note: Some values being huge exists without series for few CPUs, with series it
shows up in most of the CPUs.
So I am planning send out fix below fix separately keeping your
series as dependency.
---
arch/powerpc/include/asm/accounting.h | 4 ++--
arch/powerpc/include/asm/cputime.h | 14 +++++++-------
arch/powerpc/kernel/time.c | 22 +++++++++++-----------
3 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/include/asm/accounting.h
index 6d79c31700e2..50f120646e6d 100644
--- a/arch/powerpc/include/asm/accounting.h
+++ b/arch/powerpc/include/asm/accounting.h
@@ -21,8 +21,8 @@ struct cpu_accounting_data {
unsigned long steal_time;
unsigned long idle_time;
/* Internal counters */
- unsigned long starttime; /* TB value snapshot */
- unsigned long starttime_user; /* TB value on exit to usermode */
+ unsigned long starttime; /* Time value snapshot */
+ unsigned long starttime_user; /* Time value on exit to usermode */
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
unsigned long startspurr; /* SPURR value snapshot */
unsigned long utime_sspurr; /* ->user_time when ->startspurr set */
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index aff858ca99c0..eb6b629b113f 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -20,9 +20,9 @@
#include <asm/time.h>
#include <asm/param.h>
#include <asm/firmware.h>
+#include <linux/sched/clock.h>
#ifdef __KERNEL__
-#define cputime_to_nsecs(cputime) tb_to_ns(cputime)
/*
* PPC64 uses PACA which is task independent for storing accounting data while
@@ -44,20 +44,20 @@
*/
static notrace inline void account_cpu_user_entry(void)
{
- unsigned long tb = mftb();
+ unsigned long now = sched_clock();
struct cpu_accounting_data *acct = raw_get_accounting(current);
- acct->utime += (tb - acct->starttime_user);
- acct->starttime = tb;
+ acct->utime += (now - acct->starttime_user);
+ acct->starttime = now;
}
static notrace inline void account_cpu_user_exit(void)
{
- unsigned long tb = mftb();
+ unsigned long now = sched_clock();
struct cpu_accounting_data *acct = raw_get_accounting(current);
- acct->stime += (tb - acct->starttime);
- acct->starttime_user = tb;
+ acct->stime += (now - acct->starttime);
+ acct->starttime_user = now;
}
static notrace inline void account_stolen_time(void)
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 18506740f4a4..fb67cdae3bcb 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -215,7 +215,7 @@ static unsigned long vtime_delta(struct cpu_accounting_data *acct,
WARN_ON_ONCE(!irqs_disabled());
- now = mftb();
+ now = sched_clock();
stime = now - acct->starttime;
acct->starttime = now;
@@ -299,9 +299,9 @@ static void vtime_flush_scaled(struct task_struct *tsk,
{
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
if (acct->utime_scaled)
- tsk->utimescaled += cputime_to_nsecs(acct->utime_scaled);
+ tsk->utimescaled += acct->utime_scaled;
if (acct->stime_scaled)
- tsk->stimescaled += cputime_to_nsecs(acct->stime_scaled);
+ tsk->stimescaled += acct->stime_scaled;
acct->utime_scaled = 0;
acct->utime_sspurr = 0;
@@ -321,28 +321,28 @@ void vtime_flush(struct task_struct *tsk)
struct cpu_accounting_data *acct = get_accounting(tsk);
if (acct->utime)
- account_user_time(tsk, cputime_to_nsecs(acct->utime));
+ account_user_time(tsk, acct->utime);
if (acct->gtime)
- account_guest_time(tsk, cputime_to_nsecs(acct->gtime));
+ account_guest_time(tsk, acct->gtime);
if (IS_ENABLED(CONFIG_PPC_SPLPAR) && acct->steal_time) {
- account_steal_time(cputime_to_nsecs(acct->steal_time));
+ account_steal_time(acct->steal_time);
acct->steal_time = 0;
}
if (acct->idle_time)
- account_idle_time(cputime_to_nsecs(acct->idle_time));
+ account_idle_time(acct->idle_time);
if (acct->stime)
- account_system_index_time(tsk, cputime_to_nsecs(acct->stime),
+ account_system_index_time(tsk, acct->stime,
CPUTIME_SYSTEM);
if (acct->hardirq_time)
- account_system_index_time(tsk, cputime_to_nsecs(acct->hardirq_time),
+ account_system_index_time(tsk, acct->hardirq_time,
CPUTIME_IRQ);
if (acct->softirq_time)
- account_system_index_time(tsk, cputime_to_nsecs(acct->softirq_time),
+ account_system_index_time(tsk, acct->softirq_time,
CPUTIME_SOFTIRQ);
vtime_flush_scaled(tsk, acct);
@@ -388,7 +388,7 @@ void vtime_reset(void)
{
struct cpu_accounting_data *acct = get_accounting(current);
- acct->starttime = mftb();
+ acct->starttime = sched_clock();
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
acct->startspurr = read_spurr(acct->starttime);
#endif
--
2.43.0
next prev parent reply other threads:[~2026-02-19 18:31 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:22 [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 01/15] sched/idle: Handle offlining first in idle loop Frederic Weisbecker
2026-02-18 18:22 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 02/15] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Frederic Weisbecker
2026-02-18 18:25 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 03/15] sched/cputime: Correctly support generic vtime idle time Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-19 18:30 ` Shrikanth Hegde [this message]
2026-02-24 15:41 ` Christophe Leroy (CS GROUP)
2026-02-25 7:46 ` Shrikanth Hegde
2026-02-25 9:45 ` Christophe Leroy (CS GROUP)
2026-02-25 10:34 ` Shrikanth Hegde
2026-02-25 11:14 ` Christophe Leroy (CS GROUP)
2026-02-25 13:33 ` Shrikanth Hegde
2026-02-25 13:54 ` Christophe Leroy (CS GROUP)
2026-02-25 17:47 ` Shrikanth Hegde
2026-02-25 17:59 ` Christophe Leroy (CS GROUP)
2026-02-26 4:06 ` Shrikanth Hegde
2026-02-26 7:32 ` Christophe Leroy (CS GROUP)
2026-02-26 12:57 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 05/15] s390/time: " Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 06/15] tick/sched: Unify idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 07/15] cpufreq: ondemand: Simplify idle cputime granularity test Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 08/15] tick/sched: Remove nohz disabled special case in cputime fetch Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 09/15] tick/sched: Move dyntick-idle cputime accounting to cputime code Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 10/15] tick/sched: Remove unused fields Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 11/15] tick/sched: Account tickless idle cputime only when tick is stopped Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 12/15] tick/sched: Consolidate idle time fetching APIs Frederic Weisbecker
2026-02-06 22:35 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 13/15] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 14/15] sched/cputime: Handle idle irqtime gracefully Frederic Weisbecker
2026-03-03 11:11 ` Shrikanth Hegde
2026-03-20 14:32 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 15/15] sched/cputime: Handle dyntick-idle steal time correctly Frederic Weisbecker
2026-03-03 11:17 ` Shrikanth Hegde
2026-03-24 14:53 ` Frederic Weisbecker
2026-02-11 13:43 ` [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Shrikanth Hegde
2026-02-11 17:06 ` Frederic Weisbecker
2026-02-12 7:02 ` Shrikanth Hegde
2026-02-18 18:11 ` Shrikanth Hegde
-- strict thread matches above, loose matches on Subject: below --
2026-01-16 14:51 [PATCH 00/15] " Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-25 17:53 ` Christophe Leroy (CS GROUP)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9413517d-963b-4e6d-b11b-b440acd7cb5a@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jackzxcui1989@163.com \
--cc=jan.kiszka@siemens.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kbingham@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=neeraj.upadhyay@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox