From: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
To: Shrikanth Hegde <sshegde@linux.ibm.com>,
Frederic Weisbecker <frederic@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Ben Segall <bsegall@google.com>,
Boqun Feng <boqun.feng@gmail.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Ingo Molnar <mingo@redhat.com>,
Jan Kiszka <jan.kiszka@siemens.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Juri Lelli <juri.lelli@redhat.com>,
Kieran Bingham <kbingham@kernel.org>,
Mel Gorman <mgorman@suse.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Nicholas Piggin <npiggin@gmail.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Sven Schnelle <svens@linux.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Uladzislau Rezki <urezki@gmail.com>,
Valentin Schneider <vschneid@redhat.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Xin Zhao <jackzxcui1989@163.com>,
linux-pm@vger.kernel.org, linux-s390@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle
Date: Tue, 24 Feb 2026 16:41:32 +0100 [thread overview]
Message-ID: <9ab1e7d7-57ee-49f9-963c-3a1b96dda684@kernel.org> (raw)
In-Reply-To: <9413517d-963b-4e6d-b11b-b440acd7cb5a@linux.ibm.com>
Hi Hegde,
Le 19/02/2026 à 19:30, Shrikanth Hegde a écrit :
>
>
> On 2/6/26 7:52 PM, Frederic Weisbecker wrote:
>> Currently the tick subsystem stores the idle cputime accounting in
>> private fields, allowing cohabitation with architecture idle vtime
>> accounting. The former is fetched on online CPUs, the latter on offline
>> CPUs.
>>
>> For consolidation purpose, architecture vtime accounting will continue
>> to account the cputime but will make a break when the idle tick is
>> stopped. The dyntick cputime accounting will then be relayed by the tick
>> subsystem so that the idle cputime is still seen advancing coherently
>> even when the tick isn't there to flush the idle vtime.
>>
>> Prepare for that and introduce three new APIs which will be used in
>> subsequent patches:
>>
>> _ vtime_dynticks_start() is deemed to be called when idle enters in
>> dyntick mode. The idle cputime that elapsed so far is accumulated.
>>
>> - vtime_dynticks_stop() is deemed to be called when idle exits from
>> dyntick mode. The vtime entry clocks are fast-forward to current time
>> so that idle accounting restarts elapsing from now.
>>
>> - vtime_reset() is deemed to be called from dynticks idle IRQ entry to
>> fast-forward the clock to current time so that the IRQ time is still
>> accounted by vtime while nohz cputime is paused.
>>
>> Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid
>> accounting twice the idle cputime, along with nohz accounting.
>>
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>
>> ---
>> arch/powerpc/kernel/time.c | 41 ++++++++++++++++++++++++++++++++++++++
>> include/linux/vtime.h | 6 ++++++
>> 2 files changed, 47 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>> index 4bbeb8644d3d..18506740f4a4 100644
>> --- a/arch/powerpc/kernel/time.c
>> +++ b/arch/powerpc/kernel/time.c
>> @@ -376,6 +376,47 @@ void vtime_task_switch(struct task_struct *prev)
>> acct->starttime = acct0->starttime;
>> }
>> }
>> +
>> +#ifdef CONFIG_NO_HZ_COMMON
>> +/**
>> + * vtime_reset - Fast forward vtime entry clocks
>> + *
>> + * Called from dynticks idle IRQ entry to fast-forward the clocks to
>> current time
>> + * so that the IRQ time is still accounted by vtime while nohz
>> cputime is paused.
>> + */
>> +void vtime_reset(void)
>> +{
>> + struct cpu_accounting_data *acct = get_accounting(current);
>> +
>> + acct->starttime = mftb();
>
> I figured out why those huge values happen.
>
> This happens because mftb is from when the system is booted.
> I was doing kexec to start the new kernel and mftb wasn't getting
> reset.
>
> I thought about this. This is concern for pseries too, where LPAR's
> restart but system won't restart and mftb will continue to run instead of
> reset.
>
> I think we should be using sched_clock instead of mftb here.
> Though we need it a few more places and some cosmetic changes around it.
>
> Note: Some values being huge exists without series for few CPUs, with
> series it
> shows up in most of the CPUs.
>
> So I am planning send out fix below fix separately keeping your
> series as dependency.
>
> ---
> arch/powerpc/include/asm/accounting.h | 4 ++--
> arch/powerpc/include/asm/cputime.h | 14 +++++++-------
> arch/powerpc/kernel/time.c | 22 +++++++++++-----------
> 3 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/
> include/asm/accounting.h
> index 6d79c31700e2..50f120646e6d 100644
> --- a/arch/powerpc/include/asm/accounting.h
> +++ b/arch/powerpc/include/asm/accounting.h
> @@ -21,8 +21,8 @@ struct cpu_accounting_data {
> unsigned long steal_time;
> unsigned long idle_time;
> /* Internal counters */
> - unsigned long starttime; /* TB value snapshot */
> - unsigned long starttime_user; /* TB value on exit to usermode */
> + unsigned long starttime; /* Time value snapshot */
> + unsigned long starttime_user; /* Time value on exit to usermode */
> #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
> unsigned long startspurr; /* SPURR value snapshot */
> unsigned long utime_sspurr; /* ->user_time when ->startspurr
> set */
> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/
> asm/cputime.h
> index aff858ca99c0..eb6b629b113f 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -20,9 +20,9 @@
> #include <asm/time.h>
> #include <asm/param.h>
> #include <asm/firmware.h>
> +#include <linux/sched/clock.h>
>
> #ifdef __KERNEL__
> -#define cputime_to_nsecs(cputime) tb_to_ns(cputime)
>
> /*
> * PPC64 uses PACA which is task independent for storing accounting
> data while
> @@ -44,20 +44,20 @@
> */
> static notrace inline void account_cpu_user_entry(void)
> {
> - unsigned long tb = mftb();
> + unsigned long now = sched_clock();
Now way !
By doing that you'll kill performance for no reason. All we need when
accounting time spent in kernel or in user is the difference between
time at entry and time at exit, no mater what the time was at boot time.
Also sched_clock() returns nanoseconds which implies calculation from
timebase. This is pointless CPU consumption. The current implementation
calculates nanoseconds at task switch when calling vtime_flush().Your
change will now do it at every kernel entry and kernel exit by calling
sched_clock().
Another point is that sched_clock() returns a long long not a long.
And also sched_clock() uses get_tb() which does mftb and mftbu. Which is
pointless for calculating time deltas unless your application spends
hours without being re-scheduled.
> struct cpu_accounting_data *acct = raw_get_accounting(current);
>
> - acct->utime += (tb - acct->starttime_user);
> - acct->starttime = tb;
> + acct->utime += (now - acct->starttime_user);
> + acct->starttime = now;
> }
>
> static notrace inline void account_cpu_user_exit(void)
> {
> - unsigned long tb = mftb();
> + unsigned long now = sched_clock();
> struct cpu_accounting_data *acct = raw_get_accounting(current);
>
> - acct->stime += (tb - acct->starttime);
> - acct->starttime_user = tb;
> + acct->stime += (now - acct->starttime);
> + acct->starttime_user = now;
> }
>
> static notrace inline void account_stolen_time(void)
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 18506740f4a4..fb67cdae3bcb 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -215,7 +215,7 @@ static unsigned long vtime_delta(struct
> cpu_accounting_data *acct,
>
> WARN_ON_ONCE(!irqs_disabled());
>
> - now = mftb();
> + now = sched_clock();
> stime = now - acct->starttime;
> acct->starttime = now;
>
> @@ -299,9 +299,9 @@ static void vtime_flush_scaled(struct task_struct *tsk,
> {
> #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
> if (acct->utime_scaled)
> - tsk->utimescaled += cputime_to_nsecs(acct->utime_scaled);
> + tsk->utimescaled += acct->utime_scaled;
> if (acct->stime_scaled)
> - tsk->stimescaled += cputime_to_nsecs(acct->stime_scaled);
> + tsk->stimescaled += acct->stime_scaled;
>
> acct->utime_scaled = 0;
> acct->utime_sspurr = 0;
> @@ -321,28 +321,28 @@ void vtime_flush(struct task_struct *tsk)
> struct cpu_accounting_data *acct = get_accounting(tsk);
>
> if (acct->utime)
> - account_user_time(tsk, cputime_to_nsecs(acct->utime));
> + account_user_time(tsk, acct->utime);
>
> if (acct->gtime)
> - account_guest_time(tsk, cputime_to_nsecs(acct->gtime));
> + account_guest_time(tsk, acct->gtime);
>
> if (IS_ENABLED(CONFIG_PPC_SPLPAR) && acct->steal_time) {
> - account_steal_time(cputime_to_nsecs(acct->steal_time));
> + account_steal_time(acct->steal_time);
> acct->steal_time = 0;
> }
>
> if (acct->idle_time)
> - account_idle_time(cputime_to_nsecs(acct->idle_time));
> + account_idle_time(acct->idle_time);
>
> if (acct->stime)
> - account_system_index_time(tsk, cputime_to_nsecs(acct->stime),
> + account_system_index_time(tsk, acct->stime,
> CPUTIME_SYSTEM);
>
> if (acct->hardirq_time)
> - account_system_index_time(tsk, cputime_to_nsecs(acct-
> >hardirq_time),
> + account_system_index_time(tsk, acct->hardirq_time,
> CPUTIME_IRQ);
> if (acct->softirq_time)
> - account_system_index_time(tsk, cputime_to_nsecs(acct-
> >softirq_time),
> + account_system_index_time(tsk, acct->softirq_time,
> CPUTIME_SOFTIRQ);
>
> vtime_flush_scaled(tsk, acct);
> @@ -388,7 +388,7 @@ void vtime_reset(void)
> {
> struct cpu_accounting_data *acct = get_accounting(current);
>
> - acct->starttime = mftb();
> + acct->starttime = sched_clock();
> #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
> acct->startspurr = read_spurr(acct->starttime);
> #endif
next prev parent reply other threads:[~2026-02-24 15:41 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:22 [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 01/15] sched/idle: Handle offlining first in idle loop Frederic Weisbecker
2026-02-18 18:22 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 02/15] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Frederic Weisbecker
2026-02-18 18:25 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 03/15] sched/cputime: Correctly support generic vtime idle time Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-19 18:30 ` Shrikanth Hegde
2026-02-24 15:41 ` Christophe Leroy (CS GROUP) [this message]
2026-02-25 7:46 ` Shrikanth Hegde
2026-02-25 9:45 ` Christophe Leroy (CS GROUP)
2026-02-25 10:34 ` Shrikanth Hegde
2026-02-25 11:14 ` Christophe Leroy (CS GROUP)
2026-02-25 13:33 ` Shrikanth Hegde
2026-02-25 13:54 ` Christophe Leroy (CS GROUP)
2026-02-25 17:47 ` Shrikanth Hegde
2026-02-25 17:59 ` Christophe Leroy (CS GROUP)
2026-02-26 4:06 ` Shrikanth Hegde
2026-02-26 7:32 ` Christophe Leroy (CS GROUP)
2026-02-26 12:57 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 05/15] s390/time: " Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 06/15] tick/sched: Unify idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 07/15] cpufreq: ondemand: Simplify idle cputime granularity test Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 08/15] tick/sched: Remove nohz disabled special case in cputime fetch Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 09/15] tick/sched: Move dyntick-idle cputime accounting to cputime code Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 10/15] tick/sched: Remove unused fields Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 11/15] tick/sched: Account tickless idle cputime only when tick is stopped Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 12/15] tick/sched: Consolidate idle time fetching APIs Frederic Weisbecker
2026-02-06 22:35 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 13/15] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 14/15] sched/cputime: Handle idle irqtime gracefully Frederic Weisbecker
2026-03-03 11:11 ` Shrikanth Hegde
2026-03-20 14:32 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 15/15] sched/cputime: Handle dyntick-idle steal time correctly Frederic Weisbecker
2026-03-03 11:17 ` Shrikanth Hegde
2026-03-24 14:53 ` Frederic Weisbecker
2026-02-11 13:43 ` [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Shrikanth Hegde
2026-02-11 17:06 ` Frederic Weisbecker
2026-02-12 7:02 ` Shrikanth Hegde
2026-02-18 18:11 ` Shrikanth Hegde
-- strict thread matches above, loose matches on Subject: below --
2026-01-16 14:51 [PATCH 00/15] " Frederic Weisbecker
2026-01-16 14:51 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-25 17:53 ` Christophe Leroy (CS GROUP)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ab1e7d7-57ee-49f9-963c-3a1b96dda684@kernel.org \
--to=chleroy@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jackzxcui1989@163.com \
--cc=jan.kiszka@siemens.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kbingham@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=neeraj.upadhyay@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=sshegde@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox