From: Frederic Weisbecker <frederic@kernel.org>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Vasily Gorbik <gor@linux.ibm.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Kieran Bingham <kbingham@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Xin Zhao <jackzxcui1989@163.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Sven Schnelle <svens@linux.ibm.com>,
Boqun Feng <boqun.feng@gmail.com>, Mel Gorman <mgorman@suse.de>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ben Segall <bsegall@google.com>,
Michael Ellerman <mpe@ellerman.id.au>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Paul E . McKenney" <paulmck@kernel.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
linux-s390@vger.kernel.org, Jan Kiszka <jan.kiszka@siemens.com>,
Juri Lelli <juri.lelli@redhat.com>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
linux-pm@vger.kernel.org, Uladzislau Rezki <urezki@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Nicholas Piggin <npiggin@gmail.com>,
Heiko Carstens <hca@linux.ibm.com>,
linuxppc-dev@lists.ozlabs.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Valentin Schneider <vschneid@redhat.com>,
Viresh Kumar <viresh.kumar@linaro.org>
Subject: Re: [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting
Date: Wed, 11 Feb 2026 18:06:33 +0100 [thread overview]
Message-ID: <aYy3GTXDwZFM3VLy@localhost.localdomain> (raw)
In-Reply-To: <f5f7cc0e-81c1-49c4-9bfa-61b111c69ae2@linux.ibm.com>
Le Wed, Feb 11, 2026 at 07:13:45PM +0530, Shrikanth Hegde a écrit :
> Hi Frederic,
> Gave this series a spin on the same system as v1.
>
> On 2/6/26 7:52 PM, Frederic Weisbecker wrote:
> > Hi,
> >
> > After the issue reported here:
> >
> > https://lore.kernel.org/all/20251210083135.3993562-1-jackzxcui1989@163.com/
> >
> > It occurs that the idle cputime accounting is a big mess that
> > accumulates within two concurrent statistics, each having their own
> > shortcomings:
> >
> > * The accounting for online CPUs which is based on the delta between
> > tick_nohz_start_idle() and tick_nohz_stop_idle().
> >
> > Pros:
> > - Works when the tick is off
> >
> > - Has nsecs granularity
> >
> > Cons:
> > - Account idle steal time but doesn't substract it from idle
> > cputime.
> >
> > - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but
> > the IRQ time is simply ignored when
> > CONFIG_IRQ_TIME_ACCOUNTING=n
> >
> > - The windows between 1) idle task scheduling and the first call
> > to tick_nohz_start_idle() and 2) idle task between the last
> > tick_nohz_stop_idle() and the rest of the idle time are
> > blindspots wrt. cputime accounting (though mostly insignificant
> > amount)
> >
> > - Relies on private fields outside of kernel stats, with specific
> > accessors.
> >
> > * The accounting for offline CPUs which is based on ticks and the
> > jiffies delta during which the tick was stopped.
> >
> > Pros:
> > - Handles steal time correctly
> >
> > - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and
> > CONFIG_IRQ_TIME_ACCOUNTING=n correctly.
> >
> > - Handles the whole idle task
> >
> > - Accounts directly to kernel stats, without midlayer accumulator.
> >
> > Cons:
> > - Doesn't elapse when the tick is off, which doesn't make it
> > suitable for online CPUs.
> >
> > - Has TICK_NSEC granularity (jiffies)
> >
> > - Needs to track the dyntick-idle ticks that were accounted and
> > substract them from the total jiffies time spent while the tick
> > was stopped. This is an ugly workaround.
> >
> > Having two different accounting for a single context is not the only
> > problem: since those accountings are of different natures, it is
> > possible to observe the global idle time going backward after a CPU goes
> > offline, as reported by Xin Zhao.
> >
> > Clean up the situation with introducing a hybrid approach that stays
> > coherent, fixes the backward jumps and works for both online and offline
> > CPUs:
> >
> > * Tick based or native vtime accounting operate before the tick is
> > stopped and resumes once the tick is restarted.
> >
> > * When the idle loop starts, switch to dynticks-idle accounting as is
> > done currently, except that the statistics accumulate directly to the
> > relevant kernel stat fields.
> >
> > * Private dyntick cputime accounting fields are removed.
> >
> > * Works on both online and offline case.
> >
> > * Move most of the relevant code to the common sched/cputime subsystem
> >
> > * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the
> > dynticks-idle accounting still elapses while on IRQs.
> >
> > * Correctly substract idle steal cputime from idle time
> >
> > Changes since v1:
> >
> > - Fix deadlock involving double seq count lock on idle
> >
> > - Fix build breakage on powerpc
> >
> > - Fix build breakage on s390 (Heiko)
> >
> > - Fix broken sysfs s390 idle time file (Heiko)
> >
> > - Convert most ktime usage here into u64 (Peterz)
> >
> > - Add missing (or too implicit) <linux/sched/clock.h> (Peterz)
> >
> > - Fix whole idle time acccounting breakage due to missing TS_FLAG_ set
> > on idle entry (Shrikanth Hegde)
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > timers/core-v2
> >
> > HEAD: 21458b98c80a0567d48131240317b7b73ba34c3c
> > Thanks,
> > Frederic
>
> idle and runtime utilization with mpstat while running stress-ng looks
> correct now.
>
> However, when running hackbench I am noticing the below data. hackbench shows
> severe regressions.
>
> base: tip/master at 9c61ebbdb587a3950072700ab74a9310afe3ad73.
> (nit: patch 7 is already part of tip. so skipped applying it)
> +-----------------------------------------------+-------+---------+-----------+
> | Test | base | +series | % Diff |
> +-----------------------------------------------+-------+---------+-----------+
> | HackBench Process 10 groups | 2.23 | 3.05 | -36.77% |
> | HackBench Process 20 groups | 4.17 | 5.82 | -39.57% |
> | HackBench Process 30 groups | 6.04 | 8.49 | -40.56% |
> | HackBench Process 40 groups | 7.90 | 11.10 | -40.51% |
> | HackBench thread 10 | 2.44 | 3.36 | -37.70% |
> | HackBench thread 20 | 4.57 | 6.35 | -38.95% |
> | HackBench Process(Pipe) 10 | 1.76 | 2.29 | -30.11% |
> | HackBench Process(Pipe) 20 | 3.49 | 4.76 | -36.39% |
> | HackBench Process(Pipe) 30 | 5.21 | 7.13 | -36.85% |
> | HackBench Process(Pipe) 40 | 6.89 | 9.31 | -35.12% |
> | HackBench thread(Pipe) 10 | 1.91 | 2.50 | -30.89% |
> | HackBench thread(Pipe) 20 | 3.74 | 5.16 | -37.97% |
> +-----------------------------------------------+-------+---------+-----------+
>
> I have these in .config and I don't have nohz_full or isolated cpus.
>
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> # CONFIG_NO_HZ_IDLE is not set
> CONFIG_NO_HZ_FULL=y
>
> # CPU/Task time and stats accounting
> #
> CONFIG_VIRT_CPU_ACCOUNTING=y
> CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_IRQ_TIME_ACCOUNTING=y
> CONFIG_HAVE_SCHED_AVG_IRQ=y
>
> I did a git bisect and below is what it says.
>
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [6821315886a3b5267ea31d29dba26fd34647fbbc] sched/cputime: Handle dyntick-idle steal time correctly
> git bisect bad 6821315886a3b5267ea31d29dba26fd34647fbbc
> # status: waiting for good commit(s), bad commit known
> # good: [9c61ebbdb587a3950072700ab74a9310afe3ad73] Merge branch into tip/master: 'x86/sev'
> git bisect good 9c61ebbdb587a3950072700ab74a9310afe3ad73
> # good: [dc8bb3c84d162f7d9aa6becf9f8392474f92655a] tick/sched: Remove nohz disabled special case in cputime fetch
> git bisect good dc8bb3c84d162f7d9aa6becf9f8392474f92655a
> # good: [5070a778a581cd668f5d717f85fb22b078d8c20c] tick/sched: Account tickless idle cputime only when tick is stopped
> git bisect good 5070a778a581cd668f5d717f85fb22b078d8c20c
> # bad: [1e0ccc25a9a74b188b239c4de716fde279adbf8e] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
> git bisect bad 1e0ccc25a9a74b188b239c4de716fde279adbf8e
> # bad: [ee7c735b76071000d401869fc2883c451ee3fa61] tick/sched: Consolidate idle time fetching APIs
> git bisect bad ee7c735b76071000d401869fc2883c451ee3fa61
> # first bad commit: [ee7c735b76071000d401869fc2883c451ee3fa61] tick/sched:
> Consolidate idle time fetching APIs
I see. Can you try this? (or fetch timers/core-v3 from my tree)
Perhaps that mistake had some impact on cpufreq.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 057fdc00dbc6..08550a6d9469 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -524,7 +524,7 @@ static u64 get_cpu_sleep_time_us(int cpu, enum cpu_usage_stat idx,
do_div(res, NSEC_PER_USEC);
if (last_update_time)
- *last_update_time = res;
+ *last_update_time = ktime_to_us(now);
return res;
}
next prev parent reply other threads:[~2026-02-11 17:06 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:22 [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 01/15] sched/idle: Handle offlining first in idle loop Frederic Weisbecker
2026-02-18 18:22 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 02/15] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Frederic Weisbecker
2026-02-18 18:25 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 03/15] sched/cputime: Correctly support generic vtime idle time Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-02-19 18:30 ` Shrikanth Hegde
2026-02-24 15:41 ` Christophe Leroy (CS GROUP)
2026-02-25 7:46 ` Shrikanth Hegde
2026-02-25 9:45 ` Christophe Leroy (CS GROUP)
2026-02-25 10:34 ` Shrikanth Hegde
2026-02-25 11:14 ` Christophe Leroy (CS GROUP)
2026-02-25 13:33 ` Shrikanth Hegde
2026-02-25 13:54 ` Christophe Leroy (CS GROUP)
2026-02-25 17:47 ` Shrikanth Hegde
2026-02-25 17:59 ` Christophe Leroy (CS GROUP)
2026-02-26 4:06 ` Shrikanth Hegde
2026-02-26 7:32 ` Christophe Leroy (CS GROUP)
2026-02-26 12:57 ` Shrikanth Hegde
2026-02-06 14:22 ` [PATCH 05/15] s390/time: " Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 06/15] tick/sched: Unify idle cputime accounting Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 07/15] cpufreq: ondemand: Simplify idle cputime granularity test Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 08/15] tick/sched: Remove nohz disabled special case in cputime fetch Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 09/15] tick/sched: Move dyntick-idle cputime accounting to cputime code Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 10/15] tick/sched: Remove unused fields Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 11/15] tick/sched: Account tickless idle cputime only when tick is stopped Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 12/15] tick/sched: Consolidate idle time fetching APIs Frederic Weisbecker
2026-02-06 22:35 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 13/15] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 14/15] sched/cputime: Handle idle irqtime gracefully Frederic Weisbecker
2026-03-03 11:11 ` Shrikanth Hegde
2026-03-20 14:32 ` Frederic Weisbecker
2026-02-06 14:22 ` [PATCH 15/15] sched/cputime: Handle dyntick-idle steal time correctly Frederic Weisbecker
2026-03-03 11:17 ` Shrikanth Hegde
2026-03-24 14:53 ` Frederic Weisbecker
2026-02-11 13:43 ` [PATCH 00/15 v2] tick/sched: Refactor idle cputime accounting Shrikanth Hegde
2026-02-11 17:06 ` Frederic Weisbecker [this message]
2026-02-12 7:02 ` Shrikanth Hegde
2026-02-18 18:11 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYy3GTXDwZFM3VLy@localhost.localdomain \
--to=frederic@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=borntraeger@linux.ibm.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jackzxcui1989@163.com \
--cc=jan.kiszka@siemens.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kbingham@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=neeraj.upadhyay@kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=sshegde@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.