All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Ben Segall <bsegall@google.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-pm@vger.kernel.org, Sashiko@localhost.localdomain,
	Ingo Molnar <mingo@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Boqun Feng <boqun.feng@gmail.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linuxppc-dev@lists.ozlabs.org,
	Sven Schnelle <svens@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Kieran Bingham <kbingham@kernel.org>,
	Xin Zhao <jackzxcui1989@163.com>,
	linux-s390@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>
Subject: Re: [PATCH 00/15 v4] tick/sched: Refactor idle cputime accounting
Date: Tue, 26 May 2026 12:42:16 +0200	[thread overview]
Message-ID: <ahV5CFMvMEKtuVct@localhost.localdomain> (raw)
In-Reply-To: <20260508131647.43868-1-frederic@kernel.org>

Hi,

I don't see any further concern. What should we do with this? It could
either go through the scheduler tree or the timer tree.

Thanks.


Le Fri, May 08, 2026 at 03:16:32PM +0200, Frederic Weisbecker a écrit :
> Hi,
> 
> After the issue reported here:
> 
>         https://lore.kernel.org/all/20251210083135.3993562-1-jackzxcui1989@163.com/
> 
> It occurs that the idle cputime accounting is a big mess that
> accumulates within two concurrent statistics, each having their own
> shortcomings:
> 
> * The accounting for online CPUs which is based on the delta between
>   tick_nohz_start_idle() and tick_nohz_stop_idle().
> 
>   Pros:
>        - Works when the tick is off
> 
>        - Has nsecs granularity
> 
>   Cons:
>        - Account idle steal time but doesn't substract it from idle
>          cputime.
> 
>        - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but
>          the IRQ time is simply ignored when
>          CONFIG_IRQ_TIME_ACCOUNTING=n
> 
>        - The windows between 1) idle task scheduling and the first call
>          to tick_nohz_start_idle() and 2) idle task between the last
>          tick_nohz_stop_idle() and the rest of the idle time are
>          blindspots wrt. cputime accounting (though mostly insignificant
>          amount)
> 
>        - Relies on private fields outside of kernel stats, with specific
>          accessors.
> 
> * The accounting for offline CPUs which is based on ticks and the
>   jiffies delta during which the tick was stopped.
> 
>   Pros:
>        - Handles steal time correctly
> 
>        - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and
>          CONFIG_IRQ_TIME_ACCOUNTING=n correctly.
> 
>        - Handles the whole idle task
> 
>        - Accounts directly to kernel stats, without midlayer accumulator.
> 
>    Cons:
>        - Doesn't elapse when the tick is off, which doesn't make it
>          suitable for online CPUs.
> 
>        - Has TICK_NSEC granularity (jiffies)
> 
>        - Needs to track the dyntick-idle ticks that were accounted and
>          substract them from the total jiffies time spent while the tick
>          was stopped. This is an ugly workaround.
> 
> Having two different accounting for a single context is not the only
> problem: since those accountings are of different natures, it is
> possible to observe the global idle time going backward after a CPU goes
> offline, as reported by Xin Zhao.
> 
> Clean up the situation with introducing a hybrid approach that stays
> coherent, fixes the backward jumps and works for both online and offline
> CPUs:
> 
> * Tick based or native vtime accounting operate before the tick is
>   stopped and resumes once the tick is restarted.
> 
> * When the idle loop starts, switch to dynticks-idle accounting as is
>   done currently, except that the statistics accumulate directly to the
>   relevant kernel stat fields.
> 
> * Private dyntick cputime accounting fields are removed.
> 
> * Works on both online and offline case.
> 
> * Move most of the relevant code to the common sched/cputime subsystem
> 
> * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the
>   dynticks-idle accounting still elapses while on IRQs.
> 
> * Correctly substract idle steal cputime from idle time
> 
> Changes since v3 (among which a lot of relevant reviews from Sashiko):
> 
> - Add new tags
> 
> - Rebase on latest -rc1
> 
> - Add "tick/sched: Fix TOCTOU in nohz idle time fetch" (Sashiko)
> 
> - Fix buggy state refetch in kcpustat_cpu_fetch_vtime() (Sashiko)
> 
> - Fix build issue on powerpc (Christophe Leroy)
> 
> - Fix s390 lost steal time occuring on idle IRQs (call vtime_flush() on
>   vtime_account_hardirq() and vtime_account_softirq()) (Sashiko)
> 
> - Fix build issue on s390
> 
> - Fix uninitialized idle_sleeptime_seq (Sashiko)
> 
> - Fix irqtime being disabled or enabled in the middle of an idle IRQ
>   (Sashiko)
>   
> - Fix tick restart and then restop in the same idle loop (Sashiko)
> 
> - Fix "sched/cputime: Handle idle irqtime gracefully" changelog (Sashiko)
> 
> - Fix idle steal time substracted from the wrong index between idle and
>   iowait kcpustat. (Sashiko)
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> 	timers/core-v4
> 
> HEAD: e64ba052ce04e363ff76d3cb8bedc5f812188acb
> Thanks,
> 	Frederic
> ---
> 
> Frederic Weisbecker (15):
>       tick/sched: Fix TOCTOU in nohz idle time fetch
>       sched/idle: Handle offlining first in idle loop
>       sched/cputime: Remove superfluous and error prone kcpustat_field() parameter
>       sched/cputime: Correctly support generic vtime idle time
>       powerpc/time: Prepare to stop elapsing in dynticks-idle
>       s390/time: Prepare to stop elapsing in dynticks-idle
>       tick/sched: Unify idle cputime accounting
>       tick/sched: Remove nohz disabled special case in cputime fetch
>       tick/sched: Move dyntick-idle cputime accounting to cputime code
>       tick/sched: Remove unused fields
>       tick/sched: Account tickless idle cputime only when tick is stopped
>       tick/sched: Consolidate idle time fetching APIs
>       sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case
>       sched/cputime: Handle idle irqtime gracefully
>       sched/cputime: Handle dyntick-idle steal time correctly
> 
>  arch/powerpc/kernel/time.c         |  41 +++++
>  arch/s390/include/asm/idle.h       |   2 +
>  arch/s390/kernel/idle.c            |   5 +-
>  arch/s390/kernel/vtime.c           |  75 ++++++++-
>  drivers/cpufreq/cpufreq.c          |  29 +---
>  drivers/cpufreq/cpufreq_governor.c |   6 +-
>  drivers/macintosh/rack-meter.c     |   2 +-
>  fs/proc/stat.c                     |  40 +----
>  fs/proc/uptime.c                   |   8 +-
>  include/linux/kernel_stat.h        |  76 +++++++--
>  include/linux/tick.h               |   4 -
>  include/linux/vtime.h              |  22 ++-
>  kernel/rcu/tree.c                  |   9 +-
>  kernel/rcu/tree_stall.h            |   7 +-
>  kernel/sched/core.c                |   6 +-
>  kernel/sched/cputime.c             | 308 +++++++++++++++++++++++++++++++------
>  kernel/sched/idle.c                |  13 +-
>  kernel/time/tick-sched.c           | 212 ++++++-------------------
>  kernel/time/tick-sched.h           |  12 --
>  kernel/time/timer_list.c           |   6 +-
>  scripts/gdb/linux/timerlist.py     |   4 -
>  21 files changed, 529 insertions(+), 358 deletions(-)

-- 
Frederic Weisbecker
SUSE Labs

      parent reply	other threads:[~2026-05-26 10:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 13:16 [PATCH 00/15 v4] tick/sched: Refactor idle cputime accounting Frederic Weisbecker
2026-05-08 13:16 ` Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 01/15] tick/sched: Fix TOCTOU in nohz idle time fetch Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 02/15] sched/idle: Handle offlining first in idle loop Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 03/15] sched/cputime: Remove superfluous and error prone kcpustat_field() parameter Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 04/15] sched/cputime: Correctly support generic vtime idle time Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 05/15] powerpc/time: Prepare to stop elapsing in dynticks-idle Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 06/15] s390/time: " Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 07/15] tick/sched: Unify idle cputime accounting Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 08/15] tick/sched: Remove nohz disabled special case in cputime fetch Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 09/15] tick/sched: Move dyntick-idle cputime accounting to cputime code Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 10/15] tick/sched: Remove unused fields Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 11/15] tick/sched: Account tickless idle cputime only when tick is stopped Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 12/15] tick/sched: Consolidate idle time fetching APIs Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 13/15] sched/cputime: Provide get_cpu_[idle|iowait]_time_us() off-case Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 14/15] sched/cputime: Handle idle irqtime gracefully Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-08 13:16 ` [PATCH 15/15] sched/cputime: Handle dyntick-idle steal time correctly Frederic Weisbecker
2026-05-08 13:16   ` Frederic Weisbecker
2026-06-02 19:30   ` [tip: timers/nohz] " tip-bot2 for Frederic Weisbecker
2026-05-26 10:42 ` Frederic Weisbecker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahV5CFMvMEKtuVct@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=Sashiko@localhost.localdomain \
    --cc=agordeev@linux.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bsegall@google.com \
    --cc=chleroy@kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jackzxcui1989@163.com \
    --cc=jan.kiszka@siemens.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kbingham@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.