From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5425288C25; Fri, 16 Jan 2026 14:52:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768575140; cv=none; b=VYJPC4Mo76SxI3o7rLA36mvKrx9ccxJjCmLpbxnIAGHhXIM31p9DD6WERbyeYrAnXWIfrQLb40pxVDbiXu2juAR2F3Ev276svoMoSosgoZMe7BVaq3StaG50N6o7e4vc8GpnduOqOPT8UwSQ9+Oc9R+RHZcTb8SM26/3lmA9zcw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768575140; c=relaxed/simple; bh=gbLwhTlTxKSDHU4mEFy0F6J11rxQq3aj/1R8qTJjzyc=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=i211uytNOcqMrCtNTgE5ar4Bk6G4lX3/POszeypLvMkirGyLIO0MkxuCDmGHfMzILpmYMkusj0h7x9OiGZ4BkZ0K0ZQtTNngaDw2i0AGLZKh/ht/m4H1c0COgiWELywX2KjuYCIkVPHeZVXcxQqjrZ3nxhOGVyLXmBzUrVv9FLE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i2BL6MXw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i2BL6MXw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47D47C116C6; Fri, 16 Jan 2026 14:52:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1768575140; bh=gbLwhTlTxKSDHU4mEFy0F6J11rxQq3aj/1R8qTJjzyc=; h=From:To:Cc:Subject:Date:From; b=i2BL6MXwnFLjT2RhaNjQkDJ02sMztAuNhOvwFzUbF3otqJF3SfbbgB9eMibM19xq2 bl1V6lTECCDYQTxih6izshj472Xyceb6C1+4TOOjIJxAqlpbuCjr5t+V2+cGKaEatc Wj07CvKvNHrmovrX/+GyuwbAGSFHzoZSbCANmgbmsXkWMCLPNffc6mDP9obUn0+xDl hZNSDhaNT06745nzkKGJVPIsNwvEnYCF1DzQul/MFByYC+7wYZNnvoLWW7PD1gxx9D ECWqfjSk3kBoXmsMgIw5RcnD+dTsl1/2cAZcYe/+AeP4owg9GDKzlQmthVBS0V0MRF fb5r5qUkbrROA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , "Rafael J . Wysocki" , Boqun Feng , Thomas Gleixner , Steven Rostedt , "Christophe Leroy (CS GROUP)" , Kieran Bingham , Ben Segall , Michael Ellerman , Ingo Molnar , Vincent Guittot , Juri Lelli , Neeraj Upadhyay , Xin Zhao , Madhavan Srinivasan , Mel Gorman , Valentin Schneider , Christian Borntraeger , Jan Kiszka , linuxppc-dev@lists.ozlabs.org, "Paul E . McKenney" , Viresh Kumar , Anna-Maria Behnsen , Uladzislau Rezki , Dietmar Eggemann , Heiko Carstens , linux-pm@vger.kernel.org, Alexander Gordeev , Sven Schnelle , Vasily Gorbik , Joel Fernandes , Nicholas Piggin , linux-s390@vger.kernel.org, Peter Zijlstra Subject: [PATCH 00/15] tick/sched: Refactor idle cputime accounting Date: Fri, 16 Jan 2026 15:51:53 +0100 Message-ID: <20260116145208.87445-1-frederic@kernel.org> X-Mailer: git-send-email 2.51.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Hi, After the issue reported here: https://lore.kernel.org/all/20251210083135.3993562-1-jackzxcui1989@163.com/ It occurs that the idle cputime accounting is a big mess that accumulates within two concurrent statistics, each having their own shortcomings: * The accounting for online CPUs which is based on the delta between tick_nohz_start_idle() and tick_nohz_stop_idle(). Pros: - Works when the tick is off - Has nsecs granularity Cons: - Account idle steal time but doesn't substract it from idle cputime. - Assumes CONFIG_IRQ_TIME_ACCOUNTING by not accounting IRQs but the IRQ time is simply ignored when CONFIG_IRQ_TIME_ACCOUNTING=n - The windows between 1) idle task scheduling and the first call to tick_nohz_start_idle() and 2) idle task between the last tick_nohz_stop_idle() and the rest of the idle time are blindspots wrt. cputime accounting (though mostly insignificant amount) - Relies on private fields outside of kernel stats, with specific accessors. * The accounting for offline CPUs which is based on ticks and the jiffies delta during which the tick was stopped. Pros: - Handles steal time correctly - Handle CONFIG_IRQ_TIME_ACCOUNTING=y and CONFIG_IRQ_TIME_ACCOUNTING=n correctly. - Handles the whole idle task - Accounts directly to kernel stats, without midlayer accumulator. Cons: - Doesn't elapse when the tick is off, which doesn't make it suitable for online CPUs. - Has TICK_NSEC granularity (jiffies) - Needs to track the dyntick-idle ticks that were accounted and substract them from the total jiffies time spent while the tick was stopped. This is an ugly workaround. Having two different accounting for a single context is not the only problem: since those accountings are of different natures, it is possible to observe the global idle time going backward after a CPU goes offline, as reported by Xin Zhao. Clean up the situation with introducing a hybrid approach that stays coherent, fixes the backward jumps and works for both online and offline CPUs: * Tick based or native vtime accounting operate before the tick is stopped and resumes once the tick is restarted. * When the idle loop starts, switch to dynticks-idle accounting as is done currently, except that the statistics accumulate directly to the relevant kernel stat fields. * Private dyntick cputime accounting fields are removed. * Works on both online and offline case. * Move most of the relevant code to the common sched/cputime subsystem * Handle CONFIG_IRQ_TIME_ACCOUNTING=n correctly such that the dynticks-idle accounting still elapses while on IRQs. * Correctly substract idle steal cputime from idle time git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git timers/core HEAD: 6a3d814ef2f6142714bef862be36def5ca4c9d96 Thanks, Frederic --- Frederic Weisbecker (15): sched/idle: Handle offlining first in idle loop sched/cputime: Remove superfluous and error prone kcpustat_field() parameter sched/cputime: Correctly support generic vtime idle time powerpc/time: Prepare to stop elapsing in dynticks-idle s390/time: Prepare to stop elapsing in dynticks-idle tick/sched: Unify idle cputime accounting cpufreq: ondemand: Simplify idle cputime granularity test tick/sched: Remove nohz disabled special case in cputime fetch tick/sched: Move dyntick-idle cputime accounting to cputime code tick/sched: Remove unused fields tick/sched: Account tickless idle cputime only when tick is stopped tick/sched: Consolidate idle time fetching APIs sched/cputime: Consolidate get_cpu_[idle|iowait]_time_us() sched/cputime: Handle idle irqtime gracefully sched/cputime: Handle dyntick-idle steal time correctly arch/powerpc/kernel/time.c | 41 +++++ arch/s390/include/asm/idle.h | 11 +- arch/s390/kernel/idle.c | 13 +- arch/s390/kernel/vtime.c | 57 ++++++- drivers/cpufreq/cpufreq.c | 29 +--- drivers/cpufreq/cpufreq_governor.c | 6 +- drivers/cpufreq/cpufreq_ondemand.c | 7 +- drivers/macintosh/rack-meter.c | 2 +- fs/proc/stat.c | 40 +---- fs/proc/uptime.c | 8 +- include/linux/kernel_stat.h | 76 ++++++++-- include/linux/tick.h | 4 - include/linux/vtime.h | 20 ++- kernel/rcu/tree.c | 9 +- kernel/rcu/tree_stall.h | 7 +- kernel/sched/cputime.c | 302 +++++++++++++++++++++++++++++++------ kernel/sched/idle.c | 11 +- kernel/sched/sched.h | 1 + kernel/time/tick-sched.c | 203 +++++-------------------- kernel/time/tick-sched.h | 12 -- kernel/time/timer_list.c | 6 +- scripts/gdb/linux/timerlist.py | 4 - 22 files changed, 505 insertions(+), 364 deletions(-)