From: Ingo Molnar <mingo@kernel.org>
To: Xin Zhao <jackzxcui1989@163.com>
Cc: anna-maria@linutronix.de, frederic@kernel.org,
tglx@linutronix.de, kuba@kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 2/2] timers/nohz: Avoid /proc/stat idle/iowait fluctuation when cpu hotplug
Date: Mon, 1 Dec 2025 22:17:02 +0100 [thread overview]
Message-ID: <aS4FztjNAwVNfoUk@gmail.com> (raw)
In-Reply-To: <20251129133526.1460119-3-jackzxcui1989@163.com>
* Xin Zhao <jackzxcui1989@163.com> wrote:
> The idle and iowait statistics in /proc/stat are obtained through
> get_idle_time and get_iowait_time. Assuming CONFIG_NO_HZ_COMMON is
> enabled, when CPU is online, the idle and iowait values use the
> idle_sleeptime and iowait_sleeptime statistics from tick_cpu_sched, but
> use CPUTIME_IDLE and CPUTIME_IOWAIT items from kernel_cpustat when CPU
> is offline. Although /proc/stat do not print statistics of offline CPU,
> it still print aggregated statistics for all possible CPUs.
> tick_cpu_sched and kernel_cpustat are maintained by different logic,
> leading to a significant gap. The first line of the data below shows the
> /proc/stat output when only one CPU remains after CPU offline, the second
> line shows the /proc/stat output after all CPUs are brought back online:
>
> cpu 2408558 2 916619 4275883 5403 123758 64685 0 0 0
> cpu 2408588 2 916693 4200737 4184 123762 64686 0 0 0
Yeah, that outlier indeed looks suboptimal, and there's
very little user-space tooling can do to detect it. I
think your suggestion, to use the 'frozen' values of an
offline CPU, might as well be the right approach.
What value is printed if the CPU was never online, is
it properly initialized to zero?
> Obviously, other values do not experience significant fluctuations, while
> idle/iowait statistics show a substantial decrease, which make system CPU
> monitoring troublesome.
> Introduce get_cpu_idle_time_us_raw and get_cpu_iowait_time_us_raw, so that
> /proc/stat logic can use them to get the last raw value of idle_sleeptime
> and iowait_sleeptime from tick_cpu_sched without any calculation when CPU
> is offline. It avoids /proc/stat idle/iowait fluctuation when cpu hotplug.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
> ---
> fs/proc/stat.c | 4 ++++
> include/linux/tick.h | 4 ++++
> kernel/time/tick-sched.c | 46 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 54 insertions(+)
>
> diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> index 8b444e862..de13a2e1c 100644
> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -28,6 +28,8 @@ u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
>
> if (cpu_online(cpu))
> idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> + else
> + idle_usecs = get_cpu_idle_time_us_raw(cpu);
>
> if (idle_usecs == -1ULL)
> /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> @@ -44,6 +46,8 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
>
> if (cpu_online(cpu))
> iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
> + else
> + iowait_usecs = get_cpu_iowait_time_us_raw(cpu);
So why not just use the get_cpu_idle_time_us() and
get_cpu_iowait_time_us() values unconditionally, for
all possible_cpus?
The raw/non-raw distinction makes very little sense in
this context, the read_seqlock_retry loop will always
succeed after a single step (because there are no
writers), so the behavior of the full get_cpu_idle/iowait_time_us()
functions should be close to the _raw() variants.
Patch would be much simpler that way.
Thanks,
Ingo
next prev parent reply other threads:[~2025-12-01 21:17 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-29 13:35 [PATCH 0/2] Optimize /proc/stat idle/iowait fluctuation Xin Zhao
2025-11-29 13:35 ` [PATCH 1/2] timers/nohz: Revise a comment about broken iowait counter update race Xin Zhao
2025-11-29 13:35 ` [PATCH 2/2] timers/nohz: Avoid /proc/stat idle/iowait fluctuation when cpu hotplug Xin Zhao
2025-12-01 21:17 ` Ingo Molnar [this message]
2025-12-04 18:21 ` Xin Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS4FztjNAwVNfoUk@gmail.com \
--to=mingo@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=frederic@kernel.org \
--cc=jackzxcui1989@163.com \
--cc=kuba@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).