From: Frederic Weisbecker <frederic@kernel.org>
To: Heiko Carstens <hca@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH] tick-sched: fix idle and iowait sleeptime accounting vs CPU hotplug
Date: Wed, 17 Jan 2024 01:43:41 +0100 [thread overview]
Message-ID: <ZacivexXEcL1KvOc@pavilion.home> (raw)
In-Reply-To: <20240115163555.1004144-1-hca@linux.ibm.com>
Le Mon, Jan 15, 2024 at 05:35:55PM +0100, Heiko Carstens a écrit :
> When offlining and onlining CPUs the overall reported idle and iowait
> times as reported by /proc/stat jump backward and forward:
>
> > cat /proc/stat
> cpu 132 0 176 225249 47 6 6 21 0 0
> cpu0 80 0 115 112575 33 3 4 18 0 0
> cpu1 52 0 60 112673 13 3 1 2 0 0
>
> > chcpu -d 1
> > cat /proc/stat
> cpu 133 0 177 226681 47 6 6 21 0 0
> cpu0 80 0 116 113387 33 3 4 18 0 0
>
> > chcpu -e 1
> > cat /proc/stat
> cpu 133 0 178 114431 33 6 6 21 0 0 <---- jump backward
> cpu0 80 0 116 114247 33 3 4 18 0 0
> cpu1 52 0 61 183 0 3 1 2 0 0 <---- idle + iowait start with 0
>
> > chcpu -d 1
> > cat /proc/stat
> cpu 133 0 178 228956 47 6 6 21 0 0 <---- jump forward
> cpu0 81 0 117 114929 33 3 4 18 0 0
>
> Reason for this is that get_idle_time() in fs/proc/stat.c has different
> sources for both values depending on if a CPU is online or offline:
>
> - if a CPU is online the values may be taken from its per cpu
> tick_cpu_sched structure
>
> - if a CPU is offline the values are taken from its per cpu cpustat
> structure
>
> The problem is that the per cpu tick_cpu_sched structure is set to zero on
> CPU offline. See tick_cancel_sched_timer() in kernel/time/tick-sched.c.
>
> Therefore when a CPU is brought offline and online afterwards both its idle
> and iowait sleeptime will be zero, causing a jump backward in total system
> idle and iowait sleeptime. In a similar way if a CPU is then brought
> offline again the total idle and iowait sleeptimes will jump forward.
>
> It looks like this behavior was introduced with commit 4b0c0f294f60
> ("tick: Cleanup NOHZ per cpu data on cpu down").
>
> This was only noticed now on s390, since we switched to generic idle time
> reporting with commit be76ea614460 ("s390/idle: remove arch_cpu_idle_time()
> and corresponding code").
>
> Fix this by preserving the values of idle_sleeptime and iowait_sleeptime
> members of the per-cpu tick_sched structure on CPU hotplug.
>
> Fixes: 4b0c0f294f60 ("tick: Cleanup NOHZ per cpu data on cpu down")
> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> ---
> kernel/time/tick-sched.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index a17d26002831..d2501673028d 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1576,13 +1576,18 @@ void tick_setup_sched_timer(void)
> void tick_cancel_sched_timer(int cpu)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> + ktime_t idle_sleeptime, iowait_sleeptime;
>
> # ifdef CONFIG_HIGH_RES_TIMERS
> if (ts->sched_timer.base)
> hrtimer_cancel(&ts->sched_timer);
> # endif
>
> + idle_sleeptime = ts->idle_sleeptime;
> + iowait_sleeptime = ts->iowait_sleeptime;
> memset(ts, 0, sizeof(*ts));
> + ts->idle_sleeptime = idle_sleeptime;
> + ts->iowait_sleeptime = iowait_sleeptime;
And this is safe because it is in global stop machine. So we are
guaranteed that nobody sees the transitionning state. In the worst
case ts->idle_sleeptime_seq is observed as changed to 0 in read_seqcount_retry()
and the values are simply fetched again.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
This makes me think that we should always use cpustat[CPUTIME_IDLE] instead of
maintaining this separate ts->idle_sleeptime field. kcpustat even has a seqcount
that would make ts->idle_sleeptime_seq obsolete. Then the tick based idle accounting
could disappear on nohz, along with a few hacks. Instead of that we are
currently maintaining two different idle accounting that are roughly the same.
But anyway this is all a different story, just mumbling to myself for the next
nohz cleanups.
Thanks!
next prev parent reply other threads:[~2024-01-17 0:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-15 16:35 [PATCH] tick-sched: fix idle and iowait sleeptime accounting vs CPU hotplug Heiko Carstens
2024-01-17 0:43 ` Frederic Weisbecker [this message]
2024-01-19 15:47 ` [tip: timers/core] tick-sched: Fix " tip-bot2 for Heiko Carstens
2024-01-22 18:19 ` [PATCH] tick-sched: fix " Tim Chen
2024-01-22 22:31 ` Frederic Weisbecker
2024-01-22 23:33 ` Tim Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZacivexXEcL1KvOc@pavilion.home \
--to=frederic@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.