public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: tim <xiejingfeng@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH] psi:fix divide by zero in psi_update_stats
Date: Tue, 12 Nov 2019 11:08:21 -0500	[thread overview]
Message-ID: <20191112160821.GE168812@cmpxchg.org> (raw)
In-Reply-To: <20191112154844.GD168812@cmpxchg.org>

On Tue, Nov 12, 2019 at 10:48:46AM -0500, Johannes Weiner wrote:
> On Tue, Nov 12, 2019 at 10:41:46AM -0500, Johannes Weiner wrote:
> > On Fri, Nov 08, 2019 at 03:33:24PM +0800, tim wrote:
> > > In psi_update_stats, it is possible that period has value like
> > > 0xXXXXXXXX00000000 where the lower 32 bit is 0, then it calls div_u64 which
> > > truncates u64 period to u32, results in zero divisor.
> > > Use div64_u64() instead of div_u64()  if the divisor is u64 to avoid
> > > truncation to 32-bit on 64-bit platforms.
> > > 
> > > Signed-off-by: xiejingfeng <xiejingfeng@linux.alibaba.com>
> > 
> > This is legit. When we stop the periodic averaging worker due to an
> > idle CPU, the period after restart can be much longer than the ~4 sec
> > in the lower 32 bits. See the missed_periods logic in update_averages.
> 
> Argh, that's not right. Of course I notice right after hitting send.
> 
> missed_periods are subtracted out of the difference between now and
> the last update, so period should be not much bigger than 2s.
> 
> Something else is going on here.

Tim, does this happen right after boot? I wonder if it's because we're
not initializing avg_last_update, and the initial delta between the
last update (0) and the first scheduled update (sched_clock() + 2s)
ends up bigger than 4 seconds somehow. Later on, the delta between the
last and the scheduled update should always be ~2s. But for that to
happen, it would require a pretty slow boot, or a sched_clock() that
does not start at 0.

Tim, if you have a coredump, can you extract the value of the other
variables printed in the following patch?

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 84af7aa158bf..1b6836d23091 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -374,6 +374,10 @@ static u64 update_averages(struct psi_group *group, u64 now)
 	 */
 	avg_next_update = expires + ((1 + missed_periods) * psi_period);
 	period = now - (group->avg_last_update + (missed_periods * psi_period));
+
+	WARN(period >> 32, "period=%ld now=%ld expires=%ld last=%ld missed=%ld\n",
+	     period, now, expires, group->avg_last_update, missed_periods);
+
 	group->avg_last_update = now;
 
 	for (s = 0; s < NR_PSI_STATES - 1; s++) {

And we may need something like this to make the tick initialization
more robust regardless of the reported bug here:

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 84af7aa158bf..ce8f6748678a 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -185,7 +185,8 @@ static void group_init(struct psi_group *group)
 
 	for_each_possible_cpu(cpu)
 		seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
-	group->avg_next_update = sched_clock() + psi_period;
+	group->avg_last_update = sched_clock();
+	group->avg_next_update = group->avg_last_update + psi_period;
 	INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work);
 	mutex_init(&group->avgs_lock);
 	/* Init trigger-related members */

  reply	other threads:[~2019-11-12 16:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08  7:33 [PATCH] psi:fix divide by zero in psi_update_stats tim
2019-11-08  9:31 ` Peter Zijlstra
2019-11-08  9:49   ` Jingfeng Xie
2019-11-08 10:05     ` Peter Zijlstra
2019-11-12 15:41 ` Johannes Weiner
2019-11-12 15:48   ` Johannes Weiner
2019-11-12 16:08     ` Johannes Weiner [this message]
2019-11-12 17:27       ` Suren Baghdasaryan
2019-11-29  6:37       ` Jingfeng Xie
2019-11-30  1:41         ` Suren Baghdasaryan
2019-11-12 18:33   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191112160821.GE168812@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=xiejingfeng@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox