public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Giovanni Gherdovich <ggherdovich@suse.cz>
To: Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Harald Arnesen <harald@skogtun.org>,
	Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BISECTED]: Kernel panic (was: Linux 5.7-rc2)
Date: Wed, 22 Apr 2020 11:02:30 +0200	[thread overview]
Message-ID: <1587546150.9537.84.camel@suse.cz> (raw)
In-Reply-To: <20200421212347.GV2483@worktop.programming.kicks-ass.net>

Linus, Peter:

the panic seen by Harald Arnesen (and Dave Kleikamp) is unrelated to
virtualization, and happens on all machines that have a CPU with less than 4
physical cores. It's a very serious (and very stupid) bug in the original
version of my code and is fixed by patch 2/4 "x86, sched: Account for CPUs
with less than 4 cores in freq. invariance" in the series 

https://lore.kernel.org/lkml/20200416054745.740-1-ggherdovich@suse.cz

Harald, Dave:

for peace of mind, can you please share the output of

  turbostat --interval 1 sleep 0

from your machines? This should print a long list of power management-related
information, including your CPU model and the content of MSR_TURBO_RATIO_LIMIT.
In all likelihood that MSR says that the "4 cores turbo frequency" of your CPU
is zero, and the buggy code divides by that value.

Harald:

I'll echo Linus' request of testing that the patch series linked above fixes
the problem on your machine. Since you're testing -rc kernels and bisecting
bugs I assume you're comfortable with patching and compiling kernels, but if
that is not the case I am more than happy to assist by providing either an RPM
or a DEB package, depending on the distribution you're running. Let me know.

More comments below.

On Tue, 2020-04-21 at 23:23 +0200, Peter Zijlstra wrote:
> On Tue, Apr 21, 2020 at 12:03:10PM -0700, Linus Torvalds wrote:
> > On Mon, Apr 20, 2020 at 1:52 AM Harald Arnesen <harald@skogtun.org> wrote:
> > > 
> > > Neither rc1 nor rc2 will boot on my laptop. The attached picture is all
> > > I have been able to capture.
> > 
> > I know you saw the reply about this probably being fixed by
> > 
> >   https://lore.kernel.org/lkml/20200416054745.740-1-ggherdovich@suse.cz/
> > 
> > but it would be lovely if you could actually verify that that series
> > of four patches does indeed fix it for you.
> 
> (not seeing the original report in the archives or my list copy)
> 
> I'm assuming it's some sort of dodgy virt setup, actual real proper
> hardware should never get here like that.

I think Peter is mentioning virtualization because in another patch of my
bugfix series I address a separate issue, indeed arising in hypervisors:
patch 1/4 "x86, sched: Bail out of frequency invariance if base frequency is
unknown".

The thing is that my original code does two divisions (just grep for "div" in
the patch 1567c3e3467c "x86, sched: Add support for frequency invariance").
One of them divides by zero when num_cpus < 4, the other divides by zero on
some hypervisors (basically).

Here are the incriminated statements (file arch/x86/kernel/smpboot.c):

  (1)  arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
                                       base_freq);
       /* ... stuff ... */
       
       mcnt *= arch_max_freq_ratio;
  (2)  freq_scale = div64_u64(acnt, mcnt);

Hypervisors:
If base_freq is zero, division (1) fails. This is addressed by patch 1/4 in the
bugfix series linked above. base_freq is read from MSR_PLATFORM_INFO (no BIOS
involved, it's straight from the CPU) and is the max CPU clock frequency
without counting turbo. The Intel SDM specifies this MSR's content, none of the
machines I tested said "my base clock frequency is zero", but I didn't think
of hypervisors.

Small machines:
If turbo_freq is zero, division (2) fails. This is addressed by patch 2/4
"x86, sched: Account for CPUs with less than 4 cores in freq. invariance".
I read turbo_freq from MSR_TURBO_RATIO_LIMIT (again: no BIOS involved), at a
specific offset where it should contain the clock frequency sustainable by 4
cores simultaneously. If the machines has less than for cores, this data is
rightfully zero and thus the panic that Harald Arnesen and Dave Kleikamp
observed. It's an embarrassing mistake I made, I tested on several machines
including a consumer-level Atom Airmont (Dell Wyse thin client) but all had at
least 4 cores so I didn't see it.

For the records: there is a report for the "small machines" div-by-zero bug
from a 24 core Atom P-Series (new product line from 2020), but the fix works
on that machine, see
https://lore.kernel.org/lkml/bf43772d-48e5-01d4-dd03-330110e487fa@linux.intel.com/

> 
> > Your oops is on that divide instruction:
> > 
> >         freq_scale = div64_u64(acnt, mcnt);
> > 
> > and while we had a check for mcnt not being zero earlier, we did
> > 
> >         mcnt *= arch_max_freq_ratio;
> > 
> > after that check. I could see it becoming zero either due to an
> > overflow, or due to arch_max_freq_ratio being 0.
> 
> Right, so that's not supposed to happen, as you say, we should not
> enable this code if the ratio is 0, and we should not overflow mcnt due
> to reading that reg once per tick.
> 
> But yeha, virt, anything can happen :/
> 
> > I think the first commit in that series is supposed to fix that
> > arch_max_freq_ratio being 0 case, but it still feels like the code
> > that does the divide is checking for zero in the wrong place...
> 
> Yeah, we can certainly modify that. As is, real actual hardware should
> never even hit that case either. So we might as well move that check and
> then also make it disable all this frequency scaling stuff if we ever do
> hit it.

I'm going to send the following patch. There are a number of reasons why mcnt
overflowing and landing exactly at zero is unlikely (we read MPERF once every
tick, and arch_max_freq_ratio is a small number) but it's still a real problem
(eg. tickless kernels) and the zero check is 4 lines above where it should be,
it just needs to move down.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8c89e4d9ad28..fb71395cbcad 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -2055,14 +2055,14 @@ void arch_scale_freq_tick(void)
 
        acnt = aperf - this_cpu_read(arch_prev_aperf);
        mcnt = mperf - this_cpu_read(arch_prev_mperf);
-       if (!mcnt)
-               return;
 
        this_cpu_write(arch_prev_aperf, aperf);
        this_cpu_write(arch_prev_mperf, mperf);
 
        acnt <<= 2*SCHED_CAPACITY_SHIFT;
        mcnt *= arch_max_freq_ratio;
+       if (!mcnt)
+               return;
 
        freq_scale = div64_u64(acnt, mcnt);



Giovanni

  parent reply	other threads:[~2020-04-22  9:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-19 21:58 Linux 5.7-rc2 Linus Torvalds
     [not found] ` <428bac87-b6dd-0867-c8f8-622cd606de3e@skogtun.org>
2020-04-21 19:03   ` [BISECTED]: Kernel panic (was: Linux 5.7-rc2) Linus Torvalds
2020-04-21 21:23     ` Peter Zijlstra
2020-04-21 21:30       ` Linus Torvalds
2020-04-21 21:41         ` Peter Zijlstra
2020-04-22  9:02       ` Giovanni Gherdovich [this message]
2020-04-22  9:37         ` Harald Arnesen
2020-04-22 10:22           ` Harald Arnesen
2020-04-22 11:01             ` Giovanni Gherdovich
2020-04-22 14:12         ` Dave Kleikamp
2020-04-22 14:49           ` Giovanni Gherdovich
2020-04-22  9:32     ` Harald Arnesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1587546150.9537.84.camel@suse.cz \
    --to=ggherdovich@suse.cz \
    --cc=dave.kleikamp@oracle.com \
    --cc=harald@skogtun.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox