All of lore.kernel.org
 help / color / mirror / Atom feed
From: Giovanni Gherdovich <ggherdovich@suse.cz>
To: Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Harald Arnesen <harald@skogtun.org>,
	Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BISECTED]: Kernel panic (was: Linux 5.7-rc2)
Date: Wed, 22 Apr 2020 11:02:30 +0200	[thread overview]
Message-ID: <1587546150.9537.84.camel@suse.cz> (raw)
In-Reply-To: <20200421212347.GV2483@worktop.programming.kicks-ass.net>

Linus, Peter:

the panic seen by Harald Arnesen (and Dave Kleikamp) is unrelated to
virtualization, and happens on all machines that have a CPU with less than 4
physical cores. It's a very serious (and very stupid) bug in the original
version of my code and is fixed by patch 2/4 "x86, sched: Account for CPUs
with less than 4 cores in freq. invariance" in the series 

https://lore.kernel.org/lkml/20200416054745.740-1-ggherdovich@suse.cz

Harald, Dave:

for peace of mind, can you please share the output of

  turbostat --interval 1 sleep 0

from your machines? This should print a long list of power management-related
information, including your CPU model and the content of MSR_TURBO_RATIO_LIMIT.
In all likelihood that MSR says that the "4 cores turbo frequency" of your CPU
is zero, and the buggy code divides by that value.

Harald:

I'll echo Linus' request of testing that the patch series linked above fixes
the problem on your machine. Since you're testing -rc kernels and bisecting
bugs I assume you're comfortable with patching and compiling kernels, but if
that is not the case I am more than happy to assist by providing either an RPM
or a DEB package, depending on the distribution you're running. Let me know.

More comments below.

On Tue, 2020-04-21 at 23:23 +0200, Peter Zijlstra wrote:
> On Tue, Apr 21, 2020 at 12:03:10PM -0700, Linus Torvalds wrote:
> > On Mon, Apr 20, 2020 at 1:52 AM Harald Arnesen <harald@skogtun.org> wrote:
> > > 
> > > Neither rc1 nor rc2 will boot on my laptop. The attached picture is all
> > > I have been able to capture.
> > 
> > I know you saw the reply about this probably being fixed by
> > 
> >   https://lore.kernel.org/lkml/20200416054745.740-1-ggherdovich@suse.cz/
> > 
> > but it would be lovely if you could actually verify that that series
> > of four patches does indeed fix it for you.
> 
> (not seeing the original report in the archives or my list copy)
> 
> I'm assuming it's some sort of dodgy virt setup, actual real proper
> hardware should never get here like that.

I think Peter is mentioning virtualization because in another patch of my
bugfix series I address a separate issue, indeed arising in hypervisors:
patch 1/4 "x86, sched: Bail out of frequency invariance if base frequency is
unknown".

The thing is that my original code does two divisions (just grep for "div" in
the patch 1567c3e3467c "x86, sched: Add support for frequency invariance").
One of them divides by zero when num_cpus < 4, the other divides by zero on
some hypervisors (basically).

Here are the incriminated statements (file arch/x86/kernel/smpboot.c):

  (1)  arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
                                       base_freq);
       /* ... stuff ... */
       
       mcnt *= arch_max_freq_ratio;
  (2)  freq_scale = div64_u64(acnt, mcnt);

Hypervisors:
If base_freq is zero, division (1) fails. This is addressed by patch 1/4 in the
bugfix series linked above. base_freq is read from MSR_PLATFORM_INFO (no BIOS
involved, it's straight from the CPU) and is the max CPU clock frequency
without counting turbo. The Intel SDM specifies this MSR's content, none of the
machines I tested said "my base clock frequency is zero", but I didn't think
of hypervisors.

Small machines:
If turbo_freq is zero, division (2) fails. This is addressed by patch 2/4
"x86, sched: Account for CPUs with less than 4 cores in freq. invariance".
I read turbo_freq from MSR_TURBO_RATIO_LIMIT (again: no BIOS involved), at a
specific offset where it should contain the clock frequency sustainable by 4
cores simultaneously. If the machines has less than for cores, this data is
rightfully zero and thus the panic that Harald Arnesen and Dave Kleikamp
observed. It's an embarrassing mistake I made, I tested on several machines
including a consumer-level Atom Airmont (Dell Wyse thin client) but all had at
least 4 cores so I didn't see it.

For the records: there is a report for the "small machines" div-by-zero bug
from a 24 core Atom P-Series (new product line from 2020), but the fix works
on that machine, see
https://lore.kernel.org/lkml/bf43772d-48e5-01d4-dd03-330110e487fa@linux.intel.com/

> 
> > Your oops is on that divide instruction:
> > 
> >         freq_scale = div64_u64(acnt, mcnt);
> > 
> > and while we had a check for mcnt not being zero earlier, we did
> > 
> >         mcnt *= arch_max_freq_ratio;
> > 
> > after that check. I could see it becoming zero either due to an
> > overflow, or due to arch_max_freq_ratio being 0.
> 
> Right, so that's not supposed to happen, as you say, we should not
> enable this code if the ratio is 0, and we should not overflow mcnt due
> to reading that reg once per tick.
> 
> But yeha, virt, anything can happen :/
> 
> > I think the first commit in that series is supposed to fix that
> > arch_max_freq_ratio being 0 case, but it still feels like the code
> > that does the divide is checking for zero in the wrong place...
> 
> Yeah, we can certainly modify that. As is, real actual hardware should
> never even hit that case either. So we might as well move that check and
> then also make it disable all this frequency scaling stuff if we ever do
> hit it.

I'm going to send the following patch. There are a number of reasons why mcnt
overflowing and landing exactly at zero is unlikely (we read MPERF once every
tick, and arch_max_freq_ratio is a small number) but it's still a real problem
(eg. tickless kernels) and the zero check is 4 lines above where it should be,
it just needs to move down.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8c89e4d9ad28..fb71395cbcad 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -2055,14 +2055,14 @@ void arch_scale_freq_tick(void)
 
        acnt = aperf - this_cpu_read(arch_prev_aperf);
        mcnt = mperf - this_cpu_read(arch_prev_mperf);
-       if (!mcnt)
-               return;
 
        this_cpu_write(arch_prev_aperf, aperf);
        this_cpu_write(arch_prev_mperf, mperf);
 
        acnt <<= 2*SCHED_CAPACITY_SHIFT;
        mcnt *= arch_max_freq_ratio;
+       if (!mcnt)
+               return;
 
        freq_scale = div64_u64(acnt, mcnt);



Giovanni

  parent reply	other threads:[~2020-04-22  9:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-19 21:58 Linux 5.7-rc2 Linus Torvalds
     [not found] ` <428bac87-b6dd-0867-c8f8-622cd606de3e@skogtun.org>
2020-04-21 19:03   ` [BISECTED]: Kernel panic (was: Linux 5.7-rc2) Linus Torvalds
2020-04-21 21:23     ` Peter Zijlstra
2020-04-21 21:30       ` Linus Torvalds
2020-04-21 21:41         ` Peter Zijlstra
2020-04-22  9:02       ` Giovanni Gherdovich [this message]
2020-04-22  9:37         ` Harald Arnesen
2020-04-22 10:22           ` Harald Arnesen
2020-04-22 11:01             ` Giovanni Gherdovich
2020-04-22 14:12         ` Dave Kleikamp
2020-04-22 14:49           ` Giovanni Gherdovich
2020-04-22  9:32     ` Harald Arnesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1587546150.9537.84.camel@suse.cz \
    --to=ggherdovich@suse.cz \
    --cc=dave.kleikamp@oracle.com \
    --cc=harald@skogtun.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.