From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>,
Larry Finger <Larry.Finger@lwfinger.net>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
LKML <linux-kernel@vger.kernel.org>,
Linux PM list <linux-pm@vger.kernel.org>
Subject: Re: Regression in 4.8 - CPU speed set very low
Date: Mon, 26 Sep 2016 14:46:24 -0700 [thread overview]
Message-ID: <1474926384.35049.128.camel@linux.intel.com> (raw)
In-Reply-To: <CAJZ5v0iEchM2ye==020gLj2bZc3ddCFa29jQZh6c+9X+OhfcoQ@mail.gmail.com>
On Mon, 2016-09-26 at 23:37 +0200, Rafael J. Wysocki wrote:
> On Mon, Sep 26, 2016 at 11:28 PM, Larry Finger
> <Larry.Finger@lwfinger.net> wrote:
> >
> > On 09/26/2016 04:06 PM, Rafael J. Wysocki wrote:
> > >
> > >
> > > On Monday, September 26, 2016 11:15:45 AM Larry Finger wrote:
> > > >
> > > >
> > > > On 09/26/2016 06:37 AM, Rafael J. Wysocki wrote:
> > > > >
> > > > >
> > > > > On Friday, September 23, 2016 09:45:09 PM Larry Finger wrote:
> > > > > >
> > > > > >
> > > > > > On 09/18/2016 09:54 PM, Larry Finger wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 09/14/2016 11:00 AM, Larry Finger wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 09/09/2016 12:39 PM, Larry Finger wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I have found a regression in kernel 4.8-rc2 that
> > > > > > > > > causes the speed of
> > > > > > > > > my laptop
> > > > > > > > > with an Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz to
> > > > > > > > > suddenly have a
> > > > > > > > > maximum cpu
> > > > > > > > > frequency of ~400 MHz. Unfortunately, I do not know
> > > > > > > > > how to trigger
> > > > > > > > > this problem,
> > > > > > > > > thus a bisection is not possible. It usually happens
> > > > > > > > > under heavy
> > > > > > > > > load, such as a
> > > > > > > > > kernel build or the RPM build of VirtualBox, but it
> > > > > > > > > does not always
> > > > > > > > > fail with
> > > > > > > > > these loads. In my most recent failure, 'hwinfo --
> > > > > > > > > cpu' reports cpu
> > > > > > > > > MHz of
> > > > > > > > > 396.130 for #3. The bogomips value is 5787.73, and
> > > > > > > > > the cpu clock
> > > > > > > > > before the
> > > > > > > > > fault is 3437 MHz. Nothing is logged when this
> > > > > > > > > happens.
> > > > > > > > >
> > > > > > > > > If I were to get a patch that would show a backtrace
> > > > > > > > > when the
> > > > > > > > > maximum CPU
> > > > > > > > > frequency is changed, perhaps it would be possible to
> > > > > > > > > track this
> > > > > > > > > bug.
> > > > > > > >
> > > > > > > >
> > > > > > > > I have not yet found the bad commit, but I have reduced
> > > > > > > > the range of
> > > > > > > > commits a
> > > > > > > > bit. This bug has been difficult to trigger. So far, it
> > > > > > > > has not taken
> > > > > > > > over 1/2
> > > > > > > > day to appear in bad kernels, thus I am allowing three
> > > > > > > > days before
> > > > > > > > deciding that
> > > > > > > > a given trial is good. I never saw the problem with 4.7
> > > > > > > > kernels, but
> > > > > > > > I did in
> > > > > > > > 4.8-rc1. I also know that it appeared before commit
> > > > > > > > 581e0cd. Commit
> > > > > > > > 1b05cf6 did
> > > > > > > > not show the bug.
> > > > > > > >
> > > > > > > > Testing continues.
> > > > > > >
> > > > > > >
> > > > > > > And still does. My bisection seemed to be trending toward
> > > > > > > an
> > > > > > > improbable set of
> > > > > > > commits, and I needed to do some other work with the
> > > > > > > machine, thus I
> > > > > > > started
> > > > > > > running 4.8-rc6. It failed nearly 48 hours after the
> > > > > > > reboot, which
> > > > > > > indicated
> > > > > > > that using 3 days to indicate a "good" trial was likely
> > > > > > > too short. I
> > > > > > > am
> > > > > > > currently testing the first of the trial and will run it
> > > > > > > for at least
> > > > > > > a week. It
> > > > > > > is unlikely that these tests will be complete before 4,8
> > > > > > > is released,
> > > > > > > even if
> > > > > > > -rc8 is needed. I will keep attempting to find the faulty
> > > > > > > commit.
> > > > > >
> > > > > >
> > > > > > My debugging continues. After 7 days of beating on commit
> > > > > > f7816ad, I
> > > > > > have
> > > > > > concluded that it is likely good. Thus I think the bug lies
> > > > > > between
> > > > > > commit
> > > > > > 581e0cd (bad) and f7816ad (good). I will need to do a long
> > > > > > test on
> > > > > > commit
> > > > > > 1b05cf6, which did not fail with a shorter run.
> > > > >
> > > > >
> > > > > 581e0cd is not a valid mainline commit hash AFAICS.
> > > >
> > > >
> > > > That was a typo. The correct value is 581e0c7.
> > > > >
> > > > >
> > > > >
> > > > > What cpufreq driver do you use?
> > > >
> > > >
> > > > My "Default CPUFreq governor" is on demand.
> > > >
> > > > Running the command 'egrep -r "CPU_FREQ|CPUFREQ" .config'
> > > > results in
> > > >
> > > > CONFIG_ACPI_CPU_FREQ_PSS=y
> > > > CONFIG_CPU_FREQ=y
> > > > CONFIG_CPU_FREQ_GOV_ATTR_SET=y
> > > > CONFIG_CPU_FREQ_GOV_COMMON=y
> > > > # CONFIG_CPU_FREQ_STAT is not set
> > > > # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
> > > > # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
> > > > # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
> > > > CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> > > > # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
> > > > # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
> > > > CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> > > > CONFIG_CPU_FREQ_GOV_POWERSAVE=m
> > > > CONFIG_CPU_FREQ_GOV_USERSPACE=m
> > > > CONFIG_CPU_FREQ_GOV_ONDEMAND=y
> > > > CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m
> > > > # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set
> > > > CONFIG_X86_PCC_CPUFREQ=m
> > > > CONFIG_X86_ACPI_CPUFREQ=m
> > > > CONFIG_X86_ACPI_CPUFREQ_CPB=y
> > > >
> > > > Commit 1b05cf6 did fail on longer testing, thus my bisection
> > > > had ended up
> > > > going
> > > > wrong. Further tests have shown that commit 351a4ded is bad.
> > > > Once again,
> > > > by
> > > > bisection seems to be converging to a set of commits that seem
> > > > unlikely
> > > > to cause
> > > > this problem. Perhaps commit f7816ad is not really good even
> > > > though it
> > > > survived
> > > > 7 days of heavy CPU usage.
> > > >
> > > > I have been reluctant to post my entire .config on the list. It
> > > > is
> > > > available at
> > > > http://pastebin.com/aMZaAKwL.
> > >
> > >
> > > If the governor is ondemand, the driver is acpi-cpufreq, most
> > > likely.
> > >
> > > How do you measure the frequency?
> >
> >
> > Mostly I use a KDE applet named "System load" and look at the
> > "average
> > clock", but the same info is also available in /proc/cpuinfo as
> > "cpu MHz".
> > When the bug triggers, the system gets very slow, and the cpu fan
> > stops even
> > though the cpu is still busy.
>
> That sounds like thermal throttling kicking in.
>
This will help to know, if there is thermal throttle from OS.
# cat /sys/devices/system/cpu/cpufreq/policy?/scaling_max_freq
# grep -r . /sys/class/thermal/thermal_zone*
Thanks,
Srinivas
next prev parent reply other threads:[~2016-09-26 21:46 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-09 17:39 Regression in 4.8 - CPU speed set very low Larry Finger
2016-09-14 16:00 ` Larry Finger
2016-09-19 2:54 ` Larry Finger
2016-09-24 2:45 ` Larry Finger
2016-09-26 11:37 ` Rafael J. Wysocki
2016-09-26 16:15 ` Larry Finger
2016-09-26 21:06 ` Rafael J. Wysocki
2016-09-26 21:26 ` Srinivas Pandruvada
2016-09-26 21:30 ` Rafael J. Wysocki
2016-09-26 21:41 ` Srinivas Pandruvada
2016-09-26 21:46 ` Rafael J. Wysocki
2016-09-26 21:28 ` Larry Finger
2016-09-26 21:37 ` Rafael J. Wysocki
2016-09-26 21:46 ` Srinivas Pandruvada [this message]
2016-09-26 22:15 ` Larry Finger
2016-09-26 22:09 ` Larry Finger
2016-09-26 22:16 ` Rafael J. Wysocki
2016-09-26 23:53 ` Larry Finger
2016-09-27 0:21 ` Rafael J. Wysocki
2016-09-27 0:48 ` Larry Finger
2016-09-27 1:30 ` Srinivas Pandruvada
2016-09-27 2:53 ` Larry Finger
2016-09-27 3:12 ` Doug Smythies
2016-09-27 8:48 ` Larry Finger
2016-09-27 11:46 ` Rafael J. Wysocki
2016-09-29 2:22 ` Larry Finger
2016-09-29 12:19 ` Rafael J. Wysocki
2016-09-29 15:09 ` Larry Finger
2016-09-29 15:56 ` Srinivas Pandruvada
2016-09-29 16:24 ` Larry Finger
2016-09-29 20:52 ` Rafael J. Wysocki
2016-09-30 21:52 ` Larry Finger
2016-09-27 14:51 ` Lennart Sorensen
2016-09-29 2:26 ` Larry Finger
2016-09-29 14:17 ` Lennart Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1474926384.35049.128.camel@linux.intel.com \
--to=srinivas.pandruvada@linux.intel.com \
--cc=Larry.Finger@lwfinger.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.