* 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 19:33 Manuel Krause
2014-03-07 20:55 ` Guenter Roeck
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-07 19:33 UTC (permalink / raw)
To: linux-kernel, linux-pm
Please have a short look at the following BUG report + the
comments -- this message here is a kind of FWD-ing it:
https://bugs.archlinux.org/task/39005
I came late to test kernel 3.13 with the .5 one, as it was the
time that the related -CK/BFS patch became available.
I'm not using Archlinux, but openSUSE, and my problems are quite
the same. Especially these with smelling melting plastics.
My own reports went to Con Kolivas' Blog first:
"I get weird temperatures and abrupt 100% fan actions with
vanilla 3.13.5 with this CK and most recent BFQ at my HP Notebook.
In gkrellm the highest T had been @74°C, so far (3.12.13), and is
now growing to 94°C. Then, the fan goes to 100% for 10~30secs
cooling it to approx. 82°C.
That is not good, if I compare 74 to 94 °C.
Have I missed a .CONFIG option for 3.13, especially?"
I'd get the same without (Con's && BFQ's) patches.
Machine: HP Notebook with Core2Duo CPU (Penryn)
Distro: openSUSE 13.1, 64bit, continuously updated
Desktop: KDE 4.12.3
MESA & drm & Xorg: most recent ones from:
http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
Current kernel: 3.13.6 vanilla from openSUSE repos, with
-ck1 and BFQ patches
Same behaviour: without these patches
Last good kernel: 3.12.13 vanilla + CK2 + BFQ
Please, _always_CC_me_ -- as I'm not on the linux-kernel /
linux-pm mailing lists.
And please, if you know any person in charge of this -- lead this
message to him/her.
Thank you in advance and best regards,
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
@ 2014-03-07 20:55 ` Guenter Roeck
2014-03-07 22:04 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-07 20:55 UTC (permalink / raw)
To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors
On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> Please have a short look at the following BUG report + the comments
> -- this message here is a kind of FWD-ing it:
> https://bugs.archlinux.org/task/39005
>
> I came late to test kernel 3.13 with the .5 one, as it was the time
> that the related -CK/BFS patch became available.
>
> I'm not using Archlinux, but openSUSE, and my problems are quite the
> same. Especially these with smelling melting plastics.
>
> My own reports went to Con Kolivas' Blog first:
> "I get weird temperatures and abrupt 100% fan actions with vanilla
> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> cooling it to approx. 82°C.
> That is not good, if I compare 74 to 94 °C.
> Have I missed a .CONFIG option for 3.13, especially?"
>
> I'd get the same without (Con's && BFQ's) patches.
>
> Machine: HP Notebook with Core2Duo CPU (Penryn)
> Distro: openSUSE 13.1, 64bit, continuously updated
> Desktop: KDE 4.12.3
> MESA & drm & Xorg: most recent ones from:
> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
>
> Current kernel: 3.13.6 vanilla from openSUSE repos, with
> -ck1 and BFQ patches
> Same behaviour: without these patches
>
> Last good kernel: 3.12.13 vanilla + CK2 + BFQ
>
Can you add more information about your fan control policy ?
Do you rely on the hardware for automatic fan speed control,
or do you run the fancontrol script ?
What is the output from the 'sensors' command ?
Thanks,
Guenter
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-07 20:55 ` Guenter Roeck
@ 2014-03-07 22:04 ` Manuel Krause
2014-03-07 22:52 ` Guenter Roeck
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-07 22:04 UTC (permalink / raw)
To: Guenter Roeck; +Cc: linux-kernel, linux-pm, lm-sensors
On 2014-03-07 21:55, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
>> Please have a short look at the following BUG report + the comments
>> -- this message here is a kind of FWD-ing it:
>> https://bugs.archlinux.org/task/39005
>>
>> I came late to test kernel 3.13 with the .5 one, as it was the time
>> that the related -CK/BFS patch became available.
>>
>> I'm not using Archlinux, but openSUSE, and my problems are quite the
>> same. Especially these with smelling melting plastics.
>>
>> My own reports went to Con Kolivas' Blog first:
>> "I get weird temperatures and abrupt 100% fan actions with vanilla
>> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
>> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
>> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
>> cooling it to approx. 82°C.
>> That is not good, if I compare 74 to 94 °C.
>> Have I missed a .CONFIG option for 3.13, especially?"
>>
>> I'd get the same without (Con's && BFQ's) patches.
>>
>> Machine: HP Notebook with Core2Duo CPU (Penryn)
>> Distro: openSUSE 13.1, 64bit, continuously updated
>> Desktop: KDE 4.12.3
>> MESA & drm & Xorg: most recent ones from:
>> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
>>
>> Current kernel: 3.13.6 vanilla from openSUSE repos, with
>> -ck1 and BFQ patches
>> Same behaviour: without these patches
>>
>> Last good kernel: 3.12.13 vanilla + CK2 + BFQ
>>
>
> Can you add more information about your fan control policy ?
> Do you rely on the hardware for automatic fan speed control,
> or do you run the fancontrol script ?
>
> What is the output from the 'sensors' command ?
>
> Thanks,
> Guenter
>
Hi, and thanks for the quick response!
No special fancy "fan control policy". 'fancontrol' isn't up or
running.
Vanilla kernels 3.11.* and 3.12.* had been working on here
without any extra work.
--
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +71.0°C (crit = +256.0°C)
temp2: +69.0°C (crit = +110.0°C)
temp3: +52.0°C (crit = +105.0°C)
temp4: +25.0°C (crit = +110.0°C)
temp5: +58.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
--
My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
This is with 3.12.13 with my normal workload.
Please, trust my above mentionned values of 94 °C vs. 74°C as I
don't like to boot 3.13.6 anymore, to avoid harm to the
notebook's casing.
But I'd do to test any improvement-patch.
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-07 22:04 ` Manuel Krause
@ 2014-03-07 22:52 ` Guenter Roeck
2014-03-08 11:08 ` [lm-sensors] " Jean Delvare
0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-07 22:52 UTC (permalink / raw)
To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors
On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> On 2014-03-07 21:55, Guenter Roeck wrote:
> >On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> >>Please have a short look at the following BUG report + the comments
> >>-- this message here is a kind of FWD-ing it:
> >>https://bugs.archlinux.org/task/39005
> >>
> >>I came late to test kernel 3.13 with the .5 one, as it was the time
> >>that the related -CK/BFS patch became available.
> >>
> >>I'm not using Archlinux, but openSUSE, and my problems are quite the
> >>same. Especially these with smelling melting plastics.
> >>
> >>My own reports went to Con Kolivas' Blog first:
> >>"I get weird temperatures and abrupt 100% fan actions with vanilla
> >>3.13.5 with this CK and most recent BFQ at my HP Notebook.
> >>In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> >>now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> >>cooling it to approx. 82°C.
> >>That is not good, if I compare 74 to 94 °C.
> >>Have I missed a .CONFIG option for 3.13, especially?"
> >>
> >>I'd get the same without (Con's && BFQ's) patches.
> >>
> >>Machine: HP Notebook with Core2Duo CPU (Penryn)
> >>Distro: openSUSE 13.1, 64bit, continuously updated
> >>Desktop: KDE 4.12.3
> >>MESA & drm & Xorg: most recent ones from:
> >>http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> >>
> >>Current kernel: 3.13.6 vanilla from openSUSE repos, with
> >> -ck1 and BFQ patches
> >>Same behaviour: without these patches
> >>
> >>Last good kernel: 3.12.13 vanilla + CK2 + BFQ
> >>
> >
> >Can you add more information about your fan control policy ?
> >Do you rely on the hardware for automatic fan speed control,
> >or do you run the fancontrol script ?
> >
> >What is the output from the 'sensors' command ?
> >
> >Thanks,
> >Guenter
> >
>
> Hi, and thanks for the quick response!
> No special fancy "fan control policy". 'fancontrol' isn't up or
> running.
> Vanilla kernels 3.11.* and 3.12.* had been working on here without
> any extra work.
> --
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +71.0°C (crit = +256.0°C)
> temp2: +69.0°C (crit = +110.0°C)
> temp3: +52.0°C (crit = +105.0°C)
> temp4: +25.0°C (crit = +110.0°C)
> temp5: +58.0°C (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
> --
> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> This is with 3.12.13 with my normal workload.
>
> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> casing.
>
Understood. Unfortunately, we'll need to get information
from the new kernel to be able to track down the problem.
> But I'd do to test any improvement-patch.
>
So far I have no idea what is going on. I don't see anything in the
drivers providing above data that would explain the behavior,
but I might be missing something.
Of course, if output is different in 3.13, that would be important
to know. Maybe someone else can post related information for both
kernel versions on an affected system.
Guenter
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
2014-03-07 22:52 ` Guenter Roeck
@ 2014-03-08 11:08 ` Jean Delvare
2014-03-08 12:36 ` Rafael J. Wysocki
2014-03-08 15:59 ` Guenter Roeck
0 siblings, 2 replies; 22+ messages in thread
From: Jean Delvare @ 2014-03-08 11:08 UTC (permalink / raw)
To: Manuel Krause; +Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm
On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > Hi, and thanks for the quick response!
> > No special fancy "fan control policy". 'fancontrol' isn't up or
> > running.
> > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > any extra work.
> > --
> > # sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1: +71.0°C (crit = +256.0°C)
> > temp2: +69.0°C (crit = +110.0°C)
> > temp3: +52.0°C (crit = +105.0°C)
> > temp4: +25.0°C (crit = +110.0°C)
> > temp5: +58.0°C (crit = +110.0°C)
> >
> > coretemp-isa-0000
> > Adapter: ISA adapter
> > Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
> > Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
> > --
> > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > This is with 3.12.13 with my normal workload.
> >
> > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > casing.
>
> Understood. Unfortunately, we'll need to get information
> from the new kernel to be able to track down the problem.
Indeed. Not only the run-time temperatures, but also the high and crit
limits.
> > But I'd do to test any improvement-patch.
>
> So far I have no idea what is going on. I don't see anything in the
> drivers providing above data that would explain the behavior,
> but I might be missing something.
Looks like a regression in the acpi subsystem or in power management,
not hwmon. Hwmon is merely reporting the temperatures, it's not
responsible for the actual temperatures.
A bisection would certainly help, but of course that would require
booting to a bad kernel half of the time, which I understand Manual
wouldn't enjoy.
The only two components which I think can reach such high temperatures
in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
74°C" refers to acpitz's temp1? If the the temperatures reported by
coretemp remain the same, then I can only suppose that temp1 is the GPU
temperature. Please tell us which GPU is in this laptop, and which
driver you're using.
--
Jean Delvare
SUSE L3 Support
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
2014-03-08 11:08 ` [lm-sensors] " Jean Delvare
@ 2014-03-08 12:36 ` Rafael J. Wysocki
2014-03-08 15:59 ` Guenter Roeck
1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-03-08 12:36 UTC (permalink / raw)
To: Jean Delvare, Manuel Krause
Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm
On Saturday, March 08, 2014 12:08:31 PM Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> > On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > > Hi, and thanks for the quick response!
> > > No special fancy "fan control policy". 'fancontrol' isn't up or
> > > running.
> > > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > > any extra work.
> > > --
> > > # sensors
> > > acpitz-virtual-0
> > > Adapter: Virtual device
> > > temp1: +71.0°C (crit = +256.0°C)
> > > temp2: +69.0°C (crit = +110.0°C)
> > > temp3: +52.0°C (crit = +105.0°C)
> > > temp4: +25.0°C (crit = +110.0°C)
> > > temp5: +58.0°C (crit = +110.0°C)
> > >
> > > coretemp-isa-0000
> > > Adapter: ISA adapter
> > > Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
> > > Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
> > > --
> > > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > > This is with 3.12.13 with my normal workload.
> > >
> > > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > > casing.
> >
> > Understood. Unfortunately, we'll need to get information
> > from the new kernel to be able to track down the problem.
>
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
>
> > > But I'd do to test any improvement-patch.
> >
> > So far I have no idea what is going on. I don't see anything in the
> > drivers providing above data that would explain the behavior,
> > but I might be missing something.
>
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
>
> A bisection would certainly help, but of course that would require
> booting to a bad kernel half of the time, which I understand Manual
> wouldn't enjoy.
>
> The only two components which I think can reach such high temperatures
> in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
> 74°C" refers to acpitz's temp1? If the the temperatures reported by
> coretemp remain the same, then I can only suppose that temp1 is the GPU
> temperature. Please tell us which GPU is in this laptop, and which
> driver you're using.
Also it would be good to know which cpufreq and cpuidle drivers are in use
and whether or not 3.14-rc5 has the problem.
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
2014-03-08 11:08 ` [lm-sensors] " Jean Delvare
2014-03-08 12:36 ` Rafael J. Wysocki
@ 2014-03-08 15:59 ` Guenter Roeck
2014-03-09 0:10 ` Manuel Krause
1 sibling, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-08 15:59 UTC (permalink / raw)
To: Jean Delvare, Manuel Krause; +Cc: lm-sensors, linux-kernel, linux-pm
On 03/08/2014 03:08 AM, Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>> Hi, and thanks for the quick response!
>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>> running.
>>> Vanilla kernels 3.11.* and 3.12.* had been working on here without
>>> any extra work.
>>> --
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1: +71.0°C (crit = +256.0°C)
>>> temp2: +69.0°C (crit = +110.0°C)
>>> temp3: +52.0°C (crit = +105.0°C)
>>> temp4: +25.0°C (crit = +110.0°C)
>>> temp5: +58.0°C (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>> --
>>> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
>>> This is with 3.12.13 with my normal workload.
>>>
>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
>>> casing.
>>
>> Understood. Unfortunately, we'll need to get information
>> from the new kernel to be able to track down the problem.
>
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
>
>>> But I'd do to test any improvement-patch.
>>
>> So far I have no idea what is going on. I don't see anything in the
>> drivers providing above data that would explain the behavior,
>> but I might be missing something.
>
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
>
I would agree. I don't think we have enough information to be sure,
though. There might be some unintended interaction or interference.
gpu is a good hint ... for example, look at commit b9ed919f1c8
(drm/nouveau/drm/pm: remove everything except the hwmon interfaces
to THERM). nouveau does export pwm and fan control information,
so any change in that code may have unintended side effects.
Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
use devm_hwmon_register_with_groups) could have the observed impact,
as it is purely passive, but I prefer to be rather safe than sorry.
This problem has now been submitted into bugzilla as
https://bugzilla.kernel.org/show_bug.cgi?id=71711.
Guenter
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-08 15:59 ` Guenter Roeck
@ 2014-03-09 0:10 ` Manuel Krause
2014-03-09 17:28 ` Guenter Roeck
2014-03-09 17:58 ` Rafael J. Wysocki
0 siblings, 2 replies; 22+ messages in thread
From: Manuel Krause @ 2014-03-09 0:10 UTC (permalink / raw)
To: Guenter Roeck, linux-kernel, linux-pm; +Cc: Rafael J. Wysocki, lm-sensors
On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1: +71.0°C (crit = +256.0°C)
>>>> temp2: +69.0°C (crit = +110.0°C)
>>>> temp3: +52.0°C (crit = +105.0°C)
>>>> temp4: +25.0°C (crit = +110.0°C)
>>>> temp5: +58.0°C (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>
Sorry, for beeing late, had to search for/accumulate much info
for you...
I hope, you like me to put it into one answer to you all CCing you.
My GFX is a GM45 Intel (mobile), shared memory, running the
opensource Mesa drivers/extensions.
kernel-module: i915
According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu
CPUfreq:
driver: acpi-cpufreq
available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +41.0°C (crit = +256.0°C)
temp2: +92.0°C (crit = +110.0°C)
temp3: +71.0°C (crit = +105.0°C)
temp4: +26.5°C (crit = +110.0°C)
temp5: +25.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C)
FROM a critical "smelly" situation today, kernel-compilation, fan
@100%.
--
Additional findings:
Identification from bootup ACPI initialisation vs. sensors:
temp1 = DTSZ
temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
temp3 = SKNZ
temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
(25 - 45 - 58 - max?)
Core 0 & Core 1 are the internal CPU T sensors.
With the 3.13.x (.5+) kernels the first gatherered cooling
settings from bootup do stay forever. Means, rebooting a hot
system will get a FDTZ @45°C+ and won't make any problems, as it
does cool enough (even for kernel compiling on here). If it gets
25°C @bootup the system goes into emergency cooling somewhen.
Same is with a suspend/resume.
Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
Thank you all for your engagement, best regards,
Manuel Krause.
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-09 0:10 ` Manuel Krause
@ 2014-03-09 17:28 ` Guenter Roeck
2014-03-09 17:58 ` Rafael J. Wysocki
1 sibling, 0 replies; 22+ messages in thread
From: Guenter Roeck @ 2014-03-09 17:28 UTC (permalink / raw)
To: Manuel Krause, linux-kernel, linux-pm
Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki
On 03/08/2014 04:10 PM, Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>> Hi, and thanks for the quick response!
>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>> running.
>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>> without
>>>>> any extra work.
>>>>> --
>>>>> # sensors
>>>>> acpitz-virtual-0
>>>>> Adapter: Virtual device
>>>>> temp1: +71.0°C (crit = +256.0°C)
>>>>> temp2: +69.0°C (crit = +110.0°C)
>>>>> temp3: +52.0°C (crit = +105.0°C)
>>>>> temp4: +25.0°C (crit = +110.0°C)
>>>>> temp5: +58.0°C (crit = +110.0°C)
>>>>>
>>>>> coretemp-isa-0000
>>>>> Adapter: ISA adapter
>>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>>>> --
>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>> sensor.
>>>>> This is with 3.12.13 with my normal workload.
>>>>>
>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>> notebook's
>>>>> casing.
>>>>
>>>> Understood. Unfortunately, we'll need to get information
>>>> from the new kernel to be able to track down the problem.
>>>
>>> Indeed. Not only the run-time temperatures, but also the high
>>> and crit
>>> limits.
>>>
>>>>> But I'd do to test any improvement-patch.
>>>>
>>>> So far I have no idea what is going on. I don't see anything
>>>> in the
>>>> drivers providing above data that would explain the behavior,
>>>> but I might be missing something.
>>>
>>> Looks like a regression in the acpi subsystem or in power
>>> management,
>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>> responsible for the actual temperatures.
>>>
>>
>> I would agree. I don't think we have enough information to be sure,
>> though. There might be some unintended interaction or interference.
>>
>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>> to THERM). nouveau does export pwm and fan control information,
>> so any change in that code may have unintended side effects.
>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>> use devm_hwmon_register_with_groups) could have the observed impact,
>> as it is purely passive, but I prefer to be rather safe than sorry.
>>
>> This problem has now been submitted into bugzilla as
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>
>> Guenter
>>
>
> Sorry, for beeing late, had to search for/accumulate much info for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
> driver: acpi-cpufreq
> available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +41.0°C (crit = +256.0°C)
> temp2: +92.0°C (crit = +110.0°C)
> temp3: +71.0°C (crit = +105.0°C)
> temp4: +26.5°C (crit = +110.0°C)
> temp5: +25.0°C (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C)
> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>
Hi Manuel,
thanks a lot for the additional information.
I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711).
This is pretty much all I can do at this point; I have no idea what
is going on. Some change in ACPI would be my guess, but I did not see
anything catching my eye when looking through the ACPI code.
Guenter
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-09 0:10 ` Manuel Krause
2014-03-09 17:28 ` Guenter Roeck
@ 2014-03-09 17:58 ` Rafael J. Wysocki
2014-03-10 1:49 ` Manuel Krause
1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-03-09 17:58 UTC (permalink / raw)
To: Manuel Krause
Cc: Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors,
rui.zhang
On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
> > On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> >>>> Hi, and thanks for the quick response!
> >>>> No special fancy "fan control policy". 'fancontrol' isn't up or
> >>>> running.
> >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
> >>>> without
> >>>> any extra work.
> >>>> --
> >>>> # sensors
> >>>> acpitz-virtual-0
> >>>> Adapter: Virtual device
> >>>> temp1: +71.0°C (crit = +256.0°C)
> >>>> temp2: +69.0°C (crit = +110.0°C)
> >>>> temp3: +52.0°C (crit = +105.0°C)
> >>>> temp4: +25.0°C (crit = +110.0°C)
> >>>> temp5: +58.0°C (crit = +110.0°C)
> >>>>
> >>>> coretemp-isa-0000
> >>>> Adapter: ISA adapter
> >>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
> >>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
> >>>> --
> >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
> >>>> sensor.
> >>>> This is with 3.12.13 with my normal workload.
> >>>>
> >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> >>>> don't like to boot 3.13.6 anymore, to avoid harm to the
> >>>> notebook's
> >>>> casing.
> >>>
> >>> Understood. Unfortunately, we'll need to get information
> >>> from the new kernel to be able to track down the problem.
> >>
> >> Indeed. Not only the run-time temperatures, but also the high
> >> and crit
> >> limits.
> >>
> >>>> But I'd do to test any improvement-patch.
> >>>
> >>> So far I have no idea what is going on. I don't see anything
> >>> in the
> >>> drivers providing above data that would explain the behavior,
> >>> but I might be missing something.
> >>
> >> Looks like a regression in the acpi subsystem or in power
> >> management,
> >> not hwmon. Hwmon is merely reporting the temperatures, it's not
> >> responsible for the actual temperatures.
> >>
> >
> > I would agree. I don't think we have enough information to be sure,
> > though. There might be some unintended interaction or interference.
> >
> > gpu is a good hint ... for example, look at commit b9ed919f1c8
> > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> > to THERM). nouveau does export pwm and fan control information,
> > so any change in that code may have unintended side effects.
> > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> > use devm_hwmon_register_with_groups) could have the observed impact,
> > as it is purely passive, but I prefer to be rather safe than sorry.
> >
> > This problem has now been submitted into bugzilla as
> > https://bugzilla.kernel.org/show_bug.cgi?id=71711.
> >
> > Guenter
> >
>
> Sorry, for beeing late, had to search for/accumulate much info
> for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the
> opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
> driver: acpi-cpufreq
> available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +41.0°C (crit = +256.0°C)
> temp2: +92.0°C (crit = +110.0°C)
> temp3: +71.0°C (crit = +105.0°C)
> temp4: +26.5°C (crit = +110.0°C)
> temp5: +25.0°C (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C)
> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan
> @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
> (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling
> settings from bootup do stay forever. Means, rebooting a hot
> system will get a FDTZ @45°C+ and won't make any problems, as it
> does cool enough (even for kernel compiling on here). If it gets
> 25°C @bootup the system goes into emergency cooling somewhen.
> Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
This almost certainly is an ACPI regression, but I'm not sure whether
thermal management or CPU power management is broken on your system.
Can you compare the contents of /sys/class/thermal/ from working and
not working kernels, please?
Rafael
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-09 17:58 ` Rafael J. Wysocki
@ 2014-03-10 1:49 ` Manuel Krause
2014-03-11 21:59 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-10 1:49 UTC (permalink / raw)
To: Rafael J. Wysocki, linux-kernel, linux-pm
Cc: Guenter Roeck, Jean Delvare, lm-sensors, rui.zhang
On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>> Hi, and thanks for the quick response!
>>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>>> running.
>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>> without
>>>>>> any extra work.
>>>>>> --
>>>>>> # sensors
>>>>>> acpitz-virtual-0
>>>>>> Adapter: Virtual device
>>>>>> temp1: +71.0°C (crit = +256.0°C)
>>>>>> temp2: +69.0°C (crit = +110.0°C)
>>>>>> temp3: +52.0°C (crit = +105.0°C)
>>>>>> temp4: +25.0°C (crit = +110.0°C)
>>>>>> temp5: +58.0°C (crit = +110.0°C)
>>>>>>
>>>>>> coretemp-isa-0000
>>>>>> Adapter: ISA adapter
>>>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>>>>> --
>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>> sensor.
>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>
>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>> notebook's
>>>>>> casing.
>>>>>
>>>>> Understood. Unfortunately, we'll need to get information
>>>>> from the new kernel to be able to track down the problem.
>>>>
>>>> Indeed. Not only the run-time temperatures, but also the high
>>>> and crit
>>>> limits.
>>>>
>>>>>> But I'd do to test any improvement-patch.
>>>>>
>>>>> So far I have no idea what is going on. I don't see anything
>>>>> in the
>>>>> drivers providing above data that would explain the behavior,
>>>>> but I might be missing something.
>>>>
>>>> Looks like a regression in the acpi subsystem or in power
>>>> management,
>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>> responsible for the actual temperatures.
>>>>
>>>
>>> I would agree. I don't think we have enough information to be sure,
>>> though. There might be some unintended interaction or interference.
>>>
>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>>> to THERM). nouveau does export pwm and fan control information,
>>> so any change in that code may have unintended side effects.
>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>> use devm_hwmon_register_with_groups) could have the observed impact,
>>> as it is purely passive, but I prefer to be rather safe than sorry.
>>>
>>> This problem has now been submitted into bugzilla as
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>
>>> Guenter
>>>
>>
>> Sorry, for beeing late, had to search for/accumulate much info
>> for you...
>> I hope, you like me to put it into one answer to you all CCing you.
>>
>> My GFX is a GM45 Intel (mobile), shared memory, running the
>> opensource Mesa drivers/extensions.
>> kernel-module: i915
>>
>> According to the output of 'cpupower': I have
>> CPUidle driver: acpi_idle
>> CPUidle governor: menu
>>
>> CPUfreq:
>> driver: acpi-cpufreq
>> available cpufreq governors: ondemand, performance
>> -
>> And "ondemand" is running.
>> --
>>
>> # sensors
>> acpitz-virtual-0
>> Adapter: Virtual device
>> temp1: +41.0°C (crit = +256.0°C)
>> temp2: +92.0°C (crit = +110.0°C)
>> temp3: +71.0°C (crit = +105.0°C)
>> temp4: +26.5°C (crit = +110.0°C)
>> temp5: +25.0°C (crit = +110.0°C)
>>
>> coretemp-isa-0000
>> Adapter: ISA adapter
>> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C)
>> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C)
>>
>> FROM a critical "smelly" situation today, kernel-compilation, fan
>> @100%.
>> --
>>
>> Additional findings:
>>
>> Identification from bootup ACPI initialisation vs. sensors:
>> temp1 = DTSZ
>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>> temp3 = SKNZ
>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>> (25 - 45 - 58 - max?)
>> Core 0 & Core 1 are the internal CPU T sensors.
>>
>> With the 3.13.x (.5+) kernels the first gatherered cooling
>> settings from bootup do stay forever. Means, rebooting a hot
>> system will get a FDTZ @45°C+ and won't make any problems, as it
>> does cool enough (even for kernel compiling on here). If it gets
>> 25°C @bootup the system goes into emergency cooling somewhen.
>> Same is with a suspend/resume.
>>
>> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>
> This almost certainly is an ACPI regression, but I'm not sure whether
> thermal management or CPU power management is broken on your system.
>
> Can you compare the contents of /sys/class/thermal/ from working and
> not working kernels, please?
>
> Rafael
>
Hi again,
unfortunately you didn't specify how deeply I should dig into
/sys/class/thermal. So you get the lines from # BOF # to # EOF #
below. I hope they're readable without more comments.
The most remarkable changes, in my eyes, had happened within
"thermal_zone1".
Best regards,
Manuel Krause
# BOF #
Following ones are all from /sys/class/thermal/ which are links
to -> ../../devices/virtual/thermal/
I've listed the directories in sections of cooling_devices and
thermal_zones separately for each bad/good kernel. For Emailing
purposes only. You can merge them into a spreadsheet for your
evaluation on your own. I've left out reporting some subdirs and
subdir's values that _really_ didn't seem to need attention.
Also, I've had collected the #sensors output for each readout,
having reproduced nearly the same workload, represented by the
"Fan speed" (thermal_zone4==FDTZ).
And I've done my very best to not produce typos or c&p errors.
3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir |-
/type /cur_state /max_state
cooling_device0 Processor 0 10
cooling_device1 Processor 0 10
cooling_device2 Fan 0 1
cooling_device3 Fan 1 1
cooling_device4 Fan 0 1
cooling_device5 Fan 0 1
cooling_device6 Fan 0 1
cooling_device7 LCD 0 24
3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir |-
/type /cur_state /max_state
cooling_device0 Processor 0 10
cooling_device1 Processor 0 10
cooling_device2 Fan 0 1
cooling_device3 Fan 1 1
cooling_device4 Fan 1 1
cooling_device5 Fan 1 1
cooling_device6 Fan 1 1
cooling_device7 LCD 0 24
3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir |-
/passive /temp |- /cdev?_ /trip_ /trip_
trip_ point_ point_
point ?_temp ?_type
thermal_zone0 0 68000 ?=0 n.a. 256000 critical
thermal_zone1 n.a. 70000 |-
?=0 6 110000 critical
?=1 5 107000 passive
?=2 4 90000 active
?=3 3 75000 active
?=4 2 55000 active
?=5 1 45000 active
?=6 1 30000 active
thermal_zone2 n.a. 54000 |-
?=0 1 105000 critical
?=1 1 95000 passive
thermal_zone3 n.a. 25800 |-
?=0 1 110000 critical
?=1 1 60000 passive
thermal_zone4 0 58000 ?=0 n.a. 110000 critical
3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir |-
/passive /temp |- /cdev?_ /trip_ /trip_
trip_ point_ point_
point ?_temp ?_type
thermal_zone0 0 50000 ?=0 n.a. 256000 critical
thermal_zone1 n.a. 70000 |-
?=0 1 110000 critical
?=1 1 107000 passive
?=2 2 90000 active
?=3 3 67000 active
?=4 4 55000 active
?=5 5 45000 active
?=6 6 30000 active
thermal_zone2 n.a. 53000 |-
?=0 1 105000 critical
?=1 1 95000 passive
thermal_zone3 n.a. 25600 |-
?=0 1 110000 critical
?=1 1 60000 passive
thermal_zone4 0 58000 ?=0 n.a. 110000 critical
---
Legend here:
/type is always acpitz
/mode enabled
/policy step_wise
- from kernel ACPI initialisation: thermal_zone0==DTSZ,
thermal_zone1==CPUZ, thermal_zone2==SKNZ,
thermal_zone3==BATZ, thermal_zone4==FDTZ
- n.a. means file or value is not available
___
Legend in general:
/power/control is always auto
/power/runtime_status unsupported
/uevent ''==empty
----------------------------------------------------------------
3.13.5 -- 20140309 -- 20:52 -- bad
=============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +68.0°C (crit = +256.0°C)
temp2: +70.0°C (crit = +110.0°C)
temp3: +54.0°C (crit = +105.0°C)
temp4: +25.8°C (crit = +110.0°C)
temp5: +58.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +66.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +63.0°C (high = +105.0°C, crit = +105.0°C)
3.12.13 -- 20140310 -- 00:26 -- good
==============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +50.0°C (crit = +256.0°C)
temp2: +70.0°C (crit = +110.0°C)
temp3: +53.0°C (crit = +105.0°C)
temp4: +25.6°C (crit = +110.0°C)
temp5: +58.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +65.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +61.0°C (high = +105.0°C, crit = +105.0°C)
# EOF #
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-10 1:49 ` Manuel Krause
@ 2014-03-11 21:59 ` Manuel Krause
[not found] ` <532B4DC5.4010705@netscape.net>
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-11 21:59 UTC (permalink / raw)
To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
Cc: Guenter Roeck, Jean Delvare, lm-sensors
On 2014-03-10 02:49, Manuel Krause wrote:
> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>>> Hi, and thanks for the quick response!
>>>>>>> No special fancy "fan control policy". 'fancontrol' isn't
>>>>>>> up or
>>>>>>> running.
>>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>>> without
>>>>>>> any extra work.
>>>>>>> --
>>>>>>> # sensors
>>>>>>> acpitz-virtual-0
>>>>>>> Adapter: Virtual device
>>>>>>> temp1: +71.0°C (crit = +256.0°C)
>>>>>>> temp2: +69.0°C (crit = +110.0°C)
>>>>>>> temp3: +52.0°C (crit = +105.0°C)
>>>>>>> temp4: +25.0°C (crit = +110.0°C)
>>>>>>> temp5: +58.0°C (crit = +110.0°C)
>>>>>>>
>>>>>>> coretemp-isa-0000
>>>>>>> Adapter: ISA adapter
>>>>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>>>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>>>>>> --
>>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>>> sensor.
>>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>>
>>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C
>>>>>>> as I
>>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>>> notebook's
>>>>>>> casing.
>>>>>>
>>>>>> Understood. Unfortunately, we'll need to get information
>>>>>> from the new kernel to be able to track down the problem.
>>>>>
>>>>> Indeed. Not only the run-time temperatures, but also the high
>>>>> and crit
>>>>> limits.
>>>>>
>>>>>>> But I'd do to test any improvement-patch.
>>>>>>
>>>>>> So far I have no idea what is going on. I don't see anything
>>>>>> in the
>>>>>> drivers providing above data that would explain the behavior,
>>>>>> but I might be missing something.
>>>>>
>>>>> Looks like a regression in the acpi subsystem or in power
>>>>> management,
>>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>>> responsible for the actual temperatures.
>>>>>
>>>>
>>>> I would agree. I don't think we have enough information to be
>>>> sure,
>>>> though. There might be some unintended interaction or
>>>> interference.
>>>>
>>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>>> (drm/nouveau/drm/pm: remove everything except the hwmon
>>>> interfaces
>>>> to THERM). nouveau does export pwm and fan control information,
>>>> so any change in that code may have unintended side effects.
>>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>>> use devm_hwmon_register_with_groups) could have the observed
>>>> impact,
>>>> as it is purely passive, but I prefer to be rather safe than
>>>> sorry.
>>>>
>>>> This problem has now been submitted into bugzilla as
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>>
>>>> Guenter
>>>>
>>>
>>> Sorry, for beeing late, had to search for/accumulate much info
>>> for you...
>>> I hope, you like me to put it into one answer to you all CCing
>>> you.
>>>
>>> My GFX is a GM45 Intel (mobile), shared memory, running the
>>> opensource Mesa drivers/extensions.
>>> kernel-module: i915
>>>
>>> According to the output of 'cpupower': I have
>>> CPUidle driver: acpi_idle
>>> CPUidle governor: menu
>>>
>>> CPUfreq:
>>> driver: acpi-cpufreq
>>> available cpufreq governors: ondemand, performance
>>> -
>>> And "ondemand" is running.
>>> --
>>>
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1: +41.0°C (crit = +256.0°C)
>>> temp2: +92.0°C (crit = +110.0°C)
>>> temp3: +71.0°C (crit = +105.0°C)
>>> temp4: +26.5°C (crit = +110.0°C)
>>> temp5: +25.0°C (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C)
>>> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C)
>>>
>>> FROM a critical "smelly" situation today, kernel-compilation, fan
>>> @100%.
>>> --
>>>
>>> Additional findings:
>>>
>>> Identification from bootup ACPI initialisation vs. sensors:
>>> temp1 = DTSZ
>>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>>> temp3 = SKNZ
>>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>>> (25 - 45 - 58 - max?)
>>> Core 0 & Core 1 are the internal CPU T sensors.
>>>
>>> With the 3.13.x (.5+) kernels the first gatherered cooling
>>> settings from bootup do stay forever. Means, rebooting a hot
>>> system will get a FDTZ @45°C+ and won't make any problems, as it
>>> does cool enough (even for kernel compiling on here). If it gets
>>> 25°C @bootup the system goes into emergency cooling somewhen.
>>> Same is with a suspend/resume.
>>>
>>> Kernel 3.12.13 adjusts the cooling on it's own, but
>>> appropriately.
>>
>> This almost certainly is an ACPI regression, but I'm not sure
>> whether
>> thermal management or CPU power management is broken on your
>> system.
>>
>> Can you compare the contents of /sys/class/thermal/ from
>> working and
>> not working kernels, please?
>>
>> Rafael
>>
>
> Hi again,
> unfortunately you didn't specify how deeply I should dig into
> /sys/class/thermal. So you get the lines from # BOF # to # EOF #
> below. I hope they're readable without more comments.
>
> The most remarkable changes, in my eyes, had happened within
> "thermal_zone1".
>
> Best regards,
> Manuel Krause
>
>
> # BOF #
> Following ones are all from /sys/class/thermal/ which are links
> to -> ../../devices/virtual/thermal/
>
> I've listed the directories in sections of cooling_devices and
> thermal_zones separately for each bad/good kernel. For Emailing
> purposes only. You can merge them into a spreadsheet for your
> evaluation on your own. I've left out reporting some subdirs and
> subdir's values that _really_ didn't seem to need attention.
>
> Also, I've had collected the #sensors output for each readout,
> having reproduced nearly the same workload, represented by the
> "Fan speed" (thermal_zone4==FDTZ).
>
> And I've done my very best to not produce typos or c&p errors.
>
>
> 3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir |-
> /type /cur_state /max_state
> cooling_device0 Processor 0 10
> cooling_device1 Processor 0 10
> cooling_device2 Fan 0 1
> cooling_device3 Fan 1 1
> cooling_device4 Fan 0 1
> cooling_device5 Fan 0 1
> cooling_device6 Fan 0 1
> cooling_device7 LCD 0 24
>
> 3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir |-
> /type /cur_state /max_state
> cooling_device0 Processor 0 10
> cooling_device1 Processor 0 10
> cooling_device2 Fan 0 1
> cooling_device3 Fan 1 1
> cooling_device4 Fan 1 1
> cooling_device5 Fan 1 1
> cooling_device6 Fan 1 1
> cooling_device7 LCD 0 24
>
>
> 3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir |-
> /passive /temp |- /cdev?_ /trip_ /trip_
> trip_ point_ point_
> point ?_temp ?_type
> thermal_zone0 0 68000 ?=0 n.a. 256000 critical
> thermal_zone1 n.a. 70000 |-
> ?=0 6 110000 critical
> ?=1 5 107000 passive
> ?=2 4 90000 active
> ?=3 3 75000 active
> ?=4 2 55000 active
> ?=5 1 45000 active
> ?=6 1 30000 active
> thermal_zone2 n.a. 54000 |-
> ?=0 1 105000 critical
> ?=1 1 95000 passive
> thermal_zone3 n.a. 25800 |-
> ?=0 1 110000 critical
> ?=1 1 60000 passive
> thermal_zone4 0 58000 ?=0 n.a. 110000 critical
>
>
> 3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir |-
> /passive /temp |- /cdev?_ /trip_ /trip_
> trip_ point_ point_
> point ?_temp ?_type
> thermal_zone0 0 50000 ?=0 n.a. 256000 critical
> thermal_zone1 n.a. 70000 |-
> ?=0 1 110000 critical
> ?=1 1 107000 passive
> ?=2 2 90000 active
> ?=3 3 67000 active
> ?=4 4 55000 active
> ?=5 5 45000 active
> ?=6 6 30000 active
> thermal_zone2 n.a. 53000 |-
> ?=0 1 105000 critical
> ?=1 1 95000 passive
> thermal_zone3 n.a. 25600 |-
> ?=0 1 110000 critical
> ?=1 1 60000 passive
> thermal_zone4 0 58000 ?=0 n.a. 110000 critical
>
> ---
> Legend here:
> /type is always acpitz
> /mode enabled
> /policy step_wise
>
> - from kernel ACPI initialisation: thermal_zone0==DTSZ,
> thermal_zone1==CPUZ, thermal_zone2==SKNZ,
> thermal_zone3==BATZ, thermal_zone4==FDTZ
> - n.a. means file or value is not available
> ___
> Legend in general:
> /power/control is always auto
> /power/runtime_status unsupported
> /uevent ''==empty
>
> ----------------------------------------------------------------
>
> 3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +68.0°C (crit = +256.0°C)
> temp2: +70.0°C (crit = +110.0°C)
> temp3: +54.0°C (crit = +105.0°C)
> temp4: +25.8°C (crit = +110.0°C)
> temp5: +58.0°C (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +66.0°C (high = +105.0°C, crit = +105.0°C)
> Core 1: +63.0°C (high = +105.0°C, crit = +105.0°C)
>
>
> 3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1: +50.0°C (crit = +256.0°C)
> temp2: +70.0°C (crit = +110.0°C)
> temp3: +53.0°C (crit = +105.0°C)
> temp4: +25.6°C (crit = +110.0°C)
> temp5: +58.0°C (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +65.0°C (high = +105.0°C, crit = +105.0°C)
> Core 1: +61.0°C (high = +105.0°C, crit = +105.0°C)
>
> # EOF #
>
>
Hi, and thank you for your attention ^^
at the bottom of this email you'd get the actual values for the
new 3.12.14 kernel for two different levels of usage and ambient
temperature.
You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration
has changed to the way of 3.13.? and also one /trip_point_?_temp
did. But 3.12.14 is working as well as 3.12.13. (So my first
eyecatcher didn't lead to useful things.)
I'm not capaple of finding or understanding the related code,
but, please, let me present an idea of what MAY be going on:
In 3.12.13+, on my system, the effective cooling fan speed seems
to be an accumulation, maybe bitwise, of
cooling_device[2-6]/cur_state, that each get activated (=1) by a
certain other temperature value or level; each of the
cooling_device[2-6]/cur_state stays @1 as long as their ref.
temp. does not undershoot. For my system this ref. temp. would
most likely be triggered by temp2 == thermal_zone1/temp [CPUZ].
In 3.13.? there seems to get only one of
cooling_device[2-6]/cur_state be set to 1, the others left and/or
rewritten with 0. And the fan speed algorithm then accumulates
only one 1 without seeing the [_LEVEL_] number of
cooling_device[2-6]... or re-requesting the related trigger
temperature.
I hope this leads you developers nearer to a conclusion on how to
fix it,
best regards, Manuel Krause
_____________________________
3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use
=============================
/sys/class/thermal/* which
are links to -> ../../devices/virtual/thermal/*
dir |-
/type /cur_state /max_state Maybe
trigger
/PWM
...
cooling_device2 Fan 0 1 not yet
observed
cooling_device3 Fan 0 1 FDTZ==58°C
cooling_device4 Fan 1 1 FDTZ==45°C
cooling_device5 Fan 1 1 FDTZ==34°C
cooling_device6 Fan 1 1 FDTZ==25°C
...
dir |-
/passive /temp |- /cdev?_ /trip_ /trip_
trip_ point_ point_
point ?_temp ?_type
...
thermal_zone1 n.a. 73000 |-
(CPUZ)
?=0 6 110000 critical
?=1 5 107000 passive
?=2 4 90000 active
?=3 3 75000 active
?=4 2 55000 active
?=5 1 45000 active
?=6 1 30000 active
...
thermal_zone4 n.a. 45000 ?=0 n.a. 110000 critical
(FDTZ)
...
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +46.0°C (crit = +256.0°C)
temp2: +73.0°C (crit = +110.0°C)
temp3: +57.0°C (crit = +105.0°C)
temp4: +26.3°C (crit = +110.0°C)
temp5: +45.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +68.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +66.0°C (high = +105.0°C, crit = +105.0°C)
_____________________________
3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state
=============================
dir |-
/type /cur_state /max_state Maybe
trigger
/PWM
...
cooling_device2 Fan 0 1 not yet
observed
cooling_device3 Fan 0 1 FDTZ==58°C
cooling_device4 Fan 0 1 FDTZ==45°C
cooling_device5 Fan 0 1 FDTZ==34°C
cooling_device6 Fan 1 1 FDTZ==25°C
...
dir |-
/passive /temp
thermal_zone1 n.a. 46000 ... (CPUZ)
...
thermal_zone4 n.a. 25000 ... (FDTZ)
...
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +50.0°C (crit = +256.0°C)
temp2: +46.0°C (crit = +110.0°C)
temp3: +44.0°C (crit = +105.0°C)
temp4: +25.7°C (crit = +110.0°C)
temp5: +25.0°C (crit = +110.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +41.0°C (high = +105.0°C, crit = +105.0°C)
Core 1: +41.0°C (high = +105.0°C, crit = +105.0°C)
_____________________________
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
[not found] ` <532B4DC5.4010705@netscape.net>
@ 2014-03-31 23:37 ` Manuel Krause
2014-03-31 23:47 ` Guenter Roeck
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-31 23:37 UTC (permalink / raw)
To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
Cc: Guenter Roeck, Jean Delvare, lm-sensors
On 2014-03-20 21:21, Manuel Krause wrote:
> On 2014-03-11 22:59, Manuel Krause wrote:
>> On 2014-03-10 02:49, Manuel Krause wrote:
>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>> wrote:
> [SNIP]
>
> Long time no reply from you... Have I overseen a unwritten
> convention? Or were my charts that unusable for your analysis/work?
>
> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> persists. "Strange / dangerous fan policy..."
>
> Since kernel 3.13.6 I've managed to 'fix' the potential
> overheating problem by manually issuing a:
> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> _before_ obviously critical temperatures occur. Remind: This
> particular setting may only work for my system! ...and keeps
> working for 3.14-rc.
>
> In the following I'd like to present you a modified output of my
> /sys/class/thermal, that I've written a script for (for my
> system), that shows the results in the way of
> linux/Documentation/thermal/sysfs-api.txt, point 3:
> {I've uploded the files to pastebin, to not swamp you and the
> lists with so many lines of logs.}
>
> For the last good kernel -- 3.12.14 -- in-use:
> http://pastebin.com/HL1PNcda
> For my first bad kernel revision 3.13 -- at critical temp:
> http://pastebin.com/98hgf1a9
> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> http://pastebin.com/MuTwTnjD
> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> *) command:
> http://pastebin.com/2peda54z
>
> Please, have a look at them! And maybe, give me hints on how I
> can help you to further debug this issue, as my manual method
> works but it's annoying.
>
> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> Email-thread to someone in charge.
>
> Thank you for your work && best regards,
> Manuel Krause
>
This is still BUG 71711
https://bugzilla.kernel.org/show_bug.cgi?id=71711
3.12.15 works very well
3.13.7 fails
3.14.0-rc8 fails
I've tried the tmon tool, now, too. Nice eyecandy and for monitoring!
I've tried to revert all "thermal" related patches from
3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even
if I apply the vice-versa patch to 3.12.15.)
So "thermal" is out?
For the failing kernels: Not any reached trip point (active)
triggers ONE fan action!
Next would be ACPI, to be investigated,
THX for this audience,
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-31 23:37 ` Manuel Krause
@ 2014-03-31 23:47 ` Guenter Roeck
2014-04-06 2:37 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-31 23:47 UTC (permalink / raw)
To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
rui.zhang
Cc: Jean Delvare, lm-sensors
On 03/31/2014 04:37 PM, Manuel Krause wrote:
> On 2014-03-20 21:21, Manuel Krause wrote:
>> On 2014-03-11 22:59, Manuel Krause wrote:
>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>> wrote:
>> [SNIP]
>>
>> Long time no reply from you... Have I overseen a unwritten
>> convention? Or were my charts that unusable for your analysis/work?
>>
>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>> persists. "Strange / dangerous fan policy..."
>>
>> Since kernel 3.13.6 I've managed to 'fix' the potential
>> overheating problem by manually issuing a:
>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>> _before_ obviously critical temperatures occur. Remind: This
>> particular setting may only work for my system! ...and keeps
>> working for 3.14-rc.
>>
>> In the following I'd like to present you a modified output of my
>> /sys/class/thermal, that I've written a script for (for my
>> system), that shows the results in the way of
>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>> {I've uploded the files to pastebin, to not swamp you and the
>> lists with so many lines of logs.}
>>
>> For the last good kernel -- 3.12.14 -- in-use:
>> http://pastebin.com/HL1PNcda
>> For my first bad kernel revision 3.13 -- at critical temp:
>> http://pastebin.com/98hgf1a9
>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>> http://pastebin.com/MuTwTnjD
>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>> *) command:
>> http://pastebin.com/2peda54z
>>
>> Please, have a look at them! And maybe, give me hints on how I
>> can help you to further debug this issue, as my manual method
>> works but it's annoying.
>>
>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>> Email-thread to someone in charge.
>>
>> Thank you for your work && best regards,
>> Manuel Krause
>>
>
> This is still BUG 71711
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> 3.12.15 works very well
> 3.13.7 fails
> 3.14.0-rc8 fails
>
Best you can do would really be to bisect the problem.
Unfortunately only you (or someone else with an affected system)
can do that. Once the culprit is known it would be much easier
to get it fixed.
To answer your earlier question: I don't think you did anything wrong.
I guess everyone else is just as clueless as I am (if not, speak up
and help ;-).
Guenter
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-03-31 23:47 ` Guenter Roeck
@ 2014-04-06 2:37 ` Manuel Krause
2014-04-06 2:43 ` Guenter Roeck
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-06 2:37 UTC (permalink / raw)
To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
rui.zhang, Jean Delvare, lm-sensors
On 2014-04-01 01:47, Guenter Roeck wrote:
> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>> On 2014-03-20 21:21, Manuel Krause wrote:
>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>> wrote:
>>> [SNIP]
>>>
>>> Long time no reply from you... Have I overseen a unwritten
>>> convention? Or were my charts that unusable for your
>>> analysis/work?
>>>
>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>> persists. "Strange / dangerous fan policy..."
>>>
>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>> overheating problem by manually issuing a:
>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>> _before_ obviously critical temperatures occur. Remind: This
>>> particular setting may only work for my system! ...and keeps
>>> working for 3.14-rc.
>>>
>>> In the following I'd like to present you a modified output of my
>>> /sys/class/thermal, that I've written a script for (for my
>>> system), that shows the results in the way of
>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>> {I've uploded the files to pastebin, to not swamp you and the
>>> lists with so many lines of logs.}
>>>
>>> For the last good kernel -- 3.12.14 -- in-use:
>>> http://pastebin.com/HL1PNcda
>>> For my first bad kernel revision 3.13 -- at critical temp:
>>> http://pastebin.com/98hgf1a9
>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>> http://pastebin.com/MuTwTnjD
>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>> *) command:
>>> http://pastebin.com/2peda54z
>>>
>>> Please, have a look at them! And maybe, give me hints on how I
>>> can help you to further debug this issue, as my manual method
>>> works but it's annoying.
>>>
>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>> Email-thread to someone in charge.
>>>
>>> Thank you for your work && best regards,
>>> Manuel Krause
>>>
>>
>> This is still BUG 71711
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> 3.12.15 works very well
>> 3.13.7 fails
>> 3.14.0-rc8 fails
>>
>
> Best you can do would really be to bisect the problem.
> Unfortunately only you (or someone else with an affected system)
> can do that. Once the culprit is known it would be much easier
> to get it fixed.
>
> To answer your earlier question: I don't think you did anything
> wrong.
> I guess everyone else is just as clueless as I am (if not, speak up
> and help ;-).
>
> Guenter
>
I've now bisected two times. From two different kernel origins,
just to be sure, as I'm new to this stupid-and-lengthy method,
and, to be sure, I haven't given a false positive inbetween due
to boredom.
In the end it says each time:
# git bisect bad | tee -a /var/log/bisect.log
cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
commit cc8ef52707341e67a12067d6ead991d56ea017ca
Author: Zhang Rui <rui.zhang@intel.com>
Date: Wed Sep 25 20:39:45 2013 +0800
ACPI / AC: convert ACPI ac driver to platform bus
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
:040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
Please help me, on how I can help debug this more, and please
also read the newest from
https://bugzilla.kernel.org/show_bug.cgi?id=71711
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-06 2:37 ` Manuel Krause
@ 2014-04-06 2:43 ` Guenter Roeck
2014-04-06 23:17 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-04-06 2:43 UTC (permalink / raw)
To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
rui.zhang, Jean Delvare, lm-sensors
On 04/05/2014 07:37 PM, Manuel Krause wrote:
> On 2014-04-01 01:47, Guenter Roeck wrote:
>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>> wrote:
>>>> [SNIP]
>>>>
>>>> Long time no reply from you... Have I overseen a unwritten
>>>> convention? Or were my charts that unusable for your
>>>> analysis/work?
>>>>
>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>> persists. "Strange / dangerous fan policy..."
>>>>
>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>> overheating problem by manually issuing a:
>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>> _before_ obviously critical temperatures occur. Remind: This
>>>> particular setting may only work for my system! ...and keeps
>>>> working for 3.14-rc.
>>>>
>>>> In the following I'd like to present you a modified output of my
>>>> /sys/class/thermal, that I've written a script for (for my
>>>> system), that shows the results in the way of
>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>> lists with so many lines of logs.}
>>>>
>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>> http://pastebin.com/HL1PNcda
>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>> http://pastebin.com/98hgf1a9
>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>> http://pastebin.com/MuTwTnjD
>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>> *) command:
>>>> http://pastebin.com/2peda54z
>>>>
>>>> Please, have a look at them! And maybe, give me hints on how I
>>>> can help you to further debug this issue, as my manual method
>>>> works but it's annoying.
>>>>
>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>> Email-thread to someone in charge.
>>>>
>>>> Thank you for your work && best regards,
>>>> Manuel Krause
>>>>
>>>
>>> This is still BUG 71711
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>
>>> 3.12.15 works very well
>>> 3.13.7 fails
>>> 3.14.0-rc8 fails
>>>
>>
>> Best you can do would really be to bisect the problem.
>> Unfortunately only you (or someone else with an affected system)
>> can do that. Once the culprit is known it would be much easier
>> to get it fixed.
>>
>> To answer your earlier question: I don't think you did anything
>> wrong.
>> I guess everyone else is just as clueless as I am (if not, speak up
>> and help ;-).
>>
>> Guenter
>>
>
> I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom.
>
Not really. Keep in mint that you were able to track down the bad commit
among more than 10,000 commits in a reasonably short period of time.
> In the end it says each time:
> # git bisect bad | tee -a /var/log/bisect.log
> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> Author: Zhang Rui <rui.zhang@intel.com>
> Date: Wed Sep 25 20:39:45 2013 +0800
>
> ACPI / AC: convert ACPI ac driver to platform bus
>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
Off to the two of you...
Guenter
> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>
>
> Please help me, on how I can help debug this more, and please also read the newest from
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> Manuel Krause
>
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-06 2:43 ` Guenter Roeck
@ 2014-04-06 23:17 ` Manuel Krause
2014-04-07 11:45 ` Rafael J. Wysocki
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw)
To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
rui.zhang, Jean Delvare, lm-sensors
On 2014-04-06 04:43, Guenter Roeck wrote:
> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>> wrote:
>>>>> [SNIP]
>>>>>
>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>> convention? Or were my charts that unusable for your
>>>>> analysis/work?
>>>>>
>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>> persists. "Strange / dangerous fan policy..."
>>>>>
>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>> overheating problem by manually issuing a:
>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>> particular setting may only work for my system! ...and keeps
>>>>> working for 3.14-rc.
>>>>>
>>>>> In the following I'd like to present you a modified output
>>>>> of my
>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>> system), that shows the results in the way of
>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>> lists with so many lines of logs.}
>>>>>
>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>> http://pastebin.com/HL1PNcda
>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>> http://pastebin.com/98hgf1a9
>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>> http://pastebin.com/MuTwTnjD
>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>> *) command:
>>>>> http://pastebin.com/2peda54z
>>>>>
>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>> can help you to further debug this issue, as my manual method
>>>>> works but it's annoying.
>>>>>
>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>> Email-thread to someone in charge.
>>>>>
>>>>> Thank you for your work && best regards,
>>>>> Manuel Krause
>>>>>
>>>>
>>>> This is still BUG 71711
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> 3.12.15 works very well
>>>> 3.13.7 fails
>>>> 3.14.0-rc8 fails
>>>>
>>>
>>> Best you can do would really be to bisect the problem.
>>> Unfortunately only you (or someone else with an affected system)
>>> can do that. Once the culprit is known it would be much easier
>>> to get it fixed.
>>>
>>> To answer your earlier question: I don't think you did anything
>>> wrong.
>>> I guess everyone else is just as clueless as I am (if not,
>>> speak up
>>> and help ;-).
>>>
>>> Guenter
>>>
>>
>> I've now bisected two times. From two different kernel origins,
>> just to be sure, as I'm new to this stupid-and-lengthy method,
>> and, to be sure, I haven't given a false positive inbetween due
>> to boredom.
>>
>
> Not really. Keep in mint that you were able to track down the bad
> commit
> among more than 10,000 commits in a reasonably short period of time.
>
>> In the end it says each time:
>> # git bisect bad | tee -a /var/log/bisect.log
>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date: Wed Sep 25 20:39:45 2013 +0800
>>
>> ACPI / AC: convert ACPI ac driver to platform bus
>>
>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
> Off to the two of you...
>
> Guenter
>
>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>>
>>
>> Please help me, on how I can help debug this more, and please
>> also read the newest from
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> Manuel Krause
>>
>>
>>
>
Sorry, that I've forgotton to add the following last night: After
the first bisection round, I was so glad about a result that
time, that I reverted this mentioned patch from the 3.13.8
kernel, but this didn't fix it. Must be something that came
later: But you all understand more of what you've coded.
Best regards, Manuel Krause
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-06 23:17 ` Manuel Krause
@ 2014-04-07 11:45 ` Rafael J. Wysocki
2014-04-10 22:51 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-04-07 11:45 UTC (permalink / raw)
To: Manuel Krause
Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
lm-sensors
On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> On 2014-04-06 04:43, Guenter Roeck wrote:
> > On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
> >>>>>>>>>>>> wrote:
> >>>>> [SNIP]
> >>>>>
> >>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>> convention? Or were my charts that unusable for your
> >>>>> analysis/work?
> >>>>>
> >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> >>>>> persists. "Strange / dangerous fan policy..."
> >>>>>
> >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>> overheating problem by manually issuing a:
> >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>> particular setting may only work for my system! ...and keeps
> >>>>> working for 3.14-rc.
> >>>>>
> >>>>> In the following I'd like to present you a modified output
> >>>>> of my
> >>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>> system), that shows the results in the way of
> >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>> {I've uploded the files to pastebin, to not swamp you and the
> >>>>> lists with so many lines of logs.}
> >>>>>
> >>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>> http://pastebin.com/HL1PNcda
> >>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>> http://pastebin.com/98hgf1a9
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>> http://pastebin.com/MuTwTnjD
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>> *) command:
> >>>>> http://pastebin.com/2peda54z
> >>>>>
> >>>>> Please, have a look at them! And maybe, give me hints on how I
> >>>>> can help you to further debug this issue, as my manual method
> >>>>> works but it's annoying.
> >>>>>
> >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>> Email-thread to someone in charge.
> >>>>>
> >>>>> Thank you for your work && best regards,
> >>>>> Manuel Krause
> >>>>>
> >>>>
> >>>> This is still BUG 71711
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>
> >>>> 3.12.15 works very well
> >>>> 3.13.7 fails
> >>>> 3.14.0-rc8 fails
> >>>>
> >>>
> >>> Best you can do would really be to bisect the problem.
> >>> Unfortunately only you (or someone else with an affected system)
> >>> can do that. Once the culprit is known it would be much easier
> >>> to get it fixed.
> >>>
> >>> To answer your earlier question: I don't think you did anything
> >>> wrong.
> >>> I guess everyone else is just as clueless as I am (if not,
> >>> speak up
> >>> and help ;-).
> >>>
> >>> Guenter
> >>>
> >>
> >> I've now bisected two times. From two different kernel origins,
> >> just to be sure, as I'm new to this stupid-and-lengthy method,
> >> and, to be sure, I haven't given a false positive inbetween due
> >> to boredom.
> >>
> >
> > Not really. Keep in mint that you were able to track down the bad
> > commit
> > among more than 10,000 commits in a reasonably short period of time.
> >
> >> In the end it says each time:
> >> # git bisect bad | tee -a /var/log/bisect.log
> >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> >> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >> Author: Zhang Rui <rui.zhang@intel.com>
> >> Date: Wed Sep 25 20:39:45 2013 +0800
> >>
> >> ACPI / AC: convert ACPI ac driver to platform bus
> >>
> >> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> > Off to the two of you...
> >
> > Guenter
> >
> >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
> >>
> >>
> >> Please help me, on how I can help debug this more, and please
> >> also read the newest from
> >> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>
> >> Manuel Krause
> >>
> >>
> >>
> >
>
> Sorry, that I've forgotton to add the following last night: After
> the first bisection round, I was so glad about a result that
> time, that I reverted this mentioned patch from the 3.13.8
> kernel, but this didn't fix it.
This means that the commit in question didn't introduce the problem
you're seeing.
Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
build a kernel from that and see if you can reprocude the problem with it.
If so, it can be used as your new "first known bad" kernel for bisection.
Otherwise, you can use it as the "first good" one and commit cc8ef52707341
as "first known bad".
Thanks!
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-07 11:45 ` Rafael J. Wysocki
@ 2014-04-10 22:51 ` Manuel Krause
2014-04-13 0:05 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-10 22:51 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
lm-sensors
On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>>>> wrote:
>>>>>>> [SNIP]
>>>>>>>
>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>> convention? Or were my charts that unusable for your
>>>>>>> analysis/work?
>>>>>>>
>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>
>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>> overheating problem by manually issuing a:
>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>> working for 3.14-rc.
>>>>>>>
>>>>>>> In the following I'd like to present you a modified output
>>>>>>> of my
>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>> system), that shows the results in the way of
>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>>>> lists with so many lines of logs.}
>>>>>>>
>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>> http://pastebin.com/HL1PNcda
>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>> http://pastebin.com/98hgf1a9
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>> http://pastebin.com/MuTwTnjD
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>> *) command:
>>>>>>> http://pastebin.com/2peda54z
>>>>>>>
>>>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>>>> can help you to further debug this issue, as my manual method
>>>>>>> works but it's annoying.
>>>>>>>
>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>> Email-thread to someone in charge.
>>>>>>>
>>>>>>> Thank you for your work && best regards,
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>
>>>>>> This is still BUG 71711
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>
>>>>>> 3.12.15 works very well
>>>>>> 3.13.7 fails
>>>>>> 3.14.0-rc8 fails
>>>>>>
>>>>>
>>>>> Best you can do would really be to bisect the problem.
>>>>> Unfortunately only you (or someone else with an affected system)
>>>>> can do that. Once the culprit is known it would be much easier
>>>>> to get it fixed.
>>>>>
>>>>> To answer your earlier question: I don't think you did anything
>>>>> wrong.
>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>> speak up
>>>>> and help ;-).
>>>>>
>>>>> Guenter
>>>>>
>>>>
>>>> I've now bisected two times. From two different kernel origins,
>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>> and, to be sure, I haven't given a false positive inbetween due
>>>> to boredom.
>>>>
>>>
>>> Not really. Keep in mint that you were able to track down the bad
>>> commit
>>> among more than 10,000 commits in a reasonably short period of time.
>>>
>>>> In the end it says each time:
>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>> Date: Wed Sep 25 20:39:45 2013 +0800
>>>>
>>>> ACPI / AC: convert ACPI ac driver to platform bus
>>>>
>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>> Off to the two of you...
>>>
>>> Guenter
>>>
>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>>>>
>>>>
>>>> Please help me, on how I can help debug this more, and please
>>>> also read the newest from
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> Manuel Krause
>>>>
>>>>
>>>>
>>>
>>
>> Sorry, that I've forgotton to add the following last night: After
>> the first bisection round, I was so glad about a result that
>> time, that I reverted this mentioned patch from the 3.13.8
>> kernel, but this didn't fix it.
>
> This means that the commit in question didn't introduce the problem
> you're seeing.
>
> Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> build a kernel from that and see if you can reprocude the problem with it.
> If so, it can be used as your new "first known bad" kernel for bisection.
> Otherwise, you can use it as the "first good" one and commit cc8ef52707341
> as "first known bad".
>
> Thanks!
>
Sorry, for any inconvenience, but you should forget about what
I've written, that reverting the patch in question from 3.13.x
didn't fix it. Of course it didn't fix it, as the patch doesn't
cleanly revert from release-kernels at all. My mistake!
I' ve been guided by Guenter Roeck through two more bisecting
sessions/ways on this, that always pointed to the commit in question.
Some citation:
Me:
>>> O.k. I've now followed your latest directions:
>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was BAD =>
>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was GOOD
>>>
[ ...]
>>> Reverting that commit in question from this very git tree makes the
>>> kernel work as expected.
[ ... ]
Guenter:
>> Report the results you have above. That should show without question
>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>> and it should be easy to reproduce.
That seems to be all I can do for you for now. Please let me know
of any preliminary patches to test!
And I want to add special thanks to Guenter Roeck for his
always-just-in-time assistance over so many days,
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-10 22:51 ` Manuel Krause
@ 2014-04-13 0:05 ` Manuel Krause
2014-04-16 18:32 ` Zhang Rui
0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-13 0:05 UTC (permalink / raw)
To: rui.zhang
Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
Jean Delvare, lm-sensors
On 2014-04-11 00:51, Manuel Krause wrote:
> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>> wrote:
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>> analysis/work?
>>>>>>>>
>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>> problem
>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>
>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>> overheating problem by manually issuing a:
>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>> working for 3.14-rc.
>>>>>>>>
>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>> of my
>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>> system), that shows the results in the way of
>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>> the
>>>>>>>> lists with so many lines of logs.}
>>>>>>>>
>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>> http://pastebin.com/HL1PNcda
>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>> http://pastebin.com/98hgf1a9
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>> http://pastebin.com/MuTwTnjD
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>> *) command:
>>>>>>>> http://pastebin.com/2peda54z
>>>>>>>>
>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>> how I
>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>> method
>>>>>>>> works but it's annoying.
>>>>>>>>
>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>> Email-thread to someone in charge.
>>>>>>>>
>>>>>>>> Thank you for your work && best regards,
>>>>>>>> Manuel Krause
>>>>>>>>
>>>>>>>
>>>>>>> This is still BUG 71711
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> 3.12.15 works very well
>>>>>>> 3.13.7 fails
>>>>>>> 3.14.0-rc8 fails
>>>>>>>
>>>>>>
>>>>>> Best you can do would really be to bisect the problem.
>>>>>> Unfortunately only you (or someone else with an affected
>>>>>> system)
>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>> to get it fixed.
>>>>>>
>>>>>> To answer your earlier question: I don't think you did
>>>>>> anything
>>>>>> wrong.
>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>> speak up
>>>>>> and help ;-).
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>
>>>>> I've now bisected two times. From two different kernel origins,
>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>> to boredom.
>>>>>
>>>>
>>>> Not really. Keep in mint that you were able to track down the
>>>> bad
>>>> commit
>>>> among more than 10,000 commits in a reasonably short period
>>>> of time.
>>>>
>>>>> In the end it says each time:
>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>> commit
>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>> Date: Wed Sep 25 20:39:45 2013 +0800
>>>>>
>>>>> ACPI / AC: convert ACPI ac driver to platform bus
>>>>>
>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>> Signed-off-by: Rafael J. Wysocki
>>>>> <rafael.j.wysocki@intel.com>
>>>>>
>>>> Off to the two of you...
>>>>
>>>> Guenter
>>>>
>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>>>>>
>>>>>
>>>>> Please help me, on how I can help debug this more, and please
>>>>> also read the newest from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>
>>>>> Manuel Krause
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Sorry, that I've forgotton to add the following last night: After
>>> the first bisection round, I was so glad about a result that
>>> time, that I reverted this mentioned patch from the 3.13.8
>>> kernel, but this didn't fix it.
>>
>> This means that the commit in question didn't introduce the
>> problem
>> you're seeing.
>>
>> Please check out commit 7f2dc5c4bcbf (Merge tag
>> 'dm-3.13-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>
>> build a kernel from that and see if you can reprocude the
>> problem with it.
>> If so, it can be used as your new "first known bad" kernel for
>> bisection.
>> Otherwise, you can use it as the "first good" one and commit
>> cc8ef52707341
>> as "first known bad".
>>
>> Thanks!
>>
>
> Sorry, for any inconvenience, but you should forget about what
> I've written, that reverting the patch in question from 3.13.x
> didn't fix it. Of course it didn't fix it, as the patch doesn't
> cleanly revert from release-kernels at all. My mistake!
>
> I' ve been guided by Guenter Roeck through two more bisecting
> sessions/ways on this, that always pointed to the commit in
> question.
>
> Some citation:
> Me:
>>>> O.k. I've now followed your latest directions:
>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was BAD =>
>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was GOOD
>>>>
> [ ...]
>>>> Reverting that commit in question from this very git tree
>>>> makes the
>>>> kernel work as expected.
> [ ... ]
> Guenter:
>>> Report the results you have above. That should show without
>>> question
>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>> and it should be easy to reproduce.
>
> That seems to be all I can do for you for now. Please let me know
> of any preliminary patches to test!
> And I want to add special thanks to Guenter Roeck for his
> always-just-in-time assistance over so many days,
>
> Manuel Krause
>
BTW -- applying this patch in question to a 3.12.17 kernel, that
worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
kernels. (And, yes, the patch applied cleanly, compiled fine and
boots nicely.)
Manuel Krause
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-13 0:05 ` Manuel Krause
@ 2014-04-16 18:32 ` Zhang Rui
2014-04-16 22:17 ` Manuel Krause
0 siblings, 1 reply; 22+ messages in thread
From: Zhang Rui @ 2014-04-16 18:32 UTC (permalink / raw)
To: Manuel Krause
Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
Jean Delvare, lm-sensors
On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
> On 2014-04-11 00:51, Manuel Krause wrote:
> > On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> >>> On 2014-04-06 04:43, Guenter Roeck wrote:
> >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
> >>>>>>>>>>>>>>> Krause
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>> [SNIP]
> >>>>>>>>
> >>>>>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>>>>> convention? Or were my charts that unusable for your
> >>>>>>>> analysis/work?
> >>>>>>>>
> >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
> >>>>>>>> problem
> >>>>>>>> persists. "Strange / dangerous fan policy..."
> >>>>>>>>
> >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>>>>> overheating problem by manually issuing a:
> >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>>>>> particular setting may only work for my system! ...and keeps
> >>>>>>>> working for 3.14-rc.
> >>>>>>>>
> >>>>>>>> In the following I'd like to present you a modified output
> >>>>>>>> of my
> >>>>>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>>>>> system), that shows the results in the way of
> >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>>>>> {I've uploded the files to pastebin, to not swamp you and
> >>>>>>>> the
> >>>>>>>> lists with so many lines of logs.}
> >>>>>>>>
> >>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>>>> http://pastebin.com/HL1PNcda
> >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>>>> http://pastebin.com/98hgf1a9
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>>>> http://pastebin.com/MuTwTnjD
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>>>> *) command:
> >>>>>>>> http://pastebin.com/2peda54z
> >>>>>>>>
> >>>>>>>> Please, have a look at them! And maybe, give me hints on
> >>>>>>>> how I
> >>>>>>>> can help you to further debug this issue, as my manual
> >>>>>>>> method
> >>>>>>>> works but it's annoying.
> >>>>>>>>
> >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>>>>> Email-thread to someone in charge.
> >>>>>>>>
> >>>>>>>> Thank you for your work && best regards,
> >>>>>>>> Manuel Krause
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is still BUG 71711
> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>>>
> >>>>>>> 3.12.15 works very well
> >>>>>>> 3.13.7 fails
> >>>>>>> 3.14.0-rc8 fails
> >>>>>>>
> >>>>>>
> >>>>>> Best you can do would really be to bisect the problem.
> >>>>>> Unfortunately only you (or someone else with an affected
> >>>>>> system)
> >>>>>> can do that. Once the culprit is known it would be much easier
> >>>>>> to get it fixed.
> >>>>>>
> >>>>>> To answer your earlier question: I don't think you did
> >>>>>> anything
> >>>>>> wrong.
> >>>>>> I guess everyone else is just as clueless as I am (if not,
> >>>>>> speak up
> >>>>>> and help ;-).
> >>>>>>
> >>>>>> Guenter
> >>>>>>
> >>>>>
> >>>>> I've now bisected two times. From two different kernel origins,
> >>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
> >>>>> and, to be sure, I haven't given a false positive inbetween due
> >>>>> to boredom.
> >>>>>
> >>>>
> >>>> Not really. Keep in mint that you were able to track down the
> >>>> bad
> >>>> commit
> >>>> among more than 10,000 commits in a reasonably short period
> >>>> of time.
> >>>>
> >>>>> In the end it says each time:
> >>>>> # git bisect bad | tee -a /var/log/bisect.log
> >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
> >>>>> commit
> >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>>> Author: Zhang Rui <rui.zhang@intel.com>
> >>>>> Date: Wed Sep 25 20:39:45 2013 +0800
> >>>>>
> >>>>> ACPI / AC: convert ACPI ac driver to platform bus
> >>>>>
> >>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>> Signed-off-by: Rafael J. Wysocki
> >>>>> <rafael.j.wysocki@intel.com>
> >>>>>
> >>>> Off to the two of you...
> >>>>
> >>>> Guenter
> >>>>
> >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
> >>>>>
> >>>>>
> >>>>> Please help me, on how I can help debug this more, and please
> >>>>> also read the newest from
> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>
> >>>>> Manuel Krause
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> Sorry, that I've forgotton to add the following last night: After
> >>> the first bisection round, I was so glad about a result that
> >>> time, that I reverted this mentioned patch from the 3.13.8
> >>> kernel, but this didn't fix it.
> >>
> >> This means that the commit in question didn't introduce the
> >> problem
> >> you're seeing.
> >>
> >> Please check out commit 7f2dc5c4bcbf (Merge tag
> >> 'dm-3.13-changes' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> >>
> >> build a kernel from that and see if you can reprocude the
> >> problem with it.
> >> If so, it can be used as your new "first known bad" kernel for
> >> bisection.
> >> Otherwise, you can use it as the "first good" one and commit
> >> cc8ef52707341
> >> as "first known bad".
> >>
> >> Thanks!
> >>
> >
> > Sorry, for any inconvenience, but you should forget about what
> > I've written, that reverting the patch in question from 3.13.x
> > didn't fix it. Of course it didn't fix it, as the patch doesn't
> > cleanly revert from release-kernels at all. My mistake!
> >
> > I' ve been guided by Guenter Roeck through two more bisecting
> > sessions/ways on this, that always pointed to the commit in
> > question.
> >
> > Some citation:
> > Me:
> >>>> O.k. I've now followed your latest directions:
> >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was BAD =>
> >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was GOOD
> >>>>
> > [ ...]
> >>>> Reverting that commit in question from this very git tree
> >>>> makes the
> >>>> kernel work as expected.
> > [ ... ]
> > Guenter:
> >>> Report the results you have above. That should show without
> >>> question
> >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
> >>> and it should be easy to reproduce.
> >
> > That seems to be all I can do for you for now. Please let me know
> > of any preliminary patches to test!
> > And I want to add special thanks to Guenter Roeck for his
> > always-just-in-time assistance over so many days,
> >
> > Manuel Krause
> >
>
> BTW -- applying this patch in question to a 3.12.17 kernel, that
> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
> kernels. (And, yes, the patch applied cleanly, compiled fine and
> boots nicely.)
>
could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
the problem still exist in 3.12.17 kernel?
thanks,
rui
> Manuel Krause
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy...
2014-04-16 18:32 ` Zhang Rui
@ 2014-04-16 22:17 ` Manuel Krause
0 siblings, 0 replies; 22+ messages in thread
From: Manuel Krause @ 2014-04-16 22:17 UTC (permalink / raw)
To: Zhang Rui
Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
Jean Delvare, lm-sensors
On 2014-04-16 20:32, Zhang Rui wrote:
> On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
>> On 2014-04-11 00:51, Manuel Krause wrote:
>>> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>>>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>>>> analysis/work?
>>>>>>>>>>
>>>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>>>> problem
>>>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>>>
>>>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>>>> overheating problem by manually issuing a:
>>>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>>>> working for 3.14-rc.
>>>>>>>>>>
>>>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>>>> of my
>>>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>>>> system), that shows the results in the way of
>>>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>>>> the
>>>>>>>>>> lists with so many lines of logs.}
>>>>>>>>>>
>>>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>>> http://pastebin.com/HL1PNcda
>>>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>>> http://pastebin.com/98hgf1a9
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>>> http://pastebin.com/MuTwTnjD
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>>> *) command:
>>>>>>>>>> http://pastebin.com/2peda54z
>>>>>>>>>>
>>>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>>>> how I
>>>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>>>> method
>>>>>>>>>> works but it's annoying.
>>>>>>>>>>
>>>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>>>> Email-thread to someone in charge.
>>>>>>>>>>
>>>>>>>>>> Thank you for your work && best regards,
>>>>>>>>>> Manuel Krause
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is still BUG 71711
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>>>
>>>>>>>>> 3.12.15 works very well
>>>>>>>>> 3.13.7 fails
>>>>>>>>> 3.14.0-rc8 fails
>>>>>>>>>
>>>>>>>>
>>>>>>>> Best you can do would really be to bisect the problem.
>>>>>>>> Unfortunately only you (or someone else with an affected
>>>>>>>> system)
>>>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>>>> to get it fixed.
>>>>>>>>
>>>>>>>> To answer your earlier question: I don't think you did
>>>>>>>> anything
>>>>>>>> wrong.
>>>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>>>> speak up
>>>>>>>> and help ;-).
>>>>>>>>
>>>>>>>> Guenter
>>>>>>>>
>>>>>>>
>>>>>>> I've now bisected two times. From two different kernel origins,
>>>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>>>> to boredom.
>>>>>>>
>>>>>>
>>>>>> Not really. Keep in mint that you were able to track down the
>>>>>> bad
>>>>>> commit
>>>>>> among more than 10,000 commits in a reasonably short period
>>>>>> of time.
>>>>>>
>>>>>>> In the end it says each time:
>>>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>>>> commit
>>>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>>>> Date: Wed Sep 25 20:39:45 2013 +0800
>>>>>>>
>>>>>>> ACPI / AC: convert ACPI ac driver to platform bus
>>>>>>>
>>>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>>> Signed-off-by: Rafael J. Wysocki
>>>>>>> <rafael.j.wysocki@intel.com>
>>>>>>>
>>>>>> Off to the two of you...
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers
>>>>>>>
>>>>>>>
>>>>>>> Please help me, on how I can help debug this more, and please
>>>>>>> also read the newest from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> Sorry, that I've forgotton to add the following last night: After
>>>>> the first bisection round, I was so glad about a result that
>>>>> time, that I reverted this mentioned patch from the 3.13.8
>>>>> kernel, but this didn't fix it.
>>>>
>>>> This means that the commit in question didn't introduce the
>>>> problem
>>>> you're seeing.
>>>>
>>>> Please check out commit 7f2dc5c4bcbf (Merge tag
>>>> 'dm-3.13-changes' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>>>
>>>> build a kernel from that and see if you can reprocude the
>>>> problem with it.
>>>> If so, it can be used as your new "first known bad" kernel for
>>>> bisection.
>>>> Otherwise, you can use it as the "first good" one and commit
>>>> cc8ef52707341
>>>> as "first known bad".
>>>>
>>>> Thanks!
>>>>
>>>
>>> Sorry, for any inconvenience, but you should forget about what
>>> I've written, that reverting the patch in question from 3.13.x
>>> didn't fix it. Of course it didn't fix it, as the patch doesn't
>>> cleanly revert from release-kernels at all. My mistake!
>>>
>>> I' ve been guided by Guenter Roeck through two more bisecting
>>> sessions/ways on this, that always pointed to the commit in
>>> question.
>>>
>>> Some citation:
>>> Me:
>>>>>> O.k. I've now followed your latest directions:
>>>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was BAD =>
>>>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was GOOD
>>>>>>
>>> [ ...]
>>>>>> Reverting that commit in question from this very git tree
>>>>>> makes the
>>>>>> kernel work as expected.
>>> [ ... ]
>>> Guenter:
>>>>> Report the results you have above. That should show without
>>>>> question
>>>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>>>> and it should be easy to reproduce.
>>>
>>> That seems to be all I can do for you for now. Please let me know
>>> of any preliminary patches to test!
>>> And I want to add special thanks to Guenter Roeck for his
>>> always-just-in-time assistance over so many days,
>>>
>>> Manuel Krause
>>>
>>
>> BTW -- applying this patch in question to a 3.12.17 kernel, that
>> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
>> kernels. (And, yes, the patch applied cleanly, compiled fine and
>> boots nicely.)
>>
> could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
> on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
> the problem still exist in 3.12.17 kernel?
>
> thanks,
> rui
I'm so sorry: 3.12.17 + cc8ef52707341e67a12067d6ead991d56ea017ca
+ 50a2bc5429f07ec4d53df2d287b03bdbceb281bb does NOT improve the
situation.
Thank you for your work,
Manuel
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-04-16 22:18 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
2014-03-07 20:55 ` Guenter Roeck
2014-03-07 22:04 ` Manuel Krause
2014-03-07 22:52 ` Guenter Roeck
2014-03-08 11:08 ` [lm-sensors] " Jean Delvare
2014-03-08 12:36 ` Rafael J. Wysocki
2014-03-08 15:59 ` Guenter Roeck
2014-03-09 0:10 ` Manuel Krause
2014-03-09 17:28 ` Guenter Roeck
2014-03-09 17:58 ` Rafael J. Wysocki
2014-03-10 1:49 ` Manuel Krause
2014-03-11 21:59 ` Manuel Krause
[not found] ` <532B4DC5.4010705@netscape.net>
2014-03-31 23:37 ` Manuel Krause
2014-03-31 23:47 ` Guenter Roeck
2014-04-06 2:37 ` Manuel Krause
2014-04-06 2:43 ` Guenter Roeck
2014-04-06 23:17 ` Manuel Krause
2014-04-07 11:45 ` Rafael J. Wysocki
2014-04-10 22:51 ` Manuel Krause
2014-04-13 0:05 ` Manuel Krause
2014-04-16 18:32 ` Zhang Rui
2014-04-16 22:17 ` Manuel Krause
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).