linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.13.?: Strange / dangerous fan policy...
@ 2014-03-07 19:33 Manuel Krause
  2014-03-07 20:55 ` Guenter Roeck
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-07 19:33 UTC (permalink / raw)
  To: linux-kernel, linux-pm

Please have a short look at the following BUG report + the 
comments -- this message here is a kind of FWD-ing it:
https://bugs.archlinux.org/task/39005

I came late to test kernel 3.13 with the .5 one, as it was the 
time that the related -CK/BFS patch became available.

I'm not using Archlinux, but openSUSE, and my problems are quite 
the same. Especially these with smelling melting plastics.

My own reports went to Con Kolivas' Blog first:
"I get weird temperatures and abrupt 100% fan actions with 
vanilla 3.13.5 with this CK and most recent BFQ at my HP Notebook.
In gkrellm the highest T had been @74°C, so far (3.12.13), and is 
now growing to 94°C. Then, the fan goes to 100% for 10~30secs 
cooling it to approx. 82°C.
That is not good, if I compare 74 to 94 °C.
Have I missed a .CONFIG option for 3.13, especially?"

I'd get the same without (Con's && BFQ's) patches.

Machine:           HP Notebook with Core2Duo CPU (Penryn)
Distro:            openSUSE 13.1, 64bit, continuously updated
Desktop:           KDE 4.12.3
MESA & drm & Xorg: most recent ones from:
http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/

Current kernel:    3.13.6 vanilla from openSUSE repos, with
                    -ck1 and BFQ patches
Same behaviour:    without these patches

Last good kernel:  3.12.13 vanilla + CK2 + BFQ


Please, _always_CC_me_ -- as I'm not on the linux-kernel / 
linux-pm mailing lists.

And please, if you know any person in charge of this -- lead this 
message to him/her.

Thank you in advance and best regards,
Manuel Krause


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
@ 2014-03-07 20:55 ` Guenter Roeck
  2014-03-07 22:04   ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-07 20:55 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> Please have a short look at the following BUG report + the comments
> -- this message here is a kind of FWD-ing it:
> https://bugs.archlinux.org/task/39005
> 
> I came late to test kernel 3.13 with the .5 one, as it was the time
> that the related -CK/BFS patch became available.
> 
> I'm not using Archlinux, but openSUSE, and my problems are quite the
> same. Especially these with smelling melting plastics.
> 
> My own reports went to Con Kolivas' Blog first:
> "I get weird temperatures and abrupt 100% fan actions with vanilla
> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> cooling it to approx. 82°C.
> That is not good, if I compare 74 to 94 °C.
> Have I missed a .CONFIG option for 3.13, especially?"
> 
> I'd get the same without (Con's && BFQ's) patches.
> 
> Machine:           HP Notebook with Core2Duo CPU (Penryn)
> Distro:            openSUSE 13.1, 64bit, continuously updated
> Desktop:           KDE 4.12.3
> MESA & drm & Xorg: most recent ones from:
> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> 
> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>                    -ck1 and BFQ patches
> Same behaviour:    without these patches
> 
> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> 

Can you add more information about your fan control policy ?
Do you rely on the hardware for automatic fan speed control,
or do you run the fancontrol script ?

What is the output from the 'sensors' command ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 20:55 ` Guenter Roeck
@ 2014-03-07 22:04   ` Manuel Krause
  2014-03-07 22:52     ` Guenter Roeck
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-07 22:04 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, linux-pm, lm-sensors

On 2014-03-07 21:55, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
>> Please have a short look at the following BUG report + the comments
>> -- this message here is a kind of FWD-ing it:
>> https://bugs.archlinux.org/task/39005
>>
>> I came late to test kernel 3.13 with the .5 one, as it was the time
>> that the related -CK/BFS patch became available.
>>
>> I'm not using Archlinux, but openSUSE, and my problems are quite the
>> same. Especially these with smelling melting plastics.
>>
>> My own reports went to Con Kolivas' Blog first:
>> "I get weird temperatures and abrupt 100% fan actions with vanilla
>> 3.13.5 with this CK and most recent BFQ at my HP Notebook.
>> In gkrellm the highest T had been @74°C, so far (3.12.13), and is
>> now growing to 94°C. Then, the fan goes to 100% for 10~30secs
>> cooling it to approx. 82°C.
>> That is not good, if I compare 74 to 94 °C.
>> Have I missed a .CONFIG option for 3.13, especially?"
>>
>> I'd get the same without (Con's && BFQ's) patches.
>>
>> Machine:           HP Notebook with Core2Duo CPU (Penryn)
>> Distro:            openSUSE 13.1, 64bit, continuously updated
>> Desktop:           KDE 4.12.3
>> MESA & drm & Xorg: most recent ones from:
>> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
>>
>> Current kernel:    3.13.6 vanilla from openSUSE repos, with
>>                     -ck1 and BFQ patches
>> Same behaviour:    without these patches
>>
>> Last good kernel:  3.12.13 vanilla + CK2 + BFQ
>>
>
> Can you add more information about your fan control policy ?
> Do you rely on the hardware for automatic fan speed control,
> or do you run the fancontrol script ?
>
> What is the output from the 'sensors' command ?
>
> Thanks,
> Guenter
>

Hi, and thanks for the quick response!
No special fancy "fan control policy". 'fancontrol' isn't up or 
running.
Vanilla kernels 3.11.* and 3.12.* had been working on here 
without any extra work.
--
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +71.0°C  (crit = +256.0°C)
temp2:        +69.0°C  (crit = +110.0°C)
temp3:        +52.0°C  (crit = +105.0°C)
temp4:        +25.0°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
--
My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
This is with 3.12.13 with my normal workload.

Please, trust my above mentionned values of 94 °C vs. 74°C as I 
don't like to boot 3.13.6 anymore, to avoid harm to the 
notebook's casing.

But I'd do to test any improvement-patch.

Manuel Krause



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-07 22:04   ` Manuel Krause
@ 2014-03-07 22:52     ` Guenter Roeck
  2014-03-08 11:08       ` [lm-sensors] " Jean Delvare
  0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-07 22:52 UTC (permalink / raw)
  To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors

On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> On 2014-03-07 21:55, Guenter Roeck wrote:
> >On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote:
> >>Please have a short look at the following BUG report + the comments
> >>-- this message here is a kind of FWD-ing it:
> >>https://bugs.archlinux.org/task/39005
> >>
> >>I came late to test kernel 3.13 with the .5 one, as it was the time
> >>that the related -CK/BFS patch became available.
> >>
> >>I'm not using Archlinux, but openSUSE, and my problems are quite the
> >>same. Especially these with smelling melting plastics.
> >>
> >>My own reports went to Con Kolivas' Blog first:
> >>"I get weird temperatures and abrupt 100% fan actions with vanilla
> >>3.13.5 with this CK and most recent BFQ at my HP Notebook.
> >>In gkrellm the highest T had been @74°C, so far (3.12.13), and is
> >>now growing to 94°C. Then, the fan goes to 100% for 10~30secs
> >>cooling it to approx. 82°C.
> >>That is not good, if I compare 74 to 94 °C.
> >>Have I missed a .CONFIG option for 3.13, especially?"
> >>
> >>I'd get the same without (Con's && BFQ's) patches.
> >>
> >>Machine:           HP Notebook with Core2Duo CPU (Penryn)
> >>Distro:            openSUSE 13.1, 64bit, continuously updated
> >>Desktop:           KDE 4.12.3
> >>MESA & drm & Xorg: most recent ones from:
> >>http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/
> >>
> >>Current kernel:    3.13.6 vanilla from openSUSE repos, with
> >>                    -ck1 and BFQ patches
> >>Same behaviour:    without these patches
> >>
> >>Last good kernel:  3.12.13 vanilla + CK2 + BFQ
> >>
> >
> >Can you add more information about your fan control policy ?
> >Do you rely on the hardware for automatic fan speed control,
> >or do you run the fancontrol script ?
> >
> >What is the output from the 'sensors' command ?
> >
> >Thanks,
> >Guenter
> >
> 
> Hi, and thanks for the quick response!
> No special fancy "fan control policy". 'fancontrol' isn't up or
> running.
> Vanilla kernels 3.11.* and 3.12.* had been working on here without
> any extra work.
> --
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +71.0°C  (crit = +256.0°C)
> temp2:        +69.0°C  (crit = +110.0°C)
> temp3:        +52.0°C  (crit = +105.0°C)
> temp4:        +25.0°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> --
> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> This is with 3.12.13 with my normal workload.
> 
> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> casing.
> 
Understood. Unfortunately, we'll need to get information
from the new kernel to be able to track down the problem.

> But I'd do to test any improvement-patch.
> 
So far I have no idea what is going on. I don't see anything in the
drivers providing above data that would explain the behavior,
but I might be missing something.

Of course, if output is different in 3.13, that would be important
to know. Maybe someone else can post related information for both
kernel versions on an affected system.

Guenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-07 22:52     ` Guenter Roeck
@ 2014-03-08 11:08       ` Jean Delvare
  2014-03-08 12:36         ` Rafael J. Wysocki
  2014-03-08 15:59         ` Guenter Roeck
  0 siblings, 2 replies; 22+ messages in thread
From: Jean Delvare @ 2014-03-08 11:08 UTC (permalink / raw)
  To: Manuel Krause; +Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > Hi, and thanks for the quick response!
> > No special fancy "fan control policy". 'fancontrol' isn't up or
> > running.
> > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > any extra work.
> > --
> > # sensors
> > acpitz-virtual-0
> > Adapter: Virtual device
> > temp1:        +71.0°C  (crit = +256.0°C)
> > temp2:        +69.0°C  (crit = +110.0°C)
> > temp3:        +52.0°C  (crit = +105.0°C)
> > temp4:        +25.0°C  (crit = +110.0°C)
> > temp5:        +58.0°C  (crit = +110.0°C)
> > 
> > coretemp-isa-0000
> > Adapter: ISA adapter
> > Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> > Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> > --
> > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > This is with 3.12.13 with my normal workload.
> > 
> > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > casing.
> 
> Understood. Unfortunately, we'll need to get information
> from the new kernel to be able to track down the problem.

Indeed. Not only the run-time temperatures, but also the high and crit
limits.

> > But I'd do to test any improvement-patch.
> 
> So far I have no idea what is going on. I don't see anything in the
> drivers providing above data that would explain the behavior,
> but I might be missing something.

Looks like a regression in the acpi subsystem or in power management,
not hwmon. Hwmon is merely reporting the temperatures, it's not
responsible for the actual temperatures.

A bisection would certainly help, but of course that would require
booting to a bad kernel half of the time, which I understand Manual
wouldn't enjoy.

The only two components which I think can reach such high temperatures
in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
74°C" refers to acpitz's temp1? If the the temperatures reported by
coretemp remain the same, then I can only suppose that temp1 is the GPU
temperature. Please tell us which GPU is in this laptop, and which
driver you're using.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-08 11:08       ` [lm-sensors] " Jean Delvare
@ 2014-03-08 12:36         ` Rafael J. Wysocki
  2014-03-08 15:59         ` Guenter Roeck
  1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-03-08 12:36 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause
  Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm

On Saturday, March 08, 2014 12:08:31 PM Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> > On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> > > Hi, and thanks for the quick response!
> > > No special fancy "fan control policy". 'fancontrol' isn't up or
> > > running.
> > > Vanilla kernels 3.11.* and 3.12.* had been working on here without
> > > any extra work.
> > > --
> > > # sensors
> > > acpitz-virtual-0
> > > Adapter: Virtual device
> > > temp1:        +71.0°C  (crit = +256.0°C)
> > > temp2:        +69.0°C  (crit = +110.0°C)
> > > temp3:        +52.0°C  (crit = +105.0°C)
> > > temp4:        +25.0°C  (crit = +110.0°C)
> > > temp5:        +58.0°C  (crit = +110.0°C)
> > > 
> > > coretemp-isa-0000
> > > Adapter: ISA adapter
> > > Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> > > Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> > > --
> > > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
> > > This is with 3.12.13 with my normal workload.
> > > 
> > > Please, trust my above mentionned values of 94 °C vs. 74°C as I
> > > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
> > > casing.
> > 
> > Understood. Unfortunately, we'll need to get information
> > from the new kernel to be able to track down the problem.
> 
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
> 
> > > But I'd do to test any improvement-patch.
> > 
> > So far I have no idea what is going on. I don't see anything in the
> > drivers providing above data that would explain the behavior,
> > but I might be missing something.
> 
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
> 
> A bisection would certainly help, but of course that would require
> booting to a bad kernel half of the time, which I understand Manual
> wouldn't enjoy.
> 
> The only two components which I think can reach such high temperatures
> in a laptop are the CPU and the GPU. I suppose that the "94 °C vs.
> 74°C" refers to acpitz's temp1? If the the temperatures reported by
> coretemp remain the same, then I can only suppose that temp1 is the GPU
> temperature. Please tell us which GPU is in this laptop, and which
> driver you're using.

Also it would be good to know which cpufreq and cpuidle drivers are in use
and whether or not 3.14-rc5 has the problem.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy...
  2014-03-08 11:08       ` [lm-sensors] " Jean Delvare
  2014-03-08 12:36         ` Rafael J. Wysocki
@ 2014-03-08 15:59         ` Guenter Roeck
  2014-03-09  0:10           ` Manuel Krause
  1 sibling, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-08 15:59 UTC (permalink / raw)
  To: Jean Delvare, Manuel Krause; +Cc: lm-sensors, linux-kernel, linux-pm

On 03/08/2014 03:08 AM, Jean Delvare wrote:
> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>> Hi, and thanks for the quick response!
>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>> running.
>>> Vanilla kernels 3.11.* and 3.12.* had been working on here without
>>> any extra work.
>>> --
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +71.0°C  (crit = +256.0°C)
>>> temp2:        +69.0°C  (crit = +110.0°C)
>>> temp3:        +52.0°C  (crit = +105.0°C)
>>> temp4:        +25.0°C  (crit = +110.0°C)
>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>> --
>>> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor.
>>> This is with 3.12.13 with my normal workload.
>>>
>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's
>>> casing.
>>
>> Understood. Unfortunately, we'll need to get information
>> from the new kernel to be able to track down the problem.
>
> Indeed. Not only the run-time temperatures, but also the high and crit
> limits.
>
>>> But I'd do to test any improvement-patch.
>>
>> So far I have no idea what is going on. I don't see anything in the
>> drivers providing above data that would explain the behavior,
>> but I might be missing something.
>
> Looks like a regression in the acpi subsystem or in power management,
> not hwmon. Hwmon is merely reporting the temperatures, it's not
> responsible for the actual temperatures.
>

I would agree. I don't think we have enough information to be sure,
though. There might be some unintended interaction or interference.

gpu is a good hint ... for example, look at commit b9ed919f1c8
(drm/nouveau/drm/pm: remove everything except the hwmon interfaces
to THERM). nouveau does export pwm and fan control information,
so any change in that code may have unintended side effects.
Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
use devm_hwmon_register_with_groups) could have the observed impact,
as it is purely passive, but I prefer to be rather safe than sorry.

This problem has now been submitted into bugzilla as
https://bugzilla.kernel.org/show_bug.cgi?id=71711.

Guenter


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-08 15:59         ` Guenter Roeck
@ 2014-03-09  0:10           ` Manuel Krause
  2014-03-09 17:28             ` Guenter Roeck
  2014-03-09 17:58             ` Rafael J. Wysocki
  0 siblings, 2 replies; 22+ messages in thread
From: Manuel Krause @ 2014-03-09  0:10 UTC (permalink / raw)
  To: Guenter Roeck, linux-kernel, linux-pm; +Cc: Rafael J. Wysocki, lm-sensors

On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>

Sorry, for beeing late, had to search for/accumulate much info 
for you...
I hope, you like me to put it into one answer to you all CCing you.

My GFX is a GM45 Intel (mobile), shared memory, running the 
opensource Mesa drivers/extensions.
kernel-module: i915

According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu

CPUfreq:
   driver: acpi-cpufreq
   available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  (crit = +256.0°C)
temp2:        +92.0°C  (crit = +110.0°C)
temp3:        +71.0°C  (crit = +105.0°C)
temp4:        +26.5°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)

FROM a critical "smelly" situation today, kernel-compilation, fan 
@100%.
--

Additional findings:

Identification from bootup ACPI initialisation vs. sensors:
temp1 = DTSZ
temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
temp3 = SKNZ
temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
(25 - 45 - 58 - max?)
Core 0 & Core 1 are the internal CPU T sensors.

With the 3.13.x (.5+) kernels the first gatherered cooling 
settings from bootup do stay forever. Means, rebooting a hot 
system will get a FDTZ @45°C+ and won't make any problems, as it 
does cool enough (even for kernel compiling on here). If it gets 
25°C @bootup the system goes into emergency cooling somewhen. 
Same is with a suspend/resume.

Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.


Thank you all for your engagement, best regards,
Manuel Krause.



_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09  0:10           ` Manuel Krause
@ 2014-03-09 17:28             ` Guenter Roeck
  2014-03-09 17:58             ` Rafael J. Wysocki
  1 sibling, 0 replies; 22+ messages in thread
From: Guenter Roeck @ 2014-03-09 17:28 UTC (permalink / raw)
  To: Manuel Krause, linux-kernel, linux-pm
  Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki

On 03/08/2014 04:10 PM, Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>> Hi, and thanks for the quick response!
>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>> running.
>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>> without
>>>>> any extra work.
>>>>> --
>>>>> # sensors
>>>>> acpitz-virtual-0
>>>>> Adapter: Virtual device
>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>
>>>>> coretemp-isa-0000
>>>>> Adapter: ISA adapter
>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>> --
>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>> sensor.
>>>>> This is with 3.12.13 with my normal workload.
>>>>>
>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>> notebook's
>>>>> casing.
>>>>
>>>> Understood. Unfortunately, we'll need to get information
>>>> from the new kernel to be able to track down the problem.
>>>
>>> Indeed. Not only the run-time temperatures, but also the high
>>> and crit
>>> limits.
>>>
>>>>> But I'd do to test any improvement-patch.
>>>>
>>>> So far I have no idea what is going on. I don't see anything
>>>> in the
>>>> drivers providing above data that would explain the behavior,
>>>> but I might be missing something.
>>>
>>> Looks like a regression in the acpi subsystem or in power
>>> management,
>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>> responsible for the actual temperatures.
>>>
>>
>> I would agree. I don't think we have enough information to be sure,
>> though. There might be some unintended interaction or interference.
>>
>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>> to THERM). nouveau does export pwm and fan control information,
>> so any change in that code may have unintended side effects.
>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>> use devm_hwmon_register_with_groups) could have the observed impact,
>> as it is purely passive, but I prefer to be rather safe than sorry.
>>
>> This problem has now been submitted into bugzilla as
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>
>> Guenter
>>
>
> Sorry, for beeing late, had to search for/accumulate much info for you...
> I hope, you like me to put it into one answer to you all CCing you.
>
> My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
> kernel-module: i915
>
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
>
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
>
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>
> FROM a critical "smelly" situation today, kernel-compilation, fan @100%.
> --
>
> Additional findings:
>
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
>
> With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume.
>
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>

Hi Manuel,

thanks a lot for the additional information.

I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711).
This is pretty much all I can do at this point; I have no idea what
is going on. Some change in ACPI would be my guess, but I did not see
anything catching my eye when looking through the ACPI code.

Guenter

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09  0:10           ` Manuel Krause
  2014-03-09 17:28             ` Guenter Roeck
@ 2014-03-09 17:58             ` Rafael J. Wysocki
  2014-03-10  1:49               ` Manuel Krause
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-03-09 17:58 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors,
	rui.zhang

On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> On 2014-03-08 16:59, Guenter Roeck wrote:
> > On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
> >>>> Hi, and thanks for the quick response!
> >>>> No special fancy "fan control policy". 'fancontrol' isn't up or
> >>>> running.
> >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
> >>>> without
> >>>> any extra work.
> >>>> --
> >>>> # sensors
> >>>> acpitz-virtual-0
> >>>> Adapter: Virtual device
> >>>> temp1:        +71.0°C  (crit = +256.0°C)
> >>>> temp2:        +69.0°C  (crit = +110.0°C)
> >>>> temp3:        +52.0°C  (crit = +105.0°C)
> >>>> temp4:        +25.0°C  (crit = +110.0°C)
> >>>> temp5:        +58.0°C  (crit = +110.0°C)
> >>>>
> >>>> coretemp-isa-0000
> >>>> Adapter: ISA adapter
> >>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
> >>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
> >>>> --
> >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
> >>>> sensor.
> >>>> This is with 3.12.13 with my normal workload.
> >>>>
> >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
> >>>> don't like to boot 3.13.6 anymore, to avoid harm to the
> >>>> notebook's
> >>>> casing.
> >>>
> >>> Understood. Unfortunately, we'll need to get information
> >>> from the new kernel to be able to track down the problem.
> >>
> >> Indeed. Not only the run-time temperatures, but also the high
> >> and crit
> >> limits.
> >>
> >>>> But I'd do to test any improvement-patch.
> >>>
> >>> So far I have no idea what is going on. I don't see anything
> >>> in the
> >>> drivers providing above data that would explain the behavior,
> >>> but I might be missing something.
> >>
> >> Looks like a regression in the acpi subsystem or in power
> >> management,
> >> not hwmon. Hwmon is merely reporting the temperatures, it's not
> >> responsible for the actual temperatures.
> >>
> >
> > I would agree. I don't think we have enough information to be sure,
> > though. There might be some unintended interaction or interference.
> >
> > gpu is a good hint ... for example, look at commit b9ed919f1c8
> > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> > to THERM). nouveau does export pwm and fan control information,
> > so any change in that code may have unintended side effects.
> > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> > use devm_hwmon_register_with_groups) could have the observed impact,
> > as it is purely passive, but I prefer to be rather safe than sorry.
> >
> > This problem has now been submitted into bugzilla as
> > https://bugzilla.kernel.org/show_bug.cgi?id=71711.
> >
> > Guenter
> >
> 
> Sorry, for beeing late, had to search for/accumulate much info 
> for you...
> I hope, you like me to put it into one answer to you all CCing you.
> 
> My GFX is a GM45 Intel (mobile), shared memory, running the 
> opensource Mesa drivers/extensions.
> kernel-module: i915
> 
> According to the output of 'cpupower': I have
> CPUidle driver: acpi_idle
> CPUidle governor: menu
> 
> CPUfreq:
>    driver: acpi-cpufreq
>    available cpufreq governors: ondemand, performance
> -
> And "ondemand" is running.
> --
> 
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +41.0°C  (crit = +256.0°C)
> temp2:        +92.0°C  (crit = +110.0°C)
> temp3:        +71.0°C  (crit = +105.0°C)
> temp4:        +26.5°C  (crit = +110.0°C)
> temp5:        +25.0°C  (crit = +110.0°C)
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
> 
> FROM a critical "smelly" situation today, kernel-compilation, fan 
> @100%.
> --
> 
> Additional findings:
> 
> Identification from bootup ACPI initialisation vs. sensors:
> temp1 = DTSZ
> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
> temp3 = SKNZ
> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan 
> (25 - 45 - 58 - max?)
> Core 0 & Core 1 are the internal CPU T sensors.
> 
> With the 3.13.x (.5+) kernels the first gatherered cooling 
> settings from bootup do stay forever. Means, rebooting a hot 
> system will get a FDTZ @45°C+ and won't make any problems, as it 
> does cool enough (even for kernel compiling on here). If it gets 
> 25°C @bootup the system goes into emergency cooling somewhen. 
> Same is with a suspend/resume.
> 
> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.

This almost certainly is an ACPI regression, but I'm not sure whether
thermal management or CPU power management is broken on your system.

Can you compare the contents of /sys/class/thermal/ from working and
not working kernels, please?

Rafael


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-09 17:58             ` Rafael J. Wysocki
@ 2014-03-10  1:49               ` Manuel Krause
  2014-03-11 21:59                 ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-10  1:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm
  Cc: Guenter Roeck, Jean Delvare, lm-sensors, rui.zhang

On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>> Hi, and thanks for the quick response!
>>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>>>> running.
>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>> without
>>>>>> any extra work.
>>>>>> --
>>>>>> # sensors
>>>>>> acpitz-virtual-0
>>>>>> Adapter: Virtual device
>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>
>>>>>> coretemp-isa-0000
>>>>>> Adapter: ISA adapter
>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>> --
>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>> sensor.
>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>
>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>> notebook's
>>>>>> casing.
>>>>>
>>>>> Understood. Unfortunately, we'll need to get information
>>>>> from the new kernel to be able to track down the problem.
>>>>
>>>> Indeed. Not only the run-time temperatures, but also the high
>>>> and crit
>>>> limits.
>>>>
>>>>>> But I'd do to test any improvement-patch.
>>>>>
>>>>> So far I have no idea what is going on. I don't see anything
>>>>> in the
>>>>> drivers providing above data that would explain the behavior,
>>>>> but I might be missing something.
>>>>
>>>> Looks like a regression in the acpi subsystem or in power
>>>> management,
>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>> responsible for the actual temperatures.
>>>>
>>>
>>> I would agree. I don't think we have enough information to be sure,
>>> though. There might be some unintended interaction or interference.
>>>
>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
>>> to THERM). nouveau does export pwm and fan control information,
>>> so any change in that code may have unintended side effects.
>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>> use devm_hwmon_register_with_groups) could have the observed impact,
>>> as it is purely passive, but I prefer to be rather safe than sorry.
>>>
>>> This problem has now been submitted into bugzilla as
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>
>>> Guenter
>>>
>>
>> Sorry, for beeing late, had to search for/accumulate much info
>> for you...
>> I hope, you like me to put it into one answer to you all CCing you.
>>
>> My GFX is a GM45 Intel (mobile), shared memory, running the
>> opensource Mesa drivers/extensions.
>> kernel-module: i915
>>
>> According to the output of 'cpupower': I have
>> CPUidle driver: acpi_idle
>> CPUidle governor: menu
>>
>> CPUfreq:
>>     driver: acpi-cpufreq
>>     available cpufreq governors: ondemand, performance
>> -
>> And "ondemand" is running.
>> --
>>
>> # sensors
>> acpitz-virtual-0
>> Adapter: Virtual device
>> temp1:        +41.0°C  (crit = +256.0°C)
>> temp2:        +92.0°C  (crit = +110.0°C)
>> temp3:        +71.0°C  (crit = +105.0°C)
>> temp4:        +26.5°C  (crit = +110.0°C)
>> temp5:        +25.0°C  (crit = +110.0°C)
>>
>> coretemp-isa-0000
>> Adapter: ISA adapter
>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>
>> FROM a critical "smelly" situation today, kernel-compilation, fan
>> @100%.
>> --
>>
>> Additional findings:
>>
>> Identification from bootup ACPI initialisation vs. sensors:
>> temp1 = DTSZ
>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>> temp3 = SKNZ
>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>> (25 - 45 - 58 - max?)
>> Core 0 & Core 1 are the internal CPU T sensors.
>>
>> With the 3.13.x (.5+) kernels the first gatherered cooling
>> settings from bootup do stay forever. Means, rebooting a hot
>> system will get a FDTZ @45°C+ and won't make any problems, as it
>> does cool enough (even for kernel compiling on here). If it gets
>> 25°C @bootup the system goes into emergency cooling somewhen.
>> Same is with a suspend/resume.
>>
>> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately.
>
> This almost certainly is an ACPI regression, but I'm not sure whether
> thermal management or CPU power management is broken on your system.
>
> Can you compare the contents of /sys/class/thermal/ from working and
> not working kernels, please?
>
> Rafael
>

Hi again,
unfortunately you didn't specify how deeply I should dig into 
/sys/class/thermal. So you get the lines from # BOF # to # EOF # 
below. I hope they're readable without more comments.

The most remarkable changes, in my eyes, had happened within 
"thermal_zone1".

Best regards,
Manuel Krause


# BOF #
Following ones are all from /sys/class/thermal/ which are links 
to -> ../../devices/virtual/thermal/

I've listed the directories in sections of cooling_devices and 
thermal_zones separately for each bad/good kernel. For Emailing 
purposes only. You can merge them into a spreadsheet for your 
evaluation on your own. I've left out reporting some subdirs and 
subdir's values that _really_ didn't seem to need attention.

Also, I've had collected the #sensors output for each readout, 
having reproduced nearly the same workload, represented by the 
"Fan speed" (thermal_zone4==FDTZ).

And I've done my very best to not produce typos or c&p errors.


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          0           1
cooling_device5  Fan          0           1
cooling_device6  Fan          0           1
cooling_device7  LCD          0          24

  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir             |-
                  /type       /cur_state  /max_state
cooling_device0  Processor    0          10
cooling_device1  Processor    0          10
cooling_device2  Fan          0           1
cooling_device3  Fan          1           1
cooling_device4  Fan          1           1
cooling_device5  Fan          1           1
cooling_device6  Fan          1           1
cooling_device7  LCD          0          24


  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        68000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
thermal_zone2   n.a.    54000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25800 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
thermal_zone0  0        50000   ?=0    n.a.   256000   critical
thermal_zone1   n.a.    70000 |-
                                 ?=0   1       110000   critical
                                 ?=1   1       107000   passive
                                 ?=2   2        90000   active
                                 ?=3   3        67000   active
                                 ?=4   4        55000   active
                                 ?=5   5        45000   active
                                 ?=6   6        30000   active
thermal_zone2   n.a.    53000 |-
                                 ?=0   1       105000   critical
                                 ?=1   1        95000   passive
thermal_zone3   n.a.    25600 |-
                                 ?=0   1       110000   critical
                                 ?=1   1        60000   passive
thermal_zone4  0        58000   ?=0    n.a.   110000   critical

---
Legend here:
        /type  is always  acpitz
        /mode             enabled
        /policy           step_wise

       - from kernel ACPI initialisation: thermal_zone0==DTSZ,
          thermal_zone1==CPUZ, thermal_zone2==SKNZ,
          thermal_zone3==BATZ, thermal_zone4==FDTZ
       - n.a. means      file or value is not available
___
Legend in general:
              /power/control          is always  auto
              /power/runtime_status              unsupported
              /uevent                            ''==empty

----------------------------------------------------------------

  3.13.5 -- 20140309 -- 20:52 -- bad
=============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +68.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +54.0°C  (crit = +105.0°C)
temp4:        +25.8°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)


  3.12.13 -- 20140310 -- 00:26 -- good
==============================
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +70.0°C  (crit = +110.0°C)
temp3:        +53.0°C  (crit = +105.0°C)
temp4:        +25.6°C  (crit = +110.0°C)
temp5:        +58.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)

# EOF #



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-10  1:49               ` Manuel Krause
@ 2014-03-11 21:59                 ` Manuel Krause
       [not found]                   ` <532B4DC5.4010705@netscape.net>
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-11 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-10 02:49, Manuel Krause wrote:
> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>>>>> Hi, and thanks for the quick response!
>>>>>>> No special fancy "fan control policy". 'fancontrol' isn't
>>>>>>> up or
>>>>>>> running.
>>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>>>>> without
>>>>>>> any extra work.
>>>>>>> --
>>>>>>> # sensors
>>>>>>> acpitz-virtual-0
>>>>>>> Adapter: Virtual device
>>>>>>> temp1:        +71.0°C  (crit = +256.0°C)
>>>>>>> temp2:        +69.0°C  (crit = +110.0°C)
>>>>>>> temp3:        +52.0°C  (crit = +105.0°C)
>>>>>>> temp4:        +25.0°C  (crit = +110.0°C)
>>>>>>> temp5:        +58.0°C  (crit = +110.0°C)
>>>>>>>
>>>>>>> coretemp-isa-0000
>>>>>>> Adapter: ISA adapter
>>>>>>> Core 0:       +62.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> Core 1:       +60.0°C  (high = +105.0°C, crit = +105.0°C)
>>>>>>> --
>>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>>>>> sensor.
>>>>>>> This is with 3.12.13 with my normal workload.
>>>>>>>
>>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C
>>>>>>> as I
>>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>>>>> notebook's
>>>>>>> casing.
>>>>>>
>>>>>> Understood. Unfortunately, we'll need to get information
>>>>>> from the new kernel to be able to track down the problem.
>>>>>
>>>>> Indeed. Not only the run-time temperatures, but also the high
>>>>> and crit
>>>>> limits.
>>>>>
>>>>>>> But I'd do to test any improvement-patch.
>>>>>>
>>>>>> So far I have no idea what is going on. I don't see anything
>>>>>> in the
>>>>>> drivers providing above data that would explain the behavior,
>>>>>> but I might be missing something.
>>>>>
>>>>> Looks like a regression in the acpi subsystem or in power
>>>>> management,
>>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>>>>> responsible for the actual temperatures.
>>>>>
>>>>
>>>> I would agree. I don't think we have enough information to be
>>>> sure,
>>>> though. There might be some unintended interaction or
>>>> interference.
>>>>
>>>> gpu is a good hint ... for example, look at commit b9ed919f1c8
>>>> (drm/nouveau/drm/pm: remove everything except the hwmon
>>>> interfaces
>>>> to THERM). nouveau does export pwm and fan control information,
>>>> so any change in that code may have unintended side effects.
>>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
>>>> use devm_hwmon_register_with_groups) could have the observed
>>>> impact,
>>>> as it is purely passive, but I prefer to be rather safe than
>>>> sorry.
>>>>
>>>> This problem has now been submitted into bugzilla as
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>>>>
>>>> Guenter
>>>>
>>>
>>> Sorry, for beeing late, had to search for/accumulate much info
>>> for you...
>>> I hope, you like me to put it into one answer to you all CCing
>>> you.
>>>
>>> My GFX is a GM45 Intel (mobile), shared memory, running the
>>> opensource Mesa drivers/extensions.
>>> kernel-module: i915
>>>
>>> According to the output of 'cpupower': I have
>>> CPUidle driver: acpi_idle
>>> CPUidle governor: menu
>>>
>>> CPUfreq:
>>>     driver: acpi-cpufreq
>>>     available cpufreq governors: ondemand, performance
>>> -
>>> And "ondemand" is running.
>>> --
>>>
>>> # sensors
>>> acpitz-virtual-0
>>> Adapter: Virtual device
>>> temp1:        +41.0°C  (crit = +256.0°C)
>>> temp2:        +92.0°C  (crit = +110.0°C)
>>> temp3:        +71.0°C  (crit = +105.0°C)
>>> temp4:        +26.5°C  (crit = +110.0°C)
>>> temp5:        +25.0°C  (crit = +110.0°C)
>>>
>>> coretemp-isa-0000
>>> Adapter: ISA adapter
>>> Core 0:       +86.0°C  (high = +105.0°C, crit = +105.0°C)
>>> Core 1:       +84.0°C  (high = +105.0°C, crit = +105.0°C)
>>>
>>> FROM a critical "smelly" situation today, kernel-compilation, fan
>>> @100%.
>>> --
>>>
>>> Additional findings:
>>>
>>> Identification from bootup ACPI initialisation vs. sensors:
>>> temp1 = DTSZ
>>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C
>>> temp3 = SKNZ
>>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T
>>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan
>>> (25 - 45 - 58 - max?)
>>> Core 0 & Core 1 are the internal CPU T sensors.
>>>
>>> With the 3.13.x (.5+) kernels the first gatherered cooling
>>> settings from bootup do stay forever. Means, rebooting a hot
>>> system will get a FDTZ @45°C+ and won't make any problems, as it
>>> does cool enough (even for kernel compiling on here). If it gets
>>> 25°C @bootup the system goes into emergency cooling somewhen.
>>> Same is with a suspend/resume.
>>>
>>> Kernel 3.12.13 adjusts the cooling on it's own, but
>>> appropriately.
>>
>> This almost certainly is an ACPI regression, but I'm not sure
>> whether
>> thermal management or CPU power management is broken on your
>> system.
>>
>> Can you compare the contents of /sys/class/thermal/ from
>> working and
>> not working kernels, please?
>>
>> Rafael
>>
>
> Hi again,
> unfortunately you didn't specify how deeply I should dig into
> /sys/class/thermal. So you get the lines from # BOF # to # EOF #
> below. I hope they're readable without more comments.
>
> The most remarkable changes, in my eyes, had happened within
> "thermal_zone1".
>
> Best regards,
> Manuel Krause
>
>
> # BOF #
> Following ones are all from /sys/class/thermal/ which are links
> to -> ../../devices/virtual/thermal/
>
> I've listed the directories in sections of cooling_devices and
> thermal_zones separately for each bad/good kernel. For Emailing
> purposes only. You can merge them into a spreadsheet for your
> evaluation on your own. I've left out reporting some subdirs and
> subdir's values that _really_ didn't seem to need attention.
>
> Also, I've had collected the #sensors output for each readout,
> having reproduced nearly the same workload, represented by the
> "Fan speed" (thermal_zone4==FDTZ).
>
> And I've done my very best to not produce typos or c&p errors.
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          0           1
> cooling_device5  Fan          0           1
> cooling_device6  Fan          0           1
> cooling_device7  LCD          0          24
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir             |-
>                   /type       /cur_state  /max_state
> cooling_device0  Processor    0          10
> cooling_device1  Processor    0          10
> cooling_device2  Fan          0           1
> cooling_device3  Fan          1           1
> cooling_device4  Fan          1           1
> cooling_device5  Fan          1           1
> cooling_device6  Fan          1           1
> cooling_device7  LCD          0          24
>
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        68000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   6       110000   critical
>                                  ?=1   5       107000   passive
>                                  ?=2   4        90000   active
>                                  ?=3   3        75000   active
>                                  ?=4   2        55000   active
>                                  ?=5   1        45000   active
>                                  ?=6   1        30000   active
> thermal_zone2   n.a.    54000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25800 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir          |-
>                /passive /temp  |-     /cdev?_  /trip_   /trip_
>                                        trip_    point_   point_
>                                        point    ?_temp   ?_type
> thermal_zone0  0        50000   ?=0    n.a.   256000   critical
> thermal_zone1   n.a.    70000 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1       107000   passive
>                                  ?=2   2        90000   active
>                                  ?=3   3        67000   active
>                                  ?=4   4        55000   active
>                                  ?=5   5        45000   active
>                                  ?=6   6        30000   active
> thermal_zone2   n.a.    53000 |-
>                                  ?=0   1       105000   critical
>                                  ?=1   1        95000   passive
> thermal_zone3   n.a.    25600 |-
>                                  ?=0   1       110000   critical
>                                  ?=1   1        60000   passive
> thermal_zone4  0        58000   ?=0    n.a.   110000   critical
>
> ---
> Legend here:
>         /type  is always  acpitz
>         /mode             enabled
>         /policy           step_wise
>
>        - from kernel ACPI initialisation: thermal_zone0==DTSZ,
>           thermal_zone1==CPUZ, thermal_zone2==SKNZ,
>           thermal_zone3==BATZ, thermal_zone4==FDTZ
>        - n.a. means      file or value is not available
> ___
> Legend in general:
>               /power/control          is always  auto
>               /power/runtime_status              unsupported
>               /uevent                            ''==empty
>
> ----------------------------------------------------------------
>
>   3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +68.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +54.0°C  (crit = +105.0°C)
> temp4:        +25.8°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +66.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +63.0°C  (high = +105.0°C, crit = +105.0°C)
>
>
>   3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> # sensors
> acpitz-virtual-0
> Adapter: Virtual device
> temp1:        +50.0°C  (crit = +256.0°C)
> temp2:        +70.0°C  (crit = +110.0°C)
> temp3:        +53.0°C  (crit = +105.0°C)
> temp4:        +25.6°C  (crit = +110.0°C)
> temp5:        +58.0°C  (crit = +110.0°C)
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +65.0°C  (high = +105.0°C, crit = +105.0°C)
> Core 1:       +61.0°C  (high = +105.0°C, crit = +105.0°C)
>
> # EOF #
>
>

Hi, and thank you for your attention ^^

at the bottom of this email you'd get the actual values for the 
new 3.12.14 kernel for two different levels of usage and ambient 
temperature.
You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration 
has changed to the way of 3.13.? and also one /trip_point_?_temp 
did. But 3.12.14 is working as well as 3.12.13. (So my first 
eyecatcher didn't lead to useful things.)
I'm not capaple of finding or understanding the related code, 
but, please, let me present an idea of what MAY be going on:

In 3.12.13+, on my system, the effective cooling fan speed seems 
to be an accumulation, maybe bitwise, of 
cooling_device[2-6]/cur_state, that each get activated (=1) by a 
certain other temperature value or level; each of the 
cooling_device[2-6]/cur_state stays @1 as long as their ref. 
temp. does not undershoot. For my system this ref. temp.  would 
most likely be triggered by temp2 == thermal_zone1/temp [CPUZ].

In 3.13.? there seems to get only one of 
cooling_device[2-6]/cur_state be set to 1, the others left and/or 
rewritten with 0. And the fan speed algorithm then accumulates 
only one 1 without seeing the [_LEVEL_] number of 
cooling_device[2-6]... or re-requesting the related trigger 
temperature.

I hope this leads you developers nearer to a conclusion on how to 
fix it,
best regards, Manuel Krause

_____________________________
3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use
=============================
/sys/class/thermal/*  which
are links to -> ../../devices/virtual/thermal/*

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          1           1          FDTZ==45°C
cooling_device5  Fan          1           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp  |-     /cdev?_  /trip_   /trip_
                                       trip_    point_   point_
                                       point    ?_temp   ?_type
...
thermal_zone1   n.a.    73000 |- 
(CPUZ)
                                 ?=0   6       110000   critical
                                 ?=1   5       107000   passive
                                 ?=2   4        90000   active
                                 ?=3   3        75000   active
                                 ?=4   2        55000   active
                                 ?=5   1        45000   active
                                 ?=6   1        30000   active
...
thermal_zone4   n.a.    45000   ?=0    n.a.   110000   critical 
(FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  (crit = +256.0°C)
temp2:        +73.0°C  (crit = +110.0°C)
temp3:        +57.0°C  (crit = +105.0°C)
temp4:        +26.3°C  (crit = +110.0°C)
temp5:        +45.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +68.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +66.0°C  (high = +105.0°C, crit = +105.0°C)


_____________________________
3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state
=============================

dir             |-
                  /type       /cur_state  /max_state  Maybe
                                                       trigger
                                                       /PWM
...
cooling_device2  Fan          0           1          not yet
                                                       observed
cooling_device3  Fan          0           1          FDTZ==58°C
cooling_device4  Fan          0           1          FDTZ==45°C
cooling_device5  Fan          0           1          FDTZ==34°C
cooling_device6  Fan          1           1          FDTZ==25°C
...

dir          |-
               /passive /temp
thermal_zone1   n.a.    46000 ... (CPUZ)
...
thermal_zone4   n.a.    25000 ... (FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +50.0°C  (crit = +256.0°C)
temp2:        +46.0°C  (crit = +110.0°C)
temp3:        +44.0°C  (crit = +105.0°C)
temp4:        +25.7°C  (crit = +110.0°C)
temp5:        +25.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
Core 1:       +41.0°C  (high = +105.0°C, crit = +105.0°C)
_____________________________



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
       [not found]                   ` <532B4DC5.4010705@netscape.net>
@ 2014-03-31 23:37                     ` Manuel Krause
  2014-03-31 23:47                       ` Guenter Roeck
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-03-31 23:37 UTC (permalink / raw)
  To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang
  Cc: Guenter Roeck, Jean Delvare, lm-sensors

On 2014-03-20 21:21, Manuel Krause wrote:
> On 2014-03-11 22:59, Manuel Krause wrote:
>> On 2014-03-10 02:49, Manuel Krause wrote:
>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>> wrote:
> [SNIP]
>
> Long time no reply from you... Have I overseen a unwritten
> convention? Or were my charts that unusable for your analysis/work?
>
> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> persists. "Strange / dangerous fan policy..."
>
> Since kernel 3.13.6 I've managed to 'fix' the potential
> overheating problem by manually issuing a:
> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> _before_ obviously critical temperatures occur. Remind: This
> particular setting may only work for my system! ...and keeps
> working for 3.14-rc.
>
> In the following I'd like to present you a modified output of my
> /sys/class/thermal, that I've written a script for (for my
> system), that shows the results in the way of
> linux/Documentation/thermal/sysfs-api.txt, point 3:
> {I've uploded the files to pastebin, to not swamp you and the
> lists with so many lines of logs.}
>
> For the last good kernel -- 3.12.14 -- in-use:
>   http://pastebin.com/HL1PNcda
> For my first bad kernel revision 3.13 -- at critical temp:
>   http://pastebin.com/98hgf1a9
> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>   http://pastebin.com/MuTwTnjD
> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>   *) command:
>   http://pastebin.com/2peda54z
>
> Please, have a look at them! And maybe, give me hints on how I
> can help you to further debug this issue, as my manual method
> works but it's annoying.
>
> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> Email-thread to someone in charge.
>
> Thank you for your work && best regards,
> Manuel Krause
>

This is still BUG 71711
https://bugzilla.kernel.org/show_bug.cgi?id=71711

3.12.15 works very well
3.13.7 fails
3.14.0-rc8 fails

I've tried the tmon tool, now, too. Nice eyecandy and for monitoring!

I've tried to revert all "thermal" related patches from 
3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even 
if I apply the vice-versa patch to 3.12.15.)

So "thermal" is out?

For the failing kernels: Not any reached trip point (active) 
triggers ONE fan action!

Next would be ACPI, to be investigated,

THX for this audience,
Manuel Krause

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-31 23:37                     ` Manuel Krause
@ 2014-03-31 23:47                       ` Guenter Roeck
  2014-04-06  2:37                         ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-03-31 23:47 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang
  Cc: Jean Delvare, lm-sensors

On 03/31/2014 04:37 PM, Manuel Krause wrote:
> On 2014-03-20 21:21, Manuel Krause wrote:
>> On 2014-03-11 22:59, Manuel Krause wrote:
>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>> wrote:
>> [SNIP]
>>
>> Long time no reply from you... Have I overseen a unwritten
>> convention? Or were my charts that unusable for your analysis/work?
>>
>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>> persists. "Strange / dangerous fan policy..."
>>
>> Since kernel 3.13.6 I've managed to 'fix' the potential
>> overheating problem by manually issuing a:
>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>> _before_ obviously critical temperatures occur. Remind: This
>> particular setting may only work for my system! ...and keeps
>> working for 3.14-rc.
>>
>> In the following I'd like to present you a modified output of my
>> /sys/class/thermal, that I've written a script for (for my
>> system), that shows the results in the way of
>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>> {I've uploded the files to pastebin, to not swamp you and the
>> lists with so many lines of logs.}
>>
>> For the last good kernel -- 3.12.14 -- in-use:
>>   http://pastebin.com/HL1PNcda
>> For my first bad kernel revision 3.13 -- at critical temp:
>>   http://pastebin.com/98hgf1a9
>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>   http://pastebin.com/MuTwTnjD
>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>   *) command:
>>   http://pastebin.com/2peda54z
>>
>> Please, have a look at them! And maybe, give me hints on how I
>> can help you to further debug this issue, as my manual method
>> works but it's annoying.
>>
>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>> Email-thread to someone in charge.
>>
>> Thank you for your work && best regards,
>> Manuel Krause
>>
>
> This is still BUG 71711
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> 3.12.15 works very well
> 3.13.7 fails
> 3.14.0-rc8 fails
>

Best you can do would really be to bisect the problem.
Unfortunately only you (or someone else with an affected system)
can do that. Once the culprit is known it would be much easier
to get it fixed.

To answer your earlier question: I don't think you did anything wrong.
I guess everyone else is just as clueless as I am (if not, speak up
and help ;-).

Guenter


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-03-31 23:47                       ` Guenter Roeck
@ 2014-04-06  2:37                         ` Manuel Krause
  2014-04-06  2:43                           ` Guenter Roeck
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-06  2:37 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-01 01:47, Guenter Roeck wrote:
> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>> On 2014-03-20 21:21, Manuel Krause wrote:
>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>> wrote:
>>> [SNIP]
>>>
>>> Long time no reply from you... Have I overseen a unwritten
>>> convention? Or were my charts that unusable for your
>>> analysis/work?
>>>
>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>> persists. "Strange / dangerous fan policy..."
>>>
>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>> overheating problem by manually issuing a:
>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>> _before_ obviously critical temperatures occur. Remind: This
>>> particular setting may only work for my system! ...and keeps
>>> working for 3.14-rc.
>>>
>>> In the following I'd like to present you a modified output of my
>>> /sys/class/thermal, that I've written a script for (for my
>>> system), that shows the results in the way of
>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>> {I've uploded the files to pastebin, to not swamp you and the
>>> lists with so many lines of logs.}
>>>
>>> For the last good kernel -- 3.12.14 -- in-use:
>>>   http://pastebin.com/HL1PNcda
>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>   http://pastebin.com/98hgf1a9
>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>   http://pastebin.com/MuTwTnjD
>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>   *) command:
>>>   http://pastebin.com/2peda54z
>>>
>>> Please, have a look at them! And maybe, give me hints on how I
>>> can help you to further debug this issue, as my manual method
>>> works but it's annoying.
>>>
>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>> Email-thread to someone in charge.
>>>
>>> Thank you for your work && best regards,
>>> Manuel Krause
>>>
>>
>> This is still BUG 71711
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> 3.12.15 works very well
>> 3.13.7 fails
>> 3.14.0-rc8 fails
>>
>
> Best you can do would really be to bisect the problem.
> Unfortunately only you (or someone else with an affected system)
> can do that. Once the culprit is known it would be much easier
> to get it fixed.
>
> To answer your earlier question: I don't think you did anything
> wrong.
> I guess everyone else is just as clueless as I am (if not, speak up
> and help ;-).
>
> Guenter
>

I've now bisected two times. From two different kernel origins, 
just to be sure, as I'm new to this stupid-and-lengthy method, 
and, to be sure, I haven't given a false positive inbetween due 
to boredom.

In the end it says each time:
# git bisect bad | tee -a /var/log/bisect.log
cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
commit cc8ef52707341e67a12067d6ead991d56ea017ca
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Wed Sep 25 20:39:45 2013 +0800

     ACPI / AC: convert ACPI ac driver to platform bus

     Signed-off-by: Zhang Rui <rui.zhang@intel.com>
     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

:040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 
4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers


Please help me, on how I can help debug this more, and please 
also read the newest from
https://bugzilla.kernel.org/show_bug.cgi?id=71711

Manuel Krause


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06  2:37                         ` Manuel Krause
@ 2014-04-06  2:43                           ` Guenter Roeck
  2014-04-06 23:17                             ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Guenter Roeck @ 2014-04-06  2:43 UTC (permalink / raw)
  To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 04/05/2014 07:37 PM, Manuel Krause wrote:
> On 2014-04-01 01:47, Guenter Roeck wrote:
>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>> wrote:
>>>> [SNIP]
>>>>
>>>> Long time no reply from you... Have I overseen a unwritten
>>>> convention? Or were my charts that unusable for your
>>>> analysis/work?
>>>>
>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>> persists. "Strange / dangerous fan policy..."
>>>>
>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>> overheating problem by manually issuing a:
>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>> _before_ obviously critical temperatures occur. Remind: This
>>>> particular setting may only work for my system! ...and keeps
>>>> working for 3.14-rc.
>>>>
>>>> In the following I'd like to present you a modified output of my
>>>> /sys/class/thermal, that I've written a script for (for my
>>>> system), that shows the results in the way of
>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>> lists with so many lines of logs.}
>>>>
>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>   http://pastebin.com/HL1PNcda
>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>   http://pastebin.com/98hgf1a9
>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>   http://pastebin.com/MuTwTnjD
>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>   *) command:
>>>>   http://pastebin.com/2peda54z
>>>>
>>>> Please, have a look at them! And maybe, give me hints on how I
>>>> can help you to further debug this issue, as my manual method
>>>> works but it's annoying.
>>>>
>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>> Email-thread to someone in charge.
>>>>
>>>> Thank you for your work && best regards,
>>>> Manuel Krause
>>>>
>>>
>>> This is still BUG 71711
>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>
>>> 3.12.15 works very well
>>> 3.13.7 fails
>>> 3.14.0-rc8 fails
>>>
>>
>> Best you can do would really be to bisect the problem.
>> Unfortunately only you (or someone else with an affected system)
>> can do that. Once the culprit is known it would be much easier
>> to get it fixed.
>>
>> To answer your earlier question: I don't think you did anything
>> wrong.
>> I guess everyone else is just as clueless as I am (if not, speak up
>> and help ;-).
>>
>> Guenter
>>
>
> I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom.
>

Not really. Keep in mint that you were able to track down the bad commit
among more than 10,000 commits in a reasonably short period of time.

> In the end it says each time:
> # git bisect bad | tee -a /var/log/bisect.log
> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> Author: Zhang Rui <rui.zhang@intel.com>
> Date:   Wed Sep 25 20:39:45 2013 +0800
>
>      ACPI / AC: convert ACPI ac driver to platform bus
>
>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
Off to the two of you...

Guenter

> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>
>
> Please help me, on how I can help debug this more, and please also read the newest from
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> Manuel Krause
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06  2:43                           ` Guenter Roeck
@ 2014-04-06 23:17                             ` Manuel Krause
  2014-04-07 11:45                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw)
  To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm,
	rui.zhang, Jean Delvare, lm-sensors

On 2014-04-06 04:43, Guenter Roeck wrote:
> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>> wrote:
>>>>> [SNIP]
>>>>>
>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>> convention? Or were my charts that unusable for your
>>>>> analysis/work?
>>>>>
>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>> persists. "Strange / dangerous fan policy..."
>>>>>
>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>> overheating problem by manually issuing a:
>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>> particular setting may only work for my system! ...and keeps
>>>>> working for 3.14-rc.
>>>>>
>>>>> In the following I'd like to present you a modified output
>>>>> of my
>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>> system), that shows the results in the way of
>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>> lists with so many lines of logs.}
>>>>>
>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>   http://pastebin.com/HL1PNcda
>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>   http://pastebin.com/98hgf1a9
>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>   http://pastebin.com/MuTwTnjD
>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>   *) command:
>>>>>   http://pastebin.com/2peda54z
>>>>>
>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>> can help you to further debug this issue, as my manual method
>>>>> works but it's annoying.
>>>>>
>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>> Email-thread to someone in charge.
>>>>>
>>>>> Thank you for your work && best regards,
>>>>> Manuel Krause
>>>>>
>>>>
>>>> This is still BUG 71711
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> 3.12.15 works very well
>>>> 3.13.7 fails
>>>> 3.14.0-rc8 fails
>>>>
>>>
>>> Best you can do would really be to bisect the problem.
>>> Unfortunately only you (or someone else with an affected system)
>>> can do that. Once the culprit is known it would be much easier
>>> to get it fixed.
>>>
>>> To answer your earlier question: I don't think you did anything
>>> wrong.
>>> I guess everyone else is just as clueless as I am (if not,
>>> speak up
>>> and help ;-).
>>>
>>> Guenter
>>>
>>
>> I've now bisected two times. From two different kernel origins,
>> just to be sure, as I'm new to this stupid-and-lengthy method,
>> and, to be sure, I haven't given a false positive inbetween due
>> to boredom.
>>
>
> Not really. Keep in mint that you were able to track down the bad
> commit
> among more than 10,000 commits in a reasonably short period of time.
>
>> In the end it says each time:
>> # git bisect bad | tee -a /var/log/bisect.log
>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>> Author: Zhang Rui <rui.zhang@intel.com>
>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>
>>      ACPI / AC: convert ACPI ac driver to platform bus
>>
>>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
> Off to the two of you...
>
> Guenter
>
>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>
>>
>> Please help me, on how I can help debug this more, and please
>> also read the newest from
>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>
>> Manuel Krause
>>
>>
>>
>

Sorry, that I've forgotton to add the following last night: After 
the first bisection round, I was so glad about a result that 
time, that I reverted this mentioned patch from the 3.13.8 
kernel, but this didn't fix it. Must be something that came 
later: But you all understand more of what you've coded.

Best regards, Manuel Krause


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-06 23:17                             ` Manuel Krause
@ 2014-04-07 11:45                               ` Rafael J. Wysocki
  2014-04-10 22:51                                 ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2014-04-07 11:45 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> On 2014-04-06 04:43, Guenter Roeck wrote:
> > On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
> >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
> >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
> >>>>>>>>>>>> wrote:
> >>>>> [SNIP]
> >>>>>
> >>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>> convention? Or were my charts that unusable for your
> >>>>> analysis/work?
> >>>>>
> >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
> >>>>> persists. "Strange / dangerous fan policy..."
> >>>>>
> >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>> overheating problem by manually issuing a:
> >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>> particular setting may only work for my system! ...and keeps
> >>>>> working for 3.14-rc.
> >>>>>
> >>>>> In the following I'd like to present you a modified output
> >>>>> of my
> >>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>> system), that shows the results in the way of
> >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>> {I've uploded the files to pastebin, to not swamp you and the
> >>>>> lists with so many lines of logs.}
> >>>>>
> >>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>   http://pastebin.com/HL1PNcda
> >>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>   http://pastebin.com/98hgf1a9
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>   http://pastebin.com/MuTwTnjD
> >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>   *) command:
> >>>>>   http://pastebin.com/2peda54z
> >>>>>
> >>>>> Please, have a look at them! And maybe, give me hints on how I
> >>>>> can help you to further debug this issue, as my manual method
> >>>>> works but it's annoying.
> >>>>>
> >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>> Email-thread to someone in charge.
> >>>>>
> >>>>> Thank you for your work && best regards,
> >>>>> Manuel Krause
> >>>>>
> >>>>
> >>>> This is still BUG 71711
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>
> >>>> 3.12.15 works very well
> >>>> 3.13.7 fails
> >>>> 3.14.0-rc8 fails
> >>>>
> >>>
> >>> Best you can do would really be to bisect the problem.
> >>> Unfortunately only you (or someone else with an affected system)
> >>> can do that. Once the culprit is known it would be much easier
> >>> to get it fixed.
> >>>
> >>> To answer your earlier question: I don't think you did anything
> >>> wrong.
> >>> I guess everyone else is just as clueless as I am (if not,
> >>> speak up
> >>> and help ;-).
> >>>
> >>> Guenter
> >>>
> >>
> >> I've now bisected two times. From two different kernel origins,
> >> just to be sure, as I'm new to this stupid-and-lengthy method,
> >> and, to be sure, I haven't given a false positive inbetween due
> >> to boredom.
> >>
> >
> > Not really. Keep in mint that you were able to track down the bad
> > commit
> > among more than 10,000 commits in a reasonably short period of time.
> >
> >> In the end it says each time:
> >> # git bisect bad | tee -a /var/log/bisect.log
> >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
> >> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >> Author: Zhang Rui <rui.zhang@intel.com>
> >> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>
> >>      ACPI / AC: convert ACPI ac driver to platform bus
> >>
> >>      Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>      Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> > Off to the two of you...
> >
> > Guenter
> >
> >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>
> >>
> >> Please help me, on how I can help debug this more, and please
> >> also read the newest from
> >> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>
> >> Manuel Krause
> >>
> >>
> >>
> >
> 
> Sorry, that I've forgotton to add the following last night: After 
> the first bisection round, I was so glad about a result that 
> time, that I reverted this mentioned patch from the 3.13.8 
> kernel, but this didn't fix it.

This means that the commit in question didn't introduce the problem
you're seeing.

Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
build a kernel from that and see if you can reprocude the problem with it.
If so, it can be used as your new "first known bad" kernel for bisection.
Otherwise, you can use it as the "first good" one and commit cc8ef52707341
as "first known bad".

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-07 11:45                               ` Rafael J. Wysocki
@ 2014-04-10 22:51                                 ` Manuel Krause
  2014-04-13  0:05                                   ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-10 22:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare,
	lm-sensors

On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause
>>>>>>>>>>>>>> wrote:
>>>>>>> [SNIP]
>>>>>>>
>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>> convention? Or were my charts that unusable for your
>>>>>>> analysis/work?
>>>>>>>
>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem
>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>
>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>> overheating problem by manually issuing a:
>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>> working for 3.14-rc.
>>>>>>>
>>>>>>> In the following I'd like to present you a modified output
>>>>>>> of my
>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>> system), that shows the results in the way of
>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>> {I've uploded the files to pastebin, to not swamp you and the
>>>>>>> lists with so many lines of logs.}
>>>>>>>
>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>    *) command:
>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>
>>>>>>> Please, have a look at them! And maybe, give me hints on how I
>>>>>>> can help you to further debug this issue, as my manual method
>>>>>>> works but it's annoying.
>>>>>>>
>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>> Email-thread to someone in charge.
>>>>>>>
>>>>>>> Thank you for your work && best regards,
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>
>>>>>> This is still BUG 71711
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>
>>>>>> 3.12.15 works very well
>>>>>> 3.13.7 fails
>>>>>> 3.14.0-rc8 fails
>>>>>>
>>>>>
>>>>> Best you can do would really be to bisect the problem.
>>>>> Unfortunately only you (or someone else with an affected system)
>>>>> can do that. Once the culprit is known it would be much easier
>>>>> to get it fixed.
>>>>>
>>>>> To answer your earlier question: I don't think you did anything
>>>>> wrong.
>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>> speak up
>>>>> and help ;-).
>>>>>
>>>>> Guenter
>>>>>
>>>>
>>>> I've now bisected two times. From two different kernel origins,
>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>> and, to be sure, I haven't given a false positive inbetween due
>>>> to boredom.
>>>>
>>>
>>> Not really. Keep in mint that you were able to track down the bad
>>> commit
>>> among more than 10,000 commits in a reasonably short period of time.
>>>
>>>> In the end it says each time:
>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>
>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>
>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>       Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>> Off to the two of you...
>>>
>>> Guenter
>>>
>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>
>>>>
>>>> Please help me, on how I can help debug this more, and please
>>>> also read the newest from
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>
>>>> Manuel Krause
>>>>
>>>>
>>>>
>>>
>>
>> Sorry, that I've forgotton to add the following last night: After
>> the first bisection round, I was so glad about a result that
>> time, that I reverted this mentioned patch from the 3.13.8
>> kernel, but this didn't fix it.
>
> This means that the commit in question didn't introduce the problem
> you're seeing.
>
> Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> build a kernel from that and see if you can reprocude the problem with it.
> If so, it can be used as your new "first known bad" kernel for bisection.
> Otherwise, you can use it as the "first good" one and commit cc8ef52707341
> as "first known bad".
>
> Thanks!
>

Sorry, for any inconvenience, but you should forget about what 
I've written, that reverting the patch in question from 3.13.x 
didn't fix it. Of course it didn't fix it, as the patch doesn't 
cleanly revert from release-kernels at all. My mistake!

I' ve been guided by Guenter Roeck through two more bisecting 
sessions/ways on this, that always pointed to the commit in question.

Some citation:
Me:
>>> O.k. I've now followed your latest directions:
>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was BAD =>
>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>> => result after rebuild was GOOD
>>>
[ ...]
>>> Reverting that commit in question from this very git tree makes the
>>> kernel work as expected.
[ ... ]
Guenter:
>> Report the results you have above. That should show without question
>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>> and it should be easy to reproduce.

That seems to be all I can do for you for now. Please let me know 
of any preliminary patches to test!
And I want to add special thanks to Guenter Roeck for his 
always-just-in-time assistance over so many days,

Manuel Krause

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-10 22:51                                 ` Manuel Krause
@ 2014-04-13  0:05                                   ` Manuel Krause
  2014-04-16 18:32                                     ` Zhang Rui
  0 siblings, 1 reply; 22+ messages in thread
From: Manuel Krause @ 2014-04-13  0:05 UTC (permalink / raw)
  To: rui.zhang
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-11 00:51, Manuel Krause wrote:
> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>> wrote:
>>>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>> analysis/work?
>>>>>>>>
>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>> problem
>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>
>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>> overheating problem by manually issuing a:
>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>> working for 3.14-rc.
>>>>>>>>
>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>> of my
>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>> system), that shows the results in the way of
>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>> the
>>>>>>>> lists with so many lines of logs.}
>>>>>>>>
>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>    http://pastebin.com/HL1PNcda
>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>    http://pastebin.com/98hgf1a9
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>    http://pastebin.com/MuTwTnjD
>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>    *) command:
>>>>>>>>    http://pastebin.com/2peda54z
>>>>>>>>
>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>> how I
>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>> method
>>>>>>>> works but it's annoying.
>>>>>>>>
>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>> Email-thread to someone in charge.
>>>>>>>>
>>>>>>>> Thank you for your work && best regards,
>>>>>>>> Manuel Krause
>>>>>>>>
>>>>>>>
>>>>>>> This is still BUG 71711
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> 3.12.15 works very well
>>>>>>> 3.13.7 fails
>>>>>>> 3.14.0-rc8 fails
>>>>>>>
>>>>>>
>>>>>> Best you can do would really be to bisect the problem.
>>>>>> Unfortunately only you (or someone else with an affected
>>>>>> system)
>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>> to get it fixed.
>>>>>>
>>>>>> To answer your earlier question: I don't think you did
>>>>>> anything
>>>>>> wrong.
>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>> speak up
>>>>>> and help ;-).
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>
>>>>> I've now bisected two times. From two different kernel origins,
>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>> to boredom.
>>>>>
>>>>
>>>> Not really. Keep in mint that you were able to track down the
>>>> bad
>>>> commit
>>>> among more than 10,000 commits in a reasonably short period
>>>> of time.
>>>>
>>>>> In the end it says each time:
>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>> commit
>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>
>>>>>       ACPI / AC: convert ACPI ac driver to platform bus
>>>>>
>>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>       Signed-off-by: Rafael J. Wysocki
>>>>> <rafael.j.wysocki@intel.com>
>>>>>
>>>> Off to the two of you...
>>>>
>>>> Guenter
>>>>
>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>
>>>>>
>>>>> Please help me, on how I can help debug this more, and please
>>>>> also read the newest from
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>
>>>>> Manuel Krause
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Sorry, that I've forgotton to add the following last night: After
>>> the first bisection round, I was so glad about a result that
>>> time, that I reverted this mentioned patch from the 3.13.8
>>> kernel, but this didn't fix it.
>>
>> This means that the commit in question didn't introduce the
>> problem
>> you're seeing.
>>
>> Please check out commit 7f2dc5c4bcbf (Merge tag
>> 'dm-3.13-changes' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>
>> build a kernel from that and see if you can reprocude the
>> problem with it.
>> If so, it can be used as your new "first known bad" kernel for
>> bisection.
>> Otherwise, you can use it as the "first good" one and commit
>> cc8ef52707341
>> as "first known bad".
>>
>> Thanks!
>>
>
> Sorry, for any inconvenience, but you should forget about what
> I've written, that reverting the patch in question from 3.13.x
> didn't fix it. Of course it didn't fix it, as the patch doesn't
> cleanly revert from release-kernels at all. My mistake!
>
> I' ve been guided by Guenter Roeck through two more bisecting
> sessions/ways on this, that always pointed to the commit in
> question.
>
> Some citation:
> Me:
>>>> O.k. I've now followed your latest directions:
>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was BAD =>
>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>> => result after rebuild was GOOD
>>>>
> [ ...]
>>>> Reverting that commit in question from this very git tree
>>>> makes the
>>>> kernel work as expected.
> [ ... ]
> Guenter:
>>> Report the results you have above. That should show without
>>> question
>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>> and it should be easy to reproduce.
>
> That seems to be all I can do for you for now. Please let me know
> of any preliminary patches to test!
> And I want to add special thanks to Guenter Roeck for his
> always-just-in-time assistance over so many days,
>
> Manuel Krause
>

BTW -- applying this patch in question to a 3.12.17 kernel, that 
worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
kernels. (And, yes, the patch applied cleanly, compiled fine and 
boots nicely.)

Manuel Krause


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-13  0:05                                   ` Manuel Krause
@ 2014-04-16 18:32                                     ` Zhang Rui
  2014-04-16 22:17                                       ` Manuel Krause
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang Rui @ 2014-04-16 18:32 UTC (permalink / raw)
  To: Manuel Krause
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
> On 2014-04-11 00:51, Manuel Krause wrote:
> > On 2014-04-07 13:45, Rafael J. Wysocki wrote:
> >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
> >>> On 2014-04-06 04:43, Guenter Roeck wrote:
> >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
> >>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
> >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
> >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
> >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
> >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
> >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
> >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
> >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
> >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
> >>>>>>>>>>>>>>> Krause
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>> [SNIP]
> >>>>>>>>
> >>>>>>>> Long time no reply from you... Have I overseen a unwritten
> >>>>>>>> convention? Or were my charts that unusable for your
> >>>>>>>> analysis/work?
> >>>>>>>>
> >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
> >>>>>>>> problem
> >>>>>>>> persists. "Strange / dangerous fan policy..."
> >>>>>>>>
> >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
> >>>>>>>> overheating problem by manually issuing a:
> >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
> >>>>>>>> _before_ obviously critical temperatures occur. Remind: This
> >>>>>>>> particular setting may only work for my system! ...and keeps
> >>>>>>>> working for 3.14-rc.
> >>>>>>>>
> >>>>>>>> In the following I'd like to present you a modified output
> >>>>>>>> of my
> >>>>>>>> /sys/class/thermal, that I've written a script for (for my
> >>>>>>>> system), that shows the results in the way of
> >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
> >>>>>>>> {I've uploded the files to pastebin, to not swamp you and
> >>>>>>>> the
> >>>>>>>> lists with so many lines of logs.}
> >>>>>>>>
> >>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
> >>>>>>>>    http://pastebin.com/HL1PNcda
> >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
> >>>>>>>>    http://pastebin.com/98hgf1a9
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
> >>>>>>>>    http://pastebin.com/MuTwTnjD
> >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
> >>>>>>>>    *) command:
> >>>>>>>>    http://pastebin.com/2peda54z
> >>>>>>>>
> >>>>>>>> Please, have a look at them! And maybe, give me hints on
> >>>>>>>> how I
> >>>>>>>> can help you to further debug this issue, as my manual
> >>>>>>>> method
> >>>>>>>> works but it's annoying.
> >>>>>>>>
> >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
> >>>>>>>> Email-thread to someone in charge.
> >>>>>>>>
> >>>>>>>> Thank you for your work && best regards,
> >>>>>>>> Manuel Krause
> >>>>>>>>
> >>>>>>>
> >>>>>>> This is still BUG 71711
> >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>>>
> >>>>>>> 3.12.15 works very well
> >>>>>>> 3.13.7 fails
> >>>>>>> 3.14.0-rc8 fails
> >>>>>>>
> >>>>>>
> >>>>>> Best you can do would really be to bisect the problem.
> >>>>>> Unfortunately only you (or someone else with an affected
> >>>>>> system)
> >>>>>> can do that. Once the culprit is known it would be much easier
> >>>>>> to get it fixed.
> >>>>>>
> >>>>>> To answer your earlier question: I don't think you did
> >>>>>> anything
> >>>>>> wrong.
> >>>>>> I guess everyone else is just as clueless as I am (if not,
> >>>>>> speak up
> >>>>>> and help ;-).
> >>>>>>
> >>>>>> Guenter
> >>>>>>
> >>>>>
> >>>>> I've now bisected two times. From two different kernel origins,
> >>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
> >>>>> and, to be sure, I haven't given a false positive inbetween due
> >>>>> to boredom.
> >>>>>
> >>>>
> >>>> Not really. Keep in mint that you were able to track down the
> >>>> bad
> >>>> commit
> >>>> among more than 10,000 commits in a reasonably short period
> >>>> of time.
> >>>>
> >>>>> In the end it says each time:
> >>>>> # git bisect bad | tee -a /var/log/bisect.log
> >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
> >>>>> commit
> >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>>> Author: Zhang Rui <rui.zhang@intel.com>
> >>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
> >>>>>
> >>>>>       ACPI / AC: convert ACPI ac driver to platform bus
> >>>>>
> >>>>>       Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>>       Signed-off-by: Rafael J. Wysocki
> >>>>> <rafael.j.wysocki@intel.com>
> >>>>>
> >>>> Off to the two of you...
> >>>>
> >>>> Guenter
> >>>>
> >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
> >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
> >>>>>
> >>>>>
> >>>>> Please help me, on how I can help debug this more, and please
> >>>>> also read the newest from
> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
> >>>>>
> >>>>> Manuel Krause
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>> Sorry, that I've forgotton to add the following last night: After
> >>> the first bisection round, I was so glad about a result that
> >>> time, that I reverted this mentioned patch from the 3.13.8
> >>> kernel, but this didn't fix it.
> >>
> >> This means that the commit in question didn't introduce the
> >> problem
> >> you're seeing.
> >>
> >> Please check out commit 7f2dc5c4bcbf (Merge tag
> >> 'dm-3.13-changes' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
> >>
> >> build a kernel from that and see if you can reprocude the
> >> problem with it.
> >> If so, it can be used as your new "first known bad" kernel for
> >> bisection.
> >> Otherwise, you can use it as the "first good" one and commit
> >> cc8ef52707341
> >> as "first known bad".
> >>
> >> Thanks!
> >>
> >
> > Sorry, for any inconvenience, but you should forget about what
> > I've written, that reverting the patch in question from 3.13.x
> > didn't fix it. Of course it didn't fix it, as the patch doesn't
> > cleanly revert from release-kernels at all. My mistake!
> >
> > I' ve been guided by Guenter Roeck through two more bisecting
> > sessions/ways on this, that always pointed to the commit in
> > question.
> >
> > Some citation:
> > Me:
> >>>> O.k. I've now followed your latest directions:
> >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was BAD =>
> >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
> >>>> => result after rebuild was GOOD
> >>>>
> > [ ...]
> >>>> Reverting that commit in question from this very git tree
> >>>> makes the
> >>>> kernel work as expected.
> > [ ... ]
> > Guenter:
> >>> Report the results you have above. That should show without
> >>> question
> >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
> >>> and it should be easy to reproduce.
> >
> > That seems to be all I can do for you for now. Please let me know
> > of any preliminary patches to test!
> > And I want to add special thanks to Guenter Roeck for his
> > always-just-in-time assistance over so many days,
> >
> > Manuel Krause
> >
> 
> BTW -- applying this patch in question to a 3.12.17 kernel, that 
> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x 
> kernels. (And, yes, the patch applied cleanly, compiled fine and 
> boots nicely.)
> 
could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
the problem still exist in 3.12.17 kernel?

thanks,
rui
> Manuel Krause
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 3.13.?: Strange / dangerous fan policy...
  2014-04-16 18:32                                     ` Zhang Rui
@ 2014-04-16 22:17                                       ` Manuel Krause
  0 siblings, 0 replies; 22+ messages in thread
From: Manuel Krause @ 2014-04-16 22:17 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm,
	Jean Delvare, lm-sensors

On 2014-04-16 20:32, Zhang Rui wrote:
> On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote:
>> On 2014-04-11 00:51, Manuel Krause wrote:
>>> On 2014-04-07 13:45, Rafael J. Wysocki wrote:
>>>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote:
>>>>> On 2014-04-06 04:43, Guenter Roeck wrote:
>>>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote:
>>>>>>> On 2014-04-01 01:47, Guenter Roeck wrote:
>>>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote:
>>>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote:
>>>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote:
>>>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote:
>>>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote:
>>>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote:
>>>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>>>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel
>>>>>>>>>>>>>>>>> Krause
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> Long time no reply from you... Have I overseen a unwritten
>>>>>>>>>> convention? Or were my charts that unusable for your
>>>>>>>>>> analysis/work?
>>>>>>>>>>
>>>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the
>>>>>>>>>> problem
>>>>>>>>>> persists. "Strange / dangerous fan policy..."
>>>>>>>>>>
>>>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential
>>>>>>>>>> overheating problem by manually issuing a:
>>>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
>>>>>>>>>> _before_ obviously critical temperatures occur. Remind: This
>>>>>>>>>> particular setting may only work for my system! ...and keeps
>>>>>>>>>> working for 3.14-rc.
>>>>>>>>>>
>>>>>>>>>> In the following I'd like to present you a modified output
>>>>>>>>>> of my
>>>>>>>>>> /sys/class/thermal, that I've written a script for (for my
>>>>>>>>>> system), that shows the results in the way of
>>>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3:
>>>>>>>>>> {I've uploded the files to pastebin, to not swamp you and
>>>>>>>>>> the
>>>>>>>>>> lists with so many lines of logs.}
>>>>>>>>>>
>>>>>>>>>> For the last good kernel -- 3.12.14 -- in-use:
>>>>>>>>>>     http://pastebin.com/HL1PNcda
>>>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/98hgf1a9
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
>>>>>>>>>>     http://pastebin.com/MuTwTnjD
>>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the
>>>>>>>>>>     *) command:
>>>>>>>>>>     http://pastebin.com/2peda54z
>>>>>>>>>>
>>>>>>>>>> Please, have a look at them! And maybe, give me hints on
>>>>>>>>>> how I
>>>>>>>>>> can help you to further debug this issue, as my manual
>>>>>>>>>> method
>>>>>>>>>> works but it's annoying.
>>>>>>>>>>
>>>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this
>>>>>>>>>> Email-thread to someone in charge.
>>>>>>>>>>
>>>>>>>>>> Thank you for your work && best regards,
>>>>>>>>>> Manuel Krause
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is still BUG 71711
>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>>>
>>>>>>>>> 3.12.15 works very well
>>>>>>>>> 3.13.7 fails
>>>>>>>>> 3.14.0-rc8 fails
>>>>>>>>>
>>>>>>>>
>>>>>>>> Best you can do would really be to bisect the problem.
>>>>>>>> Unfortunately only you (or someone else with an affected
>>>>>>>> system)
>>>>>>>> can do that. Once the culprit is known it would be much easier
>>>>>>>> to get it fixed.
>>>>>>>>
>>>>>>>> To answer your earlier question: I don't think you did
>>>>>>>> anything
>>>>>>>> wrong.
>>>>>>>> I guess everyone else is just as clueless as I am (if not,
>>>>>>>> speak up
>>>>>>>> and help ;-).
>>>>>>>>
>>>>>>>> Guenter
>>>>>>>>
>>>>>>>
>>>>>>> I've now bisected two times. From two different kernel origins,
>>>>>>> just to be sure, as I'm new to this stupid-and-lengthy method,
>>>>>>> and, to be sure, I haven't given a false positive inbetween due
>>>>>>> to boredom.
>>>>>>>
>>>>>>
>>>>>> Not really. Keep in mint that you were able to track down the
>>>>>> bad
>>>>>> commit
>>>>>> among more than 10,000 commits in a reasonably short period
>>>>>> of time.
>>>>>>
>>>>>>> In the end it says each time:
>>>>>>> # git bisect bad | tee -a /var/log/bisect.log
>>>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad
>>>>>>> commit
>>>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>>> Author: Zhang Rui <rui.zhang@intel.com>
>>>>>>> Date:   Wed Sep 25 20:39:45 2013 +0800
>>>>>>>
>>>>>>>        ACPI / AC: convert ACPI ac driver to platform bus
>>>>>>>
>>>>>>>        Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>>>        Signed-off-by: Rafael J. Wysocki
>>>>>>> <rafael.j.wysocki@intel.com>
>>>>>>>
>>>>>> Off to the two of you...
>>>>>>
>>>>>> Guenter
>>>>>>
>>>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84
>>>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M      drivers
>>>>>>>
>>>>>>>
>>>>>>> Please help me, on how I can help debug this more, and please
>>>>>>> also read the newest from
>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>>>>>>>
>>>>>>> Manuel Krause
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> Sorry, that I've forgotton to add the following last night: After
>>>>> the first bisection round, I was so glad about a result that
>>>>> time, that I reverted this mentioned patch from the 3.13.8
>>>>> kernel, but this didn't fix it.
>>>>
>>>> This means that the commit in question didn't introduce the
>>>> problem
>>>> you're seeing.
>>>>
>>>> Please check out commit 7f2dc5c4bcbf (Merge tag
>>>> 'dm-3.13-changes' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm),
>>>>
>>>> build a kernel from that and see if you can reprocude the
>>>> problem with it.
>>>> If so, it can be used as your new "first known bad" kernel for
>>>> bisection.
>>>> Otherwise, you can use it as the "first good" one and commit
>>>> cc8ef52707341
>>>> as "first known bad".
>>>>
>>>> Thanks!
>>>>
>>>
>>> Sorry, for any inconvenience, but you should forget about what
>>> I've written, that reverting the patch in question from 3.13.x
>>> didn't fix it. Of course it didn't fix it, as the patch doesn't
>>> cleanly revert from release-kernels at all. My mistake!
>>>
>>> I' ve been guided by Guenter Roeck through two more bisecting
>>> sessions/ways on this, that always pointed to the commit in
>>> question.
>>>
>>> Some citation:
>>> Me:
>>>>>> O.k. I've now followed your latest directions:
>>>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was BAD =>
>>>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca
>>>>>> => result after rebuild was GOOD
>>>>>>
>>> [ ...]
>>>>>> Reverting that commit in question from this very git tree
>>>>>> makes the
>>>>>> kernel work as expected.
>>> [ ... ]
>>> Guenter:
>>>>> Report the results you have above. That should show without
>>>>> question
>>>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit,
>>>>> and it should be easy to reproduce.
>>>
>>> That seems to be all I can do for you for now. Please let me know
>>> of any preliminary patches to test!
>>> And I want to add special thanks to Guenter Roeck for his
>>> always-just-in-time assistance over so many days,
>>>
>>> Manuel Krause
>>>
>>
>> BTW -- applying this patch in question to a 3.12.17 kernel, that
>> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x
>> kernels. (And, yes, the patch applied cleanly, compiled fine and
>> boots nicely.)
>>
> could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb
> on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if
> the problem still exist in 3.12.17 kernel?
>
> thanks,
> rui

I'm so sorry: 3.12.17 + cc8ef52707341e67a12067d6ead991d56ea017ca 
+ 50a2bc5429f07ec4d53df2d287b03bdbceb281bb does NOT improve the 
situation.

Thank you for your work,
Manuel


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-04-16 22:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause
2014-03-07 20:55 ` Guenter Roeck
2014-03-07 22:04   ` Manuel Krause
2014-03-07 22:52     ` Guenter Roeck
2014-03-08 11:08       ` [lm-sensors] " Jean Delvare
2014-03-08 12:36         ` Rafael J. Wysocki
2014-03-08 15:59         ` Guenter Roeck
2014-03-09  0:10           ` Manuel Krause
2014-03-09 17:28             ` Guenter Roeck
2014-03-09 17:58             ` Rafael J. Wysocki
2014-03-10  1:49               ` Manuel Krause
2014-03-11 21:59                 ` Manuel Krause
     [not found]                   ` <532B4DC5.4010705@netscape.net>
2014-03-31 23:37                     ` Manuel Krause
2014-03-31 23:47                       ` Guenter Roeck
2014-04-06  2:37                         ` Manuel Krause
2014-04-06  2:43                           ` Guenter Roeck
2014-04-06 23:17                             ` Manuel Krause
2014-04-07 11:45                               ` Rafael J. Wysocki
2014-04-10 22:51                                 ` Manuel Krause
2014-04-13  0:05                                   ` Manuel Krause
2014-04-16 18:32                                     ` Zhang Rui
2014-04-16 22:17                                       ` Manuel Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).