* 3.13.?: Strange / dangerous fan policy... @ 2014-03-07 19:33 Manuel Krause 2014-03-07 20:55 ` Guenter Roeck 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-03-07 19:33 UTC (permalink / raw) To: linux-kernel, linux-pm Please have a short look at the following BUG report + the comments -- this message here is a kind of FWD-ing it: https://bugs.archlinux.org/task/39005 I came late to test kernel 3.13 with the .5 one, as it was the time that the related -CK/BFS patch became available. I'm not using Archlinux, but openSUSE, and my problems are quite the same. Especially these with smelling melting plastics. My own reports went to Con Kolivas' Blog first: "I get weird temperatures and abrupt 100% fan actions with vanilla 3.13.5 with this CK and most recent BFQ at my HP Notebook. In gkrellm the highest T had been @74°C, so far (3.12.13), and is now growing to 94°C. Then, the fan goes to 100% for 10~30secs cooling it to approx. 82°C. That is not good, if I compare 74 to 94 °C. Have I missed a .CONFIG option for 3.13, especially?" I'd get the same without (Con's && BFQ's) patches. Machine: HP Notebook with Core2Duo CPU (Penryn) Distro: openSUSE 13.1, 64bit, continuously updated Desktop: KDE 4.12.3 MESA & drm & Xorg: most recent ones from: http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/ Current kernel: 3.13.6 vanilla from openSUSE repos, with -ck1 and BFQ patches Same behaviour: without these patches Last good kernel: 3.12.13 vanilla + CK2 + BFQ Please, _always_CC_me_ -- as I'm not on the linux-kernel / linux-pm mailing lists. And please, if you know any person in charge of this -- lead this message to him/her. Thank you in advance and best regards, Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause @ 2014-03-07 20:55 ` Guenter Roeck 2014-03-07 22:04 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Guenter Roeck @ 2014-03-07 20:55 UTC (permalink / raw) To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote: > Please have a short look at the following BUG report + the comments > -- this message here is a kind of FWD-ing it: > https://bugs.archlinux.org/task/39005 > > I came late to test kernel 3.13 with the .5 one, as it was the time > that the related -CK/BFS patch became available. > > I'm not using Archlinux, but openSUSE, and my problems are quite the > same. Especially these with smelling melting plastics. > > My own reports went to Con Kolivas' Blog first: > "I get weird temperatures and abrupt 100% fan actions with vanilla > 3.13.5 with this CK and most recent BFQ at my HP Notebook. > In gkrellm the highest T had been @74°C, so far (3.12.13), and is > now growing to 94°C. Then, the fan goes to 100% for 10~30secs > cooling it to approx. 82°C. > That is not good, if I compare 74 to 94 °C. > Have I missed a .CONFIG option for 3.13, especially?" > > I'd get the same without (Con's && BFQ's) patches. > > Machine: HP Notebook with Core2Duo CPU (Penryn) > Distro: openSUSE 13.1, 64bit, continuously updated > Desktop: KDE 4.12.3 > MESA & drm & Xorg: most recent ones from: > http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/ > > Current kernel: 3.13.6 vanilla from openSUSE repos, with > -ck1 and BFQ patches > Same behaviour: without these patches > > Last good kernel: 3.12.13 vanilla + CK2 + BFQ > Can you add more information about your fan control policy ? Do you rely on the hardware for automatic fan speed control, or do you run the fancontrol script ? What is the output from the 'sensors' command ? Thanks, Guenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-07 20:55 ` Guenter Roeck @ 2014-03-07 22:04 ` Manuel Krause 2014-03-07 22:52 ` Guenter Roeck 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-03-07 22:04 UTC (permalink / raw) To: Guenter Roeck; +Cc: linux-kernel, linux-pm, lm-sensors On 2014-03-07 21:55, Guenter Roeck wrote: > On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote: >> Please have a short look at the following BUG report + the comments >> -- this message here is a kind of FWD-ing it: >> https://bugs.archlinux.org/task/39005 >> >> I came late to test kernel 3.13 with the .5 one, as it was the time >> that the related -CK/BFS patch became available. >> >> I'm not using Archlinux, but openSUSE, and my problems are quite the >> same. Especially these with smelling melting plastics. >> >> My own reports went to Con Kolivas' Blog first: >> "I get weird temperatures and abrupt 100% fan actions with vanilla >> 3.13.5 with this CK and most recent BFQ at my HP Notebook. >> In gkrellm the highest T had been @74°C, so far (3.12.13), and is >> now growing to 94°C. Then, the fan goes to 100% for 10~30secs >> cooling it to approx. 82°C. >> That is not good, if I compare 74 to 94 °C. >> Have I missed a .CONFIG option for 3.13, especially?" >> >> I'd get the same without (Con's && BFQ's) patches. >> >> Machine: HP Notebook with Core2Duo CPU (Penryn) >> Distro: openSUSE 13.1, 64bit, continuously updated >> Desktop: KDE 4.12.3 >> MESA & drm & Xorg: most recent ones from: >> http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/ >> >> Current kernel: 3.13.6 vanilla from openSUSE repos, with >> -ck1 and BFQ patches >> Same behaviour: without these patches >> >> Last good kernel: 3.12.13 vanilla + CK2 + BFQ >> > > Can you add more information about your fan control policy ? > Do you rely on the hardware for automatic fan speed control, > or do you run the fancontrol script ? > > What is the output from the 'sensors' command ? > > Thanks, > Guenter > Hi, and thanks for the quick response! No special fancy "fan control policy". 'fancontrol' isn't up or running. Vanilla kernels 3.11.* and 3.12.* had been working on here without any extra work. -- # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +71.0°C (crit = +256.0°C) temp2: +69.0°C (crit = +110.0°C) temp3: +52.0°C (crit = +105.0°C) temp4: +25.0°C (crit = +110.0°C) temp5: +58.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) -- My notebook (HP/Compaq 6730b) does not have a seperate fan sensor. This is with 3.12.13 with my normal workload. Please, trust my above mentionned values of 94 °C vs. 74°C as I don't like to boot 3.13.6 anymore, to avoid harm to the notebook's casing. But I'd do to test any improvement-patch. Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-07 22:04 ` Manuel Krause @ 2014-03-07 22:52 ` Guenter Roeck 2014-03-08 11:08 ` [lm-sensors] " Jean Delvare 0 siblings, 1 reply; 22+ messages in thread From: Guenter Roeck @ 2014-03-07 22:52 UTC (permalink / raw) To: Manuel Krause; +Cc: linux-kernel, linux-pm, lm-sensors On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: > On 2014-03-07 21:55, Guenter Roeck wrote: > >On Fri, Mar 07, 2014 at 08:33:02PM +0100, Manuel Krause wrote: > >>Please have a short look at the following BUG report + the comments > >>-- this message here is a kind of FWD-ing it: > >>https://bugs.archlinux.org/task/39005 > >> > >>I came late to test kernel 3.13 with the .5 one, as it was the time > >>that the related -CK/BFS patch became available. > >> > >>I'm not using Archlinux, but openSUSE, and my problems are quite the > >>same. Especially these with smelling melting plastics. > >> > >>My own reports went to Con Kolivas' Blog first: > >>"I get weird temperatures and abrupt 100% fan actions with vanilla > >>3.13.5 with this CK and most recent BFQ at my HP Notebook. > >>In gkrellm the highest T had been @74°C, so far (3.12.13), and is > >>now growing to 94°C. Then, the fan goes to 100% for 10~30secs > >>cooling it to approx. 82°C. > >>That is not good, if I compare 74 to 94 °C. > >>Have I missed a .CONFIG option for 3.13, especially?" > >> > >>I'd get the same without (Con's && BFQ's) patches. > >> > >>Machine: HP Notebook with Core2Duo CPU (Penryn) > >>Distro: openSUSE 13.1, 64bit, continuously updated > >>Desktop: KDE 4.12.3 > >>MESA & drm & Xorg: most recent ones from: > >>http://download.opensuse.org/repositories/home:/pontostroy:/X11/openSUSE_13.1/x86_64/ > >> > >>Current kernel: 3.13.6 vanilla from openSUSE repos, with > >> -ck1 and BFQ patches > >>Same behaviour: without these patches > >> > >>Last good kernel: 3.12.13 vanilla + CK2 + BFQ > >> > > > >Can you add more information about your fan control policy ? > >Do you rely on the hardware for automatic fan speed control, > >or do you run the fancontrol script ? > > > >What is the output from the 'sensors' command ? > > > >Thanks, > >Guenter > > > > Hi, and thanks for the quick response! > No special fancy "fan control policy". 'fancontrol' isn't up or > running. > Vanilla kernels 3.11.* and 3.12.* had been working on here without > any extra work. > -- > # sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +71.0°C (crit = +256.0°C) > temp2: +69.0°C (crit = +110.0°C) > temp3: +52.0°C (crit = +105.0°C) > temp4: +25.0°C (crit = +110.0°C) > temp5: +58.0°C (crit = +110.0°C) > > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) > Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) > -- > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor. > This is with 3.12.13 with my normal workload. > > Please, trust my above mentionned values of 94 °C vs. 74°C as I > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's > casing. > Understood. Unfortunately, we'll need to get information from the new kernel to be able to track down the problem. > But I'd do to test any improvement-patch. > So far I have no idea what is going on. I don't see anything in the drivers providing above data that would explain the behavior, but I might be missing something. Of course, if output is different in 3.13, that would be important to know. Maybe someone else can post related information for both kernel versions on an affected system. Guenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy... 2014-03-07 22:52 ` Guenter Roeck @ 2014-03-08 11:08 ` Jean Delvare 2014-03-08 12:36 ` Rafael J. Wysocki 2014-03-08 15:59 ` Guenter Roeck 0 siblings, 2 replies; 22+ messages in thread From: Jean Delvare @ 2014-03-08 11:08 UTC (permalink / raw) To: Manuel Krause; +Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: > On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: > > Hi, and thanks for the quick response! > > No special fancy "fan control policy". 'fancontrol' isn't up or > > running. > > Vanilla kernels 3.11.* and 3.12.* had been working on here without > > any extra work. > > -- > > # sensors > > acpitz-virtual-0 > > Adapter: Virtual device > > temp1: +71.0°C (crit = +256.0°C) > > temp2: +69.0°C (crit = +110.0°C) > > temp3: +52.0°C (crit = +105.0°C) > > temp4: +25.0°C (crit = +110.0°C) > > temp5: +58.0°C (crit = +110.0°C) > > > > coretemp-isa-0000 > > Adapter: ISA adapter > > Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) > > Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) > > -- > > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor. > > This is with 3.12.13 with my normal workload. > > > > Please, trust my above mentionned values of 94 °C vs. 74°C as I > > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's > > casing. > > Understood. Unfortunately, we'll need to get information > from the new kernel to be able to track down the problem. Indeed. Not only the run-time temperatures, but also the high and crit limits. > > But I'd do to test any improvement-patch. > > So far I have no idea what is going on. I don't see anything in the > drivers providing above data that would explain the behavior, > but I might be missing something. Looks like a regression in the acpi subsystem or in power management, not hwmon. Hwmon is merely reporting the temperatures, it's not responsible for the actual temperatures. A bisection would certainly help, but of course that would require booting to a bad kernel half of the time, which I understand Manual wouldn't enjoy. The only two components which I think can reach such high temperatures in a laptop are the CPU and the GPU. I suppose that the "94 °C vs. 74°C" refers to acpitz's temp1? If the the temperatures reported by coretemp remain the same, then I can only suppose that temp1 is the GPU temperature. Please tell us which GPU is in this laptop, and which driver you're using. -- Jean Delvare SUSE L3 Support ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy... 2014-03-08 11:08 ` [lm-sensors] " Jean Delvare @ 2014-03-08 12:36 ` Rafael J. Wysocki 2014-03-08 15:59 ` Guenter Roeck 1 sibling, 0 replies; 22+ messages in thread From: Rafael J. Wysocki @ 2014-03-08 12:36 UTC (permalink / raw) To: Jean Delvare, Manuel Krause Cc: Guenter Roeck, lm-sensors, linux-kernel, linux-pm On Saturday, March 08, 2014 12:08:31 PM Jean Delvare wrote: > On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: > > On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: > > > Hi, and thanks for the quick response! > > > No special fancy "fan control policy". 'fancontrol' isn't up or > > > running. > > > Vanilla kernels 3.11.* and 3.12.* had been working on here without > > > any extra work. > > > -- > > > # sensors > > > acpitz-virtual-0 > > > Adapter: Virtual device > > > temp1: +71.0°C (crit = +256.0°C) > > > temp2: +69.0°C (crit = +110.0°C) > > > temp3: +52.0°C (crit = +105.0°C) > > > temp4: +25.0°C (crit = +110.0°C) > > > temp5: +58.0°C (crit = +110.0°C) > > > > > > coretemp-isa-0000 > > > Adapter: ISA adapter > > > Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) > > > Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) > > > -- > > > My notebook (HP/Compaq 6730b) does not have a seperate fan sensor. > > > This is with 3.12.13 with my normal workload. > > > > > > Please, trust my above mentionned values of 94 °C vs. 74°C as I > > > don't like to boot 3.13.6 anymore, to avoid harm to the notebook's > > > casing. > > > > Understood. Unfortunately, we'll need to get information > > from the new kernel to be able to track down the problem. > > Indeed. Not only the run-time temperatures, but also the high and crit > limits. > > > > But I'd do to test any improvement-patch. > > > > So far I have no idea what is going on. I don't see anything in the > > drivers providing above data that would explain the behavior, > > but I might be missing something. > > Looks like a regression in the acpi subsystem or in power management, > not hwmon. Hwmon is merely reporting the temperatures, it's not > responsible for the actual temperatures. > > A bisection would certainly help, but of course that would require > booting to a bad kernel half of the time, which I understand Manual > wouldn't enjoy. > > The only two components which I think can reach such high temperatures > in a laptop are the CPU and the GPU. I suppose that the "94 °C vs. > 74°C" refers to acpitz's temp1? If the the temperatures reported by > coretemp remain the same, then I can only suppose that temp1 is the GPU > temperature. Please tell us which GPU is in this laptop, and which > driver you're using. Also it would be good to know which cpufreq and cpuidle drivers are in use and whether or not 3.14-rc5 has the problem. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lm-sensors] 3.13.?: Strange / dangerous fan policy... 2014-03-08 11:08 ` [lm-sensors] " Jean Delvare 2014-03-08 12:36 ` Rafael J. Wysocki @ 2014-03-08 15:59 ` Guenter Roeck 2014-03-09 0:10 ` Manuel Krause 1 sibling, 1 reply; 22+ messages in thread From: Guenter Roeck @ 2014-03-08 15:59 UTC (permalink / raw) To: Jean Delvare, Manuel Krause; +Cc: lm-sensors, linux-kernel, linux-pm On 03/08/2014 03:08 AM, Jean Delvare wrote: > On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>> Hi, and thanks for the quick response! >>> No special fancy "fan control policy". 'fancontrol' isn't up or >>> running. >>> Vanilla kernels 3.11.* and 3.12.* had been working on here without >>> any extra work. >>> -- >>> # sensors >>> acpitz-virtual-0 >>> Adapter: Virtual device >>> temp1: +71.0°C (crit = +256.0°C) >>> temp2: +69.0°C (crit = +110.0°C) >>> temp3: +52.0°C (crit = +105.0°C) >>> temp4: +25.0°C (crit = +110.0°C) >>> temp5: +58.0°C (crit = +110.0°C) >>> >>> coretemp-isa-0000 >>> Adapter: ISA adapter >>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>> -- >>> My notebook (HP/Compaq 6730b) does not have a seperate fan sensor. >>> This is with 3.12.13 with my normal workload. >>> >>> Please, trust my above mentionned values of 94 °C vs. 74°C as I >>> don't like to boot 3.13.6 anymore, to avoid harm to the notebook's >>> casing. >> >> Understood. Unfortunately, we'll need to get information >> from the new kernel to be able to track down the problem. > > Indeed. Not only the run-time temperatures, but also the high and crit > limits. > >>> But I'd do to test any improvement-patch. >> >> So far I have no idea what is going on. I don't see anything in the >> drivers providing above data that would explain the behavior, >> but I might be missing something. > > Looks like a regression in the acpi subsystem or in power management, > not hwmon. Hwmon is merely reporting the temperatures, it's not > responsible for the actual temperatures. > I would agree. I don't think we have enough information to be sure, though. There might be some unintended interaction or interference. gpu is a good hint ... for example, look at commit b9ed919f1c8 (drm/nouveau/drm/pm: remove everything except the hwmon interfaces to THERM). nouveau does export pwm and fan control information, so any change in that code may have unintended side effects. Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups) could have the observed impact, as it is purely passive, but I prefer to be rather safe than sorry. This problem has now been submitted into bugzilla as https://bugzilla.kernel.org/show_bug.cgi?id=71711. Guenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-08 15:59 ` Guenter Roeck @ 2014-03-09 0:10 ` Manuel Krause 2014-03-09 17:28 ` Guenter Roeck 2014-03-09 17:58 ` Rafael J. Wysocki 0 siblings, 2 replies; 22+ messages in thread From: Manuel Krause @ 2014-03-09 0:10 UTC (permalink / raw) To: Guenter Roeck, linux-kernel, linux-pm; +Cc: Rafael J. Wysocki, lm-sensors On 2014-03-08 16:59, Guenter Roeck wrote: > On 03/08/2014 03:08 AM, Jean Delvare wrote: >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>>> Hi, and thanks for the quick response! >>>> No special fancy "fan control policy". 'fancontrol' isn't up or >>>> running. >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here >>>> without >>>> any extra work. >>>> -- >>>> # sensors >>>> acpitz-virtual-0 >>>> Adapter: Virtual device >>>> temp1: +71.0°C (crit = +256.0°C) >>>> temp2: +69.0°C (crit = +110.0°C) >>>> temp3: +52.0°C (crit = +105.0°C) >>>> temp4: +25.0°C (crit = +110.0°C) >>>> temp5: +58.0°C (crit = +110.0°C) >>>> >>>> coretemp-isa-0000 >>>> Adapter: ISA adapter >>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>>> -- >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan >>>> sensor. >>>> This is with 3.12.13 with my normal workload. >>>> >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I >>>> don't like to boot 3.13.6 anymore, to avoid harm to the >>>> notebook's >>>> casing. >>> >>> Understood. Unfortunately, we'll need to get information >>> from the new kernel to be able to track down the problem. >> >> Indeed. Not only the run-time temperatures, but also the high >> and crit >> limits. >> >>>> But I'd do to test any improvement-patch. >>> >>> So far I have no idea what is going on. I don't see anything >>> in the >>> drivers providing above data that would explain the behavior, >>> but I might be missing something. >> >> Looks like a regression in the acpi subsystem or in power >> management, >> not hwmon. Hwmon is merely reporting the temperatures, it's not >> responsible for the actual temperatures. >> > > I would agree. I don't think we have enough information to be sure, > though. There might be some unintended interaction or interference. > > gpu is a good hint ... for example, look at commit b9ed919f1c8 > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces > to THERM). nouveau does export pwm and fan control information, > so any change in that code may have unintended side effects. > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to > use devm_hwmon_register_with_groups) could have the observed impact, > as it is purely passive, but I prefer to be rather safe than sorry. > > This problem has now been submitted into bugzilla as > https://bugzilla.kernel.org/show_bug.cgi?id=71711. > > Guenter > Sorry, for beeing late, had to search for/accumulate much info for you... I hope, you like me to put it into one answer to you all CCing you. My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions. kernel-module: i915 According to the output of 'cpupower': I have CPUidle driver: acpi_idle CPUidle governor: menu CPUfreq: driver: acpi-cpufreq available cpufreq governors: ondemand, performance - And "ondemand" is running. -- # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +41.0°C (crit = +256.0°C) temp2: +92.0°C (crit = +110.0°C) temp3: +71.0°C (crit = +105.0°C) temp4: +26.5°C (crit = +110.0°C) temp5: +25.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) FROM a critical "smelly" situation today, kernel-compilation, fan @100%. -- Additional findings: Identification from bootup ACPI initialisation vs. sensors: temp1 = DTSZ temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C temp3 = SKNZ temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?) Core 0 & Core 1 are the internal CPU T sensors. With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume. Kernel 3.12.13 adjusts the cooling on it's own, but appropriately. Thank you all for your engagement, best regards, Manuel Krause. _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-09 0:10 ` Manuel Krause @ 2014-03-09 17:28 ` Guenter Roeck 2014-03-09 17:58 ` Rafael J. Wysocki 1 sibling, 0 replies; 22+ messages in thread From: Guenter Roeck @ 2014-03-09 17:28 UTC (permalink / raw) To: Manuel Krause, linux-kernel, linux-pm Cc: Jean Delvare, lm-sensors, Rafael J. Wysocki On 03/08/2014 04:10 PM, Manuel Krause wrote: > On 2014-03-08 16:59, Guenter Roeck wrote: >> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>>>> Hi, and thanks for the quick response! >>>>> No special fancy "fan control policy". 'fancontrol' isn't up or >>>>> running. >>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here >>>>> without >>>>> any extra work. >>>>> -- >>>>> # sensors >>>>> acpitz-virtual-0 >>>>> Adapter: Virtual device >>>>> temp1: +71.0°C (crit = +256.0°C) >>>>> temp2: +69.0°C (crit = +110.0°C) >>>>> temp3: +52.0°C (crit = +105.0°C) >>>>> temp4: +25.0°C (crit = +110.0°C) >>>>> temp5: +58.0°C (crit = +110.0°C) >>>>> >>>>> coretemp-isa-0000 >>>>> Adapter: ISA adapter >>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>>>> -- >>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan >>>>> sensor. >>>>> This is with 3.12.13 with my normal workload. >>>>> >>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I >>>>> don't like to boot 3.13.6 anymore, to avoid harm to the >>>>> notebook's >>>>> casing. >>>> >>>> Understood. Unfortunately, we'll need to get information >>>> from the new kernel to be able to track down the problem. >>> >>> Indeed. Not only the run-time temperatures, but also the high >>> and crit >>> limits. >>> >>>>> But I'd do to test any improvement-patch. >>>> >>>> So far I have no idea what is going on. I don't see anything >>>> in the >>>> drivers providing above data that would explain the behavior, >>>> but I might be missing something. >>> >>> Looks like a regression in the acpi subsystem or in power >>> management, >>> not hwmon. Hwmon is merely reporting the temperatures, it's not >>> responsible for the actual temperatures. >>> >> >> I would agree. I don't think we have enough information to be sure, >> though. There might be some unintended interaction or interference. >> >> gpu is a good hint ... for example, look at commit b9ed919f1c8 >> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces >> to THERM). nouveau does export pwm and fan control information, >> so any change in that code may have unintended side effects. >> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to >> use devm_hwmon_register_with_groups) could have the observed impact, >> as it is purely passive, but I prefer to be rather safe than sorry. >> >> This problem has now been submitted into bugzilla as >> https://bugzilla.kernel.org/show_bug.cgi?id=71711. >> >> Guenter >> > > Sorry, for beeing late, had to search for/accumulate much info for you... > I hope, you like me to put it into one answer to you all CCing you. > > My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions. > kernel-module: i915 > > According to the output of 'cpupower': I have > CPUidle driver: acpi_idle > CPUidle governor: menu > > CPUfreq: > driver: acpi-cpufreq > available cpufreq governors: ondemand, performance > - > And "ondemand" is running. > -- > > # sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +41.0°C (crit = +256.0°C) > temp2: +92.0°C (crit = +110.0°C) > temp3: +71.0°C (crit = +105.0°C) > temp4: +26.5°C (crit = +110.0°C) > temp5: +25.0°C (crit = +110.0°C) > > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) > Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) > > FROM a critical "smelly" situation today, kernel-compilation, fan @100%. > -- > > Additional findings: > > Identification from bootup ACPI initialisation vs. sensors: > temp1 = DTSZ > temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C > temp3 = SKNZ > temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T > temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?) > Core 0 & Core 1 are the internal CPU T sensors. > > With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume. > > Kernel 3.12.13 adjusts the cooling on it's own, but appropriately. > Hi Manuel, thanks a lot for the additional information. I added this exchange to bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=71711). This is pretty much all I can do at this point; I have no idea what is going on. Some change in ACPI would be my guess, but I did not see anything catching my eye when looking through the ACPI code. Guenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-09 0:10 ` Manuel Krause 2014-03-09 17:28 ` Guenter Roeck @ 2014-03-09 17:58 ` Rafael J. Wysocki 2014-03-10 1:49 ` Manuel Krause 1 sibling, 1 reply; 22+ messages in thread From: Rafael J. Wysocki @ 2014-03-09 17:58 UTC (permalink / raw) To: Manuel Krause Cc: Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors, rui.zhang On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: > On 2014-03-08 16:59, Guenter Roeck wrote: > > On 03/08/2014 03:08 AM, Jean Delvare wrote: > >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: > >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: > >>>> Hi, and thanks for the quick response! > >>>> No special fancy "fan control policy". 'fancontrol' isn't up or > >>>> running. > >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here > >>>> without > >>>> any extra work. > >>>> -- > >>>> # sensors > >>>> acpitz-virtual-0 > >>>> Adapter: Virtual device > >>>> temp1: +71.0°C (crit = +256.0°C) > >>>> temp2: +69.0°C (crit = +110.0°C) > >>>> temp3: +52.0°C (crit = +105.0°C) > >>>> temp4: +25.0°C (crit = +110.0°C) > >>>> temp5: +58.0°C (crit = +110.0°C) > >>>> > >>>> coretemp-isa-0000 > >>>> Adapter: ISA adapter > >>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) > >>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) > >>>> -- > >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan > >>>> sensor. > >>>> This is with 3.12.13 with my normal workload. > >>>> > >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I > >>>> don't like to boot 3.13.6 anymore, to avoid harm to the > >>>> notebook's > >>>> casing. > >>> > >>> Understood. Unfortunately, we'll need to get information > >>> from the new kernel to be able to track down the problem. > >> > >> Indeed. Not only the run-time temperatures, but also the high > >> and crit > >> limits. > >> > >>>> But I'd do to test any improvement-patch. > >>> > >>> So far I have no idea what is going on. I don't see anything > >>> in the > >>> drivers providing above data that would explain the behavior, > >>> but I might be missing something. > >> > >> Looks like a regression in the acpi subsystem or in power > >> management, > >> not hwmon. Hwmon is merely reporting the temperatures, it's not > >> responsible for the actual temperatures. > >> > > > > I would agree. I don't think we have enough information to be sure, > > though. There might be some unintended interaction or interference. > > > > gpu is a good hint ... for example, look at commit b9ed919f1c8 > > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces > > to THERM). nouveau does export pwm and fan control information, > > so any change in that code may have unintended side effects. > > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to > > use devm_hwmon_register_with_groups) could have the observed impact, > > as it is purely passive, but I prefer to be rather safe than sorry. > > > > This problem has now been submitted into bugzilla as > > https://bugzilla.kernel.org/show_bug.cgi?id=71711. > > > > Guenter > > > > Sorry, for beeing late, had to search for/accumulate much info > for you... > I hope, you like me to put it into one answer to you all CCing you. > > My GFX is a GM45 Intel (mobile), shared memory, running the > opensource Mesa drivers/extensions. > kernel-module: i915 > > According to the output of 'cpupower': I have > CPUidle driver: acpi_idle > CPUidle governor: menu > > CPUfreq: > driver: acpi-cpufreq > available cpufreq governors: ondemand, performance > - > And "ondemand" is running. > -- > > # sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +41.0°C (crit = +256.0°C) > temp2: +92.0°C (crit = +110.0°C) > temp3: +71.0°C (crit = +105.0°C) > temp4: +26.5°C (crit = +110.0°C) > temp5: +25.0°C (crit = +110.0°C) > > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) > Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) > > FROM a critical "smelly" situation today, kernel-compilation, fan > @100%. > -- > > Additional findings: > > Identification from bootup ACPI initialisation vs. sensors: > temp1 = DTSZ > temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C > temp3 = SKNZ > temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T > temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan > (25 - 45 - 58 - max?) > Core 0 & Core 1 are the internal CPU T sensors. > > With the 3.13.x (.5+) kernels the first gatherered cooling > settings from bootup do stay forever. Means, rebooting a hot > system will get a FDTZ @45°C+ and won't make any problems, as it > does cool enough (even for kernel compiling on here). If it gets > 25°C @bootup the system goes into emergency cooling somewhen. > Same is with a suspend/resume. > > Kernel 3.12.13 adjusts the cooling on it's own, but appropriately. This almost certainly is an ACPI regression, but I'm not sure whether thermal management or CPU power management is broken on your system. Can you compare the contents of /sys/class/thermal/ from working and not working kernels, please? Rafael ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-09 17:58 ` Rafael J. Wysocki @ 2014-03-10 1:49 ` Manuel Krause 2014-03-11 21:59 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-03-10 1:49 UTC (permalink / raw) To: Rafael J. Wysocki, linux-kernel, linux-pm Cc: Guenter Roeck, Jean Delvare, lm-sensors, rui.zhang On 2014-03-09 18:58, Rafael J. Wysocki wrote: > On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >> On 2014-03-08 16:59, Guenter Roeck wrote: >>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>>>>> Hi, and thanks for the quick response! >>>>>> No special fancy "fan control policy". 'fancontrol' isn't up or >>>>>> running. >>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here >>>>>> without >>>>>> any extra work. >>>>>> -- >>>>>> # sensors >>>>>> acpitz-virtual-0 >>>>>> Adapter: Virtual device >>>>>> temp1: +71.0°C (crit = +256.0°C) >>>>>> temp2: +69.0°C (crit = +110.0°C) >>>>>> temp3: +52.0°C (crit = +105.0°C) >>>>>> temp4: +25.0°C (crit = +110.0°C) >>>>>> temp5: +58.0°C (crit = +110.0°C) >>>>>> >>>>>> coretemp-isa-0000 >>>>>> Adapter: ISA adapter >>>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>>>>> -- >>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan >>>>>> sensor. >>>>>> This is with 3.12.13 with my normal workload. >>>>>> >>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I >>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the >>>>>> notebook's >>>>>> casing. >>>>> >>>>> Understood. Unfortunately, we'll need to get information >>>>> from the new kernel to be able to track down the problem. >>>> >>>> Indeed. Not only the run-time temperatures, but also the high >>>> and crit >>>> limits. >>>> >>>>>> But I'd do to test any improvement-patch. >>>>> >>>>> So far I have no idea what is going on. I don't see anything >>>>> in the >>>>> drivers providing above data that would explain the behavior, >>>>> but I might be missing something. >>>> >>>> Looks like a regression in the acpi subsystem or in power >>>> management, >>>> not hwmon. Hwmon is merely reporting the temperatures, it's not >>>> responsible for the actual temperatures. >>>> >>> >>> I would agree. I don't think we have enough information to be sure, >>> though. There might be some unintended interaction or interference. >>> >>> gpu is a good hint ... for example, look at commit b9ed919f1c8 >>> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces >>> to THERM). nouveau does export pwm and fan control information, >>> so any change in that code may have unintended side effects. >>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to >>> use devm_hwmon_register_with_groups) could have the observed impact, >>> as it is purely passive, but I prefer to be rather safe than sorry. >>> >>> This problem has now been submitted into bugzilla as >>> https://bugzilla.kernel.org/show_bug.cgi?id=71711. >>> >>> Guenter >>> >> >> Sorry, for beeing late, had to search for/accumulate much info >> for you... >> I hope, you like me to put it into one answer to you all CCing you. >> >> My GFX is a GM45 Intel (mobile), shared memory, running the >> opensource Mesa drivers/extensions. >> kernel-module: i915 >> >> According to the output of 'cpupower': I have >> CPUidle driver: acpi_idle >> CPUidle governor: menu >> >> CPUfreq: >> driver: acpi-cpufreq >> available cpufreq governors: ondemand, performance >> - >> And "ondemand" is running. >> -- >> >> # sensors >> acpitz-virtual-0 >> Adapter: Virtual device >> temp1: +41.0°C (crit = +256.0°C) >> temp2: +92.0°C (crit = +110.0°C) >> temp3: +71.0°C (crit = +105.0°C) >> temp4: +26.5°C (crit = +110.0°C) >> temp5: +25.0°C (crit = +110.0°C) >> >> coretemp-isa-0000 >> Adapter: ISA adapter >> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) >> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) >> >> FROM a critical "smelly" situation today, kernel-compilation, fan >> @100%. >> -- >> >> Additional findings: >> >> Identification from bootup ACPI initialisation vs. sensors: >> temp1 = DTSZ >> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C >> temp3 = SKNZ >> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T >> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan >> (25 - 45 - 58 - max?) >> Core 0 & Core 1 are the internal CPU T sensors. >> >> With the 3.13.x (.5+) kernels the first gatherered cooling >> settings from bootup do stay forever. Means, rebooting a hot >> system will get a FDTZ @45°C+ and won't make any problems, as it >> does cool enough (even for kernel compiling on here). If it gets >> 25°C @bootup the system goes into emergency cooling somewhen. >> Same is with a suspend/resume. >> >> Kernel 3.12.13 adjusts the cooling on it's own, but appropriately. > > This almost certainly is an ACPI regression, but I'm not sure whether > thermal management or CPU power management is broken on your system. > > Can you compare the contents of /sys/class/thermal/ from working and > not working kernels, please? > > Rafael > Hi again, unfortunately you didn't specify how deeply I should dig into /sys/class/thermal. So you get the lines from # BOF # to # EOF # below. I hope they're readable without more comments. The most remarkable changes, in my eyes, had happened within "thermal_zone1". Best regards, Manuel Krause # BOF # Following ones are all from /sys/class/thermal/ which are links to -> ../../devices/virtual/thermal/ I've listed the directories in sections of cooling_devices and thermal_zones separately for each bad/good kernel. For Emailing purposes only. You can merge them into a spreadsheet for your evaluation on your own. I've left out reporting some subdirs and subdir's values that _really_ didn't seem to need attention. Also, I've had collected the #sensors output for each readout, having reproduced nearly the same workload, represented by the "Fan speed" (thermal_zone4==FDTZ). And I've done my very best to not produce typos or c&p errors. 3.13.5 -- 20140309 -- 20:52 -- bad ============================= dir |- /type /cur_state /max_state cooling_device0 Processor 0 10 cooling_device1 Processor 0 10 cooling_device2 Fan 0 1 cooling_device3 Fan 1 1 cooling_device4 Fan 0 1 cooling_device5 Fan 0 1 cooling_device6 Fan 0 1 cooling_device7 LCD 0 24 3.12.13 -- 20140310 -- 00:26 -- good ============================== dir |- /type /cur_state /max_state cooling_device0 Processor 0 10 cooling_device1 Processor 0 10 cooling_device2 Fan 0 1 cooling_device3 Fan 1 1 cooling_device4 Fan 1 1 cooling_device5 Fan 1 1 cooling_device6 Fan 1 1 cooling_device7 LCD 0 24 3.13.5 -- 20140309 -- 20:52 -- bad ============================= dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type thermal_zone0 0 68000 ?=0 n.a. 256000 critical thermal_zone1 n.a. 70000 |- ?=0 6 110000 critical ?=1 5 107000 passive ?=2 4 90000 active ?=3 3 75000 active ?=4 2 55000 active ?=5 1 45000 active ?=6 1 30000 active thermal_zone2 n.a. 54000 |- ?=0 1 105000 critical ?=1 1 95000 passive thermal_zone3 n.a. 25800 |- ?=0 1 110000 critical ?=1 1 60000 passive thermal_zone4 0 58000 ?=0 n.a. 110000 critical 3.12.13 -- 20140310 -- 00:26 -- good ============================== dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type thermal_zone0 0 50000 ?=0 n.a. 256000 critical thermal_zone1 n.a. 70000 |- ?=0 1 110000 critical ?=1 1 107000 passive ?=2 2 90000 active ?=3 3 67000 active ?=4 4 55000 active ?=5 5 45000 active ?=6 6 30000 active thermal_zone2 n.a. 53000 |- ?=0 1 105000 critical ?=1 1 95000 passive thermal_zone3 n.a. 25600 |- ?=0 1 110000 critical ?=1 1 60000 passive thermal_zone4 0 58000 ?=0 n.a. 110000 critical --- Legend here: /type is always acpitz /mode enabled /policy step_wise - from kernel ACPI initialisation: thermal_zone0==DTSZ, thermal_zone1==CPUZ, thermal_zone2==SKNZ, thermal_zone3==BATZ, thermal_zone4==FDTZ - n.a. means file or value is not available ___ Legend in general: /power/control is always auto /power/runtime_status unsupported /uevent ''==empty ---------------------------------------------------------------- 3.13.5 -- 20140309 -- 20:52 -- bad ============================= # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +68.0°C (crit = +256.0°C) temp2: +70.0°C (crit = +110.0°C) temp3: +54.0°C (crit = +105.0°C) temp4: +25.8°C (crit = +110.0°C) temp5: +58.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +66.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +63.0°C (high = +105.0°C, crit = +105.0°C) 3.12.13 -- 20140310 -- 00:26 -- good ============================== # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +50.0°C (crit = +256.0°C) temp2: +70.0°C (crit = +110.0°C) temp3: +53.0°C (crit = +105.0°C) temp4: +25.6°C (crit = +110.0°C) temp5: +58.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +65.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +61.0°C (high = +105.0°C, crit = +105.0°C) # EOF # ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-10 1:49 ` Manuel Krause @ 2014-03-11 21:59 ` Manuel Krause [not found] ` <532B4DC5.4010705@netscape.net> 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-03-11 21:59 UTC (permalink / raw) To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang Cc: Guenter Roeck, Jean Delvare, lm-sensors On 2014-03-10 02:49, Manuel Krause wrote: > On 2014-03-09 18:58, Rafael J. Wysocki wrote: >> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>>>>>> Hi, and thanks for the quick response! >>>>>>> No special fancy "fan control policy". 'fancontrol' isn't >>>>>>> up or >>>>>>> running. >>>>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here >>>>>>> without >>>>>>> any extra work. >>>>>>> -- >>>>>>> # sensors >>>>>>> acpitz-virtual-0 >>>>>>> Adapter: Virtual device >>>>>>> temp1: +71.0°C (crit = +256.0°C) >>>>>>> temp2: +69.0°C (crit = +110.0°C) >>>>>>> temp3: +52.0°C (crit = +105.0°C) >>>>>>> temp4: +25.0°C (crit = +110.0°C) >>>>>>> temp5: +58.0°C (crit = +110.0°C) >>>>>>> >>>>>>> coretemp-isa-0000 >>>>>>> Adapter: ISA adapter >>>>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>>>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>>>>>> -- >>>>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan >>>>>>> sensor. >>>>>>> This is with 3.12.13 with my normal workload. >>>>>>> >>>>>>> Please, trust my above mentionned values of 94 °C vs. 74°C >>>>>>> as I >>>>>>> don't like to boot 3.13.6 anymore, to avoid harm to the >>>>>>> notebook's >>>>>>> casing. >>>>>> >>>>>> Understood. Unfortunately, we'll need to get information >>>>>> from the new kernel to be able to track down the problem. >>>>> >>>>> Indeed. Not only the run-time temperatures, but also the high >>>>> and crit >>>>> limits. >>>>> >>>>>>> But I'd do to test any improvement-patch. >>>>>> >>>>>> So far I have no idea what is going on. I don't see anything >>>>>> in the >>>>>> drivers providing above data that would explain the behavior, >>>>>> but I might be missing something. >>>>> >>>>> Looks like a regression in the acpi subsystem or in power >>>>> management, >>>>> not hwmon. Hwmon is merely reporting the temperatures, it's not >>>>> responsible for the actual temperatures. >>>>> >>>> >>>> I would agree. I don't think we have enough information to be >>>> sure, >>>> though. There might be some unintended interaction or >>>> interference. >>>> >>>> gpu is a good hint ... for example, look at commit b9ed919f1c8 >>>> (drm/nouveau/drm/pm: remove everything except the hwmon >>>> interfaces >>>> to THERM). nouveau does export pwm and fan control information, >>>> so any change in that code may have unintended side effects. >>>> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to >>>> use devm_hwmon_register_with_groups) could have the observed >>>> impact, >>>> as it is purely passive, but I prefer to be rather safe than >>>> sorry. >>>> >>>> This problem has now been submitted into bugzilla as >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711. >>>> >>>> Guenter >>>> >>> >>> Sorry, for beeing late, had to search for/accumulate much info >>> for you... >>> I hope, you like me to put it into one answer to you all CCing >>> you. >>> >>> My GFX is a GM45 Intel (mobile), shared memory, running the >>> opensource Mesa drivers/extensions. >>> kernel-module: i915 >>> >>> According to the output of 'cpupower': I have >>> CPUidle driver: acpi_idle >>> CPUidle governor: menu >>> >>> CPUfreq: >>> driver: acpi-cpufreq >>> available cpufreq governors: ondemand, performance >>> - >>> And "ondemand" is running. >>> -- >>> >>> # sensors >>> acpitz-virtual-0 >>> Adapter: Virtual device >>> temp1: +41.0°C (crit = +256.0°C) >>> temp2: +92.0°C (crit = +110.0°C) >>> temp3: +71.0°C (crit = +105.0°C) >>> temp4: +26.5°C (crit = +110.0°C) >>> temp5: +25.0°C (crit = +110.0°C) >>> >>> coretemp-isa-0000 >>> Adapter: ISA adapter >>> Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) >>> Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) >>> >>> FROM a critical "smelly" situation today, kernel-compilation, fan >>> @100%. >>> -- >>> >>> Additional findings: >>> >>> Identification from bootup ACPI initialisation vs. sensors: >>> temp1 = DTSZ >>> temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C >>> temp3 = SKNZ >>> temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T >>> temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan >>> (25 - 45 - 58 - max?) >>> Core 0 & Core 1 are the internal CPU T sensors. >>> >>> With the 3.13.x (.5+) kernels the first gatherered cooling >>> settings from bootup do stay forever. Means, rebooting a hot >>> system will get a FDTZ @45°C+ and won't make any problems, as it >>> does cool enough (even for kernel compiling on here). If it gets >>> 25°C @bootup the system goes into emergency cooling somewhen. >>> Same is with a suspend/resume. >>> >>> Kernel 3.12.13 adjusts the cooling on it's own, but >>> appropriately. >> >> This almost certainly is an ACPI regression, but I'm not sure >> whether >> thermal management or CPU power management is broken on your >> system. >> >> Can you compare the contents of /sys/class/thermal/ from >> working and >> not working kernels, please? >> >> Rafael >> > > Hi again, > unfortunately you didn't specify how deeply I should dig into > /sys/class/thermal. So you get the lines from # BOF # to # EOF # > below. I hope they're readable without more comments. > > The most remarkable changes, in my eyes, had happened within > "thermal_zone1". > > Best regards, > Manuel Krause > > > # BOF # > Following ones are all from /sys/class/thermal/ which are links > to -> ../../devices/virtual/thermal/ > > I've listed the directories in sections of cooling_devices and > thermal_zones separately for each bad/good kernel. For Emailing > purposes only. You can merge them into a spreadsheet for your > evaluation on your own. I've left out reporting some subdirs and > subdir's values that _really_ didn't seem to need attention. > > Also, I've had collected the #sensors output for each readout, > having reproduced nearly the same workload, represented by the > "Fan speed" (thermal_zone4==FDTZ). > > And I've done my very best to not produce typos or c&p errors. > > > 3.13.5 -- 20140309 -- 20:52 -- bad > ============================= > dir |- > /type /cur_state /max_state > cooling_device0 Processor 0 10 > cooling_device1 Processor 0 10 > cooling_device2 Fan 0 1 > cooling_device3 Fan 1 1 > cooling_device4 Fan 0 1 > cooling_device5 Fan 0 1 > cooling_device6 Fan 0 1 > cooling_device7 LCD 0 24 > > 3.12.13 -- 20140310 -- 00:26 -- good > ============================== > dir |- > /type /cur_state /max_state > cooling_device0 Processor 0 10 > cooling_device1 Processor 0 10 > cooling_device2 Fan 0 1 > cooling_device3 Fan 1 1 > cooling_device4 Fan 1 1 > cooling_device5 Fan 1 1 > cooling_device6 Fan 1 1 > cooling_device7 LCD 0 24 > > > 3.13.5 -- 20140309 -- 20:52 -- bad > ============================= > dir |- > /passive /temp |- /cdev?_ /trip_ /trip_ > trip_ point_ point_ > point ?_temp ?_type > thermal_zone0 0 68000 ?=0 n.a. 256000 critical > thermal_zone1 n.a. 70000 |- > ?=0 6 110000 critical > ?=1 5 107000 passive > ?=2 4 90000 active > ?=3 3 75000 active > ?=4 2 55000 active > ?=5 1 45000 active > ?=6 1 30000 active > thermal_zone2 n.a. 54000 |- > ?=0 1 105000 critical > ?=1 1 95000 passive > thermal_zone3 n.a. 25800 |- > ?=0 1 110000 critical > ?=1 1 60000 passive > thermal_zone4 0 58000 ?=0 n.a. 110000 critical > > > 3.12.13 -- 20140310 -- 00:26 -- good > ============================== > dir |- > /passive /temp |- /cdev?_ /trip_ /trip_ > trip_ point_ point_ > point ?_temp ?_type > thermal_zone0 0 50000 ?=0 n.a. 256000 critical > thermal_zone1 n.a. 70000 |- > ?=0 1 110000 critical > ?=1 1 107000 passive > ?=2 2 90000 active > ?=3 3 67000 active > ?=4 4 55000 active > ?=5 5 45000 active > ?=6 6 30000 active > thermal_zone2 n.a. 53000 |- > ?=0 1 105000 critical > ?=1 1 95000 passive > thermal_zone3 n.a. 25600 |- > ?=0 1 110000 critical > ?=1 1 60000 passive > thermal_zone4 0 58000 ?=0 n.a. 110000 critical > > --- > Legend here: > /type is always acpitz > /mode enabled > /policy step_wise > > - from kernel ACPI initialisation: thermal_zone0==DTSZ, > thermal_zone1==CPUZ, thermal_zone2==SKNZ, > thermal_zone3==BATZ, thermal_zone4==FDTZ > - n.a. means file or value is not available > ___ > Legend in general: > /power/control is always auto > /power/runtime_status unsupported > /uevent ''==empty > > ---------------------------------------------------------------- > > 3.13.5 -- 20140309 -- 20:52 -- bad > ============================= > # sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +68.0°C (crit = +256.0°C) > temp2: +70.0°C (crit = +110.0°C) > temp3: +54.0°C (crit = +105.0°C) > temp4: +25.8°C (crit = +110.0°C) > temp5: +58.0°C (crit = +110.0°C) > > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +66.0°C (high = +105.0°C, crit = +105.0°C) > Core 1: +63.0°C (high = +105.0°C, crit = +105.0°C) > > > 3.12.13 -- 20140310 -- 00:26 -- good > ============================== > # sensors > acpitz-virtual-0 > Adapter: Virtual device > temp1: +50.0°C (crit = +256.0°C) > temp2: +70.0°C (crit = +110.0°C) > temp3: +53.0°C (crit = +105.0°C) > temp4: +25.6°C (crit = +110.0°C) > temp5: +58.0°C (crit = +110.0°C) > > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +65.0°C (high = +105.0°C, crit = +105.0°C) > Core 1: +61.0°C (high = +105.0°C, crit = +105.0°C) > > # EOF # > > Hi, and thank you for your attention ^^ at the bottom of this email you'd get the actual values for the new 3.12.14 kernel for two different levels of usage and ambient temperature. You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration has changed to the way of 3.13.? and also one /trip_point_?_temp did. But 3.12.14 is working as well as 3.12.13. (So my first eyecatcher didn't lead to useful things.) I'm not capaple of finding or understanding the related code, but, please, let me present an idea of what MAY be going on: In 3.12.13+, on my system, the effective cooling fan speed seems to be an accumulation, maybe bitwise, of cooling_device[2-6]/cur_state, that each get activated (=1) by a certain other temperature value or level; each of the cooling_device[2-6]/cur_state stays @1 as long as their ref. temp. does not undershoot. For my system this ref. temp. would most likely be triggered by temp2 == thermal_zone1/temp [CPUZ]. In 3.13.? there seems to get only one of cooling_device[2-6]/cur_state be set to 1, the others left and/or rewritten with 0. And the fan speed algorithm then accumulates only one 1 without seeing the [_LEVEL_] number of cooling_device[2-6]... or re-requesting the related trigger temperature. I hope this leads you developers nearer to a conclusion on how to fix it, best regards, Manuel Krause _____________________________ 3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use ============================= /sys/class/thermal/* which are links to -> ../../devices/virtual/thermal/* dir |- /type /cur_state /max_state Maybe trigger /PWM ... cooling_device2 Fan 0 1 not yet observed cooling_device3 Fan 0 1 FDTZ==58°C cooling_device4 Fan 1 1 FDTZ==45°C cooling_device5 Fan 1 1 FDTZ==34°C cooling_device6 Fan 1 1 FDTZ==25°C ... dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type ... thermal_zone1 n.a. 73000 |- (CPUZ) ?=0 6 110000 critical ?=1 5 107000 passive ?=2 4 90000 active ?=3 3 75000 active ?=4 2 55000 active ?=5 1 45000 active ?=6 1 30000 active ... thermal_zone4 n.a. 45000 ?=0 n.a. 110000 critical (FDTZ) ... # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +46.0°C (crit = +256.0°C) temp2: +73.0°C (crit = +110.0°C) temp3: +57.0°C (crit = +105.0°C) temp4: +26.3°C (crit = +110.0°C) temp5: +45.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +68.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +66.0°C (high = +105.0°C, crit = +105.0°C) _____________________________ 3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state ============================= dir |- /type /cur_state /max_state Maybe trigger /PWM ... cooling_device2 Fan 0 1 not yet observed cooling_device3 Fan 0 1 FDTZ==58°C cooling_device4 Fan 0 1 FDTZ==45°C cooling_device5 Fan 0 1 FDTZ==34°C cooling_device6 Fan 1 1 FDTZ==25°C ... dir |- /passive /temp thermal_zone1 n.a. 46000 ... (CPUZ) ... thermal_zone4 n.a. 25000 ... (FDTZ) ... # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +50.0°C (crit = +256.0°C) temp2: +46.0°C (crit = +110.0°C) temp3: +44.0°C (crit = +105.0°C) temp4: +25.7°C (crit = +110.0°C) temp5: +25.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +41.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +41.0°C (high = +105.0°C, crit = +105.0°C) _____________________________ ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <532B4DC5.4010705@netscape.net>]
* Re: 3.13.?: Strange / dangerous fan policy... [not found] ` <532B4DC5.4010705@netscape.net> @ 2014-03-31 23:37 ` Manuel Krause 2014-03-31 23:47 ` Guenter Roeck 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-03-31 23:37 UTC (permalink / raw) To: Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang Cc: Guenter Roeck, Jean Delvare, lm-sensors On 2014-03-20 21:21, Manuel Krause wrote: > On 2014-03-11 22:59, Manuel Krause wrote: >> On 2014-03-10 02:49, Manuel Krause wrote: >>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>> wrote: > [SNIP] > > Long time no reply from you... Have I overseen a unwritten > convention? Or were my charts that unusable for your analysis/work? > > Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem > persists. "Strange / dangerous fan policy..." > > Since kernel 3.13.6 I've managed to 'fix' the potential > overheating problem by manually issuing a: > "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) > _before_ obviously critical temperatures occur. Remind: This > particular setting may only work for my system! ...and keeps > working for 3.14-rc. > > In the following I'd like to present you a modified output of my > /sys/class/thermal, that I've written a script for (for my > system), that shows the results in the way of > linux/Documentation/thermal/sysfs-api.txt, point 3: > {I've uploded the files to pastebin, to not swamp you and the > lists with so many lines of logs.} > > For the last good kernel -- 3.12.14 -- in-use: > http://pastebin.com/HL1PNcda > For my first bad kernel revision 3.13 -- at critical temp: > http://pastebin.com/98hgf1a9 > For the last bad kernel -- 3.14.0-rc7 -- at critical temp: > http://pastebin.com/MuTwTnjD > For the last bad kernel -- 3.14.0-rc7 -- after issuing the > *) command: > http://pastebin.com/2peda54z > > Please, have a look at them! And maybe, give me hints on how I > can help you to further debug this issue, as my manual method > works but it's annoying. > > And, PLEASE CC: ME, as I'm not on the lists. Or lead this > Email-thread to someone in charge. > > Thank you for your work && best regards, > Manuel Krause > This is still BUG 71711 https://bugzilla.kernel.org/show_bug.cgi?id=71711 3.12.15 works very well 3.13.7 fails 3.14.0-rc8 fails I've tried the tmon tool, now, too. Nice eyecandy and for monitoring! I've tried to revert all "thermal" related patches from 3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even if I apply the vice-versa patch to 3.12.15.) So "thermal" is out? For the failing kernels: Not any reached trip point (active) triggers ONE fan action! Next would be ACPI, to be investigated, THX for this audience, Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-31 23:37 ` Manuel Krause @ 2014-03-31 23:47 ` Guenter Roeck 2014-04-06 2:37 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Guenter Roeck @ 2014-03-31 23:47 UTC (permalink / raw) To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang Cc: Jean Delvare, lm-sensors On 03/31/2014 04:37 PM, Manuel Krause wrote: > On 2014-03-20 21:21, Manuel Krause wrote: >> On 2014-03-11 22:59, Manuel Krause wrote: >>> On 2014-03-10 02:49, Manuel Krause wrote: >>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>>> wrote: >> [SNIP] >> >> Long time no reply from you... Have I overseen a unwritten >> convention? Or were my charts that unusable for your analysis/work? >> >> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem >> persists. "Strange / dangerous fan policy..." >> >> Since kernel 3.13.6 I've managed to 'fix' the potential >> overheating problem by manually issuing a: >> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >> _before_ obviously critical temperatures occur. Remind: This >> particular setting may only work for my system! ...and keeps >> working for 3.14-rc. >> >> In the following I'd like to present you a modified output of my >> /sys/class/thermal, that I've written a script for (for my >> system), that shows the results in the way of >> linux/Documentation/thermal/sysfs-api.txt, point 3: >> {I've uploded the files to pastebin, to not swamp you and the >> lists with so many lines of logs.} >> >> For the last good kernel -- 3.12.14 -- in-use: >> http://pastebin.com/HL1PNcda >> For my first bad kernel revision 3.13 -- at critical temp: >> http://pastebin.com/98hgf1a9 >> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >> http://pastebin.com/MuTwTnjD >> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >> *) command: >> http://pastebin.com/2peda54z >> >> Please, have a look at them! And maybe, give me hints on how I >> can help you to further debug this issue, as my manual method >> works but it's annoying. >> >> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >> Email-thread to someone in charge. >> >> Thank you for your work && best regards, >> Manuel Krause >> > > This is still BUG 71711 > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > 3.12.15 works very well > 3.13.7 fails > 3.14.0-rc8 fails > Best you can do would really be to bisect the problem. Unfortunately only you (or someone else with an affected system) can do that. Once the culprit is known it would be much easier to get it fixed. To answer your earlier question: I don't think you did anything wrong. I guess everyone else is just as clueless as I am (if not, speak up and help ;-). Guenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-03-31 23:47 ` Guenter Roeck @ 2014-04-06 2:37 ` Manuel Krause 2014-04-06 2:43 ` Guenter Roeck 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-04-06 2:37 UTC (permalink / raw) To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang, Jean Delvare, lm-sensors On 2014-04-01 01:47, Guenter Roeck wrote: > On 03/31/2014 04:37 PM, Manuel Krause wrote: >> On 2014-03-20 21:21, Manuel Krause wrote: >>> On 2014-03-11 22:59, Manuel Krause wrote: >>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>>>> wrote: >>> [SNIP] >>> >>> Long time no reply from you... Have I overseen a unwritten >>> convention? Or were my charts that unusable for your >>> analysis/work? >>> >>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem >>> persists. "Strange / dangerous fan policy..." >>> >>> Since kernel 3.13.6 I've managed to 'fix' the potential >>> overheating problem by manually issuing a: >>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>> _before_ obviously critical temperatures occur. Remind: This >>> particular setting may only work for my system! ...and keeps >>> working for 3.14-rc. >>> >>> In the following I'd like to present you a modified output of my >>> /sys/class/thermal, that I've written a script for (for my >>> system), that shows the results in the way of >>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>> {I've uploded the files to pastebin, to not swamp you and the >>> lists with so many lines of logs.} >>> >>> For the last good kernel -- 3.12.14 -- in-use: >>> http://pastebin.com/HL1PNcda >>> For my first bad kernel revision 3.13 -- at critical temp: >>> http://pastebin.com/98hgf1a9 >>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>> http://pastebin.com/MuTwTnjD >>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>> *) command: >>> http://pastebin.com/2peda54z >>> >>> Please, have a look at them! And maybe, give me hints on how I >>> can help you to further debug this issue, as my manual method >>> works but it's annoying. >>> >>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>> Email-thread to someone in charge. >>> >>> Thank you for your work && best regards, >>> Manuel Krause >>> >> >> This is still BUG 71711 >> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >> >> 3.12.15 works very well >> 3.13.7 fails >> 3.14.0-rc8 fails >> > > Best you can do would really be to bisect the problem. > Unfortunately only you (or someone else with an affected system) > can do that. Once the culprit is known it would be much easier > to get it fixed. > > To answer your earlier question: I don't think you did anything > wrong. > I guess everyone else is just as clueless as I am (if not, speak up > and help ;-). > > Guenter > I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom. In the end it says each time: # git bisect bad | tee -a /var/log/bisect.log cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit commit cc8ef52707341e67a12067d6ead991d56ea017ca Author: Zhang Rui <rui.zhang@intel.com> Date: Wed Sep 25 20:39:45 2013 +0800 ACPI / AC: convert ACPI ac driver to platform bus Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers Please help me, on how I can help debug this more, and please also read the newest from https://bugzilla.kernel.org/show_bug.cgi?id=71711 Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-06 2:37 ` Manuel Krause @ 2014-04-06 2:43 ` Guenter Roeck 2014-04-06 23:17 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Guenter Roeck @ 2014-04-06 2:43 UTC (permalink / raw) To: Manuel Krause, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang, Jean Delvare, lm-sensors On 04/05/2014 07:37 PM, Manuel Krause wrote: > On 2014-04-01 01:47, Guenter Roeck wrote: >> On 03/31/2014 04:37 PM, Manuel Krause wrote: >>> On 2014-03-20 21:21, Manuel Krause wrote: >>>> On 2014-03-11 22:59, Manuel Krause wrote: >>>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>>>>> wrote: >>>> [SNIP] >>>> >>>> Long time no reply from you... Have I overseen a unwritten >>>> convention? Or were my charts that unusable for your >>>> analysis/work? >>>> >>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem >>>> persists. "Strange / dangerous fan policy..." >>>> >>>> Since kernel 3.13.6 I've managed to 'fix' the potential >>>> overheating problem by manually issuing a: >>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>>> _before_ obviously critical temperatures occur. Remind: This >>>> particular setting may only work for my system! ...and keeps >>>> working for 3.14-rc. >>>> >>>> In the following I'd like to present you a modified output of my >>>> /sys/class/thermal, that I've written a script for (for my >>>> system), that shows the results in the way of >>>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>>> {I've uploded the files to pastebin, to not swamp you and the >>>> lists with so many lines of logs.} >>>> >>>> For the last good kernel -- 3.12.14 -- in-use: >>>> http://pastebin.com/HL1PNcda >>>> For my first bad kernel revision 3.13 -- at critical temp: >>>> http://pastebin.com/98hgf1a9 >>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>>> http://pastebin.com/MuTwTnjD >>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>>> *) command: >>>> http://pastebin.com/2peda54z >>>> >>>> Please, have a look at them! And maybe, give me hints on how I >>>> can help you to further debug this issue, as my manual method >>>> works but it's annoying. >>>> >>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>>> Email-thread to someone in charge. >>>> >>>> Thank you for your work && best regards, >>>> Manuel Krause >>>> >>> >>> This is still BUG 71711 >>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>> >>> 3.12.15 works very well >>> 3.13.7 fails >>> 3.14.0-rc8 fails >>> >> >> Best you can do would really be to bisect the problem. >> Unfortunately only you (or someone else with an affected system) >> can do that. Once the culprit is known it would be much easier >> to get it fixed. >> >> To answer your earlier question: I don't think you did anything >> wrong. >> I guess everyone else is just as clueless as I am (if not, speak up >> and help ;-). >> >> Guenter >> > > I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom. > Not really. Keep in mint that you were able to track down the bad commit among more than 10,000 commits in a reasonably short period of time. > In the end it says each time: > # git bisect bad | tee -a /var/log/bisect.log > cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit > commit cc8ef52707341e67a12067d6ead991d56ea017ca > Author: Zhang Rui <rui.zhang@intel.com> > Date: Wed Sep 25 20:39:45 2013 +0800 > > ACPI / AC: convert ACPI ac driver to platform bus > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Off to the two of you... Guenter > :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers > > > Please help me, on how I can help debug this more, and please also read the newest from > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > Manuel Krause > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-06 2:43 ` Guenter Roeck @ 2014-04-06 23:17 ` Manuel Krause 2014-04-07 11:45 ` Rafael J. Wysocki 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-04-06 23:17 UTC (permalink / raw) To: Guenter Roeck, Rafael J. Wysocki, linux-kernel, linux-pm, rui.zhang, Jean Delvare, lm-sensors On 2014-04-06 04:43, Guenter Roeck wrote: > On 04/05/2014 07:37 PM, Manuel Krause wrote: >> On 2014-04-01 01:47, Guenter Roeck wrote: >>> On 03/31/2014 04:37 PM, Manuel Krause wrote: >>>> On 2014-03-20 21:21, Manuel Krause wrote: >>>>> On 2014-03-11 22:59, Manuel Krause wrote: >>>>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>>>>>> wrote: >>>>> [SNIP] >>>>> >>>>> Long time no reply from you... Have I overseen a unwritten >>>>> convention? Or were my charts that unusable for your >>>>> analysis/work? >>>>> >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem >>>>> persists. "Strange / dangerous fan policy..." >>>>> >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential >>>>> overheating problem by manually issuing a: >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>>>> _before_ obviously critical temperatures occur. Remind: This >>>>> particular setting may only work for my system! ...and keeps >>>>> working for 3.14-rc. >>>>> >>>>> In the following I'd like to present you a modified output >>>>> of my >>>>> /sys/class/thermal, that I've written a script for (for my >>>>> system), that shows the results in the way of >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>>>> {I've uploded the files to pastebin, to not swamp you and the >>>>> lists with so many lines of logs.} >>>>> >>>>> For the last good kernel -- 3.12.14 -- in-use: >>>>> http://pastebin.com/HL1PNcda >>>>> For my first bad kernel revision 3.13 -- at critical temp: >>>>> http://pastebin.com/98hgf1a9 >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>>>> http://pastebin.com/MuTwTnjD >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>>>> *) command: >>>>> http://pastebin.com/2peda54z >>>>> >>>>> Please, have a look at them! And maybe, give me hints on how I >>>>> can help you to further debug this issue, as my manual method >>>>> works but it's annoying. >>>>> >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>>>> Email-thread to someone in charge. >>>>> >>>>> Thank you for your work && best regards, >>>>> Manuel Krause >>>>> >>>> >>>> This is still BUG 71711 >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>> >>>> 3.12.15 works very well >>>> 3.13.7 fails >>>> 3.14.0-rc8 fails >>>> >>> >>> Best you can do would really be to bisect the problem. >>> Unfortunately only you (or someone else with an affected system) >>> can do that. Once the culprit is known it would be much easier >>> to get it fixed. >>> >>> To answer your earlier question: I don't think you did anything >>> wrong. >>> I guess everyone else is just as clueless as I am (if not, >>> speak up >>> and help ;-). >>> >>> Guenter >>> >> >> I've now bisected two times. From two different kernel origins, >> just to be sure, as I'm new to this stupid-and-lengthy method, >> and, to be sure, I haven't given a false positive inbetween due >> to boredom. >> > > Not really. Keep in mint that you were able to track down the bad > commit > among more than 10,000 commits in a reasonably short period of time. > >> In the end it says each time: >> # git bisect bad | tee -a /var/log/bisect.log >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit >> commit cc8ef52707341e67a12067d6ead991d56ea017ca >> Author: Zhang Rui <rui.zhang@intel.com> >> Date: Wed Sep 25 20:39:45 2013 +0800 >> >> ACPI / AC: convert ACPI ac driver to platform bus >> >> Signed-off-by: Zhang Rui <rui.zhang@intel.com> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> > Off to the two of you... > > Guenter > >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers >> >> >> Please help me, on how I can help debug this more, and please >> also read the newest from >> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >> >> Manuel Krause >> >> >> > Sorry, that I've forgotton to add the following last night: After the first bisection round, I was so glad about a result that time, that I reverted this mentioned patch from the 3.13.8 kernel, but this didn't fix it. Must be something that came later: But you all understand more of what you've coded. Best regards, Manuel Krause _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-06 23:17 ` Manuel Krause @ 2014-04-07 11:45 ` Rafael J. Wysocki 2014-04-10 22:51 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Rafael J. Wysocki @ 2014-04-07 11:45 UTC (permalink / raw) To: Manuel Krause Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare, lm-sensors On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote: > On 2014-04-06 04:43, Guenter Roeck wrote: > > On 04/05/2014 07:37 PM, Manuel Krause wrote: > >> On 2014-04-01 01:47, Guenter Roeck wrote: > >>> On 03/31/2014 04:37 PM, Manuel Krause wrote: > >>>> On 2014-03-20 21:21, Manuel Krause wrote: > >>>>> On 2014-03-11 22:59, Manuel Krause wrote: > >>>>>> On 2014-03-10 02:49, Manuel Krause wrote: > >>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: > >>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: > >>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: > >>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: > >>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: > >>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause > >>>>>>>>>>>> wrote: > >>>>> [SNIP] > >>>>> > >>>>> Long time no reply from you... Have I overseen a unwritten > >>>>> convention? Or were my charts that unusable for your > >>>>> analysis/work? > >>>>> > >>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem > >>>>> persists. "Strange / dangerous fan policy..." > >>>>> > >>>>> Since kernel 3.13.6 I've managed to 'fix' the potential > >>>>> overheating problem by manually issuing a: > >>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) > >>>>> _before_ obviously critical temperatures occur. Remind: This > >>>>> particular setting may only work for my system! ...and keeps > >>>>> working for 3.14-rc. > >>>>> > >>>>> In the following I'd like to present you a modified output > >>>>> of my > >>>>> /sys/class/thermal, that I've written a script for (for my > >>>>> system), that shows the results in the way of > >>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: > >>>>> {I've uploded the files to pastebin, to not swamp you and the > >>>>> lists with so many lines of logs.} > >>>>> > >>>>> For the last good kernel -- 3.12.14 -- in-use: > >>>>> http://pastebin.com/HL1PNcda > >>>>> For my first bad kernel revision 3.13 -- at critical temp: > >>>>> http://pastebin.com/98hgf1a9 > >>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: > >>>>> http://pastebin.com/MuTwTnjD > >>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the > >>>>> *) command: > >>>>> http://pastebin.com/2peda54z > >>>>> > >>>>> Please, have a look at them! And maybe, give me hints on how I > >>>>> can help you to further debug this issue, as my manual method > >>>>> works but it's annoying. > >>>>> > >>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this > >>>>> Email-thread to someone in charge. > >>>>> > >>>>> Thank you for your work && best regards, > >>>>> Manuel Krause > >>>>> > >>>> > >>>> This is still BUG 71711 > >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 > >>>> > >>>> 3.12.15 works very well > >>>> 3.13.7 fails > >>>> 3.14.0-rc8 fails > >>>> > >>> > >>> Best you can do would really be to bisect the problem. > >>> Unfortunately only you (or someone else with an affected system) > >>> can do that. Once the culprit is known it would be much easier > >>> to get it fixed. > >>> > >>> To answer your earlier question: I don't think you did anything > >>> wrong. > >>> I guess everyone else is just as clueless as I am (if not, > >>> speak up > >>> and help ;-). > >>> > >>> Guenter > >>> > >> > >> I've now bisected two times. From two different kernel origins, > >> just to be sure, as I'm new to this stupid-and-lengthy method, > >> and, to be sure, I haven't given a false positive inbetween due > >> to boredom. > >> > > > > Not really. Keep in mint that you were able to track down the bad > > commit > > among more than 10,000 commits in a reasonably short period of time. > > > >> In the end it says each time: > >> # git bisect bad | tee -a /var/log/bisect.log > >> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit > >> commit cc8ef52707341e67a12067d6ead991d56ea017ca > >> Author: Zhang Rui <rui.zhang@intel.com> > >> Date: Wed Sep 25 20:39:45 2013 +0800 > >> > >> ACPI / AC: convert ACPI ac driver to platform bus > >> > >> Signed-off-by: Zhang Rui <rui.zhang@intel.com> > >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> > > Off to the two of you... > > > > Guenter > > > >> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 > >> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers > >> > >> > >> Please help me, on how I can help debug this more, and please > >> also read the newest from > >> https://bugzilla.kernel.org/show_bug.cgi?id=71711 > >> > >> Manuel Krause > >> > >> > >> > > > > Sorry, that I've forgotton to add the following last night: After > the first bisection round, I was so glad about a result that > time, that I reverted this mentioned patch from the 3.13.8 > kernel, but this didn't fix it. This means that the commit in question didn't introduce the problem you're seeing. Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm), build a kernel from that and see if you can reprocude the problem with it. If so, it can be used as your new "first known bad" kernel for bisection. Otherwise, you can use it as the "first good" one and commit cc8ef52707341 as "first known bad". Thanks! -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-07 11:45 ` Rafael J. Wysocki @ 2014-04-10 22:51 ` Manuel Krause 2014-04-13 0:05 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-04-10 22:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Guenter Roeck, linux-kernel, linux-pm, rui.zhang, Jean Delvare, lm-sensors On 2014-04-07 13:45, Rafael J. Wysocki wrote: > On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote: >> On 2014-04-06 04:43, Guenter Roeck wrote: >>> On 04/05/2014 07:37 PM, Manuel Krause wrote: >>>> On 2014-04-01 01:47, Guenter Roeck wrote: >>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote: >>>>>> On 2014-03-20 21:21, Manuel Krause wrote: >>>>>>> On 2014-03-11 22:59, Manuel Krause wrote: >>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause wrote: >>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause >>>>>>>>>>>>>> wrote: >>>>>>> [SNIP] >>>>>>> >>>>>>> Long time no reply from you... Have I overseen a unwritten >>>>>>> convention? Or were my charts that unusable for your >>>>>>> analysis/work? >>>>>>> >>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem >>>>>>> persists. "Strange / dangerous fan policy..." >>>>>>> >>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential >>>>>>> overheating problem by manually issuing a: >>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>>>>>> _before_ obviously critical temperatures occur. Remind: This >>>>>>> particular setting may only work for my system! ...and keeps >>>>>>> working for 3.14-rc. >>>>>>> >>>>>>> In the following I'd like to present you a modified output >>>>>>> of my >>>>>>> /sys/class/thermal, that I've written a script for (for my >>>>>>> system), that shows the results in the way of >>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>>>>>> {I've uploded the files to pastebin, to not swamp you and the >>>>>>> lists with so many lines of logs.} >>>>>>> >>>>>>> For the last good kernel -- 3.12.14 -- in-use: >>>>>>> http://pastebin.com/HL1PNcda >>>>>>> For my first bad kernel revision 3.13 -- at critical temp: >>>>>>> http://pastebin.com/98hgf1a9 >>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>>>>>> http://pastebin.com/MuTwTnjD >>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>>>>>> *) command: >>>>>>> http://pastebin.com/2peda54z >>>>>>> >>>>>>> Please, have a look at them! And maybe, give me hints on how I >>>>>>> can help you to further debug this issue, as my manual method >>>>>>> works but it's annoying. >>>>>>> >>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>>>>>> Email-thread to someone in charge. >>>>>>> >>>>>>> Thank you for your work && best regards, >>>>>>> Manuel Krause >>>>>>> >>>>>> >>>>>> This is still BUG 71711 >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>>>> >>>>>> 3.12.15 works very well >>>>>> 3.13.7 fails >>>>>> 3.14.0-rc8 fails >>>>>> >>>>> >>>>> Best you can do would really be to bisect the problem. >>>>> Unfortunately only you (or someone else with an affected system) >>>>> can do that. Once the culprit is known it would be much easier >>>>> to get it fixed. >>>>> >>>>> To answer your earlier question: I don't think you did anything >>>>> wrong. >>>>> I guess everyone else is just as clueless as I am (if not, >>>>> speak up >>>>> and help ;-). >>>>> >>>>> Guenter >>>>> >>>> >>>> I've now bisected two times. From two different kernel origins, >>>> just to be sure, as I'm new to this stupid-and-lengthy method, >>>> and, to be sure, I haven't given a false positive inbetween due >>>> to boredom. >>>> >>> >>> Not really. Keep in mint that you were able to track down the bad >>> commit >>> among more than 10,000 commits in a reasonably short period of time. >>> >>>> In the end it says each time: >>>> # git bisect bad | tee -a /var/log/bisect.log >>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit >>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca >>>> Author: Zhang Rui <rui.zhang@intel.com> >>>> Date: Wed Sep 25 20:39:45 2013 +0800 >>>> >>>> ACPI / AC: convert ACPI ac driver to platform bus >>>> >>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com> >>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>> >>> Off to the two of you... >>> >>> Guenter >>> >>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 >>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers >>>> >>>> >>>> Please help me, on how I can help debug this more, and please >>>> also read the newest from >>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>> >>>> Manuel Krause >>>> >>>> >>>> >>> >> >> Sorry, that I've forgotton to add the following last night: After >> the first bisection round, I was so glad about a result that >> time, that I reverted this mentioned patch from the 3.13.8 >> kernel, but this didn't fix it. > > This means that the commit in question didn't introduce the problem > you're seeing. > > Please check out commit 7f2dc5c4bcbf (Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm), > build a kernel from that and see if you can reprocude the problem with it. > If so, it can be used as your new "first known bad" kernel for bisection. > Otherwise, you can use it as the "first good" one and commit cc8ef52707341 > as "first known bad". > > Thanks! > Sorry, for any inconvenience, but you should forget about what I've written, that reverting the patch in question from 3.13.x didn't fix it. Of course it didn't fix it, as the patch doesn't cleanly revert from release-kernels at all. My mistake! I' ve been guided by Guenter Roeck through two more bisecting sessions/ways on this, that always pointed to the commit in question. Some citation: Me: >>> O.k. I've now followed your latest directions: >>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca >>> => result after rebuild was BAD => >>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca >>> => result after rebuild was GOOD >>> [ ...] >>> Reverting that commit in question from this very git tree makes the >>> kernel work as expected. [ ... ] Guenter: >> Report the results you have above. That should show without question >> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit, >> and it should be easy to reproduce. That seems to be all I can do for you for now. Please let me know of any preliminary patches to test! And I want to add special thanks to Guenter Roeck for his always-just-in-time assistance over so many days, Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-10 22:51 ` Manuel Krause @ 2014-04-13 0:05 ` Manuel Krause 2014-04-16 18:32 ` Zhang Rui 0 siblings, 1 reply; 22+ messages in thread From: Manuel Krause @ 2014-04-13 0:05 UTC (permalink / raw) To: rui.zhang Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors On 2014-04-11 00:51, Manuel Krause wrote: > On 2014-04-07 13:45, Rafael J. Wysocki wrote: >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote: >>> On 2014-04-06 04:43, Guenter Roeck wrote: >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote: >>>>> On 2014-04-01 01:47, Guenter Roeck wrote: >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote: >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote: >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote: >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause >>>>>>>>>>> wrote: >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel >>>>>>>>>>>>>>> Krause >>>>>>>>>>>>>>> wrote: >>>>>>>> [SNIP] >>>>>>>> >>>>>>>> Long time no reply from you... Have I overseen a unwritten >>>>>>>> convention? Or were my charts that unusable for your >>>>>>>> analysis/work? >>>>>>>> >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the >>>>>>>> problem >>>>>>>> persists. "Strange / dangerous fan policy..." >>>>>>>> >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential >>>>>>>> overheating problem by manually issuing a: >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>>>>>>> _before_ obviously critical temperatures occur. Remind: This >>>>>>>> particular setting may only work for my system! ...and keeps >>>>>>>> working for 3.14-rc. >>>>>>>> >>>>>>>> In the following I'd like to present you a modified output >>>>>>>> of my >>>>>>>> /sys/class/thermal, that I've written a script for (for my >>>>>>>> system), that shows the results in the way of >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>>>>>>> {I've uploded the files to pastebin, to not swamp you and >>>>>>>> the >>>>>>>> lists with so many lines of logs.} >>>>>>>> >>>>>>>> For the last good kernel -- 3.12.14 -- in-use: >>>>>>>> http://pastebin.com/HL1PNcda >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp: >>>>>>>> http://pastebin.com/98hgf1a9 >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>>>>>>> http://pastebin.com/MuTwTnjD >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>>>>>>> *) command: >>>>>>>> http://pastebin.com/2peda54z >>>>>>>> >>>>>>>> Please, have a look at them! And maybe, give me hints on >>>>>>>> how I >>>>>>>> can help you to further debug this issue, as my manual >>>>>>>> method >>>>>>>> works but it's annoying. >>>>>>>> >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>>>>>>> Email-thread to someone in charge. >>>>>>>> >>>>>>>> Thank you for your work && best regards, >>>>>>>> Manuel Krause >>>>>>>> >>>>>>> >>>>>>> This is still BUG 71711 >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>>>>> >>>>>>> 3.12.15 works very well >>>>>>> 3.13.7 fails >>>>>>> 3.14.0-rc8 fails >>>>>>> >>>>>> >>>>>> Best you can do would really be to bisect the problem. >>>>>> Unfortunately only you (or someone else with an affected >>>>>> system) >>>>>> can do that. Once the culprit is known it would be much easier >>>>>> to get it fixed. >>>>>> >>>>>> To answer your earlier question: I don't think you did >>>>>> anything >>>>>> wrong. >>>>>> I guess everyone else is just as clueless as I am (if not, >>>>>> speak up >>>>>> and help ;-). >>>>>> >>>>>> Guenter >>>>>> >>>>> >>>>> I've now bisected two times. From two different kernel origins, >>>>> just to be sure, as I'm new to this stupid-and-lengthy method, >>>>> and, to be sure, I haven't given a false positive inbetween due >>>>> to boredom. >>>>> >>>> >>>> Not really. Keep in mint that you were able to track down the >>>> bad >>>> commit >>>> among more than 10,000 commits in a reasonably short period >>>> of time. >>>> >>>>> In the end it says each time: >>>>> # git bisect bad | tee -a /var/log/bisect.log >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad >>>>> commit >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca >>>>> Author: Zhang Rui <rui.zhang@intel.com> >>>>> Date: Wed Sep 25 20:39:45 2013 +0800 >>>>> >>>>> ACPI / AC: convert ACPI ac driver to platform bus >>>>> >>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com> >>>>> Signed-off-by: Rafael J. Wysocki >>>>> <rafael.j.wysocki@intel.com> >>>>> >>>> Off to the two of you... >>>> >>>> Guenter >>>> >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers >>>>> >>>>> >>>>> Please help me, on how I can help debug this more, and please >>>>> also read the newest from >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>>> >>>>> Manuel Krause >>>>> >>>>> >>>>> >>>> >>> >>> Sorry, that I've forgotton to add the following last night: After >>> the first bisection round, I was so glad about a result that >>> time, that I reverted this mentioned patch from the 3.13.8 >>> kernel, but this didn't fix it. >> >> This means that the commit in question didn't introduce the >> problem >> you're seeing. >> >> Please check out commit 7f2dc5c4bcbf (Merge tag >> 'dm-3.13-changes' of >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm), >> >> build a kernel from that and see if you can reprocude the >> problem with it. >> If so, it can be used as your new "first known bad" kernel for >> bisection. >> Otherwise, you can use it as the "first good" one and commit >> cc8ef52707341 >> as "first known bad". >> >> Thanks! >> > > Sorry, for any inconvenience, but you should forget about what > I've written, that reverting the patch in question from 3.13.x > didn't fix it. Of course it didn't fix it, as the patch doesn't > cleanly revert from release-kernels at all. My mistake! > > I' ve been guided by Guenter Roeck through two more bisecting > sessions/ways on this, that always pointed to the commit in > question. > > Some citation: > Me: >>>> O.k. I've now followed your latest directions: >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca >>>> => result after rebuild was BAD => >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca >>>> => result after rebuild was GOOD >>>> > [ ...] >>>> Reverting that commit in question from this very git tree >>>> makes the >>>> kernel work as expected. > [ ... ] > Guenter: >>> Report the results you have above. That should show without >>> question >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit, >>> and it should be easy to reproduce. > > That seems to be all I can do for you for now. Please let me know > of any preliminary patches to test! > And I want to add special thanks to Guenter Roeck for his > always-just-in-time assistance over so many days, > > Manuel Krause > BTW -- applying this patch in question to a 3.12.17 kernel, that worked optimal WITHOUT it, makes it FAIL as described for 3.13.x kernels. (And, yes, the patch applied cleanly, compiled fine and boots nicely.) Manuel Krause ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-13 0:05 ` Manuel Krause @ 2014-04-16 18:32 ` Zhang Rui 2014-04-16 22:17 ` Manuel Krause 0 siblings, 1 reply; 22+ messages in thread From: Zhang Rui @ 2014-04-16 18:32 UTC (permalink / raw) To: Manuel Krause Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote: > On 2014-04-11 00:51, Manuel Krause wrote: > > On 2014-04-07 13:45, Rafael J. Wysocki wrote: > >> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote: > >>> On 2014-04-06 04:43, Guenter Roeck wrote: > >>>> On 04/05/2014 07:37 PM, Manuel Krause wrote: > >>>>> On 2014-04-01 01:47, Guenter Roeck wrote: > >>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote: > >>>>>>> On 2014-03-20 21:21, Manuel Krause wrote: > >>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote: > >>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote: > >>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: > >>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause > >>>>>>>>>>> wrote: > >>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: > >>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: > >>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel > >>>>>>>>>>>>>>> Krause > >>>>>>>>>>>>>>> wrote: > >>>>>>>> [SNIP] > >>>>>>>> > >>>>>>>> Long time no reply from you... Have I overseen a unwritten > >>>>>>>> convention? Or were my charts that unusable for your > >>>>>>>> analysis/work? > >>>>>>>> > >>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the > >>>>>>>> problem > >>>>>>>> persists. "Strange / dangerous fan policy..." > >>>>>>>> > >>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential > >>>>>>>> overheating problem by manually issuing a: > >>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) > >>>>>>>> _before_ obviously critical temperatures occur. Remind: This > >>>>>>>> particular setting may only work for my system! ...and keeps > >>>>>>>> working for 3.14-rc. > >>>>>>>> > >>>>>>>> In the following I'd like to present you a modified output > >>>>>>>> of my > >>>>>>>> /sys/class/thermal, that I've written a script for (for my > >>>>>>>> system), that shows the results in the way of > >>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: > >>>>>>>> {I've uploded the files to pastebin, to not swamp you and > >>>>>>>> the > >>>>>>>> lists with so many lines of logs.} > >>>>>>>> > >>>>>>>> For the last good kernel -- 3.12.14 -- in-use: > >>>>>>>> http://pastebin.com/HL1PNcda > >>>>>>>> For my first bad kernel revision 3.13 -- at critical temp: > >>>>>>>> http://pastebin.com/98hgf1a9 > >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: > >>>>>>>> http://pastebin.com/MuTwTnjD > >>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the > >>>>>>>> *) command: > >>>>>>>> http://pastebin.com/2peda54z > >>>>>>>> > >>>>>>>> Please, have a look at them! And maybe, give me hints on > >>>>>>>> how I > >>>>>>>> can help you to further debug this issue, as my manual > >>>>>>>> method > >>>>>>>> works but it's annoying. > >>>>>>>> > >>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this > >>>>>>>> Email-thread to someone in charge. > >>>>>>>> > >>>>>>>> Thank you for your work && best regards, > >>>>>>>> Manuel Krause > >>>>>>>> > >>>>>>> > >>>>>>> This is still BUG 71711 > >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 > >>>>>>> > >>>>>>> 3.12.15 works very well > >>>>>>> 3.13.7 fails > >>>>>>> 3.14.0-rc8 fails > >>>>>>> > >>>>>> > >>>>>> Best you can do would really be to bisect the problem. > >>>>>> Unfortunately only you (or someone else with an affected > >>>>>> system) > >>>>>> can do that. Once the culprit is known it would be much easier > >>>>>> to get it fixed. > >>>>>> > >>>>>> To answer your earlier question: I don't think you did > >>>>>> anything > >>>>>> wrong. > >>>>>> I guess everyone else is just as clueless as I am (if not, > >>>>>> speak up > >>>>>> and help ;-). > >>>>>> > >>>>>> Guenter > >>>>>> > >>>>> > >>>>> I've now bisected two times. From two different kernel origins, > >>>>> just to be sure, as I'm new to this stupid-and-lengthy method, > >>>>> and, to be sure, I haven't given a false positive inbetween due > >>>>> to boredom. > >>>>> > >>>> > >>>> Not really. Keep in mint that you were able to track down the > >>>> bad > >>>> commit > >>>> among more than 10,000 commits in a reasonably short period > >>>> of time. > >>>> > >>>>> In the end it says each time: > >>>>> # git bisect bad | tee -a /var/log/bisect.log > >>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad > >>>>> commit > >>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca > >>>>> Author: Zhang Rui <rui.zhang@intel.com> > >>>>> Date: Wed Sep 25 20:39:45 2013 +0800 > >>>>> > >>>>> ACPI / AC: convert ACPI ac driver to platform bus > >>>>> > >>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com> > >>>>> Signed-off-by: Rafael J. Wysocki > >>>>> <rafael.j.wysocki@intel.com> > >>>>> > >>>> Off to the two of you... > >>>> > >>>> Guenter > >>>> > >>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 > >>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers > >>>>> > >>>>> > >>>>> Please help me, on how I can help debug this more, and please > >>>>> also read the newest from > >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 > >>>>> > >>>>> Manuel Krause > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> Sorry, that I've forgotton to add the following last night: After > >>> the first bisection round, I was so glad about a result that > >>> time, that I reverted this mentioned patch from the 3.13.8 > >>> kernel, but this didn't fix it. > >> > >> This means that the commit in question didn't introduce the > >> problem > >> you're seeing. > >> > >> Please check out commit 7f2dc5c4bcbf (Merge tag > >> 'dm-3.13-changes' of > >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm), > >> > >> build a kernel from that and see if you can reprocude the > >> problem with it. > >> If so, it can be used as your new "first known bad" kernel for > >> bisection. > >> Otherwise, you can use it as the "first good" one and commit > >> cc8ef52707341 > >> as "first known bad". > >> > >> Thanks! > >> > > > > Sorry, for any inconvenience, but you should forget about what > > I've written, that reverting the patch in question from 3.13.x > > didn't fix it. Of course it didn't fix it, as the patch doesn't > > cleanly revert from release-kernels at all. My mistake! > > > > I' ve been guided by Guenter Roeck through two more bisecting > > sessions/ways on this, that always pointed to the commit in > > question. > > > > Some citation: > > Me: > >>>> O.k. I've now followed your latest directions: > >>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca > >>>> => result after rebuild was BAD => > >>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca > >>>> => result after rebuild was GOOD > >>>> > > [ ...] > >>>> Reverting that commit in question from this very git tree > >>>> makes the > >>>> kernel work as expected. > > [ ... ] > > Guenter: > >>> Report the results you have above. That should show without > >>> question > >>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit, > >>> and it should be easy to reproduce. > > > > That seems to be all I can do for you for now. Please let me know > > of any preliminary patches to test! > > And I want to add special thanks to Guenter Roeck for his > > always-just-in-time assistance over so many days, > > > > Manuel Krause > > > > BTW -- applying this patch in question to a 3.12.17 kernel, that > worked optimal WITHOUT it, makes it FAIL as described for 3.13.x > kernels. (And, yes, the patch applied cleanly, compiled fine and > boots nicely.) > could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if the problem still exist in 3.12.17 kernel? thanks, rui > Manuel Krause > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: 3.13.?: Strange / dangerous fan policy... 2014-04-16 18:32 ` Zhang Rui @ 2014-04-16 22:17 ` Manuel Krause 0 siblings, 0 replies; 22+ messages in thread From: Manuel Krause @ 2014-04-16 22:17 UTC (permalink / raw) To: Zhang Rui Cc: Rafael J. Wysocki, Guenter Roeck, linux-kernel, linux-pm, Jean Delvare, lm-sensors On 2014-04-16 20:32, Zhang Rui wrote: > On Sun, 2014-04-13 at 02:05 +0200, Manuel Krause wrote: >> On 2014-04-11 00:51, Manuel Krause wrote: >>> On 2014-04-07 13:45, Rafael J. Wysocki wrote: >>>> On Monday, April 07, 2014 01:17:51 AM Manuel Krause wrote: >>>>> On 2014-04-06 04:43, Guenter Roeck wrote: >>>>>> On 04/05/2014 07:37 PM, Manuel Krause wrote: >>>>>>> On 2014-04-01 01:47, Guenter Roeck wrote: >>>>>>>> On 03/31/2014 04:37 PM, Manuel Krause wrote: >>>>>>>>> On 2014-03-20 21:21, Manuel Krause wrote: >>>>>>>>>> On 2014-03-11 22:59, Manuel Krause wrote: >>>>>>>>>>> On 2014-03-10 02:49, Manuel Krause wrote: >>>>>>>>>>>> On 2014-03-09 18:58, Rafael J. Wysocki wrote: >>>>>>>>>>>>> On Sunday, March 09, 2014 01:10:25 AM Manuel Krause >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On 2014-03-08 16:59, Guenter Roeck wrote: >>>>>>>>>>>>>>> On 03/08/2014 03:08 AM, Jean Delvare wrote: >>>>>>>>>>>>>>>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel >>>>>>>>>>>>>>>>> Krause >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>> [SNIP] >>>>>>>>>> >>>>>>>>>> Long time no reply from you... Have I overseen a unwritten >>>>>>>>>> convention? Or were my charts that unusable for your >>>>>>>>>> analysis/work? >>>>>>>>>> >>>>>>>>>> Two days ago, I tried the 3.14.0-rc7-vanilla. And the >>>>>>>>>> problem >>>>>>>>>> persists. "Strange / dangerous fan policy..." >>>>>>>>>> >>>>>>>>>> Since kernel 3.13.6 I've managed to 'fix' the potential >>>>>>>>>> overheating problem by manually issuing a: >>>>>>>>>> "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) >>>>>>>>>> _before_ obviously critical temperatures occur. Remind: This >>>>>>>>>> particular setting may only work for my system! ...and keeps >>>>>>>>>> working for 3.14-rc. >>>>>>>>>> >>>>>>>>>> In the following I'd like to present you a modified output >>>>>>>>>> of my >>>>>>>>>> /sys/class/thermal, that I've written a script for (for my >>>>>>>>>> system), that shows the results in the way of >>>>>>>>>> linux/Documentation/thermal/sysfs-api.txt, point 3: >>>>>>>>>> {I've uploded the files to pastebin, to not swamp you and >>>>>>>>>> the >>>>>>>>>> lists with so many lines of logs.} >>>>>>>>>> >>>>>>>>>> For the last good kernel -- 3.12.14 -- in-use: >>>>>>>>>> http://pastebin.com/HL1PNcda >>>>>>>>>> For my first bad kernel revision 3.13 -- at critical temp: >>>>>>>>>> http://pastebin.com/98hgf1a9 >>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- at critical temp: >>>>>>>>>> http://pastebin.com/MuTwTnjD >>>>>>>>>> For the last bad kernel -- 3.14.0-rc7 -- after issuing the >>>>>>>>>> *) command: >>>>>>>>>> http://pastebin.com/2peda54z >>>>>>>>>> >>>>>>>>>> Please, have a look at them! And maybe, give me hints on >>>>>>>>>> how I >>>>>>>>>> can help you to further debug this issue, as my manual >>>>>>>>>> method >>>>>>>>>> works but it's annoying. >>>>>>>>>> >>>>>>>>>> And, PLEASE CC: ME, as I'm not on the lists. Or lead this >>>>>>>>>> Email-thread to someone in charge. >>>>>>>>>> >>>>>>>>>> Thank you for your work && best regards, >>>>>>>>>> Manuel Krause >>>>>>>>>> >>>>>>>>> >>>>>>>>> This is still BUG 71711 >>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>>>>>>> >>>>>>>>> 3.12.15 works very well >>>>>>>>> 3.13.7 fails >>>>>>>>> 3.14.0-rc8 fails >>>>>>>>> >>>>>>>> >>>>>>>> Best you can do would really be to bisect the problem. >>>>>>>> Unfortunately only you (or someone else with an affected >>>>>>>> system) >>>>>>>> can do that. Once the culprit is known it would be much easier >>>>>>>> to get it fixed. >>>>>>>> >>>>>>>> To answer your earlier question: I don't think you did >>>>>>>> anything >>>>>>>> wrong. >>>>>>>> I guess everyone else is just as clueless as I am (if not, >>>>>>>> speak up >>>>>>>> and help ;-). >>>>>>>> >>>>>>>> Guenter >>>>>>>> >>>>>>> >>>>>>> I've now bisected two times. From two different kernel origins, >>>>>>> just to be sure, as I'm new to this stupid-and-lengthy method, >>>>>>> and, to be sure, I haven't given a false positive inbetween due >>>>>>> to boredom. >>>>>>> >>>>>> >>>>>> Not really. Keep in mint that you were able to track down the >>>>>> bad >>>>>> commit >>>>>> among more than 10,000 commits in a reasonably short period >>>>>> of time. >>>>>> >>>>>>> In the end it says each time: >>>>>>> # git bisect bad | tee -a /var/log/bisect.log >>>>>>> cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad >>>>>>> commit >>>>>>> commit cc8ef52707341e67a12067d6ead991d56ea017ca >>>>>>> Author: Zhang Rui <rui.zhang@intel.com> >>>>>>> Date: Wed Sep 25 20:39:45 2013 +0800 >>>>>>> >>>>>>> ACPI / AC: convert ACPI ac driver to platform bus >>>>>>> >>>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com> >>>>>>> Signed-off-by: Rafael J. Wysocki >>>>>>> <rafael.j.wysocki@intel.com> >>>>>>> >>>>>> Off to the two of you... >>>>>> >>>>>> Guenter >>>>>> >>>>>>> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 >>>>>>> 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers >>>>>>> >>>>>>> >>>>>>> Please help me, on how I can help debug this more, and please >>>>>>> also read the newest from >>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=71711 >>>>>>> >>>>>>> Manuel Krause >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> Sorry, that I've forgotton to add the following last night: After >>>>> the first bisection round, I was so glad about a result that >>>>> time, that I reverted this mentioned patch from the 3.13.8 >>>>> kernel, but this didn't fix it. >>>> >>>> This means that the commit in question didn't introduce the >>>> problem >>>> you're seeing. >>>> >>>> Please check out commit 7f2dc5c4bcbf (Merge tag >>>> 'dm-3.13-changes' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm), >>>> >>>> build a kernel from that and see if you can reprocude the >>>> problem with it. >>>> If so, it can be used as your new "first known bad" kernel for >>>> bisection. >>>> Otherwise, you can use it as the "first good" one and commit >>>> cc8ef52707341 >>>> as "first known bad". >>>> >>>> Thanks! >>>> >>> >>> Sorry, for any inconvenience, but you should forget about what >>> I've written, that reverting the patch in question from 3.13.x >>> didn't fix it. Of course it didn't fix it, as the patch doesn't >>> cleanly revert from release-kernels at all. My mistake! >>> >>> I' ve been guided by Guenter Roeck through two more bisecting >>> sessions/ways on this, that always pointed to the commit in >>> question. >>> >>> Some citation: >>> Me: >>>>>> O.k. I've now followed your latest directions: >>>>>> git checkout -b testing cc8ef52707341e67a12067d6ead991d56ea017ca >>>>>> => result after rebuild was BAD => >>>>>> git revert cc8ef52707341e67a12067d6ead991d56ea017ca >>>>>> => result after rebuild was GOOD >>>>>> >>> [ ...] >>>>>> Reverting that commit in question from this very git tree >>>>>> makes the >>>>>> kernel work as expected. >>> [ ... ] >>> Guenter: >>>>> Report the results you have above. That should show without >>>>> question >>>>> that cc8ef52707341e67a12067d6ead991d56ea017ca is the bad commit, >>>>> and it should be easy to reproduce. >>> >>> That seems to be all I can do for you for now. Please let me know >>> of any preliminary patches to test! >>> And I want to add special thanks to Guenter Roeck for his >>> always-just-in-time assistance over so many days, >>> >>> Manuel Krause >>> >> >> BTW -- applying this patch in question to a 3.12.17 kernel, that >> worked optimal WITHOUT it, makes it FAIL as described for 3.13.x >> kernels. (And, yes, the patch applied cleanly, compiled fine and >> boots nicely.) >> > could you please apply commit 50a2bc5429f07ec4d53df2d287b03bdbceb281bb > on top of commit cc8ef52707341e67a12067d6ead991d56ea017ca and check if > the problem still exist in 3.12.17 kernel? > > thanks, > rui I'm so sorry: 3.12.17 + cc8ef52707341e67a12067d6ead991d56ea017ca + 50a2bc5429f07ec4d53df2d287b03bdbceb281bb does NOT improve the situation. Thank you for your work, Manuel ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-04-16 22:18 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-07 19:33 3.13.?: Strange / dangerous fan policy Manuel Krause 2014-03-07 20:55 ` Guenter Roeck 2014-03-07 22:04 ` Manuel Krause 2014-03-07 22:52 ` Guenter Roeck 2014-03-08 11:08 ` [lm-sensors] " Jean Delvare 2014-03-08 12:36 ` Rafael J. Wysocki 2014-03-08 15:59 ` Guenter Roeck 2014-03-09 0:10 ` Manuel Krause 2014-03-09 17:28 ` Guenter Roeck 2014-03-09 17:58 ` Rafael J. Wysocki 2014-03-10 1:49 ` Manuel Krause 2014-03-11 21:59 ` Manuel Krause [not found] ` <532B4DC5.4010705@netscape.net> 2014-03-31 23:37 ` Manuel Krause 2014-03-31 23:47 ` Guenter Roeck 2014-04-06 2:37 ` Manuel Krause 2014-04-06 2:43 ` Guenter Roeck 2014-04-06 23:17 ` Manuel Krause 2014-04-07 11:45 ` Rafael J. Wysocki 2014-04-10 22:51 ` Manuel Krause 2014-04-13 0:05 ` Manuel Krause 2014-04-16 18:32 ` Zhang Rui 2014-04-16 22:17 ` Manuel Krause
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).