* intel ips: CPU TDP doesn't match expected value @ 2010-08-25 0:48 Cesar Eduardo Barros 2010-08-25 1:37 ` Jesse Barnes 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros 0 siblings, 2 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-25 0:48 UTC (permalink / raw) To: linux-kernel; +Cc: platform-driver-x86, Jesse Barnes, Matthew Garrett I get the following messages on v2.6.36-rc2-203-g502adf5: intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 25, expected 35) intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 intel ips 0000:00:1f.6: MCP power or thermal limit exceeded intel ips 0000:00:1f.6: CPU power or thermal limit exceeded (The last two messages repeat very often, one of them every five seconds; if I understand it correctly, that is normal and expected.) The code implies the "CPU TDP doesn't match expected value" is a sanity check failing. Should that happen, or is there something wrong? /proc/cpuinfo for the first core: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 37 model name : Intel(R) Core(TM) i3 CPU M 330 @ 2.13GHz stepping : 2 cpu MHz : 933.000 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips : 4256.29 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: /sys/devices/virtual/dmi/id/: bios_date: 06/21/2010 bios_vendor: Dell Inc. bios_version: A05 board_asset_tag: board_name: 07NTDG board_vendor: Dell Inc. board_version: A05 chassis_asset_tag: chassis_type: 8 chassis_vendor: Dell Inc. chassis_version: A05 product_name: Inspiron N4010 product_version: A05 sys_vendor: Dell Inc. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: intel ips: CPU TDP doesn't match expected value 2010-08-25 0:48 intel ips: CPU TDP doesn't match expected value Cesar Eduardo Barros @ 2010-08-25 1:37 ` Jesse Barnes 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros 1 sibling, 0 replies; 25+ messages in thread From: Jesse Barnes @ 2010-08-25 1:37 UTC (permalink / raw) To: Cesar Eduardo Barros; +Cc: linux-kernel, platform-driver-x86, Matthew Garrett On Tue, 24 Aug 2010 21:48:45 -0300 Cesar Eduardo Barros <cesarb@cesarb.net> wrote: > I get the following messages on v2.6.36-rc2-203-g502adf5: > > intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value > (found 25, expected 35) > intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 > intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 > intel ips 0000:00:1f.6: MCP power or thermal limit exceeded > intel ips 0000:00:1f.6: CPU power or thermal limit exceeded > > (The last two messages repeat very often, one of them every five > seconds; if I understand it correctly, that is normal and expected.) > > The code implies the "CPU TDP doesn't match expected value" is a sanity > check failing. Should that happen, or is there something wrong? Yeah, it's normal, and should probably be downgraded to an informational message. It just means your BIOS doesn't want you to push the CPU all the way up to its maximum TDP, probably because the platform or chassis wasn't designed for that much heat or power consumption. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-25 0:48 intel ips: CPU TDP doesn't match expected value Cesar Eduardo Barros 2010-08-25 1:37 ` Jesse Barnes @ 2010-08-26 23:29 ` Cesar Eduardo Barros 2010-08-26 23:33 ` Joe Perches ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-26 23:29 UTC (permalink / raw) To: Jesse Barnes Cc: Matthew Garrett, platform-driver-x86, linux-kernel, Cesar Eduardo Barros On my Dell Inspiron N4010, one of these messages is printed every five seconds. Change both to dev_dbg to quieten them even more. Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> --- drivers/platform/x86/intel_ips.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c index 9024480..7833adc 100644 --- a/drivers/platform/x86/intel_ips.c +++ b/drivers/platform/x86/intel_ips.c @@ -607,7 +607,7 @@ static bool mcp_exceeded(struct ips_driver *ips) spin_unlock_irqrestore(&ips->turbo_status_lock, flags); if (ret) - dev_info(&ips->dev->dev, + dev_dbg(&ips->dev->dev, "MCP power or thermal limit exceeded\n"); return ret; @@ -635,7 +635,7 @@ static bool cpu_exceeded(struct ips_driver *ips, int cpu) spin_unlock_irqrestore(&ips->turbo_status_lock, flags); if (ret) - dev_info(&ips->dev->dev, + dev_dbg(&ips->dev->dev, "CPU power or thermal limit exceeded\n"); return ret; -- 1.7.2.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros @ 2010-08-26 23:33 ` Joe Perches 2010-08-27 0:11 ` Cesar Eduardo Barros 2010-08-27 0:18 ` Alan Cox 2010-08-27 1:42 ` Matthew Garrett 2 siblings, 1 reply; 25+ messages in thread From: Joe Perches @ 2010-08-26 23:33 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 26 Aug 2010, Cesar Eduardo Barros wrote: > On my Dell Inspiron N4010, one of these messages is printed every five > seconds. Change both to dev_dbg to quieten them even more. I think you should instead fix your hardware or maybe change your thermal throttling settings. > @@ -607,7 +607,7 @@ static bool mcp_exceeded(struct ips_driver *ips) > - dev_info(&ips->dev->dev, > + dev_dbg(&ips->dev->dev, > "MCP power or thermal limit exceeded\n"); ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-26 23:33 ` Joe Perches @ 2010-08-27 0:11 ` Cesar Eduardo Barros 2010-08-27 0:41 ` Joe Perches 0 siblings, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-27 0:11 UTC (permalink / raw) To: Joe Perches Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 26-08-2010 20:33, Joe Perches escreveu: > On Thu, 26 Aug 2010, Cesar Eduardo Barros wrote: > >> On my Dell Inspiron N4010, one of these messages is printed every five >> seconds. Change both to dev_dbg to quieten them even more. > > I think you should instead fix your hardware or maybe change > your thermal throttling settings. How can I know if this laptop is broken or not? According to Jesse's reply, the BIOS lowered the limit, which could explain why it hits the limit more often: intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 25, expected 35) The thermal throttling seems to be also managed by the BIOS (and there are no settings about it on the BIOS setup that I remember): CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 mce: CPU supports 9 MCE banks CPU0: Thermal monitoring handled by SMI using mwait in idle threads. [...] CPU0: Intel(R) Core(TM) i3 CPU M 330 @ 2.13GHz stepping 02 Booting Node 0, Processors #1 CPU1: Thermal monitoring handled by SMI #2 CPU2: Thermal monitoring handled by SMI #3 Ok. CPU3: Thermal monitoring handled by SMI Brought up 4 CPUs And ACPI is of no help, it shows two thermal zones, one always showing 27 C, the other always showing 0 C (and even though that is below the reported thresholds of 55 C and 71 C for the fans, the fan still speeds up and down on its own, showing that it is just the reporting that is broken). The fan does slow down to almost nothing (or even off) when idle and spins up to a strong hot breeze when compiling the kernel (make -j8), so the thermal monitoring on the BIOS seems to be working fine. I would expect broken hardware to run the fan on high speed all the time (not cooling enough) or low speed all the time (thermal sensor broken). The coretemp module (which is not autoloaded) seems to be more helpful; it shows between 43000 and 45000 for temp1_input on both cores when idle, going up to 50000-56000 when lightly loaded (I have not looked at it yet while doing a heavy compile). Is there a way to know if all this is just an oddness of this model, or if there is something which is not working quite right? (All the output above is from 2.6.35.3; I am not running 2.3.36-rc2+ right now because it hangs on resume, and I have not yet had the time to look at it.) > >> @@ -607,7 +607,7 @@ static bool mcp_exceeded(struct ips_driver *ips) >> - dev_info(&ips->dev->dev, >> + dev_dbg(&ips->dev->dev, >> "MCP power or thermal limit exceeded\n"); > > -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 0:11 ` Cesar Eduardo Barros @ 2010-08-27 0:41 ` Joe Perches 2010-08-27 1:38 ` Cesar Eduardo Barros 0 siblings, 1 reply; 25+ messages in thread From: Joe Perches @ 2010-08-27 0:41 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 2010-08-26 at 21:11 -0300, Cesar Eduardo Barros wrote: > Em 26-08-2010 20:33, Joe Perches escreveu: > > On Thu, 26 Aug 2010, Cesar Eduardo Barros wrote: > >> On my Dell Inspiron N4010, one of these messages is printed every five > >> seconds. Change both to dev_dbg to quieten them even more. > > I think you should instead fix your hardware or maybe change > > your thermal throttling settings. I was probably a bit hasty in writing that. > Is there a way to know if all this is just an oddness of this model, or > if there is something which is not working quite right? > > (All the output above is from 2.6.35.3; I am not running 2.3.36-rc2+ > right now because it hangs on resume, and I have not yet had the time to > look at it.) Perhaps you might try this patch and get a bit more information. It seems a sensible patch and perhaps should be applied anyway. Signed-off-by: Joe Perches <joe@perches.com> --- drivers/platform/x86/intel_ips.c | 58 +++++++++++++++++++++++++++++-------- 1 files changed, 45 insertions(+), 13 deletions(-) diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c index 9024480..73f9ad1 100644 --- a/drivers/platform/x86/intel_ips.c +++ b/drivers/platform/x86/intel_ips.c @@ -598,17 +598,36 @@ static bool mcp_exceeded(struct ips_driver *ips) { unsigned long flags; bool ret = false; + u16 mcp_avg_temp; + u16 mcp_temp_limit; + u16 mcp_power_limit; + u32 cpu_avg_power; + u32 mch_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->mcp_avg_temp > (ips->mcp_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power + ips->mch_avg_power > ips->mcp_power_limit) - ret = true; + + mcp_avg_temp = ips->mcp_avg_temp; + mcp_temp_limit = ips->mcp_temp_limit; + mcp_power_limit = ips->mcp_power_limit; + cpu_avg_power = ips->cpu_avg_power; + mch_avg_power = ips->mch_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) + if ((cpu_avg_power + mch_avg_power) > mcp_power_limit) { dev_info(&ips->dev->dev, - "MCP power or thermal limit exceeded\n"); + "MCP power limit %d exceeded: %d\n", + mcp_power_limit, + cpu_avg_power + mch_avg_power); + ret = true; + } + if (mcp_avg_temp > (mcp_temp_limit * 100)) { + dev_info(&ips->dev->dev, + "MCP thermal limit %d exceeded: %d\n", + mcp_temp_limit * 100, + mcp_avg_temp); + ret = true; + } return ret; } @@ -623,20 +642,33 @@ static bool mcp_exceeded(struct ips_driver *ips) static bool cpu_exceeded(struct ips_driver *ips, int cpu) { unsigned long flags; - int avg; bool ret = false; + int avg; + int core_temp_limit; + u16 core_power_limit; + u32 cpu_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); + avg = cpu ? ips->ctv2_avg_temp : ips->ctv1_avg_temp; - if (avg > (ips->limits->core_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power > ips->core_power_limit * 100) - ret = true; + core_temp_limit = ips->limits->core_temp_limit; + core_power_limit = ips->core_power_limit; + cpu_avg_power = ips->cpu_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) + if (cpu_avg_power > (core_power_limit * 100)) { + dev_info(&ips->dev->dev, + "CPU power limit %d exceeded: %d\n", + cpu_avg_power, core_power_limit * 100); + ret = true; + } + if (avg > (core_temp_limit * 100)) { dev_info(&ips->dev->dev, - "CPU power or thermal limit exceeded\n"); + "CPU thermal limit %d exceeded: %d\n", + core_temp_limit * 100, avg); + ret = true; + } return ret; } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 0:41 ` Joe Perches @ 2010-08-27 1:38 ` Cesar Eduardo Barros 2010-08-27 7:39 ` Joe Perches 0 siblings, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-27 1:38 UTC (permalink / raw) To: Joe Perches Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 26-08-2010 21:41, Joe Perches escreveu: > On Thu, 2010-08-26 at 21:11 -0300, Cesar Eduardo Barros wrote: >> Em 26-08-2010 20:33, Joe Perches escreveu: >>> On Thu, 26 Aug 2010, Cesar Eduardo Barros wrote: >>>> On my Dell Inspiron N4010, one of these messages is printed every five >>>> seconds. Change both to dev_dbg to quieten them even more. >>> I think you should instead fix your hardware or maybe change >>> your thermal throttling settings. > > I was probably a bit hasty in writing that. > >> Is there a way to know if all this is just an oddness of this model, or >> if there is something which is not working quite right? >> >> (All the output above is from 2.6.35.3; I am not running 2.3.36-rc2+ >> right now because it hangs on resume, and I have not yet had the time to >> look at it.) > > Perhaps you might try this patch and get a bit more information. Running with it right now. Unless I missed one, the messages do happen exactly every five seconds. Here are the first few lines of the dmesg (grepping for 'intel ips'), and it does seem a bit strange: $ dmesg | fgrep intel\ ips intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 25, expected 35) intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: 25615533 intel ips 0000:00:1f.6: CPU power limit 8183 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6276 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 7952 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 7155 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5928 exceeded: 0 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: 70848 intel ips 0000:00:1f.6: CPU power limit 6430 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6474 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5508 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6569 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5250 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5023 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6209 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 7276 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 9027 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 7008 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5478 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6658 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5192 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6347 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5506 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4447 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4462 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4382 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4862 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5218 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4865 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4131 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5331 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6012 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5323 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4727 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4949 exceeded: 0 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: 69539 intel ips 0000:00:1f.6: CPU power limit 5045 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5070 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5113 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4262 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5158 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4865 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4302 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4430 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4841 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4532 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4814 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4803 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5139 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4050 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 6176 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4746 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 3739 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4863 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4917 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4099 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 4956 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 5018 exceeded: 0 intel ips 0000:00:1f.6: CPU power limit 3863 exceeded: 0 I think you put the parameters in the wrong order on the dev_info() call for the CPU power limit; it is the limit that is 0. Two bogus things I can see: - The first "MCP power limit exceeded" seems very bogus. - What do you mean, core_power_limit is zero? I will go back to 2.6.35; not feeling like debugging my (unrelated) suspend/resume problem right now. If you have another debugging patch, I will probably test it only tomorrow night. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 1:38 ` Cesar Eduardo Barros @ 2010-08-27 7:39 ` Joe Perches 2010-08-27 23:12 ` Cesar Eduardo Barros 0 siblings, 1 reply; 25+ messages in thread From: Joe Perches @ 2010-08-27 7:39 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote: > intel ips 0000:00:1f.6: CPU power limit 3863 exceeded: 0 > > I think you put the parameters in the wrong order on the dev_info() call > for the CPU power limit; it is the limit that is 0. Yes. Fixed. > Two bogus things I can see: > > - The first "MCP power limit exceeded" seems very bogus. > - What do you mean, core_power_limit is zero? I added a logging message whenever the turbo limits change and logging messages for power/temp on MCH for completeness. Maybe this will show something useful like when/how CPU power limit gets set to 0. drivers/platform/x86/intel_ips.c | 105 +++++++++++++++++++++++++++++++------- 1 files changed, 87 insertions(+), 18 deletions(-) diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c index 9024480..9c65e66 100644 --- a/drivers/platform/x86/intel_ips.c +++ b/drivers/platform/x86/intel_ips.c @@ -598,17 +598,36 @@ static bool mcp_exceeded(struct ips_driver *ips) { unsigned long flags; bool ret = false; + u16 mcp_avg_temp; + u16 mcp_temp_limit; + u16 mcp_power_limit; + u32 cpu_avg_power; + u32 mch_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->mcp_avg_temp > (ips->mcp_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power + ips->mch_avg_power > ips->mcp_power_limit) - ret = true; + + mcp_avg_temp = ips->mcp_avg_temp; + mcp_temp_limit = ips->mcp_temp_limit; + mcp_power_limit = ips->mcp_power_limit; + cpu_avg_power = ips->cpu_avg_power; + mch_avg_power = ips->mch_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) + if ((cpu_avg_power + mch_avg_power) > mcp_power_limit) { + dev_info(&ips->dev->dev, + "MCP power limit %u exceeded: cpu:%u + mch:%u\n", + mcp_power_limit, + cpu_avg_power, mch_avg_power); + ret = true; + } + if (mcp_avg_temp > (mcp_temp_limit * 100)) { dev_info(&ips->dev->dev, - "MCP power or thermal limit exceeded\n"); + "MCP thermal limit %d exceeded: %d\n", + mcp_temp_limit * 100, + mcp_avg_temp); + ret = true; + } return ret; } @@ -623,20 +642,33 @@ static bool mcp_exceeded(struct ips_driver *ips) static bool cpu_exceeded(struct ips_driver *ips, int cpu) { unsigned long flags; - int avg; bool ret = false; + int avg; + int core_temp_limit; + u16 core_power_limit; + u32 cpu_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); + avg = cpu ? ips->ctv2_avg_temp : ips->ctv1_avg_temp; - if (avg > (ips->limits->core_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power > ips->core_power_limit * 100) - ret = true; + core_temp_limit = ips->limits->core_temp_limit; + core_power_limit = ips->core_power_limit; + cpu_avg_power = ips->cpu_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) + if (cpu_avg_power > (core_power_limit * 100)) { + dev_info(&ips->dev->dev, + "CPU power limit %d exceeded: %d\n", + core_power_limit * 100, cpu_avg_power); + ret = true; + } + if (avg > (core_temp_limit * 100)) { dev_info(&ips->dev->dev, - "CPU power or thermal limit exceeded\n"); + "CPU thermal limit %d exceeded: %d\n", + core_temp_limit * 100, avg); + ret = true; + } return ret; } @@ -652,13 +684,32 @@ static bool mch_exceeded(struct ips_driver *ips) unsigned long flags; bool ret = false; + u16 mch_avg_temp; + int mch_temp_limit; + u32 mch_avg_power; + u16 mch_power_limit; + spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->mch_avg_temp > (ips->limits->mch_temp_limit * 100)) - ret = true; - if (ips->mch_avg_power > ips->mch_power_limit) - ret = true; + + mch_avg_temp = ips->mch_avg_temp; + mch_temp_limit = ips->limits->mch_temp_limit; + mch_avg_power = ips->mch_avg_power; + mch_power_limit = ips->mch_power_limit; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); + if (mch_avg_power > mch_power_limit) { + dev_info(&ips->dev->dev, + "MCH power limit %d exceeded: %d\n", + mch_power_limit, mch_avg_power); + ret = true; + } + if (mch_avg_temp > (mch_temp_limit * 100)) { + dev_info(&ips->dev->dev, + "MCH thermal limit %d exceeded: %d\n", + mch_temp_limit * 100, mch_avg_temp); + ret = true; + } return ret; } @@ -689,6 +740,16 @@ static void update_turbo_limits(struct ips_driver *ips) /* Ignore BIOS CPU vs GPU pref */ } +static void show_turbo_limits(const struct ips_driver *ips, const char *caller) +{ + dev_info(&ips->dev->dev, + "%s:%s cte:%u gte:%u cpt:%u mpl:%u mtl:%u mpl:%u\n", + __func__, caller, + ips->cpu_turbo_enabled, ips->gpu_turbo_enabled, + ips->core_power_limit, ips->mch_power_limit, + ips->mcp_temp_limit, ips->mcp_power_limit); +} + /** * ips_adjust - adjust power clamp based on thermal state * @data: ips driver structure @@ -734,12 +795,18 @@ static int ips_adjust(void *data) do { bool cpu_busy = ips_cpu_busy(ips); bool gpu_busy = ips_gpu_busy(ips); + bool updated = false; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->poll_turbo_status) + if (ips->poll_turbo_status) { update_turbo_limits(ips); + updated = true; + } spin_unlock_irqrestore(&ips->turbo_status_lock, flags); + if (updated) + show_turbo_limits(ips, __func__); + /* Update turbo status if necessary */ if (ips->cpu_turbo_enabled) ips_enable_cpu_turbo(ips); @@ -1158,6 +1225,8 @@ static irqreturn_t ips_irq_handler(int irq, void *arg) spin_unlock(&ips->turbo_status_lock); thm_writeb(THM_SEC, SEC_ACK); + + show_turbo_limits(ips, __func__); } thm_writeb(THM_TES, tes); } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 7:39 ` Joe Perches @ 2010-08-27 23:12 ` Cesar Eduardo Barros 2010-08-28 2:21 ` Joe Perches 0 siblings, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-27 23:12 UTC (permalink / raw) To: Joe Perches Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 27-08-2010 04:39, Joe Perches escreveu: > On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote: >> intel ips 0000:00:1f.6: CPU power limit 3863 exceeded: 0 >> >> I think you put the parameters in the wrong order on the dev_info() call >> for the CPU power limit; it is the limit that is 0. > > Yes. Fixed. > >> Two bogus things I can see: >> >> - The first "MCP power limit exceeded" seems very bogus. >> - What do you mean, core_power_limit is zero? > > I added a logging message whenever the turbo limits change > and logging messages for power/temp on MCH for completeness. > > Maybe this will show something useful like when/how > CPU power limit gets set to 0. Running with it right now, did not help much: $ dmesg | fgrep 'intel ips' intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 25, expected 35) intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 + mch:23392829 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:5095 + mch:65379 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 7387 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 8326 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5943 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6428 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5775 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 7061 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5153 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5098 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5208 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 7500 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 9144 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6722 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 7156 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:5693 + mch:64598 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5856 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4209 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4726 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5259 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5212 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4862 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5281 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4235 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4897 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5257 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:5111 + mch:64134 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4843 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4909 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5904 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6059 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5579 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5970 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5213 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6388 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4444 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6545 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4439 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4682 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4337 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4499 Just for fun, here is what debugfs has to say: $ ls /sys/kernel/debug/ips/ cpu_clamp cpu_power cpu_temp mch_power mch_temp $ cat /sys/kernel/debug/ips/* 23.0W 21.0A 4415mW 46.99 59670mW 0.00 All numbers except the last vary a lot. A few more examples: 21.0W 21.0A 4379mW 51.09 6493mW 0.00 23.0W 21.0A 4681mW 45.13 34062mW 0.00 -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 23:12 ` Cesar Eduardo Barros @ 2010-08-28 2:21 ` Joe Perches 2010-08-28 10:46 ` Cesar Eduardo Barros 2010-09-23 20:31 ` Jesse Barnes 0 siblings, 2 replies; 25+ messages in thread From: Joe Perches @ 2010-08-28 2:21 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Fri, 2010-08-27 at 20:12 -0300, Cesar Eduardo Barros wrote: > Em 27-08-2010 04:39, Joe Perches escreveu: > > On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote: > >> - The first "MCP power limit exceeded" seems very bogus. > >> - What do you mean, core_power_limit is zero? > > I added a logging message whenever the turbo limits change > > and logging messages for power/temp on MCH for completeness. > > Maybe this will show something useful like when/how > > CPU power limit gets set to 0. > Running with it right now, did not help much: > > $ dmesg | fgrep 'intel ips' > intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value > (found 25, expected 35) > intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 > intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 > intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 + > mch:23392829 > intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675 > intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369 I believe all these limits should always have non-zero values. So I still think you've hardware problems, but I suppose it could be the driver not reading the right registers or some such. It seems odd that the driver never printed a logging message for either of the polling or irq methods to read the device cpu and thermal limits. Jesse or any Intel folk, can you verify or suggest anything better? If cpu_power_limit, or any _limit, is not set perhaps changing the test style to verify limit and adding a printed_once alert for each 0 value limit. At least that'd shut up the continuous logging but at least give a notification message. if (limit) { if (measured_val > limit) dev_info(foo) } else dev_alert_once() Maybe something like this: drivers/platform/x86/intel_ips.c | 160 +++++++++++++++++++++++++++++++++----- 1 files changed, 140 insertions(+), 20 deletions(-) diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c index 9024480..10eba04 100644 --- a/drivers/platform/x86/intel_ips.c +++ b/drivers/platform/x86/intel_ips.c @@ -598,17 +598,53 @@ static bool mcp_exceeded(struct ips_driver *ips) { unsigned long flags; bool ret = false; + static bool printed_zero_mcp_power_limit; + static bool printed_zero_mcp_temp_limit; + u16 mcp_avg_temp; + u16 mcp_temp_limit; + u16 mcp_power_limit; + u32 cpu_avg_power; + u32 mch_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->mcp_avg_temp > (ips->mcp_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power + ips->mch_avg_power > ips->mcp_power_limit) - ret = true; + + mcp_avg_temp = ips->mcp_avg_temp; + mcp_temp_limit = ips->mcp_temp_limit; + mcp_power_limit = ips->mcp_power_limit; + cpu_avg_power = ips->cpu_avg_power; + mch_avg_power = ips->mch_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) - dev_info(&ips->dev->dev, - "MCP power or thermal limit exceeded\n"); + if (mcp_power_limit) { + if ((cpu_avg_power + mch_avg_power) > mcp_power_limit) { + dev_info(&ips->dev->dev, + "MCP power limit %u exceeded: cpu:%u+mch:%u\n", + mcp_power_limit, + cpu_avg_power, mch_avg_power); + ret = true; + } + } else { + if (!printed_zero_mcp_power_limit) { + printed_zero_mcp_power_limit = true; + dev_alert(&ips->dev->dev, "MCP power limit is 0!\n"); + } + } + + if (mcp_temp_limit) { + if (mcp_avg_temp > (mcp_temp_limit * 100)) { + dev_info(&ips->dev->dev, + "MCP thermal limit %d exceeded: %d\n", + mcp_temp_limit * 100, + mcp_avg_temp); + ret = true; + } + } else { + if (!printed_zero_mcp_temp_limit) { + printed_zero_mcp_temp_limit = true; + dev_alert(&ips->dev->dev, "MCP thermal limit is 0!\n"); + } + } return ret; } @@ -623,20 +659,50 @@ static bool mcp_exceeded(struct ips_driver *ips) static bool cpu_exceeded(struct ips_driver *ips, int cpu) { unsigned long flags; - int avg; bool ret = false; + static bool printed_zero_core_power_limit; + static bool printed_zero_core_temp_limit; + int avg; + int core_temp_limit; + u16 core_power_limit; + u32 cpu_avg_power; spin_lock_irqsave(&ips->turbo_status_lock, flags); + avg = cpu ? ips->ctv2_avg_temp : ips->ctv1_avg_temp; - if (avg > (ips->limits->core_temp_limit * 100)) - ret = true; - if (ips->cpu_avg_power > ips->core_power_limit * 100) - ret = true; + core_temp_limit = ips->limits->core_temp_limit; + core_power_limit = ips->core_power_limit; + cpu_avg_power = ips->cpu_avg_power; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); - if (ret) - dev_info(&ips->dev->dev, - "CPU power or thermal limit exceeded\n"); + if (core_power_limit) { + if (cpu_avg_power > (core_power_limit * 100)) { + dev_info(&ips->dev->dev, + "CPU power limit %d exceeded: %d\n", + core_power_limit * 100, cpu_avg_power); + ret = true; + } + } else { + if (!printed_zero_core_power_limit) { + printed_zero_core_power_limit = true; + dev_alert(&ips->dev->dev, "CPU power limit is 0!\n"); + } + } + + if (core_temp_limit) { + if (avg > (core_temp_limit * 100)) { + dev_info(&ips->dev->dev, + "CPU thermal limit %d exceeded: %d\n", + core_temp_limit * 100, avg); + ret = true; + } + } else { + if (!printed_zero_core_temp_limit) { + printed_zero_core_temp_limit = true; + dev_alert(&ips->dev->dev, "CPU thermal limit is 0!\n"); + } + } return ret; } @@ -651,14 +717,50 @@ static bool mch_exceeded(struct ips_driver *ips) { unsigned long flags; bool ret = false; + static bool printed_zero_mch_power_limit; + static bool printed_zero_mch_temp_limit; + u16 mch_avg_temp; + int mch_temp_limit; + u32 mch_avg_power; + u16 mch_power_limit; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->mch_avg_temp > (ips->limits->mch_temp_limit * 100)) - ret = true; - if (ips->mch_avg_power > ips->mch_power_limit) - ret = true; + + mch_avg_temp = ips->mch_avg_temp; + mch_temp_limit = ips->limits->mch_temp_limit; + mch_avg_power = ips->mch_avg_power; + mch_power_limit = ips->mch_power_limit; + spin_unlock_irqrestore(&ips->turbo_status_lock, flags); + if (mch_power_limit) { + if (mch_avg_power > mch_power_limit) { + dev_info(&ips->dev->dev, + "MCH power limit %d exceeded: %d\n", + mch_power_limit, mch_avg_power); + ret = true; + } + } else { + if (!printed_zero_mch_power_limit) { + printed_zero_mch_power_limit = true; + dev_alert(&ips->dev->dev, "MCH power limit is 0!\n"); + } + } + + if (mch_temp_limit) { + if (mch_avg_temp > (mch_temp_limit * 100)) { + dev_info(&ips->dev->dev, + "MCH thermal limit %d exceeded: %d\n", + mch_temp_limit * 100, mch_avg_temp); + ret = true; + } + } else { + if (!printed_zero_mch_temp_limit) { + printed_zero_mch_temp_limit = true; + dev_alert(&ips->dev->dev, "MCH thermal limit is 0!\n"); + } + } + return ret; } @@ -689,6 +791,16 @@ static void update_turbo_limits(struct ips_driver *ips) /* Ignore BIOS CPU vs GPU pref */ } +static void show_turbo_limits(const struct ips_driver *ips, const char *caller) +{ + dev_info(&ips->dev->dev, + "%s:%s cte:%u gte:%u cpt:%u mpl:%u mtl:%u mpl:%u\n", + __func__, caller, + ips->cpu_turbo_enabled, ips->gpu_turbo_enabled, + ips->core_power_limit, ips->mch_power_limit, + ips->mcp_temp_limit, ips->mcp_power_limit); +} + /** * ips_adjust - adjust power clamp based on thermal state * @data: ips driver structure @@ -734,12 +846,18 @@ static int ips_adjust(void *data) do { bool cpu_busy = ips_cpu_busy(ips); bool gpu_busy = ips_gpu_busy(ips); + bool updated = false; spin_lock_irqsave(&ips->turbo_status_lock, flags); - if (ips->poll_turbo_status) + if (ips->poll_turbo_status) { update_turbo_limits(ips); + updated = true; + } spin_unlock_irqrestore(&ips->turbo_status_lock, flags); + if (updated) + show_turbo_limits(ips, __func__); + /* Update turbo status if necessary */ if (ips->cpu_turbo_enabled) ips_enable_cpu_turbo(ips); @@ -1158,6 +1276,8 @@ static irqreturn_t ips_irq_handler(int irq, void *arg) spin_unlock(&ips->turbo_status_lock); thm_writeb(THM_SEC, SEC_ACK); + + show_turbo_limits(ips, __func__); } thm_writeb(THM_TES, tes); } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 2:21 ` Joe Perches @ 2010-08-28 10:46 ` Cesar Eduardo Barros [not found] ` <1282994116.1946.226.camel@Joe-Laptop> 2010-09-23 20:31 ` Jesse Barnes 1 sibling, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-28 10:46 UTC (permalink / raw) To: Joe Perches Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 27-08-2010 23:21, Joe Perches escreveu: > On Fri, 2010-08-27 at 20:12 -0300, Cesar Eduardo Barros wrote: >> Em 27-08-2010 04:39, Joe Perches escreveu: >>> On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote: >>>> - The first "MCP power limit exceeded" seems very bogus. >>>> - What do you mean, core_power_limit is zero? >>> I added a logging message whenever the turbo limits change >>> and logging messages for power/temp on MCH for completeness. >>> Maybe this will show something useful like when/how >>> CPU power limit gets set to 0. > >> Running with it right now, did not help much: >> >> $ dmesg | fgrep 'intel ips' >> intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value >> (found 25, expected 35) >> intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 >> intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 >> intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 + >> mch:23392829 >> intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675 >> intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369 > > I believe all these limits should always have non-zero values. > So I still think you've hardware problems, but I suppose it > could be the driver not reading the right registers or some > such. It seems odd that the driver never printed a logging > message for either of the polling or irq methods to read the > device cpu and thermal limits. Come on, no blaming the BIOS? ;-) If I read the code with your previous patch correctly, show_turbo_limits will never be called if poll_turbo_status is false but no interrupt happens. And we know no interrupt happened (at least not with nonzero register values), because the interrupt handler does two dev_info() right at the beginning. So the limits could still be the ones initially set at ips_probe(). I will try to enable dev_dbg() later and see what it prints. > > Jesse or any Intel folk, can you verify or suggest anything > better? > > If cpu_power_limit, or any _limit, is not set perhaps changing > the test style to verify limit and adding a printed_once alert > for each 0 value limit. At least that'd shut up the continuous > logging but at least give a notification message. > > if (limit) { > if (measured_val> limit) > dev_info(foo) > } else > dev_alert_once() Wouldn't it make more sense to do the alert when the limit is set, instead of when it is used? Also, it should still treat it as limit exceeded (better safe than sorry). Something like: if (measured_val > limit) { if (limit) dev_info(...); ret = true; } -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1282994116.1946.226.camel@Joe-Laptop>]
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages [not found] ` <1282994116.1946.226.camel@Joe-Laptop> @ 2010-08-28 12:52 ` Cesar Eduardo Barros 2010-08-28 13:01 ` Cesar Eduardo Barros 2010-08-28 13:29 ` Joe Perches 0 siblings, 2 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-28 12:52 UTC (permalink / raw) To: Joe Perches; +Cc: Jesse Barnes, Matthew, mjg, platform-driver-x86, linux-kernel You seem to have dropped the CC list by accident, adding it back. Em 28-08-2010 08:15, Joe Perches escreveu: > On Sat, 2010-08-28 at 07:46 -0300, Cesar Eduardo Barros wrote: >> If I read the code with your previous patch correctly, show_turbo_limits >> will never be called if poll_turbo_status is false but no interrupt >> happens. And we know no interrupt happened (at least not with nonzero >> register values), because the interrupt handler does two dev_info() >> right at the beginning. So the limits could still be the ones initially >> set at ips_probe(). >> >> I will try to enable dev_dbg() later and see what it prints. > > or add: > > diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c > index 9024480..450ea44 100644 > --- a/drivers/platform/x86/intel_ips.c > +++ b/drivers/platform/x86/intel_ips.c > @@ -1600,6 +1600,7 @@ static int ips_probe(struct pci_dev *dev, const struct pci_device_id *id) > ips->second_cpu = true; > > update_turbo_limits(ips); > + show_turbo_limits(ips, __func__); > dev_dbg(&dev->dev, "max cpu power clamp: %dW\n", > ips->mcp_power_limit / 10); > dev_dbg(&dev->dev, "max core power clamp: %dW\n", Here it is: intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value (found 25, expected 35) intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 intel ips 0000:00:1f.6: show_turbo_limits:ips_probe cte:1 gte:1 cpt:0 mpl:65535 mtl:65535 mpl:65535 intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8004 + mch:25353039 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4841 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5283 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5586 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6077 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:5871 + mch:64538 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5466 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 8589 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5744 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4859 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:4834 + mch:62385 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4874 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5356 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6557 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:7589 + mch:59343 intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:5536 + mch:60020 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6676 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4401 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5634 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4038 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4700 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5086 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4930 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4697 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5034 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5381 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4417 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6839 intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 4787 -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 12:52 ` Cesar Eduardo Barros @ 2010-08-28 13:01 ` Cesar Eduardo Barros 2010-08-28 13:29 ` Joe Perches 1 sibling, 0 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-28 13:01 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Joe Perches, Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 28-08-2010 09:52, Cesar Eduardo Barros escreveu: > You seem to have dropped the CC list by accident, adding it back. It seems I made some mistake when copying and pasting the CC: list, I fixed it on this message. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 12:52 ` Cesar Eduardo Barros 2010-08-28 13:01 ` Cesar Eduardo Barros @ 2010-08-28 13:29 ` Joe Perches 2010-08-28 14:18 ` Cesar Eduardo Barros 1 sibling, 1 reply; 25+ messages in thread From: Joe Perches @ 2010-08-28 13:29 UTC (permalink / raw) To: Cesar Eduardo Barros; +Cc: Jesse Barnes, mjg, platform-driver-x86, linux-kernel On Sat, 2010-08-28 at 09:52 -0300, Cesar Eduardo Barros wrote: > Em 28-08-2010 08:15, Joe Perches escreveu: > >> I will try to enable dev_dbg() later and see what it prints. > > or add: [] > Here it is: > > intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value > (found 25, expected 35) > intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 > intel ips 0000:00:1f.6: show_turbo_limits:ips_probe cte:1 gte:1 cpt:0 > mpl:65535 mtl:65535 mpl:65535 So ips->core_power_limit is still 0. Is it bad bios or driver? I expect the bios/hw is faulty but fingers are easy to point. Your choice on how to minimize your current logging problem. Changing the message level to dev_dbg probably isn't the right thing to do overall, but it may suit you right now. Thanks for testing. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 13:29 ` Joe Perches @ 2010-08-28 14:18 ` Cesar Eduardo Barros 2010-08-28 15:23 ` Henrique de Moraes Holschuh 0 siblings, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-28 14:18 UTC (permalink / raw) To: Joe Perches Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 28-08-2010 10:29, Joe Perches escreveu: > On Sat, 2010-08-28 at 09:52 -0300, Cesar Eduardo Barros wrote: >> Em 28-08-2010 08:15, Joe Perches escreveu: >>>> I will try to enable dev_dbg() later and see what it prints. >>> or add: > [] >> Here it is: >> >> intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value >> (found 25, expected 35) >> intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 >> intel ips 0000:00:1f.6: show_turbo_limits:ips_probe cte:1 gte:1 cpt:0 >> mpl:65535 mtl:65535 mpl:65535 > > So ips->core_power_limit is still 0. > > Is it bad bios or driver? > > I expect the bios/hw is faulty but fingers are easy to point. > > Your choice on how to minimize your current logging problem. > > Changing the message level to dev_dbg probably isn't the > right thing to do overall, but it may suit you right now. The solution here probably is not less logging. The best solution IMO would be to do some sanity checking when loading the module, and if the values do not make sense, print something to the log and return -ENODEV. I expect that, when it works as it should, the first read while loading the module already returns sane values, so a sanity check there should not have many false positives. OTOH, it is best to not load the module when you think things are strange. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 14:18 ` Cesar Eduardo Barros @ 2010-08-28 15:23 ` Henrique de Moraes Holschuh 2010-08-28 19:07 ` Cesar Eduardo Barros 0 siblings, 1 reply; 25+ messages in thread From: Henrique de Moraes Holschuh @ 2010-08-28 15:23 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Joe Perches, Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Sat, 28 Aug 2010, Cesar Eduardo Barros wrote: > The solution here probably is not less logging. The best solution > IMO would be to do some sanity checking when loading the module, and > if the values do not make sense, print something to the log and > return -ENODEV. As long as your sanity checking won't make the module fail to load in the following scenario: 1. environment temperature control fails, room starts to heat up 2. things go south, server reboots due to exceeded temperature limits 3. OS boots in an overheat situation 4. module refuse to load because it expects to never start in a overheating situation. If the sanity checks will cause (4), then don't add them. rate-limit the thermal alarms (issue them only once every T, and only if temperature has increased more than, say, 5°C from the last alarm). If a given platform is buggy crap (or just el-cheapo trash that overheats all the time) to the point that the module is useless, blacklist it by DMI and inform the user. > I expect that, when it works as it should, the first read while > loading the module already returns sane values, so a sanity check well, as long as "sane" does include server-is-too-hot situations... > there should not have many false positives. OTOH, it is best to not > load the module when you think things are strange. What good is an alarm module that refuses to load when there is an alarm condition happening already? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 15:23 ` Henrique de Moraes Holschuh @ 2010-08-28 19:07 ` Cesar Eduardo Barros 2010-08-30 16:29 ` Jesse Barnes 0 siblings, 1 reply; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-28 19:07 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Joe Perches, Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 28-08-2010 12:23, Henrique de Moraes Holschuh escreveu: > On Sat, 28 Aug 2010, Cesar Eduardo Barros wrote: >> The solution here probably is not less logging. The best solution >> IMO would be to do some sanity checking when loading the module, and >> if the values do not make sense, print something to the log and >> return -ENODEV. > > As long as your sanity checking won't make the module fail to load in the > following scenario: > > 1. environment temperature control fails, room starts to heat up > 2. things go south, server reboots due to exceeded temperature limits > 3. OS boots in an overheat situation > 4. module refuse to load because it expects to never start in a overheating > situation. > > If the sanity checks will cause (4), then don't add them. rate-limit the > thermal alarms (issue them only once every T, and only if temperature has > increased more than, say, 5°C from the last alarm). I have not read the datasheet (I do not even know if it is available to the public; I have not looked), but I would not expect to see a power limit of 0 even if the CPU is on fire. Of course, you have to be more cautious when validating the current temperature (and even then, if it says the CPU is encased in a block of ice, something odd is going on). > If a given platform is buggy crap (or just el-cheapo trash that overheats > all the time) to the point that the module is useless, blacklist it by DMI > and inform the user. > >> I expect that, when it works as it should, the first read while >> loading the module already returns sane values, so a sanity check > > well, as long as "sane" does include server-is-too-hot situations... Of course. (But you most probably will want to s/server/laptop/ here.) >> there should not have many false positives. OTOH, it is best to not >> load the module when you think things are strange. > > What good is an alarm module that refuses to load when there is an alarm > condition happening already? This is not an alarm module; AFAIK it is a module for the feature in recent Intel CPU/GPU chips which allow you to overclock it a bit as long as the thermal and power limit has not been exceeded: config INTEL_IPS tristate "Intel Intelligent Power Sharing" depends on ACPI ---help--- Intel Calpella platforms support dynamic power sharing between the CPU and GPU, maximizing performance in a given TDP. This driver, along with the CPU frequency and i915 drivers, provides that functionality. If in doubt, say Y here; it will only load on supported platforms. If the module is not loaded, it simply will not be able to go above its nominal clock, so refusing to load it is not that much of a problem. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 19:07 ` Cesar Eduardo Barros @ 2010-08-30 16:29 ` Jesse Barnes 2010-08-30 21:42 ` Cesar Eduardo Barros 0 siblings, 1 reply; 25+ messages in thread From: Jesse Barnes @ 2010-08-30 16:29 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Henrique de Moraes Holschuh, Joe Perches, Matthew Garrett, platform-driver-x86, linux-kernel On Sat, 28 Aug 2010 16:07:08 -0300 Cesar Eduardo Barros <cesarb@cesarb.net> wrote: > Em 28-08-2010 12:23, Henrique de Moraes Holschuh escreveu: > > On Sat, 28 Aug 2010, Cesar Eduardo Barros wrote: > >> The solution here probably is not less logging. The best solution > >> IMO would be to do some sanity checking when loading the module, and > >> if the values do not make sense, print something to the log and > >> return -ENODEV. > > > > As long as your sanity checking won't make the module fail to load in the > > following scenario: > > > > 1. environment temperature control fails, room starts to heat up > > 2. things go south, server reboots due to exceeded temperature limits > > 3. OS boots in an overheat situation > > 4. module refuse to load because it expects to never start in a overheating > > situation. > > > > If the sanity checks will cause (4), then don't add them. rate-limit the > > thermal alarms (issue them only once every T, and only if temperature has > > increased more than, say, 5°C from the last alarm). > > I have not read the datasheet (I do not even know if it is available to > the public; I have not looked), but I would not expect to see a power > limit of 0 even if the CPU is on fire. Of course, you have to be more > cautious when validating the current temperature (and even then, if it > says the CPU is encased in a block of ice, something odd is going on). > > > If a given platform is buggy crap (or just el-cheapo trash that overheats > > all the time) to the point that the module is useless, blacklist it by DMI > > and inform the user. > > > >> I expect that, when it works as it should, the first read while > >> loading the module already returns sane values, so a sanity check > > > > well, as long as "sane" does include server-is-too-hot situations... > > Of course. (But you most probably will want to s/server/laptop/ here.) > > >> there should not have many false positives. OTOH, it is best to not > >> load the module when you think things are strange. > > > > What good is an alarm module that refuses to load when there is an alarm > > condition happening already? > > This is not an alarm module; AFAIK it is a module for the feature in > recent Intel CPU/GPU chips which allow you to overclock it a bit as long > as the thermal and power limit has not been exceeded: > > config INTEL_IPS > tristate "Intel Intelligent Power Sharing" > depends on ACPI > ---help--- > Intel Calpella platforms support dynamic power sharing between the > CPU and GPU, maximizing performance in a given TDP. This driver, > along with the CPU frequency and i915 drivers, provides that > functionality. If in doubt, say Y here; it will only load on > supported platforms. > > If the module is not loaded, it simply will not be able to go above its > nominal clock, so refusing to load it is not that much of a problem. It sounds like either your BIOS values are wrong or the driver isn't fetching them correctly on your platform. It's possible that the main issue here is bad thermal limits. There's obviously a relationship between power and thermal output, but the driver tries to monitor both. However it's up to the BIOS to provide the driver with accurate thermal limits, as well as accurate power limits. The power limits sound reasonable at 25W, thus the informational output about 35W vs 25W (35W is what the MCP can handle, but some platforms are designed to handle less, so they clamp it down a bit). But the temp limits look all wrong. I'll see if I can find info on getting better data into the driver... Thanks, -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-30 16:29 ` Jesse Barnes @ 2010-08-30 21:42 ` Cesar Eduardo Barros 0 siblings, 0 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-30 21:42 UTC (permalink / raw) To: Jesse Barnes Cc: Henrique de Moraes Holschuh, Joe Perches, Matthew Garrett, platform-driver-x86, linux-kernel Em 30-08-2010 13:29, Jesse Barnes escreveu: > It's possible that the main issue here is bad thermal limits. There's > obviously a relationship between power and thermal output, but the > driver tries to monitor both. However it's up to the BIOS to provide > the driver with accurate thermal limits, as well as accurate power > limits. The power limits sound reasonable at 25W, thus the > informational output about 35W vs 25W (35W is what the MCP can handle, > but some platforms are designed to handle less, so they clamp it down a > bit). But the temp limits look all wrong. I'll see if I can find info > on getting better data into the driver... If you need more information from this laptop (dmidecode, ACPI AML, etc), just ask. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-28 2:21 ` Joe Perches 2010-08-28 10:46 ` Cesar Eduardo Barros @ 2010-09-23 20:31 ` Jesse Barnes 2010-09-23 20:47 ` Joe Perches 1 sibling, 1 reply; 25+ messages in thread From: Jesse Barnes @ 2010-09-23 20:31 UTC (permalink / raw) To: Joe Perches Cc: Cesar Eduardo Barros, Matthew Garrett, platform-driver-x86, linux-kernel On Fri, 27 Aug 2010 19:21:44 -0700 Joe Perches <joe@perches.com> wrote: > On Fri, 2010-08-27 at 20:12 -0300, Cesar Eduardo Barros wrote: > > Em 27-08-2010 04:39, Joe Perches escreveu: > > > On Thu, 2010-08-26 at 22:38 -0300, Cesar Eduardo Barros wrote: > > >> - The first "MCP power limit exceeded" seems very bogus. > > >> - What do you mean, core_power_limit is zero? > > > I added a logging message whenever the turbo limits change > > > and logging messages for power/temp on MCH for completeness. > > > Maybe this will show something useful like when/how > > > CPU power limit gets set to 0. > > > Running with it right now, did not help much: > > > > $ dmesg | fgrep 'intel ips' > > intel ips 0000:00:1f.6: Warning: CPU TDP doesn't match expected value > > (found 25, expected 35) > > intel ips 0000:00:1f.6: PCI INT C -> GSI 18 (level, low) -> IRQ 18 > > intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 65535 > > intel ips 0000:00:1f.6: MCP power limit 65535 exceeded: cpu:8058 + > > mch:23392829 > > intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 5675 > > intel ips 0000:00:1f.6: CPU power limit 0 exceeded: 6369 > > I believe all these limits should always have non-zero values. > So I still think you've hardware problems, but I suppose it > could be the driver not reading the right registers or some > such. It seems odd that the driver never printed a logging > message for either of the polling or irq methods to read the > device cpu and thermal limits. > > Jesse or any Intel folk, can you verify or suggest anything > better? > > If cpu_power_limit, or any _limit, is not set perhaps changing > the test style to verify limit and adding a printed_once alert > for each 0 value limit. At least that'd shut up the continuous > logging but at least give a notification message. > > if (limit) { > if (measured_val > limit) > dev_info(foo) > } else > dev_alert_once() > > Maybe something like this: > > drivers/platform/x86/intel_ips.c | 160 +++++++++++++++++++++++++++++++++----- > 1 files changed, 140 insertions(+), 20 deletions(-) Yes, we do need something like this. It turns out the BIOS can optionally program several of these values, and we need to sanity check them. If they're not valid (e.g. a core power limit of 0 or MCP temp limit of 0xffff) we need to use the default values in the limit structs. I think the programmed limits are valid if they're nonzero and less than one of the available default limits. If they're not valid, we should just use the default values. I was thinking something like the below for MCP, but Joe you may want to just update your patch instead since it's more complete. Thanks, -- Jesse Barnes, Intel Open Source Technology Center diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c index 9024480..ec72e80 100644 --- a/drivers/platform/x86/intel_ips.c +++ b/drivers/platform/x86/intel_ips.c @@ -662,6 +662,17 @@ static bool mch_exceeded(struct ips_driver *ips) return ret; } +static void clamp_mcp_temp_limit(struct ips_driver *ips) +{ + /* + * BIOS may or may not program an MCP limit. Clamp it to the + * lowest available value. + */ + if (ips->mcp_temp_limit < ips->core_temp_limit || + ips->mcp_temp_limit < ips->mch_temp_limit) + ips->mcp_temp_limit = min(ips->core_temp_limit, ips->mch_temp_limit); +} + /** * update_turbo_limits - get various limits & settings from regs * @ips: IPS driver struct @@ -686,6 +697,7 @@ static void update_turbo_limits(struct ips_driver *ips) ips->mcp_temp_limit = thm_readw(THM_PTL); ips->mcp_power_limit = thm_readw(THM_MPPC); + clamp_mcp_temp_limit(ips); /* Ignore BIOS CPU vs GPU pref */ } @@ -1155,6 +1167,7 @@ static irqreturn_t ips_irq_handler(int irq, void *arg) STS_PTL_SHIFT; ips->mcp_power_limit = (tc1 & STS_PPL_MASK) >> STS_PPL_SHIFT; + clamp_mcp_temp_limit(ips); spin_unlock(&ips->turbo_status_lock); thm_writeb(THM_SEC, SEC_ACK); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-09-23 20:31 ` Jesse Barnes @ 2010-09-23 20:47 ` Joe Perches 2010-09-23 20:50 ` Jesse Barnes 0 siblings, 1 reply; 25+ messages in thread From: Joe Perches @ 2010-09-23 20:47 UTC (permalink / raw) To: Jesse Barnes Cc: Cesar Eduardo Barros, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 2010-09-23 at 13:31 -0700, Jesse Barnes wrote: > I think the programmed limits are valid if they're nonzero and less > than one of the available default limits. If they're not valid, we > should just use the default values. I was thinking something like the > below for MCP, but Joe you may want to just update your patch instead > since it's more complete. You have my patch yes? You can probably integrate and validate it better than I can with your proposed change. I don't have any of the hardware to test. If I can otherwise help somehow, let me know. cheers, Joe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-09-23 20:47 ` Joe Perches @ 2010-09-23 20:50 ` Jesse Barnes 0 siblings, 0 replies; 25+ messages in thread From: Jesse Barnes @ 2010-09-23 20:50 UTC (permalink / raw) To: Joe Perches Cc: Cesar Eduardo Barros, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 23 Sep 2010 13:47:05 -0700 Joe Perches <joe@perches.com> wrote: > On Thu, 2010-09-23 at 13:31 -0700, Jesse Barnes wrote: > > I think the programmed limits are valid if they're nonzero and less > > than one of the available default limits. If they're not valid, we > > should just use the default values. I was thinking something like the > > below for MCP, but Joe you may want to just update your patch instead > > since it's more complete. > > You have my patch yes? > > You can probably integrate and validate it > better than I can with your proposed change. > > I don't have any of the hardware to test. > If I can otherwise help somehow, let me know. Ah ok, I'll fix it up and respin then, thanks. -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros 2010-08-26 23:33 ` Joe Perches @ 2010-08-27 0:18 ` Alan Cox 2010-08-27 0:22 ` Cesar Eduardo Barros 2010-08-27 1:42 ` Matthew Garrett 2 siblings, 1 reply; 25+ messages in thread From: Alan Cox @ 2010-08-27 0:18 UTC (permalink / raw) To: Cesar Eduardo Barros Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel On Thu, 26 Aug 2010 20:29:01 -0300 Cesar Eduardo Barros <cesarb@cesarb.net> wrote: > On my Dell Inspiron N4010, one of these messages is printed every five > seconds. Change both to dev_dbg to quieten them even more. It's not the kind of thing you want to quieten since in most cases it is stuff you want to know about. > > Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> > --- > drivers/platform/x86/intel_ips.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c > index 9024480..7833adc 100644 > --- a/drivers/platform/x86/intel_ips.c > +++ b/drivers/platform/x86/intel_ips.c > @@ -607,7 +607,7 @@ static bool mcp_exceeded(struct ips_driver *ips) > spin_unlock_irqrestore(&ips->turbo_status_lock, flags); > > if (ret) > - dev_info(&ips->dev->dev, > + dev_dbg(&ips->dev->dev, > "MCP power or thermal limit exceeded\n"); Probably better to ratelimit it. Being told it occurs is important, being told every five seconds is indeed excessive. Alan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-27 0:18 ` Alan Cox @ 2010-08-27 0:22 ` Cesar Eduardo Barros 0 siblings, 0 replies; 25+ messages in thread From: Cesar Eduardo Barros @ 2010-08-27 0:22 UTC (permalink / raw) To: Alan Cox; +Cc: Jesse Barnes, Matthew Garrett, platform-driver-x86, linux-kernel Em 26-08-2010 21:18, Alan Cox escreveu: > On Thu, 26 Aug 2010 20:29:01 -0300 > Cesar Eduardo Barros<cesarb@cesarb.net> wrote: > >> On my Dell Inspiron N4010, one of these messages is printed every five >> seconds. Change both to dev_dbg to quieten them even more. > > It's not the kind of thing you want to quieten since in most cases it is > stuff you want to know about. Isn't it a normal thing, however? If I understand it correctly, this driver allows you to use a bit more power but throttles back if you are using too much or it gets too hot. Thus, there should be nothing wrong or even unusual with the situations where these messages appear. >> if (ret) >> - dev_info(&ips->dev->dev, >> + dev_dbg(&ips->dev->dev, >> "MCP power or thermal limit exceeded\n"); > > Probably better to ratelimit it. Being told it occurs is important, being > told every five seconds is indeed excessive. Agreed. Even if it does not happen when idle, it should happen a lot while playing a CPU-heavy and GPU-heavy 3D game. -- Cesar Eduardo Barros cesarb@cesarb.net cesar.barros@gmail.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros 2010-08-26 23:33 ` Joe Perches 2010-08-27 0:18 ` Alan Cox @ 2010-08-27 1:42 ` Matthew Garrett 2 siblings, 0 replies; 25+ messages in thread From: Matthew Garrett @ 2010-08-27 1:42 UTC (permalink / raw) To: Cesar Eduardo Barros; +Cc: Jesse Barnes, platform-driver-x86, linux-kernel I agree that something like this is needed, but I'm happy to defer to Jesse in terms of the implementation (he has the specs, I don't...). I'm also about to head offline for a week, so feel free to push the patch once you've got a solution and if I disagree with it I'll write whiny emails once I get back. -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-09-23 20:50 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-25 0:48 intel ips: CPU TDP doesn't match expected value Cesar Eduardo Barros 2010-08-25 1:37 ` Jesse Barnes 2010-08-26 23:29 ` [PATCH] intel_ips: quieten "power or thermal limit exceeded" messages Cesar Eduardo Barros 2010-08-26 23:33 ` Joe Perches 2010-08-27 0:11 ` Cesar Eduardo Barros 2010-08-27 0:41 ` Joe Perches 2010-08-27 1:38 ` Cesar Eduardo Barros 2010-08-27 7:39 ` Joe Perches 2010-08-27 23:12 ` Cesar Eduardo Barros 2010-08-28 2:21 ` Joe Perches 2010-08-28 10:46 ` Cesar Eduardo Barros [not found] ` <1282994116.1946.226.camel@Joe-Laptop> 2010-08-28 12:52 ` Cesar Eduardo Barros 2010-08-28 13:01 ` Cesar Eduardo Barros 2010-08-28 13:29 ` Joe Perches 2010-08-28 14:18 ` Cesar Eduardo Barros 2010-08-28 15:23 ` Henrique de Moraes Holschuh 2010-08-28 19:07 ` Cesar Eduardo Barros 2010-08-30 16:29 ` Jesse Barnes 2010-08-30 21:42 ` Cesar Eduardo Barros 2010-09-23 20:31 ` Jesse Barnes 2010-09-23 20:47 ` Joe Perches 2010-09-23 20:50 ` Jesse Barnes 2010-08-27 0:18 ` Alan Cox 2010-08-27 0:22 ` Cesar Eduardo Barros 2010-08-27 1:42 ` Matthew Garrett
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).