From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Srivatsa S. Bhat" Subject: Re: 3.15-rc2: longhaul cpufreq stalls tasks for 120s+ Date: Fri, 25 Apr 2014 13:45:11 +0530 Message-ID: <535A198F.3040009@linux.vnet.ibm.com> References: <5358EBB9.3090809@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: Received: from e28smtp02.in.ibm.com ([122.248.162.2]:58489 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751780AbaDYIQD (ORCPT ); Fri, 25 Apr 2014 04:16:03 -0400 Received: from /spool/local by e28smtp02.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 25 Apr 2014 13:46:00 +0530 In-Reply-To: Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: Viresh Kumar Cc: Meelis Roos , "Rafael J. Wysocki" , "cpufreq@vger.kernel.org" , "linux-pm@vger.kernel.org" , Linux Kernel list On 04/25/2014 10:11 AM, Viresh Kumar wrote: > On 25 April 2014 00:33, Meelis Roos wrote: > >> [ 240.140176] INFO: task kworker/0:1:116 blocked for more than 120 seconds. >> [ 240.140353] Not tainted 3.15.0-rc2-dirty #37 >> [ 240.140485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 240.140687] kworker/0:1 D cf6afd50 0 116 2 0x00000000 >> [ 240.140938] Workqueue: events od_dbs_timer >> [ 240.141103] cf6afd98 00000082 00000002 cf6afd50 c1040d91 cf6affec cf6ad310 cf6ad310 >> [ 240.142479] c1286dcb 00000002 cf6afd70 c1040f14 00000000 ce460b30 00000282 00000046 >> [ 240.143011] 00000282 ce460b30 cf6afd78 c1040f39 cf6afd88 00000282 cf6afdb0 ce460b30 >> [ 240.143544] Call Trace: >> [ 240.143706] [] ? mark_held_locks+0x4b/0x61 >> [ 240.143883] [] ? _raw_spin_unlock_irqrestore+0x33/0x3f >> [ 240.144043] [] ? trace_hardirqs_on_caller+0x16d/0x187 >> [ 240.144203] [] ? trace_hardirqs_on+0xb/0xd >> [ 240.144358] [] schedule+0x5d/0x5f >> [ 240.144527] [] cpufreq_freq_transition_begin+0x4a/0x9d >> [ 240.144687] [] ? __wake_up_sync+0x14/0x14 >> [ 240.144860] [] longhaul_setstate+0x88/0x2f1 [longhaul] >> [ 240.145023] [] ? srcu_notifier_call_chain+0x1a/0x1c >> [ 240.145186] [] ? cpufreq_freq_transition_begin+0x95/0x9d >> [ 240.145350] [] longhaul_target+0x7c/0x8b [longhaul] >> [ 240.145511] [] __cpufreq_driver_target+0xfe/0x148 > > Am I reading it correctly? It looks like we are starting another transition > from notifier chain, but I couldn't figure out how from code. > Indeed, its a case of double invocation of the _begin() and _end() notifiers. I developed a patchset to fix this in longhaul, powernow-k6 and k7 drivers, before seeing your patchset that does the same thing. However, looking closer, I don't completely agree with the approach you used to fix the issue, so I'll post my patches as well (which have a different design). Regards, Srivatsa S. Bhat