cpufreq.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
@ 2014-05-05  7:22 Srivatsa S. Bhat
  2014-05-05  8:53 ` Meelis Roos
  0 siblings, 1 reply; 6+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-05  7:22 UTC (permalink / raw)
  To: rjw, mroos, viresh.kumar
  Cc: ego, cpufreq, linux-pm, linux-kernel, srivatsa.bhat

Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
for 2 reasons:

1. At the moment, we have no way to avoid a particular scenario where this
   debug infrastructure can emit false-positive warnings for such drivers.
   The scenario is depicted below:


         Task A						Task B

    /* 1st freq transition */
    Invoke _begin() {
            ...
            ...
    }

    Change the frequency

    /* 2nd freq transition */
    Invoke _begin() {
	    ...	//waiting for B to
            ... //finish _end() for
	    ... //the 1st transition
	    ...	      |				Got interrupt for successful
	    ...	      |				change of frequency (1st one).
	    ...       |
	    ...	      |				/* 1st freq transition */
	    ...	      |				Invoke _end() {
	    ...	      |					...
	    ...	      V				}
	    ...
	    ...
    }

   This scenario is actually deadlock-free because, once Task A changes the
   frequency, it is Task B's responsibility to invoke the corresponding
   _end() for the 1st frequency transition. Hence it is perfectly legal for
   Task A to go ahead and attempt another frequency transition in the meantime.
   (Of course it won't be able to proceed until Task B finishes the 1st _end(),
   but this doesn't cause a deadlock or a hang).

   The debug infrastructure cannot handle this scenario and will treat it as
   a deadlock and print a warning. To avoid this, we exclude such drivers
   from the purview of this code.

2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
   at all! The cpufreq core does not automatically invoke the _begin() and
   _end() APIs during frequency transitions in such drivers. Thus, the driver
   alone is responsible for invoking _begin()/_end() and hence there shouldn't
   be any conflicts which lead to double invocations. So, we can skip these
   drivers, since the probability that such drivers will hit this problem is
   extremely low, as outlined above.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
---

Changes in v3:
Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
drivers are left out from the check, as suggested by Gautham R. Shenoy.
No code changes in this version.

v2: https://lkml.org/lkml/2014/4/29/283
v1: https://lkml.org/lkml/2014/4/28/469

 drivers/cpufreq/cpufreq.c |   14 ++++++++++++++
 include/linux/cpufreq.h   |    1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index abda660..a05c921 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -354,6 +354,18 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
 void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs)
 {
+
+	/*
+	 * Catch double invocations of _begin() which lead to self-deadlock.
+	 * ASYNC_NOTIFICATION drivers are left out because the cpufreq core
+	 * doesn't invoke _begin() on their behalf, and hence the chances of
+	 * double invocations are very low. Moreover, there are scenarios
+	 * where these checks can emit false-positive warnings in these
+	 * drivers; so we avoid that by skipping them altogether.
+	 */
+	WARN_ON(!(cpufreq_driver->flags & CPUFREQ_ASYNC_NOTIFICATION)
+				&& current == policy->transition_task);
+
 wait:
 	wait_event(policy->transition_wait, !policy->transition_ongoing);
 
@@ -365,6 +377,7 @@ wait:
 	}
 
 	policy->transition_ongoing = true;
+	policy->transition_task = current;
 
 	spin_unlock(&policy->transition_lock);
 
@@ -381,6 +394,7 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
 	cpufreq_notify_post_transition(policy, freqs, transition_failed);
 
 	policy->transition_ongoing = false;
+	policy->transition_task = NULL;
 
 	wake_up(&policy->transition_wait);
 }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5ae5100..8f44d79 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -110,6 +110,7 @@ struct cpufreq_policy {
 	bool			transition_ongoing; /* Tracks transition status */
 	spinlock_t		transition_lock;
 	wait_queue_head_t	transition_wait;
+	struct task_struct	*transition_task; /* Task which is doing the transition */
 };
 
 /* Only for ACPI */


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
  2014-05-05  7:22 [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end Srivatsa S. Bhat
@ 2014-05-05  8:53 ` Meelis Roos
  2014-05-05  9:09   ` Srivatsa S. Bhat
  0 siblings, 1 reply; 6+ messages in thread
From: Meelis Roos @ 2014-05-05  8:53 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: rjw, viresh.kumar, ego, cpufreq, linux-pm, linux-kernel

> 
> Changes in v3:
> Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
> drivers are left out from the check, as suggested by Gautham R. Shenoy.
> No code changes in this version.
> 
> v2: https://lkml.org/lkml/2014/4/29/283
> v1: https://lkml.org/lkml/2014/4/28/469

Seems to work on VIA EPIA with 3.15-rc2 (no other cpufreq fixes):

[    8.250959] ------------[ cut here ]------------
[    8.251179] WARNING: CPU: 0 PID: 116 at drivers/cpufreq/cpufreq.c:367 cpufreq_freq_transition_begin+0xd3/0xd8()
[    8.251400] Modules linked in: snd_via82xx snd_mpu401_uart longhaul snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device sg snd_pcm snd_timer vt8231 via_rhine uhci_hcd hwmon ehci_hcd tulip i2c_viapro snd evdev soundcore serio_raw sr_mod i2c_core mii usbcore gameport usb_common cdrom parport_pc fan parport via_agp button agpgart processor
[    8.253465] CPU: 0 PID: 116 Comm: kworker/0:1 Not tainted 3.15.0-rc2-dirty #40
[    8.253664] Hardware name: VIA TECHNOLOGIES, INC. EPIA/EPIA, BIOS 6.00 PG 11/02/2004
[    8.253873] Workqueue: events od_dbs_timer
[    8.254038]  00000000 00000000 cf6afd60 c1283096 cf6afd90 c101ff15 c135a90c 00000000
[    8.254574]  00000074 c138ccdc 0000016f c11eaaf0 c11eaaf0 cf542000 00061698 cf582c00
[    8.255118]  cf6afda0 c101ffa8 00000009 00000000 cf6afdd0 c11eaaf0 cf6afdd8 cf6afdf4
[    8.255661] Call Trace:
[    8.255815]  [<c1283096>] dump_stack+0x16/0x18
[    8.255988]  [<c101ff15>] warn_slowpath_common+0x70/0x87
[    8.256151]  [<c11eaaf0>] ? cpufreq_freq_transition_begin+0xd3/0xd8
[    8.256316]  [<c11eaaf0>] ? cpufreq_freq_transition_begin+0xd3/0xd8
[    8.256482]  [<c101ffa8>] warn_slowpath_null+0x1d/0x1f
[    8.256641]  [<c11eaaf0>] cpufreq_freq_transition_begin+0xd3/0xd8
[    8.256809]  [<c1036cba>] ? __srcu_notifier_call_chain+0x24/0x93
[    8.256989]  [<d06c52d5>] longhaul_setstate+0x88/0x2f1 [longhaul]
[    8.257149]  [<c1036d43>] ? srcu_notifier_call_chain+0x1a/0x1c
[    8.257315]  [<c11eaad9>] ? cpufreq_freq_transition_begin+0xbc/0xd8
[    8.257481]  [<d06c55ba>] longhaul_target+0x7c/0x8b [longhaul]
[    8.257647]  [<c11eaebd>] __cpufreq_driver_target+0xfe/0x148
[    8.257824]  [<c10540c6>] ? get_cpu_iowait_time_us+0x84/0xa3
[    8.257981]  [<c11ebe27>] od_check_cpu+0x75/0x79
[    8.258135]  [<c11ec7a1>] dbs_check_cpu+0xbd/0xc5
[    8.258286]  [<c11ec813>] ? need_load_eval+0x18/0x79
[    8.258439]  [<c11ec004>] od_dbs_timer+0x7a/0xd9
[    8.258612]  [<c102f808>] process_one_work+0x205/0x363
[    8.258770]  [<c102f7d7>] ? process_one_work+0x1d4/0x363
[    8.258927]  [<c10300b3>] ? worker_thread+0x27/0x26f
[    8.259086]  [<c103022c>] worker_thread+0x1a0/0x26f
[    8.259244]  [<c103008c>] ? rescuer_thread+0x204/0x204
[    8.259405]  [<c10339e9>] kthread+0xa3/0xa8
[    8.259567]  [<c1287bc0>] ret_from_kernel_thread+0x20/0x30
[    8.259726]  [<c1033946>] ? kthread_create_on_node+0x112/0x112
[    8.259874] ---[ end trace fe16863c964a8bbe ]---

-- 
Meelis Roos <mroos@linux.ee>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
  2014-05-05  8:53 ` Meelis Roos
@ 2014-05-05  9:09   ` Srivatsa S. Bhat
  2014-05-21 12:32     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 6+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-05  9:09 UTC (permalink / raw)
  To: Meelis Roos, rjw, viresh.kumar; +Cc: ego, cpufreq, linux-pm, linux-kernel

On 05/05/2014 02:23 PM, Meelis Roos wrote:
>>
>> Changes in v3:
>> Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
>> drivers are left out from the check, as suggested by Gautham R. Shenoy.
>> No code changes in this version.
>>
>> v2: https://lkml.org/lkml/2014/4/29/283
>> v1: https://lkml.org/lkml/2014/4/28/469
> 
> Seems to work on VIA EPIA with 3.15-rc2 (no other cpufreq fixes):
>

Great! Thanks a lot for testing this!

Rafael/Viresh, can you please add Meelis' Tested-by while taking
this patch?

Thank you!

Regards,
Srivatsa S. Bhat
 
> [    8.250959] ------------[ cut here ]------------
> [    8.251179] WARNING: CPU: 0 PID: 116 at drivers/cpufreq/cpufreq.c:367 cpufreq_freq_transition_begin+0xd3/0xd8()
> [    8.251400] Modules linked in: snd_via82xx snd_mpu401_uart longhaul snd_ac97_codec ac97_bus snd_rawmidi snd_seq_device sg snd_pcm snd_timer vt8231 via_rhine uhci_hcd hwmon ehci_hcd tulip i2c_viapro snd evdev soundcore serio_raw sr_mod i2c_core mii usbcore gameport usb_common cdrom parport_pc fan parport via_agp button agpgart processor
> [    8.253465] CPU: 0 PID: 116 Comm: kworker/0:1 Not tainted 3.15.0-rc2-dirty #40
> [    8.253664] Hardware name: VIA TECHNOLOGIES, INC. EPIA/EPIA, BIOS 6.00 PG 11/02/2004
> [    8.253873] Workqueue: events od_dbs_timer
> [    8.254038]  00000000 00000000 cf6afd60 c1283096 cf6afd90 c101ff15 c135a90c 00000000
> [    8.254574]  00000074 c138ccdc 0000016f c11eaaf0 c11eaaf0 cf542000 00061698 cf582c00
> [    8.255118]  cf6afda0 c101ffa8 00000009 00000000 cf6afdd0 c11eaaf0 cf6afdd8 cf6afdf4
> [    8.255661] Call Trace:
> [    8.255815]  [<c1283096>] dump_stack+0x16/0x18
> [    8.255988]  [<c101ff15>] warn_slowpath_common+0x70/0x87
> [    8.256151]  [<c11eaaf0>] ? cpufreq_freq_transition_begin+0xd3/0xd8
> [    8.256316]  [<c11eaaf0>] ? cpufreq_freq_transition_begin+0xd3/0xd8
> [    8.256482]  [<c101ffa8>] warn_slowpath_null+0x1d/0x1f
> [    8.256641]  [<c11eaaf0>] cpufreq_freq_transition_begin+0xd3/0xd8
> [    8.256809]  [<c1036cba>] ? __srcu_notifier_call_chain+0x24/0x93
> [    8.256989]  [<d06c52d5>] longhaul_setstate+0x88/0x2f1 [longhaul]
> [    8.257149]  [<c1036d43>] ? srcu_notifier_call_chain+0x1a/0x1c
> [    8.257315]  [<c11eaad9>] ? cpufreq_freq_transition_begin+0xbc/0xd8
> [    8.257481]  [<d06c55ba>] longhaul_target+0x7c/0x8b [longhaul]
> [    8.257647]  [<c11eaebd>] __cpufreq_driver_target+0xfe/0x148
> [    8.257824]  [<c10540c6>] ? get_cpu_iowait_time_us+0x84/0xa3
> [    8.257981]  [<c11ebe27>] od_check_cpu+0x75/0x79
> [    8.258135]  [<c11ec7a1>] dbs_check_cpu+0xbd/0xc5
> [    8.258286]  [<c11ec813>] ? need_load_eval+0x18/0x79
> [    8.258439]  [<c11ec004>] od_dbs_timer+0x7a/0xd9
> [    8.258612]  [<c102f808>] process_one_work+0x205/0x363
> [    8.258770]  [<c102f7d7>] ? process_one_work+0x1d4/0x363
> [    8.258927]  [<c10300b3>] ? worker_thread+0x27/0x26f
> [    8.259086]  [<c103022c>] worker_thread+0x1a0/0x26f
> [    8.259244]  [<c103008c>] ? rescuer_thread+0x204/0x204
> [    8.259405]  [<c10339e9>] kthread+0xa3/0xa8
> [    8.259567]  [<c1287bc0>] ret_from_kernel_thread+0x20/0x30
> [    8.259726]  [<c1033946>] ? kthread_create_on_node+0x112/0x112
> [    8.259874] ---[ end trace fe16863c964a8bbe ]---
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
  2014-05-05  9:09   ` Srivatsa S. Bhat
@ 2014-05-21 12:32     ` Srivatsa S. Bhat
  2014-05-21 13:01       ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-21 12:32 UTC (permalink / raw)
  To: Meelis Roos, rjw, viresh.kumar; +Cc: ego, cpufreq, linux-pm, linux-kernel

On 05/05/2014 02:39 PM, Srivatsa S. Bhat wrote:
> On 05/05/2014 02:23 PM, Meelis Roos wrote:
>>>
>>> Changes in v3:
>>> Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
>>> drivers are left out from the check, as suggested by Gautham R. Shenoy.
>>> No code changes in this version.
>>>
>>> v2: https://lkml.org/lkml/2014/4/29/283
>>> v1: https://lkml.org/lkml/2014/4/28/469
>>
>> Seems to work on VIA EPIA with 3.15-rc2 (no other cpufreq fixes):
>>
> 
> Great! Thanks a lot for testing this!
> 
> Rafael/Viresh, can you please add Meelis' Tested-by while taking
> this patch?

Rafael, any thoughts on this patch?

It has been Acked by Viresh and tested by Meelis. Hope you'll consider
picking up this patch for 3.16.

Thank you!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
  2014-05-21 13:01       ` Rafael J. Wysocki
@ 2014-05-21 12:48         ` Srivatsa S. Bhat
  0 siblings, 0 replies; 6+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-21 12:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Meelis Roos, viresh.kumar, ego, cpufreq, linux-pm, linux-kernel

On 05/21/2014 06:31 PM, Rafael J. Wysocki wrote:
> On Wednesday, May 21, 2014 06:02:51 PM Srivatsa S. Bhat wrote:
>> On 05/05/2014 02:39 PM, Srivatsa S. Bhat wrote:
>>> On 05/05/2014 02:23 PM, Meelis Roos wrote:
>>>>>
>>>>> Changes in v3:
>>>>> Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
>>>>> drivers are left out from the check, as suggested by Gautham R. Shenoy.
>>>>> No code changes in this version.
>>>>>
>>>>> v2: https://lkml.org/lkml/2014/4/29/283
>>>>> v1: https://lkml.org/lkml/2014/4/28/469
>>>>
>>>> Seems to work on VIA EPIA with 3.15-rc2 (no other cpufreq fixes):
>>>>
>>>
>>> Great! Thanks a lot for testing this!
>>>
>>> Rafael/Viresh, can you please add Meelis' Tested-by while taking
>>> this patch?
>>
>> Rafael, any thoughts on this patch?
>>
>> It has been Acked by Viresh and tested by Meelis. Hope you'll consider
>> picking up this patch for 3.16.
> 
> It should be in linux-next today AFAICS.
> 

Oops, I did check your tree before asking, but I'm not sure how I missed
this. Now I'm able to see the patch in your bleeding-edge branch. Sorry
for the noise!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
  2014-05-21 12:32     ` Srivatsa S. Bhat
@ 2014-05-21 13:01       ` Rafael J. Wysocki
  2014-05-21 12:48         ` Srivatsa S. Bhat
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2014-05-21 13:01 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Meelis Roos, viresh.kumar, ego, cpufreq, linux-pm, linux-kernel

On Wednesday, May 21, 2014 06:02:51 PM Srivatsa S. Bhat wrote:
> On 05/05/2014 02:39 PM, Srivatsa S. Bhat wrote:
> > On 05/05/2014 02:23 PM, Meelis Roos wrote:
> >>>
> >>> Changes in v3:
> >>> Expanded the comment in the code to briefly mention why ASYNC_NOTIFICATION
> >>> drivers are left out from the check, as suggested by Gautham R. Shenoy.
> >>> No code changes in this version.
> >>>
> >>> v2: https://lkml.org/lkml/2014/4/29/283
> >>> v1: https://lkml.org/lkml/2014/4/28/469
> >>
> >> Seems to work on VIA EPIA with 3.15-rc2 (no other cpufreq fixes):
> >>
> > 
> > Great! Thanks a lot for testing this!
> > 
> > Rafael/Viresh, can you please add Meelis' Tested-by while taking
> > this patch?
> 
> Rafael, any thoughts on this patch?
> 
> It has been Acked by Viresh and tested by Meelis. Hope you'll consider
> picking up this patch for 3.16.

It should be in linux-next today AFAICS.

Rafael


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-21 13:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-05  7:22 [PATCH v3] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end Srivatsa S. Bhat
2014-05-05  8:53 ` Meelis Roos
2014-05-05  9:09   ` Srivatsa S. Bhat
2014-05-21 12:32     ` Srivatsa S. Bhat
2014-05-21 13:01       ` Rafael J. Wysocki
2014-05-21 12:48         ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).