lockdep warnings: cpufreq ondemand gorvernor possibly circular locking

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* lockdep warnings: cpufreq ondemand gorvernor possibly circular  locking
@ 2009-05-10 15:22 KOSAKI Motohiro
  2009-05-10 18:04 ` Andrew Morton
  2009-05-10 19:12 ` [RFC patch] cpufreq: fix circular locking in teardown Mathieu Desnoyers
  0 siblings, 2 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 15:22 UTC (permalink / raw)
  To: LKML, Mathieu Desnoyers, Greg KH, Ingo Molnar, Rafael J. Wysocki,
	Ben Slusky, Dave Jones, Chris Wright, Andrew Morton

Hi

my box output following warnings.
it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.

A: work -> do_dbs_timer()  -> cpu_policy_rwsem
B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work



=======================================================
nd cpu frequency[ INFO: possible circular locking dependency detected ]
 scaling: 2.6.30-rc4-mm1 #26
-------------------------------------------------------
K99cpuspeed/227488 is trying to acquire lock:
 (&(&dbs_info->work)->work){+.+...}, at: [<ffffffff81055bfd>]
__cancel_work_timer+0xde/0x227

but task is already holding lock:
 (dbs_mutex){+.+.+.}, at: [<ffffffffa0081af3>]
cpufreq_governor_dbs+0x241/0x2d2 [cpufreq_ondemand]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (dbs_mutex){+.+.+.}:
       [<ffffffff8106a15a>] __lock_acquire+0xa9d/0xc33
       [<ffffffff8106a3b1>] lock_acquire+0xc1/0xe5
       [<ffffffff81303aa5>] __mutex_lock_common+0x4d/0x34c
       [<ffffffff81303e5c>] mutex_lock_nested+0x3a/0x3f
       [<ffffffffa008193d>] cpufreq_governor_dbs+0x8b/0x2d2 [cpufreq_ondemand]	
       [<ffffffff812678bd>] __cpufreq_governor+0xa7/0xe4
       [<ffffffff81267acf>] __cpufreq_set_policy+0x19a/0x216
       [<ffffffff81268572>] store_scaling_governor+0x1ec/0x228
       [<ffffffff812691fb>] store+0x67/0x8c					
       [<ffffffff81129091>] sysfs_write_file+0xe9/0x11e				
       [<ffffffff810e64d8>] vfs_write+0xb0/0x10a
       [<ffffffff810e6600>] sys_write+0x4c/0x74
       [<ffffffff8100bc1b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
       [<ffffffff8106a15a>] __lock_acquire+0xa9d/0xc33
       [<ffffffff8106a3b1>] lock_acquire+0xc1/0xe5
       [<ffffffff813040d8>] down_write+0x4d/0x81
       [<ffffffff81268ad3>] lock_policy_rwsem_write+0x4d/0x7d			
       [<ffffffffa0081692>] do_dbs_timer+0x64/0x284 [cpufreq_ondemand]
       [<ffffffff81055395>] worker_thread+0x205/0x318				
       [<ffffffff8105933f>] kthread+0x8d/0x95
       [<ffffffff8100ccfa>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&(&dbs_info->work)->work){+.+...}:
       [<ffffffff8106a04e>] __lock_acquire+0x991/0xc33
       [<ffffffff8106a3b1>] lock_acquire+0xc1/0xe5
       [<ffffffff81055c36>] __cancel_work_timer+0x117/0x227
       [<ffffffff81055d58>] cancel_delayed_work_sync+0x12/0x14				
       [<ffffffffa0081b06>] cpufreq_governor_dbs+0x254/0x2d2
[cpufreq_ondemand]
       [<ffffffff812678bd>] __cpufreq_governor+0xa7/0xe4
       [<ffffffff81267ab9>] __cpufreq_set_policy+0x184/0x216
       [<ffffffff81268572>] store_scaling_governor+0x1ec/0x228
       [<ffffffff812691fb>] store+0x67/0x8c						
       [<ffffffff81129091>] sysfs_write_file+0xe9/0x11e				
       [<ffffffff810e64d8>] vfs_write+0xb0/0x10a
       [<ffffffff810e6600>] sys_write+0x4c/0x74
       [<ffffffff8100bc1b>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

3 locks held by K99cpuspeed/227488:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81128fe5>]
sysfs_write_file+0x3d/0x11e
 #1:  (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at:
[<ffffffff81268ad3>] lock_policy_rwsem_write+0x4d/0x7d
 #2:  (dbs_mutex){+.+.+.}, at: [<ffffffffa0081af3>]
cpufreq_governor_dbs+0x241/0x2d2 [cpufreq_ondemand]

stack backtrace:
Pid: 227488, comm: K99cpuspeed Not tainted 2.6.30-rc4-mm1 #26
Call Trace:
 [<ffffffff81069347>] print_circular_bug_tail+0x71/0x7c
 [<ffffffff8106a04e>] __lock_acquire+0x991/0xc33
 [<ffffffff8104e2f6>] ? lock_timer_base+0x2b/0x4f
 [<ffffffff8106a3b1>] lock_acquire+0xc1/0xe5
 [<ffffffff81055bfd>] ? __cancel_work_timer+0xde/0x227
 [<ffffffff81055c36>] __cancel_work_timer+0x117/0x227		
 [<ffffffff81055bfd>] ? __cancel_work_timer+0xde/0x227
 [<ffffffff81068a22>] ? mark_held_locks+0x4d/0x6b
 [<ffffffff81303d5a>] ? __mutex_lock_common+0x302/0x34c
 [<ffffffffa0081af3>] ? cpufreq_governor_dbs+0x241/0x2d2 [cpufreq_ondemand]	
 [<ffffffff81068a22>] ? mark_held_locks+0x4d/0x6b
 [<ffffffffa0081af3>] ? cpufreq_governor_dbs+0x241/0x2d2 [cpufreq_ondemand]
 [<ffffffff8100b956>] ? ftrace_call+0x5/0x2b
 [<ffffffff81055d58>] cancel_delayed_work_sync+0x12/0x14
 [<ffffffffa0081b06>] cpufreq_governor_dbs+0x254/0x2d2 [cpufreq_ondemand]
 [<ffffffff8105ce4d>] ? up_read+0x2b/0x2f
 [<ffffffff812678bd>] __cpufreq_governor+0xa7/0xe4
 [<ffffffff81267ab9>] __cpufreq_set_policy+0x184/0x216
 [<ffffffff81268572>] store_scaling_governor+0x1ec/0x228
 [<ffffffff81269354>] ? handle_update+0x0/0x39
 [<ffffffff812691fb>] store+0x67/0x8c
 [<ffffffff81129091>] sysfs_write_file+0xe9/0x11e
 [<ffffffff810e64d8>] vfs_write+0xb0/0x10a
 [<ffffffff810e6600>] sys_write+0x4c/0x74
 [<ffffffff8100bc1b>] system_call_fastpath+0x16/0x1b
[  OK  ]
Sending all processes th

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: lockdep warnings: cpufreq ondemand gorvernor possibly circular locking
  2009-05-10 15:22 lockdep warnings: cpufreq ondemand gorvernor possibly circular locking KOSAKI Motohiro
@ 2009-05-10 18:04 ` Andrew Morton
  2009-05-10 23:13   ` KOSAKI Motohiro
  2009-05-10 19:12 ` [RFC patch] cpufreq: fix circular locking in teardown Mathieu Desnoyers
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2009-05-10 18:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Mathieu Desnoyers, Greg KH, Ingo Molnar, Rafael J. Wysocki,
	Ben Slusky, Dave Jones, Chris Wright

On Mon, 11 May 2009 00:22:26 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
> 
> A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work

I can't find any commit which matches
7ccc7608b836e58fbacf65ee4f8eefa288e86fac


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC patch] cpufreq: fix circular locking in teardown
  2009-05-10 15:22 lockdep warnings: cpufreq ondemand gorvernor possibly circular locking KOSAKI Motohiro
  2009-05-10 18:04 ` Andrew Morton
@ 2009-05-10 19:12 ` Mathieu Desnoyers
  1 sibling, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers @ 2009-05-10 19:12 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Greg KH, Ingo Molnar, Rafael J. Wysocki, Ben Slusky,
	Dave Jones, Chris Wright, Andrew Morton

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
> 
> my box output following warnings.
> it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
> 
> A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
> 
> 

Hrm, I think it must be due to my attempt to fix the timer teardown race
in ondemand governor mixed with new locking behavior in 2.6.30-rc.

The rwlock seems to be taken around the whole call to
cpufreq_governor_dbs(), when it should be only taken around accesses to
the locked data, and especially *not* around the call to
dbs_timer_exit().

Reverting my fix attempt would put the teardown race back in place
(replacing the cancel_delayed_work_sync by cancel_delayed_work).
Instead, a proper fix would imply modifying this critical section :

cpufreq.c: __cpufreq_remove_dev()
...
        if (cpufreq_driver->target)
                __cpufreq_governor(data, CPUFREQ_GOV_STOP);

        unlock_policy_rwsem_write(cpu);

To make sure the __cpufreq_governor() callback is not called with rwsem
held. This would allow execution of cancel_delayed_work_sync() without
being nested within the rwsem.

Here is a first cut at a RFC patch for cpufreq.c locking. This is 
currently untested.

Applies on top of the 2.6.30-rc5 tree with

cpufreq-fix-timer-teardown-in-conservative-governor.patch
cpufreq-fix-timer-teardown-in-ondemand-governor.patch

already applied. Should fix circular dep in teardown of both conservative and
ondemande governors. At a first glance, CPUFREQ_GOV_STOP does not seem to modify
the policy, therefore this locking seemed unneeded.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Greg KH <greg@kroah.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Rafael J. Wysocki" <rjw@sisk.pl>
CC: Ben Slusky <sluskyb@paranoiacs.org>
CC: Dave Jones <davej@redhat.com>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 drivers/cpufreq/cpufreq.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-lttng/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6-lttng.orig/drivers/cpufreq/cpufreq.c	2009-05-10 14:41:53.000000000 -0400
+++ linux-2.6-lttng/drivers/cpufreq/cpufreq.c	2009-05-10 14:42:29.000000000 -0400
@@ -1070,11 +1070,11 @@ static int __cpufreq_remove_dev(struct s
 	spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
 #endif
 
+	unlock_policy_rwsem_write(cpu);
+
 	if (cpufreq_driver->target)
 		__cpufreq_governor(data, CPUFREQ_GOV_STOP);
 
-	unlock_policy_rwsem_write(cpu);
-
 	kobject_put(&data->kobj);
 
 	/* we need to make sure that the underlying kobj is actually

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: lockdep warnings: cpufreq ondemand gorvernor possibly circular locking
  2009-05-10 18:04 ` Andrew Morton
@ 2009-05-10 23:13   ` KOSAKI Motohiro
  0 siblings, 0 replies; 4+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 23:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, LKML, Mathieu Desnoyers, Greg KH, Ingo Molnar,
	Rafael J. Wysocki, Ben Slusky, Dave Jones, Chris Wright

> On Mon, 11 May 2009 00:22:26 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > my box output following warnings.
> > it seems regression by commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac.
> > 
> > A: work -> do_dbs_timer()  -> cpu_policy_rwsem
> > B: store() -> cpu_policy_rwsem -> cpufreq_governor_dbs() -> work
> 
> I can't find any commit which matches
> 7ccc7608b836e58fbacf65ee4f8eefa288e86fac

Grr, sorry. it's mmotm only commit.

filename is cpufreq-fix-timer-teardown-in-ondemand-governor.patch.

commit 7ccc7608b836e58fbacf65ee4f8eefa288e86fac
Author: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Date:   Tue Apr 28 20:50:21 2009 +0200

    The problem is that dbs_timer_exit() uses cancel_delayed_work() when it
    should use cancel_delayed_work_sync().  cancel_delayed_work() does not
    wait for the workqueue handler to exit.

    The ondemand governor does not "seem" to be affected (read : race
    condition occurs very rarely) because the "if (!dbs_info->enable)" check
    at the beginning of the workqueue handler returns immediately without
    rescheduling the work.  The conservative governor in 2.6.30-rc has the
    same check as the ondemand governor, which makes things usually run
    smoothly.  However, if the governor is quickly stopped and then started,
    this could lead to the following race :

    dbs_enable could be reenabled and multiple do_dbs_timer handlers would
    run.  This is why a synchronized teardown is required.

    The patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-05-10 23:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-10 15:22 lockdep warnings: cpufreq ondemand gorvernor possibly circular locking KOSAKI Motohiro
2009-05-10 18:04 ` Andrew Morton
2009-05-10 23:13   ` KOSAKI Motohiro
2009-05-10 19:12 ` [RFC patch] cpufreq: fix circular locking in teardown Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox