From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Pallipadi, Venkatesh" Subject: Re: [Bug #13475] suspend/hibernate lockdep warning Date: Tue, 16 Jun 2009 17:39:25 -0700 Message-ID: <20090617003925.GA3900@linux-os.sc.intel.com> References: <84144f020906070621r1f480eaeief026d23662df380@mail.gmail.com> <1244447366.13471.4.camel@penberg-laptop> <20090608124844.GA17588@Krystal> <20090608143220.GC2516@redhat.com> <1244727561.5350.32.camel@odie.local> <20090611152329.GB28099@Krystal> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <20090611152329.GB28099@Krystal> Sender: kernel-testers-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: Mathieu Desnoyers Cc: Simon Holm =?iso-8859-1?Q?Th=F8gersen?= , Dave Jones , Pekka Enberg , Dave Young , "Rafael J. Wysocki" , Linux Kernel Mailing List , Kernel Testers List , "cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Rusty Russell , "trenn-l3A5Bk7waGM@public.gmane.org" , "sven.wegener-sQQoR7IzGU7R7s880joybQ@public.gmane.org" , "Pallipadi, Venkatesh" On Thu, Jun 11, 2009 at 08:23:29AM -0700, Mathieu Desnoyers wrote: > * Simon Holm Th=F8gersen (odie-t5LvXY1cjzpaa/9Udqfwiw@public.gmane.org) wrote: > > man, 08 06 2009 kl. 10:32 -0400, skrev Dave Jones:=20 > > > On Mon, Jun 08, 2009 at 08:48:45AM -0400, Mathieu Desnoyers wrote= : > > > =20 > > > > > > >> Bug-Entry : http://bugzilla.kernel.org/show_bug.c= gi?id=3D13475 > > > > > > >> Subject : suspend/hibernate lockdep warning > > > > > > >> References : http://marc.info/?l=3Dlinux-kernel&m=3D= 124393723321241&w=3D4 > > > > > >=20 > > > > > > I suspect the following commit, after revert this patch I = test 5 times > > > > > > without lockdep warnings. > > > > > >=20 > > > > > > commit b14893a62c73af0eca414cfed505b8c09efc613c > > > > > > Author: Mathieu Desnoyers > > > > > > Date: Sun May 17 10:30:45 2009 -0400 > > > > > >=20 > > > > > > [CPUFREQ] fix timer teardown in ondemand governor > > > > >=20 > > > > > The patch is probably not at fault here. I suspect it's some= latent bug > > > > > that simply got exposed by the change to cancel_delayed_work= _sync(). In > > > > > any case, Mathieu, can you take a look at this please? > > > >=20 > > > > Yes, it's been looked at and discussed on the cpufreq ML. The = short > > > > answer is that they plan to re-engineer cpufreq and remove the= policy > > > > rwlock taken around almost every operations at the cpufreq lev= el. > > > >=20 > > > > The short-term solution, which is recognised as ugly, would be= do to the > > > > following before doing the cancel_delayed_work_sync() : > > > >=20 > > > > unlock policy rwlock write lock > > > >=20 > > > > lock policy rwlock write lock > > > >=20 > > > > It basically works because this rwlock is unneeded for teardow= n, hence > > > > the future re-work planned. > > > >=20 > > > > I'm sorry I cannot prepare a patch current... I've got quite a= few pages > > > > of Ph.D. thesis due for the beginning of July. > > > =20 > > > I'm kinda scared to touch this code at all for .30 due to the num= ber of > > > unexpected gotchas we seem to run into every time we touch someth= ing > > > locking related. So I'm inclined to just live with the lockdep w= arning > > > for .30, and see how the real fixes look for .31, and push them b= ack > > > as -stable updates if they work out. > >=20 > > Unfortunately I don't think it is just theoretical, I've actually h= it > > the following (that haven't got anything to do with suspend/hiberna= te) > >=20 > > INFO: task cpufreqd:4676 blocked for more than 120 seconds. > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this m= essage. > > cpufreqd D eee2ac60 0 4676 1 > > ee01bd68 00000086 eee2aad0 eee2ac60 00000533 eee2aad0 eee2ac60 00= 02b16f > > 00000000 eee2ac60 7fffffff 7fffffff eee2ac60 7fffffff 7fffffff 00= 000000 > > ee01bd70 c03117ee ee01bdbc c0311c0c eee2aad0 eecf6900 eee2aad0 ee= cf6900 > > Call Trace: > > [] schedule+0x12/0x24 > > [] schedule_timeout+0x17/0x170 > > [] ? __wake_up+0x2b/0x51 > > [] wait_for_common+0xc4/0x135 > > [] ? default_wake_function+0x0/0xd > > [] wait_for_completion+0x12/0x14 > > [] __cancel_work_timer+0xfe/0x129 > > [] ? wq_barrier_func+0x0/0xd > > [] cancel_delayed_work_sync+0xb/0xd > > [] cpufreq_governor_dbs+0x22e/0x291 [cpufreq_ondemand] > > [] __cpufreq_governor+0x65/0x9d > > [] __cpufreq_set_policy+0xd1/0x11f > > [] store_scaling_governor+0x18a/0x1b2 > > [] ? handle_update+0x0/0xd > > [] ? store_scaling_governor+0x0/0x1b2 > > [] store+0x48/0x61 > > [] sysfs_write_file+0xb4/0xdf > > [] ? sysfs_write_file+0x0/0xdf > > [] vfs_write+0x8a/0x104 > > [] sys_write+0x3b/0x60 > > [] sysenter_do_call+0x12/0x2c > > INFO: task kondemand/0:4956 blocked for more than 120 seconds. > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this m= essage. > > kondemand/0 D 00000533 0 4956 2 > > ee1d9efc 00000046 c011815f 00000533 071148de ee1e0080 ee1e0210 00= 000000 > > c03ff478 9189e633 00000082 c03ff478 ee1e0210 c04159f4 c04159f0 00= 000000 > > ee1d9f04 c03117ee ee1d9f28 c0313104 ee1d9f30 c04159f4 ee1e0080 c0= 1183be > > Call Trace: > > [] ? update_curr+0x6c/0x14b > > [] schedule+0x12/0x24 > > [] rwsem_down_failed_common+0x150/0x16e > > [] ? dequeue_task_fair+0x51/0x56 > > [] rwsem_down_write_failed+0x1b/0x23 > > [] call_rwsem_down_write_failed+0x6/0x8 > > [] ? down_write+0x14/0x16 > > [] lock_policy_rwsem_write+0x1d/0x33 > > [] do_dbs_timer+0x45/0x266 [cpufreq_ondemand] > > [] worker_thread+0x165/0x212 > > [] ? do_dbs_timer+0x0/0x266 [cpufreq_ondemand] > > [] ? autoremove_wake_function+0x0/0x33 > > [] ? worker_thread+0x0/0x212 > > [] kthread+0x42/0x67 > > [] ? kthread+0x0/0x67 > > [] kernel_thread_helper+0x7/0x10 > >=20 > > I've only seen it once in 5 boots and CONFIG_PROVELOCKING does not = give any > > warnings about this, though it does yell when switching governor as= reported > > by others in bug #13493. > >=20 > > Let's hope Mathieu nails it, though I know he's busy with his thesi= s. > >=20 >=20 > Thanks for the lockdep reports, >=20 > I'm currently looking into it, and it's not pretty. Basically we have= : >=20 > A > B > (means B nested in A) >=20 > work > read rwlock policy >=20 > dbs_mutex > work > read rwlock policy >=20 > write rwlock policy > dbs_mutex >=20 > So the added dbs_mutex <- work <- rwlock policy dependency (for prope= r > teardown) is firing the reverse dependency between policy rwlock and > dbs_mutex. >=20 > The real way to fix this is to do not take the rwlock policy around > non-policy-related actions, like governor START/STOP doing worker > creation/teardown. >=20 > One simple short-term solution would be to take a mutex outside of th= e > policy rwlock write lock in cpufreq.c. This mutex would be the > equivalent of dbs_mutex "lifted" outside of the rwlock write lock. Fo= r > teardown, we only need to hold this mutex, not the rwlock write lock. > Then we can remove the dbs_mutex from the governors. >=20 > But looking at cpufreq.c's cpufreq_add_dev() is very much like kickin= g a > wasp nest: a lot of error paths are not handled properly, and I fear > someone will have to go through the code, fix the currently incorrect > code paths, and then add the lifted mutex. >=20 > I currently have no time for implementation due to my thesis, but I'l= l > be happy to review a patch. >=20 How about below patch on top of Mathieu's patch here http://marc.info/?l=3Dlinux-kernel&m=3D124448150529838&w=3D2 [PATCH] cpufreq: Eliminate lockdep issue with dbs_mutex and policy_rwse= m This removes the unneeded dependency of=20 write rwlock policy dbs_mutex dbs_mutex does not have anything to do with timer_init and timer_exit. = It is just to protect dbs tunables in sysfs cpufreq/ondemand and is not needed to be held during timer init, exit as well as during governor li= mit changes. Signed-off-by: Venkatesh Pallipadi --- drivers/cpufreq/cpufreq_ondemand.c | 8 +++----- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufr= eq_ondemand.c index e741c33..1c94ff5 100644 --- a/drivers/cpufreq/cpufreq_ondemand.c +++ b/drivers/cpufreq/cpufreq_ondemand.c @@ -352,8 +352,8 @@ static ssize_t store_powersave_bias(struct cpufreq_= policy *unused, =20 mutex_lock(&dbs_mutex); dbs_tuners_ins.powersave_bias =3D input; - ondemand_powersave_bias_init(); mutex_unlock(&dbs_mutex); + ondemand_powersave_bias_init(); =20 return count; } @@ -626,14 +626,14 @@ static int cpufreq_governor_dbs(struct cpufreq_po= licy *policy, =20 dbs_tuners_ins.sampling_rate =3D def_sampling_rate; } + mutex_unlock(&dbs_mutex); dbs_timer_init(this_dbs_info); =20 - mutex_unlock(&dbs_mutex); break; =20 case CPUFREQ_GOV_STOP: - mutex_lock(&dbs_mutex); dbs_timer_exit(this_dbs_info); + mutex_lock(&dbs_mutex); sysfs_remove_group(&policy->kobj, &dbs_attr_group); dbs_enable--; mutex_unlock(&dbs_mutex); @@ -641,14 +641,12 @@ static int cpufreq_governor_dbs(struct cpufreq_po= licy *policy, break; =20 case CPUFREQ_GOV_LIMITS: - mutex_lock(&dbs_mutex); if (policy->max < this_dbs_info->cur_policy->cur) __cpufreq_driver_target(this_dbs_info->cur_policy, policy->max, CPUFREQ_RELATION_H); else if (policy->min > this_dbs_info->cur_policy->cur) __cpufreq_driver_target(this_dbs_info->cur_policy, policy->min, CPUFREQ_RELATION_L); - mutex_unlock(&dbs_mutex); break; } return 0; --=20 1.6.0.6