* cpufreq ondemand governor debugobjects warning [not found] <52BCEA76.1020906@marvell.com> @ 2013-12-27 3:06 ` Jane Li 2013-12-27 5:31 ` Viresh Kumar 2013-12-27 8:01 ` Viresh Kumar 0 siblings, 2 replies; 5+ messages in thread From: Jane Li @ 2013-12-27 3:06 UTC (permalink / raw) To: rjw, viresh.kumar; +Cc: cpufreq Hi All, When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Sometimes the delayed work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its own work but queue up the other CPUs works to run. Commit a11a35(cpufreq: Fix timer/workqueue corruption due to double queueing) has tried to fix this, but reading governor_enabled is not protected by cpufreq_governor_lock. Even though od_dbs_timer() checks governor_enabled before gov_queue_work(), this scenario may occur. For example CPU0 CPU1 ---- ---- cpu_down() ... <work runs> __cpufreq_remove_dev() od_dbs_timer() __cpufreq_governor() policy->governor_enabled policy->governor_enabled = false; cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: gov_cancel_work(dbs_data, policy); cpu0 work is canceled timer is canceled cpu1 work is canceled <waits for cpu1> gov_queue_work(*, *, true); cpu0 work queued cpu1 work queued cpu2 work queued ... cpu1 work is canceled cpu2 work is canceled ... If this occurs, debugobjects will split out a warning: WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc() ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14 Modules linked in: galcore CPU: 1 PID: 1205 Comm: sh Tainted: G W 3.10.0 #200 [<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14) [<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68) [<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40) [<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc) [<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104) [<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84) [<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48) [<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258) [<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c) [<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74) [<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24) [<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180) [<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184) [<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68) [<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48) When gov_queue_work(), governor_enabled may be modified. Following patch can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think this is a good idea? diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index e6be635..a27246c 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, unsigned int delay, bool all_cpus) { int i; - - if (!policy->governor_enabled) + mutex_lock(&cpufreq_governor_lock); + if (!policy->governor_enabled) { + mutex_unlock(&cpufreq_governor_lock); return; + } if (!all_cpus) { /* @@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy, for_each_cpu(i, policy->cpus) __gov_queue_work(i, dbs_data, delay); } + mutex_unlock(&cpufreq_governor_lock); } EXPORT_SYMBOL_GPL(gov_queue_work); ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: cpufreq ondemand governor debugobjects warning 2013-12-27 3:06 ` cpufreq ondemand governor debugobjects warning Jane Li @ 2013-12-27 5:31 ` Viresh Kumar 2013-12-27 6:18 ` Jane Li 2013-12-27 8:01 ` Viresh Kumar 1 sibling, 1 reply; 5+ messages in thread From: Viresh Kumar @ 2013-12-27 5:31 UTC (permalink / raw) To: Jane Li; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote: > When gov_queue_work(), governor_enabled may be modified. Following patch > can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this > way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think > this is a good idea? gov_queue_work() is called with timer_mutex on. Can you try placing this lock around gov_cancel_work() and see if it resolves issues you have reported ? -- viresh ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cpufreq ondemand governor debugobjects warning 2013-12-27 5:31 ` Viresh Kumar @ 2013-12-27 6:18 ` Jane Li 0 siblings, 0 replies; 5+ messages in thread From: Jane Li @ 2013-12-27 6:18 UTC (permalink / raw) To: Viresh Kumar; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org On 12/27/2013 01:31 PM, Viresh Kumar wrote: > On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote: >> When gov_queue_work(), governor_enabled may be modified. Following patch >> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this >> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think >> this is a good idea? > gov_queue_work() is called with timer_mutex on. Can you try placing this > lock around gov_cancel_work() and see if it resolves issues you have reported ? > > -- > viresh There may be deadlock in this way. The existing dependency chain (in reverse order) is: -> #1 (&j_cdbs->timer_mutex){+.+.+.}: [<c017ba0c>] lock_acquire+0x9c/0x150 [<c069d6d0>] mutex_lock_nested+0x50/0x3d8 [<c04849c0>] od_dbs_timer+0x3c/0x130 [<c0143b84>] process_one_work+0x1b8/0x518 [<c0144024>] worker_thread+0x140/0x3f0 [<c014a708>] kthread+0xa4/0xb0 [<c010e2c8>] ret_from_fork+0x14/0x2c -> #0 ((&(&j_cdbs->work)->work)){+.+...}: [<c017af18>] __lock_acquire+0x171c/0x1c64 [<c017ba0c>] lock_acquire+0x9c/0x150 [<c01432e8>] flush_work+0x3c/0x2a0 [<c0144b90>] __cancel_work_timer+0x90/0x138 [<c0485e90>] cpufreq_governor_dbs+0x528/0x6a4 [<c0481c40>] __cpufreq_governor+0x80/0x1b0 [<c0481dd8>] __cpufreq_remove_dev.isra.12+0x68/0x380 [<c0696eac>] cpufreq_cpu_callback+0x7c/0x90 [<c014fff8>] notifier_call_chain+0x44/0x84 [<c012b09c>] __cpu_notify+0x2c/0x48 [<c0691c98>] _cpu_down+0x80/0x258 [<c0691e98>] cpu_down+0x28/0x3c [<c0692418>] store_online+0x30/0x74 [<c03a4400>] dev_attr_store+0x18/0x24 [<c0257498>] sysfs_write_file+0x100/0x180 [<c01ff154>] vfs_write+0xbc/0x184 [<c01ff4ec>] SyS_write+0x40/0x68 [<c010e200>] ret_fast_syscall+0x0/0x48 CPU0 CPU1 ---- ---- lock(&j_cdbs->timer_mutex); lock((&(&j_cdbs->work)->work)); lock(&j_cdbs->timer_mutex); lock((&(&j_cdbs->work)->work)); diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index e6be635..ff14647 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -351,7 +351,9 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, if (dbs_data->cdata->governor == GOV_CONSERVATIVE) cs_dbs_info->enable = 0; + mutex_lock(&cpu_cdbs->timer_mutex); gov_cancel_work(dbs_data, policy); + mutex_unlock(&cpu_cdbs->timer_mutex); mutex_lock(&dbs_data->mutex); mutex_destroy(&cpu_cdbs->timer_mutex); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cpufreq ondemand governor debugobjects warning 2013-12-27 3:06 ` cpufreq ondemand governor debugobjects warning Jane Li 2013-12-27 5:31 ` Viresh Kumar @ 2013-12-27 8:01 ` Viresh Kumar 2013-12-27 9:35 ` Jane Li 1 sibling, 1 reply; 5+ messages in thread From: Viresh Kumar @ 2013-12-27 8:01 UTC (permalink / raw) To: Jane Li; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote: > When gov_queue_work(), governor_enabled may be modified. Following patch > can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this > way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think > this is a good idea? I don't see a alternative than this solution after the deadlock issue with timer lock. > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index e6be635..a27246c 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data, > struct cpufreq_policy *policy, > unsigned int delay, bool all_cpus) > { > int i; > - don't remove this. > - if (!policy->governor_enabled) > + mutex_lock(&cpufreq_governor_lock); > + if (!policy->governor_enabled) { > + mutex_unlock(&cpufreq_governor_lock); > return; > + } > > if (!all_cpus) { > /* > @@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data, > struct cpufreq_policy *policy, > for_each_cpu(i, policy->cpus) > __gov_queue_work(i, dbs_data, delay); > } > + mutex_unlock(&cpufreq_governor_lock); > } > EXPORT_SYMBOL_GPL(gov_queue_work); You also need to remove 'static' from lock's declaration. -- viresh ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: cpufreq ondemand governor debugobjects warning 2013-12-27 8:01 ` Viresh Kumar @ 2013-12-27 9:35 ` Jane Li 0 siblings, 0 replies; 5+ messages in thread From: Jane Li @ 2013-12-27 9:35 UTC (permalink / raw) To: Viresh Kumar; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org On 12/27/2013 04:01 PM, Viresh Kumar wrote: > On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote: >> When gov_queue_work(), governor_enabled may be modified. Following patch >> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this >> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think >> this is a good idea? > I don't see a alternative than this solution after the deadlock issue with timer > lock. > >> diff --git a/drivers/cpufreq/cpufreq_governor.c >> b/drivers/cpufreq/cpufreq_governor.c >> index e6be635..a27246c 100644 >> --- a/drivers/cpufreq/cpufreq_governor.c >> +++ b/drivers/cpufreq/cpufreq_governor.c >> @@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data, >> struct cpufreq_policy *policy, >> unsigned int delay, bool all_cpus) >> { >> int i; >> - > don't remove this. > >> - if (!policy->governor_enabled) >> + mutex_lock(&cpufreq_governor_lock); >> + if (!policy->governor_enabled) { >> + mutex_unlock(&cpufreq_governor_lock); >> return; >> + } >> >> if (!all_cpus) { >> /* >> @@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data, >> struct cpufreq_policy *policy, >> for_each_cpu(i, policy->cpus) >> __gov_queue_work(i, dbs_data, delay); >> } >> + mutex_unlock(&cpufreq_governor_lock); >> } >> EXPORT_SYMBOL_GPL(gov_queue_work); > You also need to remove 'static' from lock's declaration. > > -- > viresh Thanks. I have pushed one patch named "[PATCH] cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled". ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-12-27 9:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <52BCEA76.1020906@marvell.com>
2013-12-27 3:06 ` cpufreq ondemand governor debugobjects warning Jane Li
2013-12-27 5:31 ` Viresh Kumar
2013-12-27 6:18 ` Jane Li
2013-12-27 8:01 ` Viresh Kumar
2013-12-27 9:35 ` Jane Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox