cpufreq ondemand governor debugobjects warning

public inbox for cpufreq@vger.kernel.org
 help / color / mirror / Atom feed

* cpufreq ondemand governor debugobjects warning
       [not found] <52BCEA76.1020906@marvell.com>
@ 2013-12-27  3:06 ` Jane Li
  2013-12-27  5:31   ` Viresh Kumar
  2013-12-27  8:01   ` Viresh Kumar
  0 siblings, 2 replies; 5+ messages in thread
From: Jane Li @ 2013-12-27  3:06 UTC (permalink / raw)
  To: rjw, viresh.kumar; +Cc: cpufreq

Hi All,

When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Sometimes the delayed work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its
own work but queue up the other CPUs works to run.

Commit a11a35(cpufreq: Fix timer/workqueue corruption due to double queueing) has tried to fix this, but reading governor_enabled is not protected by cpufreq_governor_lock. Even though od_dbs_timer() checks governor_enabled before gov_queue_work(), this scenario may occur. For example

   CPU0                                        CPU1
   ----                                        ----
   cpu_down()
    ...                                        <work runs>
    __cpufreq_remove_dev()                     od_dbs_timer()
     __cpufreq_governor()                       policy->governor_enabled
      policy->governor_enabled = false;
      cpufreq_governor_dbs()
       case CPUFREQ_GOV_STOP:
        gov_cancel_work(dbs_data, policy);
         cpu0 work is canceled
          timer is canceled
          cpu1 work is canceled
          <waits for cpu1>
                                                gov_queue_work(*, *, true);
                                                 cpu0 work queued
                                                 cpu1 work queued
                                                 cpu2 work queued
                                                 ...
          cpu1 work is canceled
          cpu2 work is canceled
          ...


If this occurs, debugobjects will split out a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
Modules linked in: galcore
CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
[<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
[<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
[<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
[<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
[<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
[<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
[<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
[<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
[<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
[<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
[<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
[<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
[<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
[<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
[<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
[<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
[<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
[<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
[<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
[<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)


When gov_queue_work(), governor_enabled may be modified.  Following patch can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think this is a good idea?

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index e6be635..a27246c 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy,
                   unsigned int delay, bool all_cpus)
    {
           int i;
-
-       if (!policy->governor_enabled)
+       mutex_lock(&cpufreq_governor_lock);
+       if (!policy->governor_enabled) {
+               mutex_unlock(&cpufreq_governor_lock);
                   return;
+       }

           if (!all_cpus) {
                   /*
@@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
struct cpufreq_policy *policy,
                   for_each_cpu(i, policy->cpus)
                           __gov_queue_work(i, dbs_data, delay);
           }
+       mutex_unlock(&cpufreq_governor_lock);
    }
    EXPORT_SYMBOL_GPL(gov_queue_work);










^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: cpufreq ondemand governor debugobjects warning
  2013-12-27  3:06 ` cpufreq ondemand governor debugobjects warning Jane Li
@ 2013-12-27  5:31   ` Viresh Kumar
  2013-12-27  6:18     ` Jane Li
  2013-12-27  8:01   ` Viresh Kumar
  1 sibling, 1 reply; 5+ messages in thread
From: Viresh Kumar @ 2013-12-27  5:31 UTC (permalink / raw)
  To: Jane Li; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org

On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote:
> When gov_queue_work(), governor_enabled may be modified.  Following patch
> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this
> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think
> this is a good idea?

gov_queue_work() is called with timer_mutex on. Can you try placing this
lock around gov_cancel_work() and see if it resolves issues you have reported ?

--
viresh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cpufreq ondemand governor debugobjects warning
  2013-12-27  5:31   ` Viresh Kumar
@ 2013-12-27  6:18     ` Jane Li
  0 siblings, 0 replies; 5+ messages in thread
From: Jane Li @ 2013-12-27  6:18 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org

On 12/27/2013 01:31 PM, Viresh Kumar wrote:

> On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote:
>> When gov_queue_work(), governor_enabled may be modified.  Following patch
>> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this
>> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think
>> this is a good idea?
> gov_queue_work() is called with timer_mutex on. Can you try placing this
> lock around gov_cancel_work() and see if it resolves issues you have reported ?
>
> --
> viresh

There may be deadlock in this way.

The existing dependency chain (in reverse order) is:

-> #1 (&j_cdbs->timer_mutex){+.+.+.}:

[<c017ba0c>] lock_acquire+0x9c/0x150

[<c069d6d0>] mutex_lock_nested+0x50/0x3d8

[<c04849c0>] od_dbs_timer+0x3c/0x130

[<c0143b84>] process_one_work+0x1b8/0x518

[<c0144024>] worker_thread+0x140/0x3f0

[<c014a708>] kthread+0xa4/0xb0

[<c010e2c8>] ret_from_fork+0x14/0x2c

-> #0 ((&(&j_cdbs->work)->work)){+.+...}:

[<c017af18>] __lock_acquire+0x171c/0x1c64

[<c017ba0c>] lock_acquire+0x9c/0x150

[<c01432e8>] flush_work+0x3c/0x2a0

[<c0144b90>] __cancel_work_timer+0x90/0x138

[<c0485e90>] cpufreq_governor_dbs+0x528/0x6a4

[<c0481c40>] __cpufreq_governor+0x80/0x1b0

[<c0481dd8>] __cpufreq_remove_dev.isra.12+0x68/0x380

[<c0696eac>] cpufreq_cpu_callback+0x7c/0x90

[<c014fff8>] notifier_call_chain+0x44/0x84

[<c012b09c>] __cpu_notify+0x2c/0x48

[<c0691c98>] _cpu_down+0x80/0x258

[<c0691e98>] cpu_down+0x28/0x3c

[<c0692418>] store_online+0x30/0x74

[<c03a4400>] dev_attr_store+0x18/0x24

[<c0257498>] sysfs_write_file+0x100/0x180

[<c01ff154>] vfs_write+0xbc/0x184

[<c01ff4ec>] SyS_write+0x40/0x68

[<c010e200>] ret_fast_syscall+0x0/0x48


         CPU0                    CPU1

         ----                    ----

   lock(&j_cdbs->timer_mutex);

                                lock((&(&j_cdbs->work)->work));

                                lock(&j_cdbs->timer_mutex);

   lock((&(&j_cdbs->work)->work));


diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c

index e6be635..ff14647 100644

--- a/drivers/cpufreq/cpufreq_governor.c

+++ b/drivers/cpufreq/cpufreq_governor.c

@@ -351,7 +351,9 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,

                 if (dbs_data->cdata->governor == GOV_CONSERVATIVE)

                         cs_dbs_info->enable = 0;

+               mutex_lock(&cpu_cdbs->timer_mutex);

                 gov_cancel_work(dbs_data, policy);

+               mutex_unlock(&cpu_cdbs->timer_mutex);

                 mutex_lock(&dbs_data->mutex);

                 mutex_destroy(&cpu_cdbs->timer_mutex);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cpufreq ondemand governor debugobjects warning
  2013-12-27  3:06 ` cpufreq ondemand governor debugobjects warning Jane Li
  2013-12-27  5:31   ` Viresh Kumar
@ 2013-12-27  8:01   ` Viresh Kumar
  2013-12-27  9:35     ` Jane Li
  1 sibling, 1 reply; 5+ messages in thread
From: Viresh Kumar @ 2013-12-27  8:01 UTC (permalink / raw)
  To: Jane Li; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org

On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote:
> When gov_queue_work(), governor_enabled may be modified.  Following patch
> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this
> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think
> this is a good idea?

I don't see a alternative than this solution after the deadlock issue with timer
lock.

> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index e6be635..a27246c 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data,
> struct cpufreq_policy *policy,
>                   unsigned int delay, bool all_cpus)
>    {
>           int i;
> -

don't remove this.

> -       if (!policy->governor_enabled)
> +       mutex_lock(&cpufreq_governor_lock);
> +       if (!policy->governor_enabled) {
> +               mutex_unlock(&cpufreq_governor_lock);
>                   return;
> +       }
>
>           if (!all_cpus) {
>                   /*
> @@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
> struct cpufreq_policy *policy,
>                   for_each_cpu(i, policy->cpus)
>                           __gov_queue_work(i, dbs_data, delay);
>           }
> +       mutex_unlock(&cpufreq_governor_lock);
>    }
>    EXPORT_SYMBOL_GPL(gov_queue_work);

You also need to remove 'static' from lock's declaration.

--
viresh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: cpufreq ondemand governor debugobjects warning
  2013-12-27  8:01   ` Viresh Kumar
@ 2013-12-27  9:35     ` Jane Li
  0 siblings, 0 replies; 5+ messages in thread
From: Jane Li @ 2013-12-27  9:35 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org


On 12/27/2013 04:01 PM, Viresh Kumar wrote:
> On 27 December 2013 08:36, Jane Li <jiel@marvell.com> wrote:
>> When gov_queue_work(), governor_enabled may be modified.  Following patch
>> can fix it by adding cpufreq_governor_lock in gov_queue_work. But in this
>> way, cpufreq_governor_lock also protects __gov_queue_work(). Do you think
>> this is a good idea?
> I don't see a alternative than this solution after the deadlock issue with timer
> lock.
>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index e6be635..a27246c 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -118,9 +118,11 @@ void gov_queue_work(struct dbs_data *dbs_data,
>> struct cpufreq_policy *policy,
>>                    unsigned int delay, bool all_cpus)
>>     {
>>            int i;
>> -
> don't remove this.
>
>> -       if (!policy->governor_enabled)
>> +       mutex_lock(&cpufreq_governor_lock);
>> +       if (!policy->governor_enabled) {
>> +               mutex_unlock(&cpufreq_governor_lock);
>>                    return;
>> +       }
>>
>>            if (!all_cpus) {
>>                    /*
>> @@ -135,6 +137,7 @@ void gov_queue_work(struct dbs_data *dbs_data,
>> struct cpufreq_policy *policy,
>>                    for_each_cpu(i, policy->cpus)
>>                            __gov_queue_work(i, dbs_data, delay);
>>            }
>> +       mutex_unlock(&cpufreq_governor_lock);
>>     }
>>     EXPORT_SYMBOL_GPL(gov_queue_work);
> You also need to remove 'static' from lock's declaration.
>
> --
> viresh

Thanks. I have pushed one patch named "[PATCH] cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled".


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-12-27  9:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <52BCEA76.1020906@marvell.com>
2013-12-27  3:06 ` cpufreq ondemand governor debugobjects warning Jane Li
2013-12-27  5:31   ` Viresh Kumar
2013-12-27  6:18     ` Jane Li
2013-12-27  8:01   ` Viresh Kumar
2013-12-27  9:35     ` Jane Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox