* [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
@ 2015-01-30 1:13 Viresh Kumar
2015-01-30 1:30 ` ethan zhao
2015-01-30 22:57 ` Rafael J. Wysocki
0 siblings, 2 replies; 14+ messages in thread
From: Viresh Kumar @ 2015-01-30 1:13 UTC (permalink / raw)
To: Rafael Wysocki, santosh.shilimkar, ethan.zhao
Cc: linaro-kernel, linux-pm, Viresh Kumar, stable
cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
has missed this. And as a result we get this:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
Call Trace:
[<ffffffff81661b14>] dump_stack+0x46/0x58
[<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff812e16d1>] kobject_get+0x41/0x50
[<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[<ffffffff810b8cb2>] ? up+0x32/0x50
[<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[<ffffffff81089566>] ? move_linked_works+0x66/0x90
[<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[<ffffffff8108c910>] process_one_work+0x160/0x410
[<ffffffff8108d05b>] worker_thread+0x11b/0x520
[<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[<ffffffff81092421>] kthread+0xe1/0x100
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
---[ end trace 89e66eb9795efdf7 ]---
And here is the race:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_finish()
cpufreq_policy_put_kobj()
kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
freed until cpufreq_cpu_put() is called.
But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data later and that too without these locks. And so the first thread
gets a valid policy structure and before it does kobject_get() on it, the second
one does kobject_put(). And so this WARN().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.
Cc: <stable@vger.kernel.org> # 3.12+
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
@Santosh: I have changed read locks to write locks here and so you need to test
again.
drivers/cpufreq/cpufreq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..e3bf702b5588 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
unsigned long flags;
struct cpufreq_policy *policy;
- read_lock_irqsave(&cpufreq_driver_lock, flags);
+ write_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
- read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+ per_cpu(cpufreq_cpu_data, cpu) = NULL;
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
if (!policy) {
pr_debug("%s: No cpu_data found\n", __func__);
@@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
}
}
- per_cpu(cpufreq_cpu_data, cpu) = NULL;
return 0;
}
--
2.3.0.rc0.44.ga94655d
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 1:13 [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject Viresh Kumar
@ 2015-01-30 1:30 ` ethan zhao
2015-01-30 2:05 ` Viresh Kumar
2015-01-30 22:57 ` Rafael J. Wysocki
1 sibling, 1 reply; 14+ messages in thread
From: ethan zhao @ 2015-01-30 1:30 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael Wysocki, santosh.shilimkar, linaro-kernel, linux-pm,
stable
Viresh,
On 2015/1/30 9:13, Viresh Kumar wrote:
> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
> has missed this. And as a result we get this:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
> kobject_get+0x41/0x50()
> Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
> lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
> ...
> Call Trace:
> [<ffffffff81661b14>] dump_stack+0x46/0x58
> [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
> [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff812e16d1>] kobject_get+0x41/0x50
> [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
> [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
> [<ffffffff810b8cb2>] ? up+0x32/0x50
> [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
> [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
> [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
> [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
> [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
> [<ffffffff81089566>] ? move_linked_works+0x66/0x90
> [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
> [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
> [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
> [<ffffffff8108c910>] process_one_work+0x160/0x410
> [<ffffffff8108d05b>] worker_thread+0x11b/0x520
> [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
> [<ffffffff81092421>] kthread+0xe1/0x100
> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
> [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
> ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the race:
>
> Thread A: Workqueue: kacpi_notify
>
> acpi_processor_notify()
> acpi_processor_ppc_has_changed()
> cpufreq_update_policy()
> cpufreq_cpu_get()
> kobject_get()
>
> Thread B: xenbus_thread()
>
> xenbus_thread()
> msg->u.watch.handle->callback()
> handle_vcpu_hotplug_event()
> vcpu_hotplug()
> cpu_down()
> __cpu_notify(CPU_POST_DEAD..)
> cpufreq_cpu_callback()
> __cpufreq_remove_dev_finish()
> cpufreq_policy_put_kobj()
> kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
It is another bug.
>
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data later and that too without these locks. And so the first thread
> gets a valid policy structure and before it does kobject_get() on it, the second
> one does kobject_put(). And so this WARN().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
>
> Cc: <stable@vger.kernel.org> # 3.12+
> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> @Santosh: I have changed read locks to write locks here and so you need to test
> again.
>
> drivers/cpufreq/cpufreq.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba1d6b0..e3bf702b5588 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> unsigned long flags;
> struct cpufreq_policy *policy;
>
> - read_lock_irqsave(&cpufreq_driver_lock, flags);
> + write_lock_irqsave(&cpufreq_driver_lock, flags);
> policy = per_cpu(cpufreq_cpu_data, cpu);
> - read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> + per_cpu(cpufreq_cpu_data, cpu) = NULL;
> + write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> if (!policy) {
> pr_debug("%s: No cpu_data found\n", __func__);
> @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> }
> }
>
> - per_cpu(cpufreq_cpu_data, cpu) = NULL;
Yes, here is one bug should be fix. but seems not enough to avoid the
issue completely,
how about the Thread B running here
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_prepare
update_policy_cpu(){
...
down_write(&policy->rwsem);
policy->last_cpu = policy->cpu;
policy->cpu = cpu;
up_write(&policy->rwsem);
---->
}
And thread A run to the same position as above, don't ignore my work blindly, that piece of bandage
could save your time :>
Thanks,
Ethan
> return 0;
> }
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 1:30 ` ethan zhao
@ 2015-01-30 2:05 ` Viresh Kumar
2015-01-30 2:10 ` ethan zhao
0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2015-01-30 2:05 UTC (permalink / raw)
To: ethan zhao
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote:
> It is another bug.
Hmm, lets see..
> Yes, here is one bug should be fix. but seems not enough to avoid the issue
> completely,
Did you test it ?
> how about the Thread B running here
>
> Thread B: xenbus_thread()
>
> xenbus_thread()
> msg->u.watch.handle->callback()
> handle_vcpu_hotplug_event()
> vcpu_hotplug()
> cpu_down()
> __cpu_notify(CPU_POST_DEAD..)
> cpufreq_cpu_callback()
> __cpufreq_remove_dev_prepare
> update_policy_cpu(){
> ...
> down_write(&policy->rwsem);
> policy->last_cpu = policy->cpu;
> policy->cpu = cpu;
> up_write(&policy->rwsem);
> ---->
> }
>
> And thread A run to the same position as above, don't ignore my work
> blindly, that piece of bandage
> could save your time
Oh, please don't misunderstand me. I didn't had any intention of showing
any kind of disrespect.
Okay, why do you say that the above thread you shown has a bug in there?
Its juse updating policy->cpu and that shouldn't make anything unstable.
Please explain again one more time, in details why do you think you hit a
different bug. Also look at kref.h to see which piece of code has hit that
WARN() and it will happen only in the case I have shown. Lets see if I
missed something :)
--
viresh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 2:05 ` Viresh Kumar
@ 2015-01-30 2:10 ` ethan zhao
2015-01-30 2:13 ` Viresh Kumar
0 siblings, 1 reply; 14+ messages in thread
From: ethan zhao @ 2015-01-30 2:10 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
Viresh,
On 2015/1/30 10:05, Viresh Kumar wrote:
> On 30 January 2015 at 07:00, ethan zhao <ethan.zhao@oracle.com> wrote:
>
>> It is another bug.
> Hmm, lets see..
>
>> Yes, here is one bug should be fix. but seems not enough to avoid the issue
>> completely,
> Did you test it ?
For a PPC notification and xen-bus thread race, could you tell me a way how
to reproduce it by trigger the PPC notification and xen-bus events
manually ?
You really want me write some code into a test kernel to flood the PPC
and xen-bus at the same time ? if we could analysis code and get the
issue clearly,
we wouldn't wait the users to yell out.
Thanks,
Ethan
>
>> how about the Thread B running here
>>
>> Thread B: xenbus_thread()
>>
>> xenbus_thread()
>> msg->u.watch.handle->callback()
>> handle_vcpu_hotplug_event()
>> vcpu_hotplug()
>> cpu_down()
>> __cpu_notify(CPU_POST_DEAD..)
>> cpufreq_cpu_callback()
>> __cpufreq_remove_dev_prepare
>> update_policy_cpu(){
>> ...
>> down_write(&policy->rwsem);
>> policy->last_cpu = policy->cpu;
>> policy->cpu = cpu;
>> up_write(&policy->rwsem);
>> ---->
>> }
>>
>> And thread A run to the same position as above, don't ignore my work
>> blindly, that piece of bandage
>> could save your time
> Oh, please don't misunderstand me. I didn't had any intention of showing
> any kind of disrespect.
>
> Okay, why do you say that the above thread you shown has a bug in there?
> Its juse updating policy->cpu and that shouldn't make anything unstable.
>
> Please explain again one more time, in details why do you think you hit a
> different bug. Also look at kref.h to see which piece of code has hit that
> WARN() and it will happen only in the case I have shown. Lets see if I
> missed something :)
>
> --
> viresh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 2:10 ` ethan zhao
@ 2015-01-30 2:13 ` Viresh Kumar
2015-01-30 2:21 ` ethan zhao
0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2015-01-30 2:13 UTC (permalink / raw)
To: ethan zhao
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote:
> For a PPC notification and xen-bus thread race, could you tell me a way how
> to reproduce it by trigger the PPC notification and xen-bus events manually
> ?
> You really want me write some code into a test kernel to flood the PPC and
> xen-bus at the same time ? if we could analysis code and get the issue
> clearly, we wouldn't wait the users to yell out.
I thought you already have a test where you are hitting the issue you originally
reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel
during boot..
My reasoning of why your observation doesn't fit here:
Copying from your earlier mail..
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
This tries to increment the count and the warning you have mentioned
happen because:
WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
i.e. even after incrementing the count, it is < 2. Which I believe will be
1. Which means that we have tried to do kobject_get() on a kobject
for which kobject_put() is already done.
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_DOWN_PREPARE..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_prepare()
update_policy_cpu()
kobject_move()
Okay, where is the race or kobject_put() here ? We are just moving
the kobject and it has nothing to do with the refcount of kobject.
Why do you see its a race ?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 2:13 ` Viresh Kumar
@ 2015-01-30 2:21 ` ethan zhao
2015-01-30 3:14 ` Viresh Kumar
0 siblings, 1 reply; 14+ messages in thread
From: ethan zhao @ 2015-01-30 2:21 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 2015/1/30 10:13, Viresh Kumar wrote:
> On 30 January 2015 at 07:40, ethan zhao <ethan.zhao@oracle.com> wrote:
>> For a PPC notification and xen-bus thread race, could you tell me a way how
>> to reproduce it by trigger the PPC notification and xen-bus events manually
>> ?
>> You really want me write some code into a test kernel to flood the PPC and
>> xen-bus at the same time ? if we could analysis code and get the issue
>> clearly, we wouldn't wait the users to yell out.
> I thought you already have a test where you are hitting the issue you originally
> reported. Atleast Santosh did confirm that he is hitting 3/5 times in his kernel
> during boot..
As I know, PPC notification only happens when power capping needed,
maybe the server
over-hot, if the cooling condition recover, you couldn't reproduce it
either !.
>
> My reasoning of why your observation doesn't fit here:
>
> Copying from your earlier mail..
>
> Thread A: Workqueue: kacpi_notify
>
> acpi_processor_notify()
> acpi_processor_ppc_has_changed()
> cpufreq_update_policy()
> cpufreq_cpu_get()
> kobject_get()
>
> This tries to increment the count and the warning you have mentioned
> happen because:
>
> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>
> i.e. even after incrementing the count, it is < 2. Which I believe will be
> 1. Which means that we have tried to do kobject_get() on a kobject
> for which kobject_put() is already done.
>
> Thread B: xenbus_thread()
>
> xenbus_thread()
> msg->u.watch.handle->callback()
> handle_vcpu_hotplug_event()
> vcpu_hotplug()
> cpu_down()
> __cpu_notify(CPU_DOWN_PREPARE..)
> cpufreq_cpu_callback()
> __cpufreq_remove_dev_prepare()
> update_policy_cpu()
> kobject_move()
>
>
> Okay, where is the race or kobject_put() here ? We are just moving
> the kobject and it has nothing to do with the refcount of kobject.
>
> Why do you see its a race ?
I mean the policy->cpu has been changed, that CPU is about to be down,
Thread A continue to get and update the policy for it blindly, that is
what I Say 'race', not the refcount itself.
Thanks,
Ethan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 2:21 ` ethan zhao
@ 2015-01-30 3:14 ` Viresh Kumar
2015-01-30 3:46 ` ethan zhao
0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2015-01-30 3:14 UTC (permalink / raw)
To: ethan zhao
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote:
>> My reasoning of why your observation doesn't fit here:
>>
>> Copying from your earlier mail..
>>
>> Thread A: Workqueue: kacpi_notify
>>
>> acpi_processor_notify()
>> acpi_processor_ppc_has_changed()
>> cpufreq_update_policy()
>> cpufreq_cpu_get()
>> kobject_get()
>>
>> This tries to increment the count and the warning you have mentioned
>> happen because:
>>
>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>
>> i.e. even after incrementing the count, it is < 2. Which I believe will be
>> 1. Which means that we have tried to do kobject_get() on a kobject
>> for which kobject_put() is already done.
>>
>> Thread B: xenbus_thread()
>>
>> xenbus_thread()
>> msg->u.watch.handle->callback()
>> handle_vcpu_hotplug_event()
>> vcpu_hotplug()
>> cpu_down()
>> __cpu_notify(CPU_DOWN_PREPARE..)
>> cpufreq_cpu_callback()
>> __cpufreq_remove_dev_prepare()
>> update_policy_cpu()
>> kobject_move()
>>
>>
>> Okay, where is the race or kobject_put() here ? We are just moving
>> the kobject and it has nothing to do with the refcount of kobject.
>>
>> Why do you see its a race ?
>
> I mean the policy->cpu has been changed, that CPU is about to be down,
> Thread A continue to get and update the policy for it blindly, that is
> what I Say 'race', not the refcount itself.
First of all, the WARN you had in your patch doesn't have anything to do
with the so-called race you just define. Its because of the reason I defined
earlier.
Second, what if policy->cpu is getting updated? A policy manages a group
of CPUs, not a single cpu. And there still are other CPUs online for that
policy and so kobject_get() for that policy->kobj is perfectly valid.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 3:14 ` Viresh Kumar
@ 2015-01-30 3:46 ` ethan zhao
2015-01-30 4:14 ` Viresh Kumar
0 siblings, 1 reply; 14+ messages in thread
From: ethan zhao @ 2015-01-30 3:46 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 2015/1/30 11:14, Viresh Kumar wrote:
> On 30 January 2015 at 07:51, ethan zhao <ethan.zhao@oracle.com> wrote:
>>> My reasoning of why your observation doesn't fit here:
>>>
>>> Copying from your earlier mail..
>>>
>>> Thread A: Workqueue: kacpi_notify
>>>
>>> acpi_processor_notify()
>>> acpi_processor_ppc_has_changed()
>>> cpufreq_update_policy()
>>> cpufreq_cpu_get()
>>> kobject_get()
>>>
>>> This tries to increment the count and the warning you have mentioned
>>> happen because:
>>>
>>> WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
>>>
>>> i.e. even after incrementing the count, it is < 2. Which I believe will be
>>> 1. Which means that we have tried to do kobject_get() on a kobject
>>> for which kobject_put() is already done.
>>>
>>> Thread B: xenbus_thread()
>>>
>>> xenbus_thread()
>>> msg->u.watch.handle->callback()
>>> handle_vcpu_hotplug_event()
>>> vcpu_hotplug()
>>> cpu_down()
>>> __cpu_notify(CPU_DOWN_PREPARE..)
>>> cpufreq_cpu_callback()
>>> __cpufreq_remove_dev_prepare()
>>> update_policy_cpu()
>>> kobject_move()
>>>
>>>
>>> Okay, where is the race or kobject_put() here ? We are just moving
>>> the kobject and it has nothing to do with the refcount of kobject.
>>>
>>> Why do you see its a race ?
>> I mean the policy->cpu has been changed, that CPU is about to be down,
>> Thread A continue to get and update the policy for it blindly, that is
>> what I Say 'race', not the refcount itself.
> First of all, the WARN you had in your patch doesn't have anything to do
> with the so-called race you just define. Its because of the reason I defined
> earlier.
>
> Second, what if policy->cpu is getting updated? A policy manages a group
> of CPUs, not a single cpu. And there still are other CPUs online for that
> policy and so kobject_get() for that policy->kobj is perfectly valid.
You mean the policy is shared by all CPUs, so PPC notification about one
CPU should update all CPU's policy, right ? even the requested CPU is
shutting
down.
Thanks,
Ethan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 3:46 ` ethan zhao
@ 2015-01-30 4:14 ` Viresh Kumar
2015-02-02 1:54 ` ethan zhao
0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2015-01-30 4:14 UTC (permalink / raw)
To: ethan zhao
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote:
> You mean the policy is shared by all CPUs, so PPC notification about one
A policy may or maynot be shared by multiple CPUs. It all depends on your
systems configuration. All CPUs which share clock line, share a policy.
You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This
gives the CPUs that share policy.
> CPU should update all CPU's policy, right ? even the requested CPU is
> shutting down.
CPUs sharing policy have a single policy structure. And so policy would
be updated for all CPUs that share the poilcy. Even if some cpu goes down,
the policy might still be up and running.
--
viresh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 22:57 ` Rafael J. Wysocki
@ 2015-01-30 22:55 ` santosh shilimkar
2015-01-31 0:31 ` Viresh Kumar
1 sibling, 0 replies; 14+ messages in thread
From: santosh shilimkar @ 2015-01-30 22:55 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, ethan.zhao, linaro-kernel, linux-pm, stable
On 1/30/2015 2:57 PM, Rafael J. Wysocki wrote:
> On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
>> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
>> has missed this.
>
> No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to
> protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get()
> that the policy object won't go away from under them (it is used for some other
> purposes too, but they are unrelated).
>
> What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu)
> needs to be cleared before calling kobject_put(&policy->kobj) *and* under the
> lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel
> with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj)
> was executed.
>
> So the lock is needed not just because it protects cpufreq_cpu_data, but because
> it is supposed to prevent writes to cpufreq_cpu_data from happening between the
> read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().
>
>> And as a result we get this:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>> kobject_get+0x41/0x50()
>> Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>> lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>> ...
>> Call Trace:
>> [<ffffffff81661b14>] dump_stack+0x46/0x58
>> [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>> [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>> [<ffffffff812e16d1>] kobject_get+0x41/0x50
>> [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>> [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>> [<ffffffff810b8cb2>] ? up+0x32/0x50
>> [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>> [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>> [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>> [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>> [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>> [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>> [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>> [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>> [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>> [<ffffffff8108c910>] process_one_work+0x160/0x410
>> [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>> [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>> [<ffffffff81092421>] kthread+0xe1/0x100
>> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>> [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>> ---[ end trace 89e66eb9795efdf7 ]---
>>
>> And here is the race:
>>
>> Thread A: Workqueue: kacpi_notify
>>
>> acpi_processor_notify()
>> acpi_processor_ppc_has_changed()
>> cpufreq_update_policy()
>> cpufreq_cpu_get()
>> kobject_get()
>>
>> Thread B: xenbus_thread()
>>
>> xenbus_thread()
>> msg->u.watch.handle->callback()
>> handle_vcpu_hotplug_event()
>> vcpu_hotplug()
>> cpu_down()
>> __cpu_notify(CPU_POST_DEAD..)
>> cpufreq_cpu_callback()
>> __cpufreq_remove_dev_finish()
>> cpufreq_policy_put_kobj()
>> kobject_put()
>>
>> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
>> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
>> freed until cpufreq_cpu_put() is called.
>
> Because it does the kobject_get() under the lock too.
>
>> But the race happens when another thread puts the kobject first and updates
>> cpufreq_cpu_data later
>
> This is an ordering problem.
>
>> and that too without these locks.
>
> And this is racy.
>
>> And so the first thread
>> gets a valid policy structure and before it does kobject_get() on it, the second
>> one does kobject_put(). And so this WARN().
>>
>> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
>> too under locks.
>
> That's almost correct. In addition to the above (albeit maybe unintentionally)
> the patch also fixes the possible race between two instances of
> __cpufreq_remove_dev_finish() with the same arguments running in parallel with
> each other. The proof is left as an exercise to the reader. :-)
>
> Please try to improve the changelog ...
>
>> Cc: <stable@vger.kernel.org> # 3.12+
>> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
>> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>> @Santosh: I have changed read locks to write locks here and so you need to test
>> again.
>>
Right. Tested this version and confirm that it fixes the reported WARN()
Regards,
Santosh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 1:13 [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject Viresh Kumar
2015-01-30 1:30 ` ethan zhao
@ 2015-01-30 22:57 ` Rafael J. Wysocki
2015-01-30 22:55 ` santosh shilimkar
2015-01-31 0:31 ` Viresh Kumar
1 sibling, 2 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2015-01-30 22:57 UTC (permalink / raw)
To: Viresh Kumar
Cc: santosh.shilimkar, ethan.zhao, linaro-kernel, linux-pm, stable
On Friday, January 30, 2015 06:43:12 AM Viresh Kumar wrote:
> cpufreq_cpu_data is protected by cpufreq_driver_lock and one of the instances
> has missed this.
No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to
protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get()
that the policy object won't go away from under them (it is used for some other
purposes too, but they are unrelated).
What technically happens is an ordering problem. per_cpu(cpufreq_cpu_data, cpu)
needs to be cleared before calling kobject_put(&policy->kobj) *and* under the
lock, because otherwise if someone else calls cpufreq_cpu_get() in parallel
with that, they can obtain a non-NULL from it *after* kobject_put(&policy->kobj)
was executed.
So the lock is needed not just because it protects cpufreq_cpu_data, but because
it is supposed to prevent writes to cpufreq_cpu_data from happening between the
read from it and the kobject_get(&policy->kobj) in cpufreq_cpu_get().
> And as a result we get this:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
> kobject_get+0x41/0x50()
> Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
> lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
> ...
> Call Trace:
> [<ffffffff81661b14>] dump_stack+0x46/0x58
> [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
> [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff812e16d1>] kobject_get+0x41/0x50
> [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
> [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
> [<ffffffff810b8cb2>] ? up+0x32/0x50
> [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
> [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
> [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
> [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
> [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
> [<ffffffff81089566>] ? move_linked_works+0x66/0x90
> [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
> [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
> [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
> [<ffffffff8108c910>] process_one_work+0x160/0x410
> [<ffffffff8108d05b>] worker_thread+0x11b/0x520
> [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
> [<ffffffff81092421>] kthread+0xe1/0x100
> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
> [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
> [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
> ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the race:
>
> Thread A: Workqueue: kacpi_notify
>
> acpi_processor_notify()
> acpi_processor_ppc_has_changed()
> cpufreq_update_policy()
> cpufreq_cpu_get()
> kobject_get()
>
> Thread B: xenbus_thread()
>
> xenbus_thread()
> msg->u.watch.handle->callback()
> handle_vcpu_hotplug_event()
> vcpu_hotplug()
> cpu_down()
> __cpu_notify(CPU_POST_DEAD..)
> cpufreq_cpu_callback()
> __cpufreq_remove_dev_finish()
> cpufreq_policy_put_kobj()
> kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
Because it does the kobject_get() under the lock too.
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data later
This is an ordering problem.
> and that too without these locks.
And this is racy.
> And so the first thread
> gets a valid policy structure and before it does kobject_get() on it, the second
> one does kobject_put(). And so this WARN().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
That's almost correct. In addition to the above (albeit maybe unintentionally)
the patch also fixes the possible race between two instances of
__cpufreq_remove_dev_finish() with the same arguments running in parallel with
each other. The proof is left as an exercise to the reader. :-)
Please try to improve the changelog ...
> Cc: <stable@vger.kernel.org> # 3.12+
> Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
> Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> @Santosh: I have changed read locks to write locks here and so you need to test
> again.
>
> drivers/cpufreq/cpufreq.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4473eba1d6b0..e3bf702b5588 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1409,9 +1409,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> unsigned long flags;
> struct cpufreq_policy *policy;
>
> - read_lock_irqsave(&cpufreq_driver_lock, flags);
> + write_lock_irqsave(&cpufreq_driver_lock, flags);
> policy = per_cpu(cpufreq_cpu_data, cpu);
> - read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> + per_cpu(cpufreq_cpu_data, cpu) = NULL;
> + write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> if (!policy) {
> pr_debug("%s: No cpu_data found\n", __func__);
> @@ -1466,7 +1467,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> }
> }
>
> - per_cpu(cpufreq_cpu_data, cpu) = NULL;
> return 0;
> }
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 22:57 ` Rafael J. Wysocki
2015-01-30 22:55 ` santosh shilimkar
@ 2015-01-31 0:31 ` Viresh Kumar
1 sibling, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2015-01-31 0:31 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: santosh shilimkar, Ethan Zhao, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 31 January 2015 at 04:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> No, it isn't. cpufreq_cpu_data is a pointer and doesn't need any locks to
> protect it. What the lock does is to guarantee the callers of cpufreq_cpu_get()
> that the policy object won't go away from under them (it is used for some other
> purposes too, but they are unrelated).
Yeah, its not just locking. Otherwise the locking only at the bottom
of this routine
should have fixed it.
> That's almost correct. In addition to the above (albeit maybe unintentionally)
> the patch also fixes the possible race between two instances of
> __cpufreq_remove_dev_finish() with the same arguments running in parallel with
> each other. The proof is left as an exercise to the reader. :-)
Two __cpufreq_remove_dev_finish() can't get called in parallel, so it
doesn't fix
any race there :).
> Please try to improve the changelog ...
Yes.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-01-30 4:14 ` Viresh Kumar
@ 2015-02-02 1:54 ` ethan zhao
2015-02-02 3:20 ` Viresh Kumar
0 siblings, 1 reply; 14+ messages in thread
From: ethan zhao @ 2015-02-02 1:54 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
Viresh,
On 2015/1/30 12:14, Viresh Kumar wrote:
> On 30 January 2015 at 09:16, ethan zhao <ethan.zhao@oracle.com> wrote:
>> You mean the policy is shared by all CPUs, so PPC notification about one
> A policy may or maynot be shared by multiple CPUs. It all depends on your
> systems configuration. All CPUs which share clock line, share a policy.
>
> You can verify that from /sys/devices/system/cpu/cpu*/related_cpus. This
> gives the CPUs that share policy.
>
>> CPU should update all CPU's policy, right ? even the requested CPU is
>> shutting down.
> CPUs sharing policy have a single policy structure. And so policy would
> be updated for all CPUs that share the poilcy. Even if some cpu goes down,
> the policy might still be up and running.
Let' me check ACPI spec and BIOS to match your implementation.
About that patch you suggested, there is another question left pending:
Policy will be freed only when that's last CPU shutting down, how does it
happen when system booting ?
The description of the patch would be wrong (the Xen_bus call stack) --
Did the xen_bus shut down every CPU till the last during booting ?
Thanks,
Ethan
>
> --
> viresh
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
2015-02-02 1:54 ` ethan zhao
@ 2015-02-02 3:20 ` Viresh Kumar
0 siblings, 0 replies; 14+ messages in thread
From: Viresh Kumar @ 2015-02-02 3:20 UTC (permalink / raw)
To: ethan zhao
Cc: Rafael Wysocki, santosh shilimkar, Linaro Kernel Mailman List,
linux-pm@vger.kernel.org, # 3.13.x
On 2 February 2015 at 07:24, ethan zhao <ethan.zhao@oracle.com> wrote:
> Policy will be freed only when that's last CPU shutting down, how does it
> happen when system booting ?
I couldn't understand what you are asking here. Though policy will be allocated
for the first CPU of every policy during boot..
> The description of the patch would be wrong (the Xen_bus call stack) --
> Did the xen_bus shut down every CPU till the last during booting ?
I don't know about that..
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-02-02 3:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-30 1:13 [PATCH] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject Viresh Kumar
2015-01-30 1:30 ` ethan zhao
2015-01-30 2:05 ` Viresh Kumar
2015-01-30 2:10 ` ethan zhao
2015-01-30 2:13 ` Viresh Kumar
2015-01-30 2:21 ` ethan zhao
2015-01-30 3:14 ` Viresh Kumar
2015-01-30 3:46 ` ethan zhao
2015-01-30 4:14 ` Viresh Kumar
2015-02-02 1:54 ` ethan zhao
2015-02-02 3:20 ` Viresh Kumar
2015-01-30 22:57 ` Rafael J. Wysocki
2015-01-30 22:55 ` santosh shilimkar
2015-01-31 0:31 ` Viresh Kumar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox