* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:17 [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU Tejun Heo
@ 2012-09-17 20:36 ` Borislav Petkov
2012-09-17 20:53 ` Tejun Heo
2012-09-17 20:38 ` Rafael J. Wysocki
2012-09-18 20:12 ` Tejun Heo
2 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2012-09-17 20:36 UTC (permalink / raw)
To: Tejun Heo
Cc: Linus Torvalds, Rafael J. Wysocki, linux-kernel, cpufreq,
linux-pm, Duncan, Andreas Herrmann
On Mon, Sep 17, 2012 at 01:17:21PM -0700, Tejun Heo wrote:
> powernowk8_target() runs off a per-cpu work item and if the
> cpufreq_policy->cpu is different from the current one, it migrates the
> kworker to the target CPU by manipulating current->cpus_allowed. The
> function migrates the kworker back to the original CPU but this is
> still broken. Workqueue concurrency management requires the kworkers
> to stay on the same CPU and powernowk8_target() ends up triggerring
> BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
> fidvid_mutex and sleeps.
>
> It is unclear why this bug is being reported now. Duncan says it
> appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
> 3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
> instead of @gcwq or @cpu where applicable" which is an non-functional
> change. Given that the reproduce case sometimes took upto days to
> trigger, it's easy to be misled while bisecting. Maybe something made
> contention on fidvid_mutex more likely? I don't know.
>
> This patch fixes the bug by punting to another per-cpu work item on
> the target CPU if it isn't the same as the current one. The code
> assumes that cpufreq_policy->cpu is kept online by the caller, which
> Rafael tells me is the case.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Duncan <1i5t5.duncan@cox.net>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
> Cc: stable@kernel.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
> ---
>
> While it's very late in the merge cycle, the fix is limited in scope
> and fairly safe, so it wouldn't be too crazy to merge but then again
> this can go through the next -rc1 and then -stable. Linus, Rafael,
> what do you guys think?
Wouldn't it be much simpler to carve out the piece after
set_cpus_allowed_ptr(), put it in a sub-function called
__powernowk8_target() and call it with smp_call_function_single instead
of defining another work item?
Would the workqueue code handle that or are there any other issues?
> drivers/cpufreq/powernow-k8.c | 89 +++++++++++++++++++++++++-----------------
> 1 file changed, 53 insertions(+), 36 deletions(-)
If it can, the diffstat should look much slimmer.
Thanks.
--
Regards/Gruss,
Boris.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:36 ` Borislav Petkov
@ 2012-09-17 20:53 ` Tejun Heo
2012-09-17 21:22 ` Borislav Petkov
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2012-09-17 20:53 UTC (permalink / raw)
To: Borislav Petkov, Linus Torvalds, Rafael J. Wysocki, linux-kernel,
cpufreq, linux-pm, Duncan, Andreas Herrmann
On Mon, Sep 17, 2012 at 10:36:54PM +0200, Borislav Petkov wrote:
> Wouldn't it be much simpler to carve out the piece after
> set_cpus_allowed_ptr(), put it in a sub-function called
> __powernowk8_target() and call it with smp_call_function_single instead
> of defining another work item?
>
> Would the workqueue code handle that or are there any other issues?
The function grabs a mutex. smp_call_function wouldn't be too happy
about that.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:53 ` Tejun Heo
@ 2012-09-17 21:22 ` Borislav Petkov
0 siblings, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2012-09-17 21:22 UTC (permalink / raw)
To: Tejun Heo
Cc: Linus Torvalds, Rafael J. Wysocki, linux-kernel, cpufreq,
linux-pm, Duncan, Andreas Herrmann
On Mon, Sep 17, 2012 at 01:53:55PM -0700, Tejun Heo wrote:
> On Mon, Sep 17, 2012 at 10:36:54PM +0200, Borislav Petkov wrote:
> > Wouldn't it be much simpler to carve out the piece after
> > set_cpus_allowed_ptr(), put it in a sub-function called
> > __powernowk8_target() and call it with smp_call_function_single instead
> > of defining another work item?
> >
> > Would the workqueue code handle that or are there any other issues?
>
> The function grabs a mutex. smp_call_function wouldn't be too happy
> about that.
Yes indeed.
Thanks.
--
Regards/Gruss,
Boris.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:17 [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU Tejun Heo
2012-09-17 20:36 ` Borislav Petkov
@ 2012-09-17 20:38 ` Rafael J. Wysocki
2012-09-23 1:53 ` Thomas Renninger
2012-09-18 20:12 ` Tejun Heo
2 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2012-09-17 20:38 UTC (permalink / raw)
To: Tejun Heo, Thomas Renninger, Andre Przywara
Cc: Linus Torvalds, linux-kernel, cpufreq, linux-pm, Duncan,
Andreas Herrmann
On Monday, September 17, 2012, Tejun Heo wrote:
> powernowk8_target() runs off a per-cpu work item and if the
> cpufreq_policy->cpu is different from the current one, it migrates the
> kworker to the target CPU by manipulating current->cpus_allowed. The
> function migrates the kworker back to the original CPU but this is
> still broken. Workqueue concurrency management requires the kworkers
> to stay on the same CPU and powernowk8_target() ends up triggerring
> BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
> fidvid_mutex and sleeps.
>
> It is unclear why this bug is being reported now. Duncan says it
> appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
> 3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
> instead of @gcwq or @cpu where applicable" which is an non-functional
> change. Given that the reproduce case sometimes took upto days to
> trigger, it's easy to be misled while bisecting. Maybe something made
> contention on fidvid_mutex more likely? I don't know.
>
> This patch fixes the bug by punting to another per-cpu work item on
> the target CPU if it isn't the same as the current one. The code
> assumes that cpufreq_policy->cpu is kept online by the caller, which
> Rafael tells me is the case.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Duncan <1i5t5.duncan@cox.net>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
> Cc: stable@kernel.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
> ---
>
> While it's very late in the merge cycle, the fix is limited in scope
> and fairly safe, so it wouldn't be too crazy to merge but then again
> this can go through the next -rc1 and then -stable. Linus, Rafael,
> what do you guys think?
Well, I don't see much reason to wait with this, although I'd like some
more people to check it.
Andre, Thomas, can you please have a look at it?
Rafael
> drivers/cpufreq/powernow-k8.c | 89 +++++++++++++++++++++++++-----------------
> 1 file changed, 53 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
> index c0e8164..53db9de 100644
> --- a/drivers/cpufreq/powernow-k8.c
> +++ b/drivers/cpufreq/powernow-k8.c
> @@ -35,7 +35,6 @@
> #include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/cpumask.h>
> -#include <linux/sched.h> /* for current / set_cpus_allowed() */
> #include <linux/io.h>
> #include <linux/delay.h>
>
> @@ -1139,46 +1138,43 @@ static int transition_frequency_pstate(struct powernow_k8_data *data,
> return res;
> }
>
> -/* Driver entry point to switch to the target frequency */
> -static int powernowk8_target(struct cpufreq_policy *pol,
> - unsigned targfreq, unsigned relation)
> +struct powernowk8_target_work {
> + struct work_struct work;
> + struct cpufreq_policy *pol;
> + unsigned targfreq;
> + unsigned relation;
> + int ret;
> +};
> +
> +static void powernowk8_target_on_cpu(struct work_struct *work)
> {
> - cpumask_var_t oldmask;
> + struct powernowk8_target_work *tw =
> + container_of(work, struct powernowk8_target_work, work);
> + struct cpufreq_policy *pol = tw->pol;
> struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
> u32 checkfid;
> u32 checkvid;
> unsigned int newstate;
> - int ret = -EIO;
>
> + tw->ret = -EINVAL;
> if (!data)
> - return -EINVAL;
> + return;
> +
> + tw->ret = -EIO;
>
> checkfid = data->currfid;
> checkvid = data->currvid;
>
> - /* only run on specific CPU from here on. */
> - /* This is poor form: use a workqueue or smp_call_function_single */
> - if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
> - return -ENOMEM;
> -
> - cpumask_copy(oldmask, tsk_cpus_allowed(current));
> - set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
> -
> - if (smp_processor_id() != pol->cpu) {
> - printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
> - goto err_out;
> - }
> -
> if (pending_bit_stuck()) {
> printk(KERN_ERR PFX "failing targ, change pending bit set\n");
> - goto err_out;
> + return;
> }
>
> pr_debug("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
> - pol->cpu, targfreq, pol->min, pol->max, relation);
> + pol->cpu, tw->targfreq, pol->min, pol->max, tw->relation);
>
> if (query_current_values_with_pending_wait(data))
> - goto err_out;
> + return;
>
> if (cpu_family != CPU_HW_PSTATE) {
> pr_debug("targ: curr fid 0x%x, vid 0x%x\n",
> @@ -1195,23 +1191,23 @@ static int powernowk8_target(struct cpufreq_policy *pol,
> }
>
> if (cpufreq_frequency_table_target(pol, data->powernow_table,
> - targfreq, relation, &newstate))
> - goto err_out;
> + tw->targfreq, tw->relation, &newstate))
> + return;
>
> mutex_lock(&fidvid_mutex);
>
> powernow_k8_acpi_pst_values(data, newstate);
>
> if (cpu_family == CPU_HW_PSTATE)
> - ret = transition_frequency_pstate(data,
> - data->powernow_table[newstate].index);
> + tw->ret = transition_frequency_pstate(data,
> + data->powernow_table[newstate].index);
> else
> - ret = transition_frequency_fidvid(data, newstate);
> - if (ret) {
> + tw->ret = transition_frequency_fidvid(data, newstate);
> + if (tw->ret) {
> printk(KERN_ERR PFX "transition frequency failed\n");
> - ret = 1;
> + tw->ret = 1;
> mutex_unlock(&fidvid_mutex);
> - goto err_out;
> + return;
> }
> mutex_unlock(&fidvid_mutex);
>
> @@ -1220,12 +1216,33 @@ static int powernowk8_target(struct cpufreq_policy *pol,
> data->powernow_table[newstate].index);
> else
> pol->cur = find_khz_freq_from_fid(data->currfid);
> - ret = 0;
>
> -err_out:
> - set_cpus_allowed_ptr(current, oldmask);
> - free_cpumask_var(oldmask);
> - return ret;
> + tw->ret = 0;
> +}
> +
> +/* Driver entry point to switch to the target frequency */
> +static int powernowk8_target(struct cpufreq_policy *pol,
> + unsigned targfreq, unsigned relation)
> +{
> + struct powernowk8_target_work tw;
> +
> + /*
> + * Must run on @pol->cpu. Bounce to workqueue if necessary.
> + * cpufreq core is responsible for ensuring the cpu stays online.
> + */
> + INIT_WORK_ONSTACK(&tw.work, powernowk8_target_on_cpu);
> + tw.pol = pol;
> + tw.targfreq = targfreq;
> + tw.relation = relation;
> +
> + if (smp_processor_id() == pol->cpu) {
> + powernowk8_target_on_cpu(&tw.work);
> + } else {
> + schedule_work_on(pol->cpu, &tw.work);
> + flush_work(&tw.work);
> + }
> +
> + return tw.ret;
> }
>
> /* Driver entry point to verify the policy and range of frequencies */
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:38 ` Rafael J. Wysocki
@ 2012-09-23 1:53 ` Thomas Renninger
0 siblings, 0 replies; 11+ messages in thread
From: Thomas Renninger @ 2012-09-23 1:53 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Tejun Heo, Andre Przywara, Linus Torvalds, linux-kernel, cpufreq,
linux-pm, Duncan, Andreas Herrmann
Hi,
better late than never..
On Monday 17 September 2012 22:38:20 Rafael J. Wysocki wrote:
> On Monday, September 17, 2012, Tejun Heo wrote:
> > powernowk8_target() runs off a per-cpu work item and if the
> > cpufreq_policy->cpu is different from the current one, it migrates the
> > kworker to the target CPU by manipulating current->cpus_allowed. The
> > function migrates the kworker back to the original CPU but this is
> > still broken. Workqueue concurrency management requires the kworkers
> > to stay on the same CPU and powernowk8_target() ends up triggerring
> > BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
> > fidvid_mutex and sleeps.
> >
> > It is unclear why this bug is being reported now. Duncan says it
> > appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
> > 3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
> > instead of @gcwq or @cpu where applicable" which is an non-functional
> > change. Given that the reproduce case sometimes took upto days to
> > trigger, it's easy to be misled while bisecting. Maybe something made
> > contention on fidvid_mutex more likely? I don't know.
> >
> > This patch fixes the bug by punting to another per-cpu work item on
> > the target CPU if it isn't the same as the current one. The code
> > assumes that cpufreq_policy->cpu is kept online by the caller, which
> > Rafael tells me is the case.
> >
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Reported-by: Duncan <1i5t5.duncan@cox.net>
> > Cc: Rafael J. Wysocki <rjw@sisk.pl>
> > Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
> > Cc: stable@kernel.org
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
> > ---
> >
> > While it's very late in the merge cycle, the fix is limited in scope
> > and fairly safe, so it wouldn't be too crazy to merge but then again
> > this can go through the next -rc1 and then -stable. Linus, Rafael,
> > what do you guys think?
>
> Well, I don't see much reason to wait with this, although I'd like some
> more people to check it.
>
> Andre, Thomas, can you please have a look at it?
The cpufreq changes are not really (functional) changes.
I cannot judge the risk of the real change:
> > + INIT_WORK_ONSTACK(&tw.work, powernowk8_target_on_cpu);
instead of using set_cpus_allowed_ptr.
Changing scheduler behavior of powernow-k8
sounds rather intrusive for rc6, but I would fully trust
Tejun's advise on this.
I wonder whether more drivers are affected similarly, grepping for:
set_cpus_allowed_ptr
shows quite some hits.
My 2 cents...,
Thomas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-17 20:17 [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU Tejun Heo
2012-09-17 20:36 ` Borislav Petkov
2012-09-17 20:38 ` Rafael J. Wysocki
@ 2012-09-18 20:12 ` Tejun Heo
2012-09-18 20:27 ` Linus Torvalds
2012-09-18 20:30 ` [PATCH 3.6-rc6] " Rafael J. Wysocki
2 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2012-09-18 20:12 UTC (permalink / raw)
To: Linus Torvalds, Rafael J. Wysocki
Cc: linux-kernel, cpufreq, linux-pm, Duncan, Andreas Herrmann
So, with work_on_cpu() reimplementation just posted[1], we can do the
following instead. Functionally it's about the same but less ugly.
Ugly as it may be, I think the previous open coded version is better
suited as a fix and for -stable. Thoughts?
Thanks.
[1] https://lkml.org/lkml/2012/9/18/430
drivers/cpufreq/powernow-k8.c | 63 ++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 29 deletions(-)
diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index c0e8164..1a40935 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -35,7 +35,6 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/cpumask.h>
-#include <linux/sched.h> /* for current / set_cpus_allowed() */
#include <linux/io.h>
#include <linux/delay.h>
@@ -1139,16 +1138,23 @@ static int transition_frequency_pstate(struct powernow_k8_data *data,
return res;
}
-/* Driver entry point to switch to the target frequency */
-static int powernowk8_target(struct cpufreq_policy *pol,
- unsigned targfreq, unsigned relation)
+struct powernowk8_target_arg {
+ struct cpufreq_policy *pol;
+ unsigned targfreq;
+ unsigned relation;
+};
+
+static long powernowk8_target_fn(void *arg)
{
- cpumask_var_t oldmask;
+ struct powernowk8_target_arg *pta = arg;
+ struct cpufreq_policy *pol = pta->pol;
+ unsigned targfreq = pta->targfreq;
+ unsigned relation = pta->relation;
struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
u32 checkfid;
u32 checkvid;
unsigned int newstate;
- int ret = -EIO;
+ int ret;
if (!data)
return -EINVAL;
@@ -1156,29 +1162,16 @@ static int powernowk8_target(struct cpufreq_policy *pol,
checkfid = data->currfid;
checkvid = data->currvid;
- /* only run on specific CPU from here on. */
- /* This is poor form: use a workqueue or smp_call_function_single */
- if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
- return -ENOMEM;
-
- cpumask_copy(oldmask, tsk_cpus_allowed(current));
- set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
-
- if (smp_processor_id() != pol->cpu) {
- printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
- goto err_out;
- }
-
if (pending_bit_stuck()) {
printk(KERN_ERR PFX "failing targ, change pending bit set\n");
- goto err_out;
+ return -EIO;
}
pr_debug("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
pol->cpu, targfreq, pol->min, pol->max, relation);
if (query_current_values_with_pending_wait(data))
- goto err_out;
+ return -EIO;
if (cpu_family != CPU_HW_PSTATE) {
pr_debug("targ: curr fid 0x%x, vid 0x%x\n",
@@ -1196,7 +1189,7 @@ static int powernowk8_target(struct cpufreq_policy *pol,
if (cpufreq_frequency_table_target(pol, data->powernow_table,
targfreq, relation, &newstate))
- goto err_out;
+ return -EIO;
mutex_lock(&fidvid_mutex);
@@ -1209,9 +1202,8 @@ static int powernowk8_target(struct cpufreq_policy *pol,
ret = transition_frequency_fidvid(data, newstate);
if (ret) {
printk(KERN_ERR PFX "transition frequency failed\n");
- ret = 1;
mutex_unlock(&fidvid_mutex);
- goto err_out;
+ return 1;
}
mutex_unlock(&fidvid_mutex);
@@ -1220,12 +1212,25 @@ static int powernowk8_target(struct cpufreq_policy *pol,
data->powernow_table[newstate].index);
else
pol->cur = find_khz_freq_from_fid(data->currfid);
- ret = 0;
-err_out:
- set_cpus_allowed_ptr(current, oldmask);
- free_cpumask_var(oldmask);
- return ret;
+ return 0;
+}
+
+/* Driver entry point to switch to the target frequency */
+static int powernowk8_target(struct cpufreq_policy *pol,
+ unsigned targfreq, unsigned relation)
+{
+ struct powernowk8_target_arg pta = { .pol = pol, .targfreq = targfreq,
+ .relation = relation };
+
+ /*
+ * Must run on @pol->cpu. cpufreq core is responsible for ensuring
+ * that we're bound to the current CPU and pol->cpu stays online.
+ */
+ if (smp_processor_id() == pol->cpu)
+ return powernowk8_target_fn(&pta);
+ else
+ return work_on_cpu(pol->cpu, powernowk8_target_fn, &pta);
}
/* Driver entry point to verify the policy and range of frequencies */
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-18 20:12 ` Tejun Heo
@ 2012-09-18 20:27 ` Linus Torvalds
2012-09-18 20:49 ` [PATCH 3.6-rc6 1/2] workqueue: reimplement work_on_cpu() using system_wq Tejun Heo
2012-09-18 20:30 ` [PATCH 3.6-rc6] " Rafael J. Wysocki
1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-09-18 20:27 UTC (permalink / raw)
To: Tejun Heo
Cc: Rafael J. Wysocki, linux-kernel, cpufreq, linux-pm, Duncan,
Andreas Herrmann
On Tue, Sep 18, 2012 at 1:12 PM, Tejun Heo <tj@kernel.org> wrote:
> So, with work_on_cpu() reimplementation just posted[1], we can do the
> following instead. Functionally it's about the same but less ugly.
> Ugly as it may be, I think the previous open coded version is better
> suited as a fix and for -stable. Thoughts?
I have to say, since the work_on_cpu() reimplementation seems to
seriously simplify the code, and removes more lines than it adds, and
makes this patch smaller than your original patch, I would personally
prefer this approach instead anyway.
It's what we want long-range, isn't it? And it's smaller and simpler.
Sure, it might be a *conceptually* bigger change, but since it's both
prettier and *practically* smaller, I do like it more. Even at this
stage of -rc (and even for backporting to -stable).
Can we get some quick testing of this two-patch series from the people
who saw the original K8 cpufreq issue? Duncan?
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 3.6-rc6 1/2] workqueue: reimplement work_on_cpu() using system_wq
2012-09-18 20:27 ` Linus Torvalds
@ 2012-09-18 20:49 ` Tejun Heo
2012-09-18 20:51 ` [PATCH 3.6-rc6 2/2] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2012-09-18 20:49 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, linux-kernel, cpufreq, linux-pm, Duncan,
Andreas Herrmann, Jiri Kosina, Bjorn Helgaas, Len Brown
The existing work_on_cpu() implementation is hugely inefficient. It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.
Now that system_wq can handle concurrent executions, there's no
advantage of doing this. Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.
stable: While this isn't a fix in itself, it's needed to fix a
workqueue related bug in cpufreq/powernow-k8. AFAICS, this
shouldn't break other existing users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
---
So, here are the two patches which can be applied to 3.6-rc6. I'll
post a combined patch to the bugzilla so that Duncan can test it.
The only worrying thing is that this might affect the existing
work_on_cpu() users in some crazy subtle way. I can't see how it
would break anything but it's when I think like that when something
bites me. That said, pci-driver is probably the most common use case
and it seems to work fine here.
Thanks.
kernel/workqueue.c | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3576,18 +3576,17 @@ static int __devinit workqueue_cpu_down_
#ifdef CONFIG_SMP
struct work_for_cpu {
- struct completion completion;
+ struct work_struct work;
long (*fn)(void *);
void *arg;
long ret;
};
-static int do_work_for_cpu(void *_wfc)
+static void work_for_cpu_fn(struct work_struct *work)
{
- struct work_for_cpu *wfc = _wfc;
+ struct work_for_cpu *wfc = container_of(work, struct work_for_cpu, work);
+
wfc->ret = wfc->fn(wfc->arg);
- complete(&wfc->completion);
- return 0;
}
/**
@@ -3602,19 +3601,11 @@ static int do_work_for_cpu(void *_wfc)
*/
long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
{
- struct task_struct *sub_thread;
- struct work_for_cpu wfc = {
- .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion),
- .fn = fn,
- .arg = arg,
- };
-
- sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu");
- if (IS_ERR(sub_thread))
- return PTR_ERR(sub_thread);
- kthread_bind(sub_thread, cpu);
- wake_up_process(sub_thread);
- wait_for_completion(&wfc.completion);
+ struct work_for_cpu wfc = { .fn = fn, .arg = arg };
+
+ INIT_WORK_ONSTACK(&wfc.work, work_for_cpu_fn);
+ schedule_work_on(cpu, &wfc.work);
+ flush_work(&wfc.work);
return wfc.ret;
}
EXPORT_SYMBOL_GPL(work_on_cpu);
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 3.6-rc6 2/2] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-18 20:49 ` [PATCH 3.6-rc6 1/2] workqueue: reimplement work_on_cpu() using system_wq Tejun Heo
@ 2012-09-18 20:51 ` Tejun Heo
0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2012-09-18 20:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, linux-kernel, cpufreq, linux-pm, Duncan,
Andreas Herrmann, Jiri Kosina, Bjorn Helgaas, Len Brown
powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed. The
function migrates the kworker back to the original CPU but this is
still broken. Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.
It is unclear why this bug is being reported now. Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change. Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting. Maybe something made
contention on fidvid_mutex more likely? I don't know.
This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one. The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.
stable: This needs the patch "workqueue: reimplement work_on_cpu()
using system_wq"; otherwise, the behavior could be horrible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
---
drivers/cpufreq/powernow-k8.c | 63 ++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 29 deletions(-)
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -35,7 +35,6 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/cpumask.h>
-#include <linux/sched.h> /* for current / set_cpus_allowed() */
#include <linux/io.h>
#include <linux/delay.h>
@@ -1139,16 +1138,23 @@ static int transition_frequency_pstate(s
return res;
}
-/* Driver entry point to switch to the target frequency */
-static int powernowk8_target(struct cpufreq_policy *pol,
- unsigned targfreq, unsigned relation)
+struct powernowk8_target_arg {
+ struct cpufreq_policy *pol;
+ unsigned targfreq;
+ unsigned relation;
+};
+
+static long powernowk8_target_fn(void *arg)
{
- cpumask_var_t oldmask;
+ struct powernowk8_target_arg *pta = arg;
+ struct cpufreq_policy *pol = pta->pol;
+ unsigned targfreq = pta->targfreq;
+ unsigned relation = pta->relation;
struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
u32 checkfid;
u32 checkvid;
unsigned int newstate;
- int ret = -EIO;
+ int ret;
if (!data)
return -EINVAL;
@@ -1156,29 +1162,16 @@ static int powernowk8_target(struct cpuf
checkfid = data->currfid;
checkvid = data->currvid;
- /* only run on specific CPU from here on. */
- /* This is poor form: use a workqueue or smp_call_function_single */
- if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
- return -ENOMEM;
-
- cpumask_copy(oldmask, tsk_cpus_allowed(current));
- set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
-
- if (smp_processor_id() != pol->cpu) {
- printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
- goto err_out;
- }
-
if (pending_bit_stuck()) {
printk(KERN_ERR PFX "failing targ, change pending bit set\n");
- goto err_out;
+ return -EIO;
}
pr_debug("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
pol->cpu, targfreq, pol->min, pol->max, relation);
if (query_current_values_with_pending_wait(data))
- goto err_out;
+ return -EIO;
if (cpu_family != CPU_HW_PSTATE) {
pr_debug("targ: curr fid 0x%x, vid 0x%x\n",
@@ -1196,7 +1189,7 @@ static int powernowk8_target(struct cpuf
if (cpufreq_frequency_table_target(pol, data->powernow_table,
targfreq, relation, &newstate))
- goto err_out;
+ return -EIO;
mutex_lock(&fidvid_mutex);
@@ -1209,9 +1202,8 @@ static int powernowk8_target(struct cpuf
ret = transition_frequency_fidvid(data, newstate);
if (ret) {
printk(KERN_ERR PFX "transition frequency failed\n");
- ret = 1;
mutex_unlock(&fidvid_mutex);
- goto err_out;
+ return 1;
}
mutex_unlock(&fidvid_mutex);
@@ -1220,12 +1212,25 @@ static int powernowk8_target(struct cpuf
data->powernow_table[newstate].index);
else
pol->cur = find_khz_freq_from_fid(data->currfid);
- ret = 0;
-err_out:
- set_cpus_allowed_ptr(current, oldmask);
- free_cpumask_var(oldmask);
- return ret;
+ return 0;
+}
+
+/* Driver entry point to switch to the target frequency */
+static int powernowk8_target(struct cpufreq_policy *pol,
+ unsigned targfreq, unsigned relation)
+{
+ struct powernowk8_target_arg pta = { .pol = pol, .targfreq = targfreq,
+ .relation = relation };
+
+ /*
+ * Must run on @pol->cpu. cpufreq core is responsible for ensuring
+ * that we're bound to the current CPU and pol->cpu stays online.
+ */
+ if (smp_processor_id() == pol->cpu)
+ return powernowk8_target_fn(&pta);
+ else
+ return work_on_cpu(pol->cpu, powernowk8_target_fn, &pta);
}
/* Driver entry point to verify the policy and range of frequencies */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3.6-rc6] cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
2012-09-18 20:12 ` Tejun Heo
2012-09-18 20:27 ` Linus Torvalds
@ 2012-09-18 20:30 ` Rafael J. Wysocki
1 sibling, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2012-09-18 20:30 UTC (permalink / raw)
To: Tejun Heo
Cc: Linus Torvalds, linux-kernel, cpufreq, linux-pm, Duncan,
Andreas Herrmann
On Tuesday, September 18, 2012, Tejun Heo wrote:
> So, with work_on_cpu() reimplementation just posted[1], we can do the
> following instead. Functionally it's about the same but less ugly.
> Ugly as it may be, I think the previous open coded version is better
> suited as a fix and for -stable. Thoughts?
I'd prefer this one to go into v3.7 along with the work_on_cpu() patch
and then the open-coded version to go to -stable after the mainline one
has got some testing coverage.
Thanks,
Rafael
> [1] https://lkml.org/lkml/2012/9/18/430
>
> drivers/cpufreq/powernow-k8.c | 63 ++++++++++++++++++++++--------------------
> 1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
> index c0e8164..1a40935 100644
> --- a/drivers/cpufreq/powernow-k8.c
> +++ b/drivers/cpufreq/powernow-k8.c
> @@ -35,7 +35,6 @@
> #include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/cpumask.h>
> -#include <linux/sched.h> /* for current / set_cpus_allowed() */
> #include <linux/io.h>
> #include <linux/delay.h>
>
> @@ -1139,16 +1138,23 @@ static int transition_frequency_pstate(struct powernow_k8_data *data,
> return res;
> }
>
> -/* Driver entry point to switch to the target frequency */
> -static int powernowk8_target(struct cpufreq_policy *pol,
> - unsigned targfreq, unsigned relation)
> +struct powernowk8_target_arg {
> + struct cpufreq_policy *pol;
> + unsigned targfreq;
> + unsigned relation;
> +};
> +
> +static long powernowk8_target_fn(void *arg)
> {
> - cpumask_var_t oldmask;
> + struct powernowk8_target_arg *pta = arg;
> + struct cpufreq_policy *pol = pta->pol;
> + unsigned targfreq = pta->targfreq;
> + unsigned relation = pta->relation;
> struct powernow_k8_data *data = per_cpu(powernow_data, pol->cpu);
> u32 checkfid;
> u32 checkvid;
> unsigned int newstate;
> - int ret = -EIO;
> + int ret;
>
> if (!data)
> return -EINVAL;
> @@ -1156,29 +1162,16 @@ static int powernowk8_target(struct cpufreq_policy *pol,
> checkfid = data->currfid;
> checkvid = data->currvid;
>
> - /* only run on specific CPU from here on. */
> - /* This is poor form: use a workqueue or smp_call_function_single */
> - if (!alloc_cpumask_var(&oldmask, GFP_KERNEL))
> - return -ENOMEM;
> -
> - cpumask_copy(oldmask, tsk_cpus_allowed(current));
> - set_cpus_allowed_ptr(current, cpumask_of(pol->cpu));
> -
> - if (smp_processor_id() != pol->cpu) {
> - printk(KERN_ERR PFX "limiting to cpu %u failed\n", pol->cpu);
> - goto err_out;
> - }
> -
> if (pending_bit_stuck()) {
> printk(KERN_ERR PFX "failing targ, change pending bit set\n");
> - goto err_out;
> + return -EIO;
> }
>
> pr_debug("targ: cpu %d, %d kHz, min %d, max %d, relation %d\n",
> pol->cpu, targfreq, pol->min, pol->max, relation);
>
> if (query_current_values_with_pending_wait(data))
> - goto err_out;
> + return -EIO;
>
> if (cpu_family != CPU_HW_PSTATE) {
> pr_debug("targ: curr fid 0x%x, vid 0x%x\n",
> @@ -1196,7 +1189,7 @@ static int powernowk8_target(struct cpufreq_policy *pol,
>
> if (cpufreq_frequency_table_target(pol, data->powernow_table,
> targfreq, relation, &newstate))
> - goto err_out;
> + return -EIO;
>
> mutex_lock(&fidvid_mutex);
>
> @@ -1209,9 +1202,8 @@ static int powernowk8_target(struct cpufreq_policy *pol,
> ret = transition_frequency_fidvid(data, newstate);
> if (ret) {
> printk(KERN_ERR PFX "transition frequency failed\n");
> - ret = 1;
> mutex_unlock(&fidvid_mutex);
> - goto err_out;
> + return 1;
> }
> mutex_unlock(&fidvid_mutex);
>
> @@ -1220,12 +1212,25 @@ static int powernowk8_target(struct cpufreq_policy *pol,
> data->powernow_table[newstate].index);
> else
> pol->cur = find_khz_freq_from_fid(data->currfid);
> - ret = 0;
>
> -err_out:
> - set_cpus_allowed_ptr(current, oldmask);
> - free_cpumask_var(oldmask);
> - return ret;
> + return 0;
> +}
> +
> +/* Driver entry point to switch to the target frequency */
> +static int powernowk8_target(struct cpufreq_policy *pol,
> + unsigned targfreq, unsigned relation)
> +{
> + struct powernowk8_target_arg pta = { .pol = pol, .targfreq = targfreq,
> + .relation = relation };
> +
> + /*
> + * Must run on @pol->cpu. cpufreq core is responsible for ensuring
> + * that we're bound to the current CPU and pol->cpu stays online.
> + */
> + if (smp_processor_id() == pol->cpu)
> + return powernowk8_target_fn(&pta);
> + else
> + return work_on_cpu(pol->cpu, powernowk8_target_fn, &pta);
> }
>
> /* Driver entry point to verify the policy and range of frequencies */
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread