From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: Subject: Warning in workqueue.c
Date: Mon, 10 Mar 2014 10:37:42 -0400 [thread overview]
Message-ID: <531DCE36.8010906@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140225103726.GJ9987@twins.programming.kicks-ass.net>
On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>
>> That's a bummer but it at least isn't a very new regression. Peter,
>> any ideas on debugging this? I can make workqueue to play block /
>> unblock dance to try to work around the issue but that'd be very
>> yucky. It'd be great to root cause where the cpu selection anomaly is
>> coming from.
>
> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
> CPUs; the below adds some error paths to that code. In particular we
> propagate the __migrate_task() fail (returns the number of tasks
> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>
> This way we can see if there was a problem with the migration.
>
> You should be able to now reliably use the return value of
> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
> allowed mask.
>
> I've also included an #if 0 retry loop for the fail case; but I suspect
> that that might end up deadlocking your machine if you hit that just
> wrong, something like the waking CPU endlessly trying to migrate the
> task over while the wakee CPU is waiting for completion of something
> from the waking CPU.
>
> But its worth a prod I suppose.
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 84b23cec0aeb..4c384efac8b3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>
> do_set_cpus_allowed(p, new_mask);
>
> +again: __maybe_unused
> +
> /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpumask_test_cpu(task_cpu(p), new_mask))
> + if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
> goto out;
>
> - dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> + dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
> if (p->on_rq) {
> struct migration_arg arg = { p, dest_cpu };
> +
> /* Need help from migration thread: drop lock and wait. */
> task_rq_unlock(rq, p, &flags);
> - stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> + ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +#if 0
> + if (ret) {
> + rq = task_rq_lock(p, &flags);
> + goto again;
> + }
> +#endif
> tlb_migrate_finish(p->mm);
> - return 0;
> +
> + return ret;
> }
> out:
> task_rq_unlock(rq, p, &flags);
> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int nid)
> static int migration_cpu_stop(void *data)
> {
> struct migration_arg *arg = data;
> + int ret = 0;
>
> /*
> * The original target cpu might have gone down and we might
> * be on another cpu but it doesn't matter.
> */
> local_irq_disable();
> - __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
> + if (!__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu))
> + ret = -EAGAIN;
> local_irq_enable();
> - return 0;
> +
> + return ret;
> }
>
> #ifdef CONFIG_HOTPLUG_CPU
>
Peter,
Did you intend for me to run with this patch or was it posted for
discussion only? If you want it run, please tell me what to look for.
Also, if I should run this, should I include any other patches, either
the last one you posted in this thread or any of Tejun's?
Thanks.
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
next prev parent reply other threads:[~2014-03-10 14:37 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55 ` Jason J. Herne
2014-02-07 19:36 ` Tejun Heo
2014-02-10 15:32 ` Jason J. Herne
2014-02-10 23:17 ` Tejun Heo
2014-02-12 15:18 ` Jason J. Herne
2014-02-13 3:02 ` Lai Jiangshan
2014-02-13 3:31 ` Lai Jiangshan
2014-02-13 17:58 ` Jason J. Herne
2014-02-13 20:41 ` Tejun Heo
2014-02-14 14:56 ` Jason J. Herne
2014-02-14 14:58 ` Tejun Heo
2014-02-14 16:09 ` Peter Zijlstra
2014-02-14 16:25 ` Tejun Heo
2014-02-14 16:28 ` Peter Zijlstra
2014-02-14 16:38 ` Tejun Heo
2014-02-24 15:01 ` Jason J. Herne
2014-02-24 18:35 ` Tejun Heo
2014-02-25 10:37 ` Peter Zijlstra
2014-03-10 14:37 ` Jason J. Herne [this message]
2014-03-17 14:51 ` Jason J. Herne
2014-03-17 15:16 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=531DCE36.8010906@linux.vnet.ibm.com \
--to=jjherne@linux.vnet.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox