From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: Subject: Warning in workqueue.c
Date: Mon, 17 Mar 2014 10:51:45 -0400 [thread overview]
Message-ID: <53270C01.6090209@linux.vnet.ibm.com> (raw)
In-Reply-To: <531DCE36.8010906@linux.vnet.ibm.com>
On 03/10/2014 10:37 AM, Jason J. Herne wrote:
> On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
>> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>>
>>> That's a bummer but it at least isn't a very new regression. Peter,
>>> any ideas on debugging this? I can make workqueue to play block /
>>> unblock dance to try to work around the issue but that'd be very
>>> yucky. It'd be great to root cause where the cpu selection anomaly is
>>> coming from.
>>
>> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
>> CPUs; the below adds some error paths to that code. In particular we
>> propagate the __migrate_task() fail (returns the number of tasks
>> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>>
>> This way we can see if there was a problem with the migration.
>>
>> You should be able to now reliably use the return value of
>> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
>> allowed mask.
>>
>> I've also included an #if 0 retry loop for the fail case; but I suspect
>> that that might end up deadlocking your machine if you hit that just
>> wrong, something like the waking CPU endlessly trying to migrate the
>> task over while the wakee CPU is waiting for completion of something
>> from the waking CPU.
>>
>> But its worth a prod I suppose.
>>
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 84b23cec0aeb..4c384efac8b3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct
>> *p, const struct cpumask *new_mask)
>>
>> do_set_cpus_allowed(p, new_mask);
>>
>> +again: __maybe_unused
>> +
>> /* Can the task run on the task's current CPU? If so, we're done */
>> - if (cpumask_test_cpu(task_cpu(p), new_mask))
>> + if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>> goto out;
>>
>> - dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> + dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>> if (p->on_rq) {
>> struct migration_arg arg = { p, dest_cpu };
>> +
>> /* Need help from migration thread: drop lock and wait. */
>> task_rq_unlock(rq, p, &flags);
>> - stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> + ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +#if 0
>> + if (ret) {
>> + rq = task_rq_lock(p, &flags);
>> + goto again;
>> + }
>> +#endif
>> tlb_migrate_finish(p->mm);
>> - return 0;
>> +
>> + return ret;
>> }
>> out:
>> task_rq_unlock(rq, p, &flags);
>> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int
>> nid)
>> static int migration_cpu_stop(void *data)
>> {
>> struct migration_arg *arg = data;
>> + int ret = 0;
>>
>> /*
>> * The original target cpu might have gone down and we might
>> * be on another cpu but it doesn't matter.
>> */
>> local_irq_disable();
>> - __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>> + if (!__migrate_task(arg->task, raw_smp_processor_id(),
>> arg->dest_cpu))
>> + ret = -EAGAIN;
>> local_irq_enable();
>> - return 0;
>> +
>> + return ret;
>> }
>>
>> #ifdef CONFIG_HOTPLUG_CPU
>>
>
> Peter,
>
> Did you intend for me to run with this patch or was it posted for
> discussion only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either
> the last one you posted in this thread or any of Tejun's?
>
> Thanks.
>
Ping?
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
next prev parent reply other threads:[~2014-03-17 14:51 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55 ` Jason J. Herne
2014-02-07 19:36 ` Tejun Heo
2014-02-10 15:32 ` Jason J. Herne
2014-02-10 23:17 ` Tejun Heo
2014-02-12 15:18 ` Jason J. Herne
2014-02-13 3:02 ` Lai Jiangshan
2014-02-13 3:31 ` Lai Jiangshan
2014-02-13 17:58 ` Jason J. Herne
2014-02-13 20:41 ` Tejun Heo
2014-02-14 14:56 ` Jason J. Herne
2014-02-14 14:58 ` Tejun Heo
2014-02-14 16:09 ` Peter Zijlstra
2014-02-14 16:25 ` Tejun Heo
2014-02-14 16:28 ` Peter Zijlstra
2014-02-14 16:38 ` Tejun Heo
2014-02-24 15:01 ` Jason J. Herne
2014-02-24 18:35 ` Tejun Heo
2014-02-25 10:37 ` Peter Zijlstra
2014-03-10 14:37 ` Jason J. Herne
2014-03-17 14:51 ` Jason J. Herne [this message]
2014-03-17 15:16 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53270C01.6090209@linux.vnet.ibm.com \
--to=jjherne@linux.vnet.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox