public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: Subject: Warning in workqueue.c
Date: Mon, 17 Mar 2014 10:51:45 -0400	[thread overview]
Message-ID: <53270C01.6090209@linux.vnet.ibm.com> (raw)
In-Reply-To: <531DCE36.8010906@linux.vnet.ibm.com>

On 03/10/2014 10:37 AM, Jason J. Herne wrote:
> On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
>> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>>
>>> That's a bummer but it at least isn't a very new regression.  Peter,
>>> any ideas on debugging this?  I can make workqueue to play block /
>>> unblock dance to try to work around the issue but that'd be very
>>> yucky.  It'd be great to root cause where the cpu selection anomaly is
>>> coming from.
>>
>> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
>> CPUs; the below adds some error paths to that code. In particular we
>> propagate the __migrate_task() fail (returns the number of tasks
>> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>>
>> This way we can see if there was a problem with the migration.
>>
>> You should be able to now reliably use the return value of
>> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
>> allowed mask.
>>
>> I've also included an #if 0 retry loop for the fail case; but I suspect
>> that that might end up deadlocking your machine if you hit that just
>> wrong, something like the waking CPU endlessly trying to migrate the
>> task over while the wakee CPU is waiting for completion of something
>> from the waking CPU.
>>
>> But its worth a prod I suppose.
>>
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 84b23cec0aeb..4c384efac8b3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct
>> *p, const struct cpumask *new_mask)
>>
>>       do_set_cpus_allowed(p, new_mask);
>>
>> +again: __maybe_unused
>> +
>>       /* Can the task run on the task's current CPU? If so, we're done */
>> -    if (cpumask_test_cpu(task_cpu(p), new_mask))
>> +    if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>>           goto out;
>>
>> -    dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> +    dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>>       if (p->on_rq) {
>>           struct migration_arg arg = { p, dest_cpu };
>> +
>>           /* Need help from migration thread: drop lock and wait. */
>>           task_rq_unlock(rq, p, &flags);
>> -        stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +        ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +#if 0
>> +        if (ret) {
>> +            rq = task_rq_lock(p, &flags);
>> +            goto again;
>> +        }
>> +#endif
>>           tlb_migrate_finish(p->mm);
>> -        return 0;
>> +
>> +        return ret;
>>       }
>>   out:
>>       task_rq_unlock(rq, p, &flags);
>> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int
>> nid)
>>   static int migration_cpu_stop(void *data)
>>   {
>>       struct migration_arg *arg = data;
>> +    int ret = 0;
>>
>>       /*
>>        * The original target cpu might have gone down and we might
>>        * be on another cpu but it doesn't matter.
>>        */
>>       local_irq_disable();
>> -    __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>> +    if (!__migrate_task(arg->task, raw_smp_processor_id(),
>> arg->dest_cpu))
>> +        ret = -EAGAIN;
>>       local_irq_enable();
>> -    return 0;
>> +
>> +    return ret;
>>   }
>>
>>   #ifdef CONFIG_HOTPLUG_CPU
>>
>
> Peter,
>
> Did you intend for me to run with this patch or was it posted for
> discussion only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either
> the last one you posted in this thread or any of Tejun's?
>
> Thanks.
>

Ping?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)


  reply	other threads:[~2014-03-17 14:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-07 14:39 Subject: Warning in workqueue.c Jason J. Herne
2014-02-07 16:51 ` Tejun Heo
2014-02-07 17:55   ` Jason J. Herne
2014-02-07 19:36     ` Tejun Heo
2014-02-10 15:32       ` Jason J. Herne
2014-02-10 23:17         ` Tejun Heo
2014-02-12 15:18           ` Jason J. Herne
2014-02-13  3:02             ` Lai Jiangshan
2014-02-13  3:31             ` Lai Jiangshan
2014-02-13 17:58               ` Jason J. Herne
2014-02-13 20:41                 ` Tejun Heo
2014-02-14 14:56                   ` Jason J. Herne
2014-02-14 14:58                     ` Tejun Heo
2014-02-14 16:09                   ` Peter Zijlstra
2014-02-14 16:25                     ` Tejun Heo
2014-02-14 16:28                       ` Peter Zijlstra
2014-02-14 16:38                         ` Tejun Heo
2014-02-24 15:01                       ` Jason J. Herne
2014-02-24 18:35                         ` Tejun Heo
2014-02-25 10:37                           ` Peter Zijlstra
2014-03-10 14:37                             ` Jason J. Herne
2014-03-17 14:51                               ` Jason J. Herne [this message]
2014-03-17 15:16                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53270C01.6090209@linux.vnet.ibm.com \
    --to=jjherne@linux.vnet.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox