All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <bsingharora@gmail.com>
To: Michael Ellerman <mpe@ellerman.id.au>, Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	jiangshanlai@gmail.com, akpm@linux-foundation.org,
	kernel-team@fb.com,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)
Date: Mon, 17 Oct 2016 23:51:30 +1100	[thread overview]
Message-ID: <cdbc6901-9183-c2ff-1690-e909381c7956@gmail.com> (raw)
In-Reply-To: <87eg3fcge5.fsf@concordia.ellerman.id.au>



On 17/10/16 23:24, Michael Ellerman wrote:
> Tejun Heo <tj@kernel.org> writes:
> 
>> Hello, Michael.
>>
>> On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote:
>>> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq
>>> is NULL.
>>>
>>> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is
>>> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in
>>> set_task_rq() and happen to get NULL.
>>>
>>> We never should have done set_task_rq(p, 2048), because 2048 is >=
>>> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq()
>>> doesn't cope with that.
>>
>> Hmm... it doesn't reproduce it here and can't see how the commit would
>> affect this given that it doesn't really change when the kworker
>> kthreads are being created.
> 
> It changes when the pool attributes are created, which is the source of
> the bug.
> 
> The original crash happens because we have a task with an empty cpus_allowed
> mask. That mask originally comes from pool->attrs->cpumask.
> 
> The attrs for the pool are created early via workqueue_init_early() in
> apply_wqattrs_prepare():
> 
>   start_here_common
>   -> start_kernel
>      -> workqueue_init_early
>         -> __alloc_workqueue_key
>            -> apply_workqueue_attrs
>               -> apply_workqueue_attrs_locked
>                  -> apply_wqattrs_prepare
> 	          
> In there we do:
> 
> 	copy_workqueue_attrs(new_attrs, attrs);
> 	cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
> 	if (unlikely(cpumask_empty(new_attrs->cpumask)))
> 		cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask);
> 	...
> 	copy_workqueue_attrs(tmp_attrs, new_attrs);
> 	...
> 	for_each_node(node) {
> 		if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
> +			BUG_ON(cpumask_empty(tmp_attrs->cpumask));
> 			ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
> 
> 
> The bad case (where we hit the BUG_ON I added above) is where we are
> creating a wq for node 1.
> 
> In wq_calc_node_cpumask() we do:
> 
> 	cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
> 	return !cpumask_equal(cpumask, attrs->cpumask);
> 
> Which with the arguments inserted is:
> 
> 	cpumask_and(tmp_attrs->cpumask, new_attrs->cpumask, wq_numa_possible_cpumask[1]);
> 	return !cpumask_equal(tmp_attrs->cpumask, new_attrs->cpumask);
> 
> And that results in tmp_attrs->cpumask being empty, because
> wq_numa_possible_cpumask[1] is an empty cpumask.
> 
> The reason wq_numa_possible_cpumask[1] is an empty mask is because in
> wq_numa_init() we did:
> 
> 	for_each_possible_cpu(cpu) {
> 		node = cpu_to_node(cpu);
> 		if (WARN_ON(node == NUMA_NO_NODE)) {
> 			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
> 			/* happens iff arch is bonkers, let's just proceed */
> 			return;
> 		}
> 		cpumask_set_cpu(cpu, tbl[node]);
> 	}
> 
> And cpu_to_node() returned node 0 for every CPU in the system, despite there
> being multiple nodes.
> 
> That happened because we haven't yet called set_cpu_numa_node() for the non-boot
> cpus, because that happens in smp_prepare_cpus(), and
> workqueue_init_early() is called much earlier than that.
> 
> This doesn't trigger on x86 because it does set_cpu_numa_node() in
> setup_per_cpu_areas(), which is called prior to workqueue_init_early().
> 
> We can (should) probably do the same on powerpc, I'll look at that
> tomorrow. But other arches may have a similar problem, and at the very
> least we need to document that workqueue_init_early() relies on
> cpu_to_node() working.

Don't we do the setup cpu->node mapings in initmem_init()?
Ideally we have setup_arch->intmem_init->numa_setup_cpu

Will look at it tomorrow
Balbir Singh

  reply	other threads:[~2016-10-17 12:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-15 19:30 [PATCHSET wq/for-4.9] workqueue: make workqueue available very early during boot Tejun Heo
2016-09-15 19:30 ` [PATCH 1/7] workqueue: make workqueue available " Tejun Heo
2016-09-17 17:23   ` [PATCH v2 " Tejun Heo
2016-10-10 10:22     ` Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot) Michael Ellerman
2016-10-10 11:17       ` Balbir Singh
2016-10-10 12:53         ` Tejun Heo
2016-10-10 13:22           ` Balbir Singh
2016-10-10 13:02       ` Tejun Heo
2016-10-10 13:14         ` Tejun Heo
2016-10-11 11:22         ` Michael Ellerman
2016-10-11 12:21           ` Balbir Singh
2016-10-14 15:08             ` Tejun Heo
2016-10-15  3:43               ` Balbir Singh
2016-10-14 15:07           ` Tejun Heo
2016-10-15  1:25             ` Balbir Singh
2016-10-15  9:48             ` Michael Ellerman
2016-10-17 18:13               ` Tejun Heo
2016-10-17 12:24             ` Michael Ellerman
2016-10-17 12:51               ` Balbir Singh [this message]
2016-10-18  2:35                 ` Michael Ellerman
2016-10-17 18:15               ` Tejun Heo
2016-10-17 19:30                 ` Tejun Heo
2016-10-18  4:37                   ` Michael Ellerman
2016-10-18 18:58                     ` Tejun Heo
2016-10-19 11:16                       ` Michael Ellerman
2016-10-19 16:15                         ` [PATCH wq/for-4.10] workqueue: move wq_numa_init() to workqueue_init() Tejun Heo
2016-09-15 19:30 ` [PATCH 2/7] mce, workqueue: remove keventd_up() usage Tejun Heo
2016-09-17  7:56   ` Borislav Petkov
2016-09-17 17:24     ` Tejun Heo
2016-09-17 20:26       ` Borislav Petkov
2016-09-15 19:30 ` [PATCH 3/7] tty, " Tejun Heo
2016-09-15 19:30 ` [PATCH 4/7] power, " Tejun Heo
2016-09-15 23:55   ` Rafael J. Wysocki
2016-09-15 19:30 ` [PATCH 5/7] slab, " Tejun Heo
2016-09-15 19:30   ` Tejun Heo
2016-09-22  8:01   ` Joonsoo Kim
2016-09-22  8:01     ` Joonsoo Kim
2016-09-15 19:30 ` [PATCH 6/7] debugobj, " Tejun Heo
2016-09-15 21:19   ` Thomas Gleixner
2016-09-15 19:30 ` [PATCH 7/7] workqueue: remove keventd_up() Tejun Heo
2016-09-15 19:51 ` [PATCHSET wq/for-4.9] workqueue: make workqueue available very early during boot Linus Torvalds
2016-09-16 19:51 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdbc6901-9183-c2ff-1690-e909381c7956@gmail.com \
    --to=bsingharora@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.