From: Peter Zijlstra <peterz@infradead.org>
To: Sachin Sant <sachinp@in.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Linux/PPC Development <linuxppc-dev@ozlabs.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>,
linux-next@vger.kernel.org
Subject: Re: [Next] CPU Hotplug test failures on powerpc
Date: Tue, 15 Dec 2009 11:43:47 +0100 [thread overview]
Message-ID: <1260873827.4165.362.camel@twins> (raw)
In-Reply-To: <4B275A6B.9030200@in.ibm.com>
On Tue, 2009-12-15 at 15:14 +0530, Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
> >> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> >> {
> >> int dest_cpu;
> >> const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
> >>
> >> again:
> >> /* Look for allowed, online CPU in same node. */
> >> for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
> >> if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> >> goto move;
> >>
> >> /* Any allowed, online CPU? */
> >> dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
> >> if (dest_cpu < nr_cpu_ids)
> >> goto move;
> >>
> >> /* No more Mr. Nice Guy. */
> >> if (dest_cpu >= nr_cpu_ids) {
> >> cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> >> ====> dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
> >>
> >> /*
> >> * Don't tell them about moving exiting tasks or
> >> * kernel threads (both mm NULL), since they never
> >> * leave kernel.
> >> */
> >> if (p->mm && printk_ratelimit()) {
> >> pr_info("process %d (%s) no longer affine to cpu%d\n",
> >> task_pid_nr(p), p->comm, dead_cpu);
> >> }
> >> }
> >>
> >> move:
> >> /* It can have affinity changed while we were choosing. */
> >> if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
> >> goto again;
> >> }
> >>
> >> Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
> >> won't go away since we hold the tasklist_lock (in migrate_list_tasks),
> >> and cpu_active_mask is static storage, so WTH is it going funny on?
> >>
> I added some debug statements within the above code.
> This is a 2 cpu machine.
>
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
>
> Seems to me that the control is stuck in an infinite loop and hence the
> machine appears to be in hung state. The dest_cpu value is always 1024
> and never changes, which result in an infinite loop.
>
> In working scenario the o/p is something on the following lines
>
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
>
> Let me know if i should try to record any specific value ?
Could you possibly print the two masks themselves? cpumask_scnprintf()
and friend come in handy for this.
The dest_cpu=1024 thing seem to suggest the intersection between
p->cpus_allowed and cpu_active_mask is empty for some reason, even
though we forcefully reset p->cpus_allowed to the full set using
cpuset_cpus_allowed_locked().
/me goes re-read the cpu_active_map code, this really shouldn't happen.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Sachin Sant <sachinp@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>,
linux-next@vger.kernel.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Linux/PPC Development <linuxppc-dev@ozlabs.org>
Subject: Re: [Next] CPU Hotplug test failures on powerpc
Date: Tue, 15 Dec 2009 11:43:47 +0100 [thread overview]
Message-ID: <1260873827.4165.362.camel@twins> (raw)
In-Reply-To: <4B275A6B.9030200@in.ibm.com>
On Tue, 2009-12-15 at 15:14 +0530, Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
> >> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p=
)
> >> {
> >> int dest_cpu;
> >> const struct cpumask *nodemask =3D cpumask_of_node(cpu_to_node=
(dead_cpu));
> >>
> >> again:
> >> /* Look for allowed, online CPU in same node. */
> >> for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
> >> if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> >> goto move;
> >>
> >> /* Any allowed, online CPU? */
> >> dest_cpu =3D cpumask_any_and(&p->cpus_allowed, cpu_active_mask=
);
> >> if (dest_cpu < nr_cpu_ids)
> >> goto move;
> >>
> >> /* No more Mr. Nice Guy. */
> >> if (dest_cpu >=3D nr_cpu_ids) {
> >> cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> >> =3D=3D=3D=3D> dest_cpu =3D cpumask_any_and(cpu_active_mask, =
&p->cpus_allowed);
> >>
> >> /*
> >> * Don't tell them about moving exiting tasks or
> >> * kernel threads (both mm NULL), since they never
> >> * leave kernel.
> >> */
> >> if (p->mm && printk_ratelimit()) {
> >> pr_info("process %d (%s) no longer affine to c=
pu%d\n",
> >> task_pid_nr(p), p->comm, dead_cpu);
> >> }
> >> }
> >>
> >> move:
> >> /* It can have affinity changed while we were choosing. */
> >> if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
> >> goto again;
> >> }
> >>
> >> Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
> >> won't go away since we hold the tasklist_lock (in migrate_list_tasks),
> >> and cpu_active_mask is static storage, so WTH is it going funny on?
> >> =20
> I added some debug statements within the above code.=20
> This is a 2 cpu machine.
>=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 1024=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 1024=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 1024=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1
>=20
> Seems to me that the control is stuck in an infinite loop and hence the
> machine appears to be in hung state. The dest_cpu value is always 1024
> and never changes, which result in an infinite loop.
>=20
> In working scenario the o/p is something on the following lines
>=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 0=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 0=20
> XMON dest_cpu =3D 1024 . dead_cpu =3D 1 . nr_cpu_ids =3D 2
> XMON dest_cpu =3D 0=20
>=20
> Let me know if i should try to record any specific value ?
Could you possibly print the two masks themselves? cpumask_scnprintf()
and friend come in handy for this.
The dest_cpu=3D1024 thing seem to suggest the intersection between
p->cpus_allowed and cpu_active_mask is empty for some reason, even
though we forcefully reset p->cpus_allowed to the full set using
cpuset_cpus_allowed_locked().
/me goes re-read the cpu_active_map code, this really shouldn't happen.
next prev parent reply other threads:[~2009-12-15 10:43 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-11 10:53 [Next] CPU Hotplug test failures on powerpc Sachin Sant
2009-12-11 10:53 ` Sachin Sant
2009-12-14 2:48 ` Benjamin Herrenschmidt
2009-12-14 2:48 ` Benjamin Herrenschmidt
2009-12-14 4:37 ` Sachin Sant
2009-12-14 4:37 ` Sachin Sant
2009-12-14 10:22 ` Peter Zijlstra
2009-12-14 10:22 ` Peter Zijlstra
2009-12-14 11:11 ` Sachin Sant
2009-12-14 11:11 ` Sachin Sant
2009-12-14 11:11 ` Sachin Sant
2009-12-14 12:19 ` Peter Zijlstra
2009-12-14 12:19 ` Peter Zijlstra
2009-12-14 21:17 ` Benjamin Herrenschmidt
2009-12-14 21:17 ` Benjamin Herrenschmidt
2009-12-15 9:44 ` Sachin Sant
2009-12-15 9:44 ` Sachin Sant
2009-12-15 10:43 ` Peter Zijlstra [this message]
2009-12-15 10:43 ` Peter Zijlstra
2009-12-15 13:47 ` Sachin Sant
2009-12-15 13:47 ` Sachin Sant
2009-12-15 15:03 ` Peter Zijlstra
2009-12-15 15:03 ` Peter Zijlstra
2009-12-16 5:38 ` Sachin Sant
2009-12-16 5:38 ` Sachin Sant
2009-12-16 7:14 ` Peter Zijlstra
2009-12-16 7:14 ` Peter Zijlstra
2009-12-16 6:56 ` Xiaotian Feng
2009-12-16 6:56 ` Xiaotian Feng
2009-12-16 6:25 ` Xiaotian Feng
2009-12-16 6:25 ` Xiaotian Feng
2009-12-16 6:41 ` Sachin Sant
2009-12-16 6:41 ` Sachin Sant
2009-12-16 6:45 ` Xiaotian Feng
2009-12-16 6:45 ` Xiaotian Feng
2009-12-16 6:54 ` Sachin Sant
2009-12-16 6:54 ` Sachin Sant
2009-12-16 7:18 ` Peter Zijlstra
2009-12-16 7:18 ` Peter Zijlstra
2009-12-16 7:57 ` Xiaotian Feng
2009-12-16 7:57 ` Xiaotian Feng
2009-12-16 8:24 ` Sachin Sant
2009-12-16 8:24 ` Sachin Sant
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:07 ` Xiaotian Feng
2009-12-16 9:15 ` [PATCH] fix cpu hotplug " Xiaotian Feng
2009-12-16 10:16 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1260873827.4165.362.camel@twins \
--to=peterz@infradead.org \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mingo@elte.hu \
--cc=sachinp@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.