From: Konrad Eisele <konrad@gaisler.com>
To: sparclinux@vger.kernel.org
Subject: Re: failing kthread_create_on_node()
Date: Tue, 17 Apr 2012 06:37:50 +0000 [thread overview]
Message-ID: <4F8D0FBE.8090702@gaisler.com> (raw)
In-Reply-To: <4F8BED36.7000304@gaisler.com>
>
> Please describe the sequence of events in more detail so we
> can analyze this better.
Here is the state of the kernel that causes the failure:
( <p> is the kthread created by
kernel/stop_machine.c:cpu_stop_cpu_callback(),
2 cpu system, cpu0 and cpu1, cpu0 has run-queue rq0,
cpu1 has run-queue rq1, the run-queue structure has a
field called "stop", "..." means "whatever"
)
rq0[ ... <p> ...] , rq0.stop=...
rq1[ ] , rq1.stop=<p>
1. How to get there
...
CPU0 boots up and
->kernel/smp.c:smp_init()
->kernel/cpu.c:__cpu_up() calls __cpu_notify(CPU_UP_PREPARE..)
...->stop_machine.c:cpu_stop_cpu_callback()
here you have the following code sequence:
...
310: case CPU_UP_PREPARE:
311: BUG_ON(stopper->thread || stopper->enabled ||
312: !list_empty(&stopper->works));
313: p = kthread_create_on_node(cpu_stopper_thread,
314: stopper,
315: cpu_to_node(cpu),
316: "migration/%d", cpu);
317: if (IS_ERR(p))
318: return notifier_from_errno(PTR_ERR(p));
319: get_task_struct(p);
321: kthread_bind(p, cpu);
322: sched_set_stop_task(cpu, p);
323: stopper->thread = p;
324: break;
...
I observe that lines 313-321 craete kthread <p> but it is
in rq0 . I'm not shure why kthread_bind doesnt move <p> to
rq1. Then come line 322 that calls:
-> kernel/core/sched.c:sched_set_stop_task(1,<p>)
and there you have the line:
...
980: cpu_rq(cpu)->stop = stop;
...
(where "stop" is <p> and cpu is 1).
Now you end up with the above described state. You have <p> in
rq0 and also in rq1.stop
2. What happens next
CPU1 will boot up and execute schedule().
...
-> kernel/core.c:__schedule()
...
3158: cpu = smp_processor_id();
3159: rq = cpu_rq(cpu);
...
3199: put_prev_task(rq, prev);
3200: next = pick_next_task(rq);
3201: clear_tsk_need_resched(prev);
...
CPU1 is executing, so cpu = 1 and rq = rq1
Line 3200 will call
-> kernel/core.c:pick_next_task()
...
3137: for_each_class(class) {
3138: p = class->pick_next_task(rq);
3139: if (p)
3140: return p;
3141: }
...
(rq is still rq1)
Line 3138 will end up in kernel/sched/stop_task.c:pick_next_task_stop()
...
28: struct task_struct *stop = rq->stop;
29:
30: if (stop && stop->on_rq)
31: return stop;
32:
33: return NULL;
...
With the state of the kernel being:
rq0[ ... <p> ...] , rq0.stop=...
rq1[ ] , rq1.stop=<p>
You get as "next" in __schedule <p>, even though CPU is cpu1 and <p> in
in rq0 and has thread_info(<p>)->cpu set to 0.
3. Failure
After CPU1 has switched to <p> it will end up in __schedule()
again. However because of
#define raw_smp_processor_id() (current_thread_info()->cpu)
now smp_processor_id() returns 0 even though you are cpu1 ( because <p> is on rq0)
=> lots of stange things happen and the kernel crashes
eventually.
4. Question
Where should I look for the solution. Is it
- kthread_bind(p, cpu)
That should move <p> to rq1.
- sched_set_stop_task(cpu, p)
That should force move of <p> to rq1
- Should maybe smp_processor_id() be redefined to hard_smp_processor_id()
- rq1 is empty, maybe idle_balance(1, rq1) in schedule() should have migrated
<p> to rq1, however it doesnt do it right now.
-- Greetings Konrad
next prev parent reply other threads:[~2012-04-17 6:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-16 9:58 failing kthread_create_on_node() Konrad Eisele
2012-04-16 18:33 ` David Miller
2012-04-17 6:37 ` Konrad Eisele [this message]
2012-04-18 3:41 ` David Miller
2012-04-18 6:33 ` Konrad Eisele
2012-04-19 9:33 ` Konrad Eisele
2012-04-19 9:34 ` Konrad Eisele
2012-04-19 17:36 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F8D0FBE.8090702@gaisler.com \
--to=konrad@gaisler.com \
--cc=sparclinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.