All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Eisele <konrad@gaisler.com>
To: sparclinux@vger.kernel.org
Subject: Re: failing kthread_create_on_node()
Date: Tue, 17 Apr 2012 06:37:50 +0000	[thread overview]
Message-ID: <4F8D0FBE.8090702@gaisler.com> (raw)
In-Reply-To: <4F8BED36.7000304@gaisler.com>


>
> Please describe the sequence of events in more detail so we
> can analyze this better.

Here is the state of the kernel that causes the failure:
( <p> is the kthread created by
   kernel/stop_machine.c:cpu_stop_cpu_callback(),
   2 cpu system, cpu0 and cpu1, cpu0 has run-queue rq0,
   cpu1 has run-queue rq1, the run-queue structure has a
   field called "stop", "..." means "whatever"
)

rq0[ ... <p> ...] , rq0.stop=...
rq1[ ]            , rq1.stop=<p>

1. How to get there
...
CPU0 boots up and
->kernel/smp.c:smp_init()
  ->kernel/cpu.c:__cpu_up() calls __cpu_notify(CPU_UP_PREPARE..)
    ...->stop_machine.c:cpu_stop_cpu_callback()
         here you have the following code sequence:
         ...
310: 	case CPU_UP_PREPARE:
311:		BUG_ON(stopper->thread || stopper->enabled ||
312:		       !list_empty(&stopper->works));
313:		p = kthread_create_on_node(cpu_stopper_thread,
314:					   stopper,
315:					   cpu_to_node(cpu),
316:					   "migration/%d", cpu);
317:		if (IS_ERR(p))
318:			return notifier_from_errno(PTR_ERR(p));
319:		get_task_struct(p);
321:		kthread_bind(p, cpu);
322:		sched_set_stop_task(cpu, p);
323:		stopper->thread = p;
324:		break;
        ...
        I observe that lines 313-321 craete kthread <p> but it is
        in rq0 . I'm not shure why kthread_bind doesnt move <p> to
        rq1. Then come line 322 that calls:
        -> kernel/core/sched.c:sched_set_stop_task(1,<p>)
        and there you have the line:
        ...
980:   cpu_rq(cpu)->stop = stop;
        ...
        (where "stop" is <p> and cpu is 1).
        Now you end up with the above described state. You have <p> in
        rq0 and also in rq1.stop

2. What happens next

    CPU1 will boot up and execute schedule().
    ...
    -> kernel/core.c:__schedule()
         ...
3158:   cpu = smp_processor_id();
3159:	rq = cpu_rq(cpu);
         ...
3199: 	put_prev_task(rq, prev);
3200:	next = pick_next_task(rq);
3201:	clear_tsk_need_resched(prev);
         ...
     CPU1 is executing, so cpu = 1 and rq = rq1
     Line 3200 will call
    -> kernel/core.c:pick_next_task()

         ...
3137:	for_each_class(class) {
3138:		p = class->pick_next_task(rq);
3139:		if (p)
3140:			return p;
3141:	}
         ...
     (rq is still rq1)
     Line 3138 will end up in kernel/sched/stop_task.c:pick_next_task_stop()
         ...
28:	struct task_struct *stop = rq->stop;
29:
30:	if (stop && stop->on_rq)
31:		return stop;
32:
33:	return NULL;
         ...

With the state of the kernel being:

rq0[ ... <p> ...] , rq0.stop=...
rq1[ ]            , rq1.stop=<p>

You get as "next" in __schedule <p>, even though CPU is cpu1 and <p> in
in rq0 and has thread_info(<p>)->cpu set to 0.

3. Failure

After CPU1 has switched to <p> it will end up in __schedule()
again. However because of
#define raw_smp_processor_id()		(current_thread_info()->cpu)
now smp_processor_id() returns 0 even though you are cpu1 ( because <p> is on rq0)
=> lots of stange things happen and the kernel crashes
    eventually.


4. Question

Where should I look for the solution. Is it
  -  kthread_bind(p, cpu)
     That should move <p> to rq1.
  -  sched_set_stop_task(cpu, p)
     That should force move of <p> to rq1
  -  Should maybe smp_processor_id() be redefined to hard_smp_processor_id()
  -  rq1 is empty, maybe idle_balance(1, rq1) in schedule() should have migrated
     <p> to rq1, however it doesnt do it right now.

-- Greetings Konrad


  parent reply	other threads:[~2012-04-17  6:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-16  9:58 failing kthread_create_on_node() Konrad Eisele
2012-04-16 18:33 ` David Miller
2012-04-17  6:37 ` Konrad Eisele [this message]
2012-04-18  3:41 ` David Miller
2012-04-18  6:33 ` Konrad Eisele
2012-04-19  9:33 ` Konrad Eisele
2012-04-19  9:34 ` Konrad Eisele
2012-04-19 17:36 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8D0FBE.8090702@gaisler.com \
    --to=konrad@gaisler.com \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.