public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hubertus Franke <frankeh@watson.ibm.com>
To: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@transmeta.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Andrew Morton <akpm@zip.com.au>
Subject: Re: [patch] Race between init_idle and reschedule_idle
Date: Tue, 9 Oct 2001 09:12:22 -0400	[thread overview]
Message-ID: <20011009091222.A4644@watson.ibm.com> (raw)
In-Reply-To: <1076429074.1001803809@[10.10.1.2]>
In-Reply-To: <1076429074.1001803809@[10.10.1.2]>; from Martin J. Bligh on Sat, Sep 29, 2001 at 10:50:09PM -0700

* Martin J. Bligh <Martin.Bligh@us.ibm.com> [20010930 01;50]:"
> On an SMP system, if the boot cpu calls reschedule_idle before the
> secondary cpus have all called init_idle, we hit the following code
> in reschedule_idle:
> 
> ...
>     for (i = 0; i < smp_num_cpus; i++) {
>         cpu = cpu_logical_map(i);
>         if (!can_schedule(p, cpu))
>             continue;
>         tsk = cpu_curr(cpu);
> ...
> 
> Because init_idle hasn't run for all cpus yet, the expression 
> "cpu_curr(cpu)"
> gives us NULL. When we derefernce an offset of 0x50 off tsk a few
> nanoseconds later we panic, complaining vaddr 0x00000050 is invalid.
> 
> This is more likely to happen on larger systems, but could happen on any
> SMP system (especially if I enable serial console, which really slows down
> the secondary procs doing printk ;-) ). If you want to see the panic / 
> analysis,
> I can send it.
> 
> Thanks to Alan Cox & Andrew Morton for showing me how to serialise the
> cpus to make the panic legible. The following patch holds back the boot
> cpu at the end of smp_init until all the secondarys have done init_idle:
> 

martin, we experienced the same problem way back when we brought our
Fujitsu based Numa machine up. We also experience the problem with 
our cpu pooling and load balancing approach. I think the ksoftirq might
have similar problems.
We solved it in a similar fashion, through counters.
I strongly suggest to put this code into the main track.

-- Hubertus

> --- virgin-2.4.10/kernel/sched.c	Mon Sep 17 23:03:09 2001
> +++ linux-2.4.10/kernel/sched.c	Sat Sep 29 19:57:10 2001
> @@ -1309,6 +1309,8 @@
>  	atomic_inc(&current->files->count);
>  }
> 
> +extern volatile unsigned long wait_init_idle;
> +
>  void __init init_idle(void)
>  {
>  	struct schedule_data * sched_data;
> @@ -1321,6 +1323,7 @@
>  	}
>  	sched_data->curr = current;
>  	sched_data->last_schedule = get_cycles();
> +	clear_bit(current->processor, &wait_init_idle);
>  }
> 
>  extern void init_timervecs (void);
> --- virgin-2.4.10/init/main.c	Thu Sep 20 21:02:01 2001
> +++ linux-2.4.10/init/main.c	Sat Sep 29 21:11:52 2001
> @@ -477,6 +477,8 @@
>  extern void setup_arch(char **);
>  extern void cpu_idle(void);
> 
> +volatile unsigned long wait_init_idle = 0UL;
> +
>  #ifndef CONFIG_SMP
> 
>  #ifdef CONFIG_X86_IO_APIC
> @@ -490,13 +492,25 @@
> 
>  #else
> 
> +
>  /* Called by boot processor to activate the rest. */
>  static void __init smp_init(void)
>  {
>  	/* Get other processors into their bootup holding patterns. */
>  	smp_boot_cpus();
> +	wait_init_idle = cpu_online_map;
> +	clear_bit(current->processor, &wait_init_idle); /* Don't wait on me! */
> +	printk("Waiting on wait_init_idle (map = 0x%lx)\n", wait_init_idle);
>  	smp_threads_ready=1;
>  	smp_commence();
> +
> +	/* Wait for the other cpus to set up their idle processes */
> +        while (1) {
> +                if (!wait_init_idle)
> +                        break;
> +                rep_nop();
> +        }
> +	printk("All processors have done init_idle\n");
>  }		
> 
>  #endif
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  parent reply	other threads:[~2001-10-09 15:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-09-30  5:50 [patch] Race between init_idle and reschedule_idle Martin J. Bligh
2001-09-30 12:19 ` Richard Gooch
2001-10-02 18:43 ` george anzinger
2001-10-09 13:12 ` Hubertus Franke [this message]
2001-10-09 15:56   ` Martin J. Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20011009091222.A4644@watson.ibm.com \
    --to=frankeh@watson.ibm.com \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=akpm@zip.com.au \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox