public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <ntl@pobox.com>
To: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: i386 cpu hotplug bug - instant reboot when onlining secondary
Date: Tue, 28 Feb 2006 21:28:19 -0600	[thread overview]
Message-ID: <20060301032819.GC2856@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.64.0602281412450.28074@montezuma.fsmlabs.com>

Zwane Mwaikambo wrote:
> On Tue, 28 Feb 2006, Nathan Lynch wrote:
> 
> > Zwane Mwaikambo wrote:
> > > On Mon, 27 Feb 2006, Nathan Lynch wrote:
> > > 
> > > > Zwane Mwaikambo wrote:
> > > > > On Sun, 19 Feb 2006, Nathan Lynch wrote:
> > > > > 
> > > > > > On a dual P3 Xeon machine, offlining and then onlining a cpu makes the
> > > > > > box instantly reboot.  I've been seeing this throughout the 2.6.16-rc
> > > > > > series, but wasn't able to collect more information until now.  Not
> > > > > > sure when this last worked, unfortunately.
> > > > > > 
> > > > > > With the debugging patch below, I get this on serial console:
> > > > > 
> > > > > Does 2.6.14 work? Also i wonder if it gets out of the trampoline...
> > > > 
> > > > 2.6.14 works (albeit with an APIC error reported).  When retesting
> > > > 2.6.16-rc4 with your patch on top of my debugging patch, I don't see the
> > > > "startup_secondary" line:
> > > 
> > > Hi Nathan,
> > > 
> > > Can you try the following patch? We can start moving the WARM_BOOT_HLT 
> > > down until it triple faults (i'm assuming it at least gets this far).
> > 
> > Here's what I got with this one on top of a day-old -git (all
> > debugging patches still applied):
> 
> Looks good, how about the following

I now get:

[17179687.244000] CPU 1 is now offline
[17179693.164000] Booting processor 1/1 eip 3000
[17179693.216000] CPU 1 irqstacks, hard=7837f000 soft=78377000
[17179693.284000] Setting warm reset code and vector.
[17179693.340000] 1.
[17179693.364000] 2.
[17179693.388000] 3.
[17179693.408000] Asserting INIT.
[17179693.448000] Waiting for send to finish...
[17179693.496000] +<7>Deasserting INIT.
[17179693.552000] Waiting for send to finish...
[17179693.600000] +<7>#startup loops: 2.
[17179693.644000] Sending STARTUP #1.
[17179693.688000] After apic_write.
[17179693.724000] Doing apic_write_around for target chip...
[17179693.788000] Doing apic_write_around to kick the second...


> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 head.S
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S	11 Feb 2006 16:55:14 -0000	1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/head.S	28 Feb 2006 22:12:25 -0000
> @@ -146,6 +146,12 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
>   * we know the trampoline has already loaded the boot_gdt_table GDT
>   * for us.
>   */
> +#define warm_boot	tsc_sync_disabled-__PAGE_OFFSET
> +#define WARM_BOOT_HLT		\
> +	cmpl	$0, warm_boot;	\
> +10:				\
> +	jne	10b
> +
>  ENTRY(startup_32_smp)
>  	cld
>  	movl $(__BOOT_DS),%eax
> @@ -324,6 +330,7 @@ is386:	movl $2,%ecx		# set MP
>  	cmpb $0,%cl
>  	je 1f			# the first CPU calls start_kernel
>  				# all other CPUs call initialize_secondary
> +	WARM_BOOT_HLT
>  	call initialize_secondary
>  	jmp L6
>  1:
> Index: linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c
> ===================================================================
> RCS file: /home/cvsroot/linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 smpboot.c
> --- linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c	11 Feb 2006 16:55:14 -0000	1.1.1.1
> +++ linux-2.6.16-rc2-mm1/arch/i386/kernel/smpboot.c	28 Feb 2006 15:34:42 -0000
> @@ -102,7 +102,7 @@ static cpumask_t smp_commenced_mask;
>   * is no way to resync one AP against BP. TBD: for prescott and above, we
>   * should use IA64's algorithm
>   */
> -static int __devinitdata tsc_sync_disabled;
> +int __devinitdata tsc_sync_disabled;
>  
>  /* Per CPU bogomips and other parameters */
>  struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;

  reply	other threads:[~2006-03-01  3:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-19 23:58 i386 cpu hotplug bug - instant reboot when onlining secondary Nathan Lynch
2006-02-21 16:20 ` Zwane Mwaikambo
2006-02-27  7:50   ` Nathan Lynch
2006-02-28 15:40     ` Zwane Mwaikambo
2006-02-28 21:34       ` Nathan Lynch
2006-02-28 22:13         ` Zwane Mwaikambo
2006-03-01  3:28           ` Nathan Lynch [this message]
2006-03-01  6:31             ` Zwane Mwaikambo
2006-03-06 13:25               ` Nathan Lynch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060301032819.GC2856@localhost.localdomain \
    --to=ntl@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox