public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: Andrew Theurer <habanero@us.ibm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	William Lee Irwin III <wli@holomorphy.com>,
	Andrew Morton <akpm@osdl.org>,
	"Martin J. Bligh" <mbligh@aracnet.com>
Subject: Re: CPU boot problem on 2.6.0-test3-bk8
Date: 20 Aug 2003 20:42:09 -0700	[thread overview]
Message-ID: <1061437329.15363.92.camel@nighthawk> (raw)
In-Reply-To: <200308202013.51702.habanero@us.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On Wed, 2003-08-20 at 18:13, Andrew Theurer wrote:
> On Wednesday 20 August 2003 20:02, Dave Hansen wrote:
> > On Wed, 2003-08-20 at 14:58, Andrew Theurer wrote:
> > > Maybe this is already known, but just in case:
> > > I cannot fully boot on an x440 system with 2.6.0-test3-bk8.  The kernel
> > > tries to boot more than the 16 logical processors, and after failing (no
> > > response) on cpus 16, 17, and 18, it still thinks it has 19 cpus total. 
> > > It finally gets stuck at "checking TSC synchronization across 19 CPUs:"
> > >
> > > Attached is the boot log.  Any ideas? I'll try -test3-bk7 next
> >
> > Can you see if it works without HT on?  Did it work on plain -test3?
> > My 16-way x440 with no HT boots fine on test3.
> 
> I'll try without HT to see what happens.  FWIW, it boots fine with HT if I set 
> maxcpus=16.  I am wondering if (apicid == BAD_APIC) test is not working in 
> smp_boot_cpus.

Hmmm.  This is looking like fallout from the massive wli-bomb.  Here's
the loop that controls the cpu booting, before and after cpumask_t:

-	for (bit = 0; kicked < NR_CPUS && bit < BITS_PER_LONG; bit++) +	for
(bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++)
		apicid = cpu_present_to_apicid(bit);

"kicked" only gets incremented for CPUs that were successfully booted,
so it doesn't help terminate the loop much.  MAX_APICS is 256 on summit,
which is *MUCH* bigger than BITS_PER_LONG. 
cpu_2_logical_apicid[NR_CPUS] which is referenced from
cpu_present_to_apicid() is getting referenced up to MAX_APICs, which is
bigger than NR_CPUS.  Overflow.  Bang.  garbage != BAD_APICID :)

Attached patch fixes it.  We sure do have a lot of duplicate code in the
subarches.  <sigh>
-- 
Dave Hansen
haveblue@us.ibm.com

[-- Attachment #2: cpu_to_logical_apicid-fix-2.6.0-test3-bk8-0.patch --]
[-- Type: text/x-patch, Size: 2325 bytes --]

diff -urp linux-2.6.0-test3-clean/include/asm-i386/mach-bigsmp/mach_apic.h linux-2.6.0-test3-work/include/asm-i386/mach-bigsmp/mach_apic.h
--- linux-2.6.0-test3-clean/include/asm-i386/mach-bigsmp/mach_apic.h	Wed Aug 20 19:54:32 2003
+++ linux-2.6.0-test3-work/include/asm-i386/mach-bigsmp/mach_apic.h	Wed Aug 20 20:23:52 2003
@@ -98,6 +98,8 @@ extern u8 cpu_2_logical_apicid[];
 /* Mapping from cpu number to logical apicid */
 static inline int cpu_to_logical_apicid(int cpu)
 {
+       if (cpu >= NR_CPUS)
+	       return BAD_APICID;
        return (int)cpu_2_logical_apicid[cpu];
  }
 
diff -urp linux-2.6.0-test3-clean/include/asm-i386/mach-es7000/mach_apic.h linux-2.6.0-test3-work/include/asm-i386/mach-es7000/mach_apic.h
--- linux-2.6.0-test3-clean/include/asm-i386/mach-es7000/mach_apic.h	Wed Aug 20 19:54:32 2003
+++ linux-2.6.0-test3-work/include/asm-i386/mach-es7000/mach_apic.h	Wed Aug 20 20:23:56 2003
@@ -123,6 +123,8 @@ extern u8 cpu_2_logical_apicid[];
 /* Mapping from cpu number to logical apicid */
 static inline int cpu_to_logical_apicid(int cpu)
 {
+       if (cpu >= NR_CPUS)
+	       return BAD_APICID;
        return (int)cpu_2_logical_apicid[cpu];
 }
 
diff -urp linux-2.6.0-test3-clean/include/asm-i386/mach-numaq/mach_apic.h linux-2.6.0-test3-work/include/asm-i386/mach-numaq/mach_apic.h
--- linux-2.6.0-test3-clean/include/asm-i386/mach-numaq/mach_apic.h	Wed Aug 20 19:54:32 2003
+++ linux-2.6.0-test3-work/include/asm-i386/mach-numaq/mach_apic.h	Wed Aug 20 20:23:59 2003
@@ -60,6 +60,8 @@ static inline physid_mask_t ioapic_phys_
 extern u8 cpu_2_logical_apicid[];
 static inline int cpu_to_logical_apicid(int cpu)
 {
+       if (cpu >= NR_CPUS)
+	       return BAD_APICID;
 	return (int)cpu_2_logical_apicid[cpu];
 }
 
diff -urp linux-2.6.0-test3-clean/include/asm-i386/mach-summit/mach_apic.h linux-2.6.0-test3-work/include/asm-i386/mach-summit/mach_apic.h
--- linux-2.6.0-test3-clean/include/asm-i386/mach-summit/mach_apic.h	Wed Aug 20 19:54:32 2003
+++ linux-2.6.0-test3-work/include/asm-i386/mach-summit/mach_apic.h	Wed Aug 20 20:24:03 2003
@@ -80,6 +80,8 @@ static inline int apicid_to_node(int log
 extern u8 cpu_2_logical_apicid[];
 static inline int cpu_to_logical_apicid(int cpu)
 {
+       if (cpu >= NR_CPUS)
+	       return BAD_APICID;
 	return (int)cpu_2_logical_apicid[cpu];
 }
 

  reply	other threads:[~2003-08-21  3:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-20 21:58 CPU boot problem on 2.6.0-test3-bk8 Andrew Theurer
2003-08-21  1:02 ` Dave Hansen
2003-08-21  1:13   ` Andrew Theurer
2003-08-21  3:42     ` Dave Hansen [this message]
2003-08-21 14:10       ` Andrew Theurer
2003-08-21 14:58         ` Dave Hansen
2003-08-21 15:56           ` Andrew Theurer
2003-08-21 16:09             ` Dave Hansen
2003-08-21 17:02               ` Andrew Theurer
2003-08-21 21:13                 ` William Lee Irwin III
2003-08-21 21:33                 ` William Lee Irwin III
2003-08-21 22:17                   ` William Lee Irwin III
2003-08-21 22:45                     ` William Lee Irwin III
2003-08-21 23:10                       ` William Lee Irwin III
2003-08-22 17:16                 ` William Lee Irwin III
2003-08-22 18:16                   ` Andrew Theurer
2003-08-22 19:11                   ` Andrew Theurer
2003-08-21 15:28         ` Dave Hansen
2003-08-21 21:04           ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1061437329.15363.92.camel@nighthawk \
    --to=haveblue@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=habanero@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox