public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, mingo@elte.hu, tglx@linutronix.de,
	Christoph Lameter <clameter@sgi.com>,
	Jack Steiner <steiner@sgi.com>
Subject: Re: 2.6.24 git2/mm1: cpu_to_node mapping to non-existant nodes causing boot failure
Date: Fri, 15 Feb 2008 07:46:05 -0800	[thread overview]
Message-ID: <47B5B3BD.8050205@sgi.com> (raw)
In-Reply-To: <20080215020208.GA6500@csn.ul.ie>

Mel Gorman wrote:
> On (14/02/08 12:41), Mike Travis didst pronounce:
>> Mel Gorman wrote:
>>> On (13/02/08 10:45), Mike Travis didst pronounce:
>>>> Mel Gorman wrote:
>>>>> On (03/02/08 17:16), Andrew Morton didst pronounce:
>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24/2.6.24-mm1/
>>>>>>
>>>>> bl6-13 (4-way x86_64 machine) from test.kernel.org is failing to boot recent
>>>>> -mm and mainline trees. I noticed it when testing -mm before rebasing other
>>>>> patches but the oops on mainline looks the same. The full console log is
>>>>> below but the important difference between a working and non-working kernel
>>>>> is the following
>>>>>
>>>>> -PERCPU: Allocating 62512 bytes of per cpu data
>>>>> -Built 1 zonelists in Node order, mobility grouping on.  Total pages: 255875
>>>>> +PERCPU: Allocating 65560 bytes of per cpu data
>>>>> +cpu with no node 2, num_online_nodes 1
>>>>> +cpu with no node 3, num_online_nodes 1
>>>>> +Built 1 zonelists in Node order, mobility grouping on.  Total pages:
>>>>> 251257
>>>>>
>>>>> "cpu with no node 2" is actually saying that cpu 2 has no node and the
>>>>> message is a just misleading. The number of online nodes and cpu mappings
>>>>> are not adding up as I got this from a debugging patch
>>>> I'll take a closer look though I've not been able to duplicate your
>>>> error yet.  It does appear from the message text that the code is
>>>> out-of-date.  The latest "setup_per_cpu_areas()" should say:
>>>>
>>>>        "cpu %d has no node, num_online_nodes %d\n",
>>>>         i, num_online_nodes());
>>>>
>>>> There are a number of backed up patches in the queue.  I'm resubmitting
>>>> the whole set re-based on 2.6.25-rc1 shortly.  (I don't know though, that
>>>> any will address this problem.)
>>>>
>>> According to git-bisect, the problem patch is below. It doesn't back out
>>> cleanly so I haven't verified for sure the bisect is correct yet.
>> This might make sense.  This code is in preparation for the extended
>> apic's available on the new processors.  I've tested the code with
>> our simulator (with no errors) and I'm setting up to test on a real
>> machine that has multiple numa nodes.  I wonder if maybe BIOS is not
>> providing correct node data, or the ACPI parsing is in error?  You
>> might try adding "apic=debug" to the boot command line.
>>
> 
> I tried this, but the dmesg complained about a malformed option. I'll
> check out why tomorrow but it didn't appear particularly helpful.
> 
>> For the short term, we can remove this patch if it's causing the
>> problem.  A more complete patch will be available soon that contains
>> the entire set of x2apic changes.
>>
> 
> If you send me patches to apply on top of 2.6.25-rc1, I'll give them a spin
> on the machine in question. Reverting didn't work out very well as there are
> too many collisions with patches that were applied later. I eventually got
> the machine booting but it only succeeds because it only brings up one core
> on each processor.  The patch, which is pretty brain damaged is below in case
> it helps you guess what the real problem is. dmesg logs are attached of the
> vanilla failure with acpi=debug and the log with the patch applied showing
> "__cpu_up: bad cpu 1" and "__cpu_up: bad cpu3" (i.e. the second cores of
> each machine).
> 

Thanks Mel.  I'm heading up to MV today to debug on the NUMA machine.

-Mike
> 
> diff -ru linux-2.6/arch/x86/kernel/genapic_64.c linux-2.6-working/arch/x86/kernel/genapic_64.c
> --- linux-2.6/arch/x86/kernel/genapic_64.c	2008-02-14 16:32:55.000000000 -0600
> +++ linux-2.6-working/arch/x86/kernel/genapic_64.c	2008-02-14 15:46:18.000000000 -0600
> @@ -25,10 +25,10 @@
>  #endif
>  
>  /* which logical CPU number maps to which CPU (physical APIC ID) */
> -u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata
> +u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata
>  					= { [0 ... NR_CPUS-1] = BAD_APICID };
>  void *x86_cpu_to_apicid_early_ptr;
> -DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
> +DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
>  EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
>  
>  struct genapic __read_mostly *genapic = &apic_flat;
> diff -ru linux-2.6/arch/x86/kernel/mpparse_64.c linux-2.6-working/arch/x86/kernel/mpparse_64.c
> --- linux-2.6/arch/x86/kernel/mpparse_64.c	2008-02-14 16:32:55.000000000 -0600
> +++ linux-2.6-working/arch/x86/kernel/mpparse_64.c	2008-02-14 15:45:44.000000000 -0600
> @@ -67,7 +67,7 @@
>  /* Bitmask of physically existing CPUs */
>  physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
>  
> -u16 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
> +u8 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
>  				= { [0 ... NR_CPUS-1] = BAD_APICID };
>  void *x86_bios_cpu_apicid_early_ptr;
>  DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
> diff -ru linux-2.6/include/asm-x86/smp_64.h linux-2.6-working/include/asm-x86/smp_64.h
> --- linux-2.6/include/asm-x86/smp_64.h	2008-02-14 16:33:04.000000000 -0600
> +++ linux-2.6-working/include/asm-x86/smp_64.h	2008-02-14 15:43:01.000000000 -0600
> @@ -26,15 +26,16 @@
>  extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
>  				  void *info, int wait);
>  
> -extern u16 __initdata x86_cpu_to_apicid_init[];
> -extern u16 __initdata x86_bios_cpu_apicid_init[];
> +extern u8 __initdata x86_cpu_to_apicid_init[];
> +extern u8 __initdata x86_bios_cpu_apicid_init[];
>  extern void *x86_cpu_to_apicid_early_ptr;
>  extern void *x86_bios_cpu_apicid_early_ptr;
> +DECLARE_PER_CPU(u8, x86_cpu_to_apicid); /* physical ID */
> +extern u8 bios_cpu_apicid[];
>  
>  DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
>  DECLARE_PER_CPU(cpumask_t, cpu_core_map);
>  DECLARE_PER_CPU(u16, cpu_llc_id);
> -DECLARE_PER_CPU(u16, x86_cpu_to_apicid);
>  DECLARE_PER_CPU(u16, x86_bios_cpu_apicid);
>  
>  static inline int cpu_present_to_apicid(int mps_cpu)
> 
> 


  reply	other threads:[~2008-02-15 15:46 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-04  1:16 2.6.24-mm1 Andrew Morton
2008-02-04  3:55 ` 2.6.24-mm1 Build Faliure on pgtable_32.c Kamalesh Babulal
2008-02-04  4:31   ` Balbir Singh
2008-02-04  7:36 ` 2.6.24-mm1 Ingo Molnar
2008-02-04 16:22 ` [PATCH] 2.6.24-mm1 section type conflict cleanup Kamalesh Babulal
2008-02-04 18:04   ` Sam Ravnborg
2008-02-05  4:49     ` Kamalesh Babulal
2008-02-04 20:29 ` 2.6.24-mm1: ppc32: too few arguments to function 'reserve_bootmem' Mariusz Kozlowski
2008-02-04 22:40   ` Andrew Morton
2008-02-05 13:00     ` Sergei Shtylyov
2008-02-05 13:25     ` Bernhard Walle
2008-02-04 21:56 ` 2.6.24-mm1: module params broken Hugh Dickins
2008-02-04 23:06   ` Andrew Morton
2008-02-05  0:06     ` Hugh Dickins
2008-02-05  0:16       ` Andrew Morton
2008-02-04 22:23 ` 2.6.24-mm1 - build error, AMD MCE using Intel ifdef'd log function Zan Lynx
2008-02-04 23:10   ` Andrew Morton
2008-02-04 22:32 ` 2.6.24-mm1 - Build failure at net/sched/cls_flow.c:598 Tilman Schmidt
2008-02-04 23:25   ` Andrew Morton
2008-02-05  7:24     ` Rami Rosen
2008-02-05 16:20 ` [-mm Patch] arch/um/kernel/mem.c: fix a shadowed variable WANG Cong
2008-02-05 16:25 ` [-mm Patch] arch/um/kernel/initrd.c: fix a missed conversion specifier WANG Cong
2008-02-05 16:59   ` Jeff Dike
2008-02-05 16:53 ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 17:01   ` 2.6.24-mm1 Arjan van de Ven
2008-02-05 19:48     ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 19:50       ` 2.6.24-mm1 Arjan van de Ven
2008-02-05 21:25         ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 20:19       ` 2.6.24-mm1 Andrew Morton
2008-02-06 11:13 ` 2.6.24-mm1 KOSAKI Motohiro
2008-02-06 11:15   ` 2.6.24-mm1 Ingo Molnar
2008-02-06 11:19     ` 2.6.24-mm1 KOSAKI Motohiro
2008-02-13 17:52 ` 2.6.24 git2/mm1: cpu_to_node mapping to non-existant nodes causing boot failure Mel Gorman
2008-02-13 18:45   ` Mike Travis
2008-02-14 20:17     ` Mel Gorman
2008-02-14 20:41       ` Mike Travis
2008-02-15  2:02         ` Mel Gorman
2008-02-15 15:46           ` Mike Travis [this message]
2008-02-16 20:34           ` Mike Travis
2008-02-17  0:23           ` Mike Travis
2008-02-19 16:12             ` Mike Travis
2008-02-19 19:23               ` Mel Gorman
2008-02-19 19:29                 ` Mike Travis
2008-02-27  6:29                 ` Yinghai Lu
2008-02-27 14:37                   ` Mike Travis
2008-02-27 17:25                     ` Yinghai Lu
2008-02-28 15:42                   ` Mel Gorman
2008-02-28 17:45                     ` Yinghai Lu
2008-03-03 16:27                       ` Mel Gorman
2008-03-03 17:45                         ` Ingo Molnar
2008-03-03 18:56                           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47B5B3BD.8050205@sgi.com \
    --to=travis@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox