All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, mingo@elte.hu, tglx@linutronix.de,
	Christoph Lameter <clameter@sgi.com>,
	Jack Steiner <steiner@sgi.com>
Subject: Re: 2.6.24 git2/mm1: cpu_to_node mapping to non-existant nodes causing boot failure
Date: Fri, 15 Feb 2008 07:46:05 -0800	[thread overview]
Message-ID: <47B5B3BD.8050205@sgi.com> (raw)
In-Reply-To: <20080215020208.GA6500@csn.ul.ie>

Mel Gorman wrote:
> On (14/02/08 12:41), Mike Travis didst pronounce:
>> Mel Gorman wrote:
>>> On (13/02/08 10:45), Mike Travis didst pronounce:
>>>> Mel Gorman wrote:
>>>>> On (03/02/08 17:16), Andrew Morton didst pronounce:
>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24/2.6.24-mm1/
>>>>>>
>>>>> bl6-13 (4-way x86_64 machine) from test.kernel.org is failing to boot recent
>>>>> -mm and mainline trees. I noticed it when testing -mm before rebasing other
>>>>> patches but the oops on mainline looks the same. The full console log is
>>>>> below but the important difference between a working and non-working kernel
>>>>> is the following
>>>>>
>>>>> -PERCPU: Allocating 62512 bytes of per cpu data
>>>>> -Built 1 zonelists in Node order, mobility grouping on.  Total pages: 255875
>>>>> +PERCPU: Allocating 65560 bytes of per cpu data
>>>>> +cpu with no node 2, num_online_nodes 1
>>>>> +cpu with no node 3, num_online_nodes 1
>>>>> +Built 1 zonelists in Node order, mobility grouping on.  Total pages:
>>>>> 251257
>>>>>
>>>>> "cpu with no node 2" is actually saying that cpu 2 has no node and the
>>>>> message is a just misleading. The number of online nodes and cpu mappings
>>>>> are not adding up as I got this from a debugging patch
>>>> I'll take a closer look though I've not been able to duplicate your
>>>> error yet.  It does appear from the message text that the code is
>>>> out-of-date.  The latest "setup_per_cpu_areas()" should say:
>>>>
>>>>        "cpu %d has no node, num_online_nodes %d\n",
>>>>         i, num_online_nodes());
>>>>
>>>> There are a number of backed up patches in the queue.  I'm resubmitting
>>>> the whole set re-based on 2.6.25-rc1 shortly.  (I don't know though, that
>>>> any will address this problem.)
>>>>
>>> According to git-bisect, the problem patch is below. It doesn't back out
>>> cleanly so I haven't verified for sure the bisect is correct yet.
>> This might make sense.  This code is in preparation for the extended
>> apic's available on the new processors.  I've tested the code with
>> our simulator (with no errors) and I'm setting up to test on a real
>> machine that has multiple numa nodes.  I wonder if maybe BIOS is not
>> providing correct node data, or the ACPI parsing is in error?  You
>> might try adding "apic=debug" to the boot command line.
>>
> 
> I tried this, but the dmesg complained about a malformed option. I'll
> check out why tomorrow but it didn't appear particularly helpful.
> 
>> For the short term, we can remove this patch if it's causing the
>> problem.  A more complete patch will be available soon that contains
>> the entire set of x2apic changes.
>>
> 
> If you send me patches to apply on top of 2.6.25-rc1, I'll give them a spin
> on the machine in question. Reverting didn't work out very well as there are
> too many collisions with patches that were applied later. I eventually got
> the machine booting but it only succeeds because it only brings up one core
> on each processor.  The patch, which is pretty brain damaged is below in case
> it helps you guess what the real problem is. dmesg logs are attached of the
> vanilla failure with acpi=debug and the log with the patch applied showing
> "__cpu_up: bad cpu 1" and "__cpu_up: bad cpu3" (i.e. the second cores of
> each machine).
> 

Thanks Mel.  I'm heading up to MV today to debug on the NUMA machine.

-Mike
> 
> diff -ru linux-2.6/arch/x86/kernel/genapic_64.c linux-2.6-working/arch/x86/kernel/genapic_64.c
> --- linux-2.6/arch/x86/kernel/genapic_64.c	2008-02-14 16:32:55.000000000 -0600
> +++ linux-2.6-working/arch/x86/kernel/genapic_64.c	2008-02-14 15:46:18.000000000 -0600
> @@ -25,10 +25,10 @@
>  #endif
>  
>  /* which logical CPU number maps to which CPU (physical APIC ID) */
> -u16 x86_cpu_to_apicid_init[NR_CPUS] __initdata
> +u8 x86_cpu_to_apicid_init[NR_CPUS] __initdata
>  					= { [0 ... NR_CPUS-1] = BAD_APICID };
>  void *x86_cpu_to_apicid_early_ptr;
> -DEFINE_PER_CPU(u16, x86_cpu_to_apicid) = BAD_APICID;
> +DEFINE_PER_CPU(u8, x86_cpu_to_apicid) = BAD_APICID;
>  EXPORT_PER_CPU_SYMBOL(x86_cpu_to_apicid);
>  
>  struct genapic __read_mostly *genapic = &apic_flat;
> diff -ru linux-2.6/arch/x86/kernel/mpparse_64.c linux-2.6-working/arch/x86/kernel/mpparse_64.c
> --- linux-2.6/arch/x86/kernel/mpparse_64.c	2008-02-14 16:32:55.000000000 -0600
> +++ linux-2.6-working/arch/x86/kernel/mpparse_64.c	2008-02-14 15:45:44.000000000 -0600
> @@ -67,7 +67,7 @@
>  /* Bitmask of physically existing CPUs */
>  physid_mask_t phys_cpu_present_map = PHYSID_MASK_NONE;
>  
> -u16 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
> +u8 x86_bios_cpu_apicid_init[NR_CPUS] __initdata
>  				= { [0 ... NR_CPUS-1] = BAD_APICID };
>  void *x86_bios_cpu_apicid_early_ptr;
>  DEFINE_PER_CPU(u16, x86_bios_cpu_apicid) = BAD_APICID;
> diff -ru linux-2.6/include/asm-x86/smp_64.h linux-2.6-working/include/asm-x86/smp_64.h
> --- linux-2.6/include/asm-x86/smp_64.h	2008-02-14 16:33:04.000000000 -0600
> +++ linux-2.6-working/include/asm-x86/smp_64.h	2008-02-14 15:43:01.000000000 -0600
> @@ -26,15 +26,16 @@
>  extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
>  				  void *info, int wait);
>  
> -extern u16 __initdata x86_cpu_to_apicid_init[];
> -extern u16 __initdata x86_bios_cpu_apicid_init[];
> +extern u8 __initdata x86_cpu_to_apicid_init[];
> +extern u8 __initdata x86_bios_cpu_apicid_init[];
>  extern void *x86_cpu_to_apicid_early_ptr;
>  extern void *x86_bios_cpu_apicid_early_ptr;
> +DECLARE_PER_CPU(u8, x86_cpu_to_apicid); /* physical ID */
> +extern u8 bios_cpu_apicid[];
>  
>  DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
>  DECLARE_PER_CPU(cpumask_t, cpu_core_map);
>  DECLARE_PER_CPU(u16, cpu_llc_id);
> -DECLARE_PER_CPU(u16, x86_cpu_to_apicid);
>  DECLARE_PER_CPU(u16, x86_bios_cpu_apicid);
>  
>  static inline int cpu_present_to_apicid(int mps_cpu)
> 
> 


  reply	other threads:[~2008-02-15 15:46 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-04  1:16 2.6.24-mm1 Andrew Morton
2008-02-04  3:55 ` 2.6.24-mm1 Build Faliure on pgtable_32.c Kamalesh Babulal
2008-02-04  3:55   ` Kamalesh Babulal
2008-02-04  4:31   ` Balbir Singh
2008-02-04  4:31     ` Balbir Singh
2008-02-04  7:36 ` 2.6.24-mm1 Ingo Molnar
2008-02-04 16:22 ` [PATCH] 2.6.24-mm1 section type conflict cleanup Kamalesh Babulal
2008-02-04 18:04   ` Sam Ravnborg
2008-02-05  4:49     ` Kamalesh Babulal
2008-02-04 20:29 ` 2.6.24-mm1: ppc32: too few arguments to function 'reserve_bootmem' Mariusz Kozlowski
2008-02-04 20:29   ` Mariusz Kozlowski
2008-02-04 22:40   ` Andrew Morton
2008-02-04 22:40     ` Andrew Morton
2008-02-05 13:00     ` Sergei Shtylyov
2008-02-05 13:00       ` Sergei Shtylyov
2008-02-05 13:25     ` Bernhard Walle
2008-02-05 13:25       ` Bernhard Walle
2008-02-04 21:56 ` 2.6.24-mm1: module params broken Hugh Dickins
2008-02-04 23:06   ` Andrew Morton
2008-02-05  0:06     ` Hugh Dickins
2008-02-05  0:16       ` Andrew Morton
2008-02-04 22:23 ` 2.6.24-mm1 - build error, AMD MCE using Intel ifdef'd log function Zan Lynx
2008-02-04 23:10   ` Andrew Morton
2008-02-04 22:32 ` 2.6.24-mm1 - Build failure at net/sched/cls_flow.c:598 Tilman Schmidt
2008-02-04 23:25   ` Andrew Morton
2008-02-05  7:24     ` Rami Rosen
2008-02-05 16:20 ` [uml-devel] [-mm Patch] arch/um/kernel/mem.c: fix a shadowed variable WANG Cong
2008-02-05 16:20   ` WANG Cong
2008-02-05 16:25 ` [uml-devel] [-mm Patch] arch/um/kernel/initrd.c: fix a missed conversion specifier WANG Cong
2008-02-05 16:25   ` WANG Cong
2008-02-05 16:59   ` [uml-devel] " Jeff Dike
2008-02-05 16:59     ` Jeff Dike
2008-02-05 16:53 ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 17:01   ` 2.6.24-mm1 Arjan van de Ven
2008-02-05 19:48     ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 19:50       ` 2.6.24-mm1 Arjan van de Ven
2008-02-05 21:25         ` 2.6.24-mm1 Valdis.Kletnieks
2008-02-05 20:19       ` 2.6.24-mm1 Andrew Morton
2008-02-06 11:13 ` 2.6.24-mm1 KOSAKI Motohiro
2008-02-06 11:15   ` 2.6.24-mm1 Ingo Molnar
2008-02-06 11:19     ` 2.6.24-mm1 KOSAKI Motohiro
2008-02-13 17:52 ` 2.6.24 git2/mm1: cpu_to_node mapping to non-existant nodes causing boot failure Mel Gorman
2008-02-13 18:45   ` Mike Travis
2008-02-14 20:17     ` Mel Gorman
2008-02-14 20:41       ` Mike Travis
2008-02-15  2:02         ` Mel Gorman
2008-02-15 15:46           ` Mike Travis [this message]
2008-02-16 20:34           ` Mike Travis
2008-02-17  0:23           ` Mike Travis
2008-02-19 16:12             ` Mike Travis
2008-02-19 19:23               ` Mel Gorman
2008-02-19 19:29                 ` Mike Travis
2008-02-27  6:29                 ` Yinghai Lu
2008-02-27 14:37                   ` Mike Travis
2008-02-27 17:25                     ` Yinghai Lu
2008-02-28 15:42                   ` Mel Gorman
2008-02-28 17:45                     ` Yinghai Lu
2008-03-03 16:27                       ` Mel Gorman
2008-03-03 17:45                         ` Ingo Molnar
2008-03-03 18:56                           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47B5B3BD.8050205@sgi.com \
    --to=travis@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.