Re: [BUG 2.6.27-rc1] find_busiest

linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
       [not found]       ` <20101113084018.GA23098@localhost>
@ 2010-11-13 10:30         ` Peter Zijlstra
  2010-11-13 12:00           ` Wu Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2010-11-13 10:30 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: LKML, Ingo Molnar, Nikanth Karthikesan, Yinghai Lu,
	David Rientjes, Zheng, Shaohui, Andrew Morton, linux-hotplug

On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
> > Will try and figure out how the heck that's happening, Ingo any clue?
> 
> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
> 
> The interesting part is, the commit was introduced in 
> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.

Argh, that commit again..

Does this fix it: http://lkml.org/lkml/2010/11/12/8



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
  2010-11-13 10:30         ` [BUG 2.6.27-rc1] find_busiest_group() LOCKUP Peter Zijlstra
@ 2010-11-13 12:00           ` Wu Fengguang
  2010-11-13 12:57             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2010-11-13 12:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nikanth Karthikesan, Yinghai Lu,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
> > > Will try and figure out how the heck that's happening, Ingo any clue?
> > 
> > It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
> > ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
> > 
> > The interesting part is, the commit was introduced in 
> > 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
> 
> Argh, that commit again..
> 
> Does this fix it: http://lkml.org/lkml/2010/11/12/8

No it still panics. Here is the dmesg.

Thanks,
Fengguang
---

[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
[    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
[    0.000000] ... CHAINHASH_SIZE:          16384
[    0.000000]  memory used by lock dependency info: 6367 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] allocated 62914560 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] ODEBUG: 15 of 15 active objects replaced
[    0.000000] hpet clockevent registered
[    0.001000] Fast TSC calibration using PIT
[    0.002000] Detected 2666.733 MHz processor.
[    0.000009] Calibrating delay loop (skipped), value calculated using timer frequency.. 5333.46 BogoMIPS (lpj&66733)
[    0.010813] pid_max: default: 32768 minimum: 301
[    0.018252] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.028528] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.036421] Mount-cache hash table entries: 256
[    0.041300] Initializing cgroup subsys debug
[    0.045664] Initializing cgroup subsys ns
[    0.049767] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
[    0.058788] Initializing cgroup subsys cpuacct
[    0.063328] Initializing cgroup subsys memory
[    0.067805] Initializing cgroup subsys devices
[    0.072340] Initializing cgroup subsys freezer
[    0.076910] CPU: Physical Processor ID: 0
[    0.081008] CPU: Processor Core ID: 0
[    0.084761] mce: CPU supports 9 MCE banks
[    0.088876] CPU0: Thermal monitoring enabled (TM1)
[    0.093767] using mwait in idle threads.
[    0.097777] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
[    0.105138] ... version:                3
[    0.109239] ... bit width:              48
[    0.113423] ... generic registers:      4
[    0.117521] ... value mask:             0000ffffffffffff
[    0.122918] ... max period:             000000007fffffff
[    0.128319] ... fixed-purpose events:   3
[    0.132415] ... event mask:             000000070000000f
[    0.138807] ACPI: Core revision 20101013
[    0.162629] ftrace: allocating 24175 entries in 95 pages
[    0.177831] Setting APIC routing to flat
[    0.182351] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.198414] CPU0: Genuine Intel(R) CPU             000  @ 2.67GHz stepping 04
[    0.312081] lockdep: fixing up alternatives.
[    0.317087] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
[    0.416915]  #2lockdep: fixing up alternatives.
[    0.513688]  #3lockdep: fixing up alternatives.
[    0.610394]  #4lockdep: fixing up alternatives.
[    0.707133]  Ok.
[    0.709070] Booting Node   1, Processors  #5lockdep: fixing up alternatives.
[    0.808855]  Ok.
[    0.810787] Booting Node   0, Processors  #6lockdep: fixing up alternatives.
[    0.910602]  Ok.
[    0.912532] Booting Node   1, Processors  #7 Ok.
[    1.007347] Brought up 8 CPUs
[    1.010412] Total of 8 processors activated (42661.40 BogoMIPS).
[    1.016551] Testing NMI watchdog ... OK.
[    1.044508] CPU0 attaching sched-domain:
[    1.048524]  domain 0: span 0-3 level MC
[    1.052578]   groups: 0 1 2 3
[    1.055836]   domain 1: span 0-4,6 level CPU
[    1.060235]    groups: 0-3 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.066875] ERROR: repeated CPUs
[    1.070189]
[    1.071778] ERROR: groups don't span domain->span
[    1.076564]    domain 2: span 0-7 level NODE
[    1.080966]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.087884] CPU1 attaching sched-domain:
[    1.091899]  domain 0: span 0-3 level MC
[    1.095957]   groups: 1 2 3 0
[    1.099201]   domain 1: span 0-4,6 level CPU
[    1.103608]    groups: 0-3 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.110273] ERROR: repeated CPUs
[    1.113594]
[    1.115177] ERROR: groups don't span domain->span
[    1.119966]    domain 2: span 0-7 level NODE
[    1.124371]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.131280] CPU2 attaching sched-domain:
[    1.135295]  domain 0: span 0-3 level MC
[    1.139353]   groups: 2 3 0 1
[    1.142609]   domain 1: span 0-4,6 level CPU
[    1.147008]    groups: 0-3 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.153664] ERROR: repeated CPUs
[    1.156979]
[    1.158567] ERROR: groups don't span domain->span
[    1.163357]    domain 2: span 0-7 level NODE
[    1.167759]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.174681] CPU3 attaching sched-domain:
[    1.178688]  domain 0: span 0-3 level MC
[    1.182746]   groups: 3 0 1 2
[    1.185997]   domain 1: span 0-4,6 level CPU
[    1.190400]    groups: 0-3 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.197059] ERROR: repeated CPUs
[    1.200377]
[    1.201959] ERROR: groups don't span domain->span
[    1.206747]    domain 2: span 0-7 level NODE
[    1.211140]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.218050] CPU4 attaching sched-domain:
[    1.222055]  domain 0: span 4-7 level MC
[    1.226112]   groups: 4 5 6 7
[    1.229358] ERROR: parent span is not a superset of domain->span
[    1.235452]   domain 1: span 0-4,6 level CPU
[    1.239858] ERROR: domain->groups does not contain CPU4
[    1.245163]    groups: 5,7 (cpu_power = 4096)
[    1.249742] ERROR: groups don't span domain->span
[    1.254535]    domain 2: span 0-7 level NODE
[    1.258935]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.265836] CPU5 attaching sched-domain:
[    1.269841]  domain 0: span 4-7 level MC
[    1.273899]   groups: 5 6 7 4
[    1.277142] ERROR: parent span is not a superset of domain->span
[    1.283227]   domain 1: span 5,7 level CPU
[    1.287458]    groups: 5,7 (cpu_power = 4096)
[    1.292026]    domain 2: span 0-7 level NODE
[    1.296429]     groups: 5,7 (cpu_power = 4096) 0-4,6 (cpu_power = 4096)
[    1.304915] CPU6 attaching sched-domain:
[    1.308922]  domain 0: span 4-7 level MC
[    1.312979]   groups: 6 7 4 5
[    1.316248] ERROR: parent span is not a superset of domain->span
[    1.322344]   domain 1: span 0-4,6 level CPU
[    1.326742] ERROR: domain->groups does not contain CPU6
[    1.332048]    groups: 5,7 (cpu_power = 4096)
[    1.336623] ERROR: groups don't span domain->span
[    1.341437]    domain 2: span 0-7 level NODE
[    1.345841]     groups: 0-4,6 (cpu_power = 4096) 5,7 (cpu_power = 4096)
[    1.352755] CPU7 attaching sched-domain:
[    1.356764]  domain 0: span 4-7 level MC
[    1.360820]   groups: 7 4 5 6
[    1.364078] ERROR: parent span is not a superset of domain->span
[    1.370165]   domain 1: span 5,7 level CPU
[    1.374398]    groups: 5,7 (cpu_power = 4096)
[    1.378964]    domain 2: span 0-7 level NODE
[    1.383372]     groups: 5,7 (cpu_power = 4096) 0-4,6 (cpu_power = 4096)
[    6.526802] BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff810a9dc1, registers:
[    6.534902] CPU 0
[    6.536767] Modules linked in:
[    6.540213]
[    6.541792] Pid: 1, comm: swapper Tainted: G        W   2.6.37-rc1+ #111 X8DTN/X8DTN
[    6.549675] RIP: 0010:[<ffffffff810a9dc1>]  [<ffffffff810a9dc1>] find_busiest_group+0x761/0x1480
[    6.558650] RSP: 0018:ffff8801b966d870  EFLAGS: 00000012
[    6.564039] RAX: 0000000000000000 RBX: ffff8801b966daec RCX: 0000000000000000
[    6.571245] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8800bac0e410
[    6.578455] RBP: ffff8801b966da30 R08: ffff8800bac0e410 R09: ffff8800bac0e400
[    6.585664] R10: 0000000000000003 R11: 0000000000000000 R12: 00000000001d2d00
[    6.592873] R13: 00000000001d2d00 R14: 00000000001d2d00 R15: 0000000000000008
[    6.600083] FS:  0000000000000000(0000) GS:ffff8800ba400000(0000) knlGS:0000000000000000
[    6.608312] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    6.614134] CR2: 0000000000000000 CR3: 0000000001ee1000 CR4: 00000000000006f0
[    6.621348] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.628558] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    6.635767] Process swapper (pid: 1, threadinfo ffff8801b966c000, task ffff8800b3778000)
[    6.643994] Stack:
[    6.646095]  ffff8801b966d890 ffff8801b966d9d0 0000000000000007 ffff8801bfdd2d00
[    6.653793]  0000000000000000 00000000001d2d00 ffff8801b966dae0 00000002b966d910
[    6.661476]  ffff8801b966d801 ffffffff810929ed ffff8800ba40de48 00000000000b306a
[    6.669171] Call Trace:
[    6.671706]  [<ffffffff810929ed>] ? __phys_addr+0x5d/0x120
[    6.677270]  [<ffffffff810b2614>] load_balance+0xe4/0xcb0
[    6.682747]  [<ffffffff810b0b54>] ? dequeue_task_fair+0x1f4/0x250
[    6.688926]  [<ffffffff8199be5d>] schedule+0xb0d/0x14b0
[    6.694235]  [<ffffffff810cc60e>] ? __sysctl_head_next+0x19e/0x1a0
[    6.700499]  [<ffffffff8199d2dd>] schedule_timeout+0x50d/0x570
[    6.706409]  [<ffffffff8110b9bc>] ? print_lock_contention_bug+0x2c/0x110
[    6.713187]  [<ffffffff810af7a1>] ? get_parent_ip+0x11/0x90
[    6.718843]  [<ffffffff819a7cbd>] ? sub_preempt_count+0x12d/0x1f0
[    6.725020]  [<ffffffff8199b10b>] wait_for_common+0x16b/0x290
[    6.730853]  [<ffffffff810b4950>] ? default_wake_function+0x0/0x20
[    6.737113]  [<ffffffff8199b34d>] wait_for_completion+0x1d/0x20
[    6.743112]  [<ffffffff810efdfb>] kthread_create+0x9b/0x150
[    6.748764]  [<ffffffff810e8310>] ? rescuer_thread+0x0/0x2a0
[    6.754506]  [<ffffffff81202078>] ? __kmalloc_node+0x2b8/0x340
[    6.760419]  [<ffffffff810e7d5a>] __alloc_workqueue_key+0x27a/0x830
[    6.766765]  [<ffffffff8263b23f>] cpuset_init_smp+0x56/0x8c
[    6.772417]  [<ffffffff8261d148>] kernel_init+0x17a/0x27c
[    6.777899]  [<ffffffff81051a24>] kernel_thread_helper+0x4/0x10
[    6.783899]  [<ffffffff819a2c14>] ? restore_args+0x0/0x30
[    6.789377]  [<ffffffff8261cfce>] ? kernel_init+0x0/0x27c
[    6.794859]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    6.801028] Code: ff 8b 42 08 48 05 00 02 00 00 48 c1 f8 0a 48 85 c0 48 89 45 c0 0f 94 c0 0f b6 c0 48 63 d0 48 83 c2 02 48 83 04 d5 58 21 09 82 01 <85> c0 0f 84 07 02 00 00 48 8b bd a8 fe ff ff 31 d2 83 7f 50 01
[    6.822637] ---[ end trace 4eaa2a86a8e2da23 ]---
[    6.827330] Kernel panic - not syncing: Non maskable interrupt
[    6.833236] Pid: 1, comm: swapper Tainted: G      D W   2.6.37-rc1+ #111
[    6.840018] Call Trace:
[    6.842548]  <NMI>  [<ffffffff810a9dc1>] ? find_busiest_group+0x761/0x1480
[    6.849539]  [<ffffffff8199acb0>] panic+0xb1/0x222
[    6.854414]  [<ffffffff810a9dc1>] ? find_busiest_group+0x761/0x1480
[    6.860763]  [<ffffffff819a4403>] die_nmi+0x153/0x180
[    6.865895]  [<ffffffff819a5049>] nmi_watchdog_tick+0x219/0x270
[    6.871902]  [<ffffffff819a38fa>] do_nmi+0x2fa/0x490
[    6.876955]  [<ffffffff819a3170>] nmi+0x20/0x39
[    6.881566]  [<ffffffff810a9dc1>] ? find_busiest_group+0x761/0x1480
[    6.887916]  <<EOE>>  [<ffffffff810929ed>] ? __phys_addr+0x5d/0x120
[    6.894301]  [<ffffffff810b2614>] load_balance+0xe4/0xcb0
[    6.899783]  [<ffffffff810b0b54>] ? dequeue_task_fair+0x1f4/0x250
[    6.905960]  [<ffffffff8199be5d>] schedule+0xb0d/0x14b0
[    6.911271]  [<ffffffff810cc60e>] ? __sysctl_head_next+0x19e/0x1a0
[    6.917533]  [<ffffffff8199d2dd>] schedule_timeout+0x50d/0x570
[    6.923443]  [<ffffffff8110b9bc>] ? print_lock_contention_bug+0x2c/0x110
[    6.930222]  [<ffffffff810af7a1>] ? get_parent_ip+0x11/0x90
[    6.935872]  [<ffffffff819a7cbd>] ? sub_preempt_count+0x12d/0x1f0
[    6.942051]  [<ffffffff8199b10b>] wait_for_common+0x16b/0x290
[    6.947881]  [<ffffffff810b4950>] ? default_wake_function+0x0/0x20
[    6.954140]  [<ffffffff8199b34d>] wait_for_completion+0x1d/0x20
[    6.960140]  [<ffffffff810efdfb>] kthread_create+0x9b/0x150
[    6.965792]  [<ffffffff810e8310>] ? rescuer_thread+0x0/0x2a0
[    6.971533]  [<ffffffff81202078>] ? __kmalloc_node+0x2b8/0x340
[    6.977445]  [<ffffffff810e7d5a>] __alloc_workqueue_key+0x27a/0x830
[    6.983793]  [<ffffffff8263b23f>] cpuset_init_smp+0x56/0x8c
[    6.989443]  [<ffffffff8261d148>] kernel_init+0x17a/0x27c
[    6.994924]  [<ffffffff81051a24>] kernel_thread_helper+0x4/0x10
[    7.000924]  [<ffffffff819a2c14>] ? restore_args+0x0/0x30
[    7.006402]  [<ffffffff8261cfce>] ? kernel_init+0x0/0x27c
[    7.011883]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    8.097122] Rebooting in 10 seconds..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
  2010-11-13 12:00           ` Wu Fengguang
@ 2010-11-13 12:57             ` Peter Zijlstra
  2010-11-13 13:10               ` Wu Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2010-11-13 12:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: LKML, Ingo Molnar, Nikanth Karthikesan, Yinghai Lu,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
> > On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
> > > > Will try and figure out how the heck that's happening, Ingo any clue?
> > > 
> > > It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
> > > ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
> > > 
> > > The interesting part is, the commit was introduced in 
> > > 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
> > 
> > Argh, that commit again..
> > 
> > Does this fix it: http://lkml.org/lkml/2010/11/12/8
> 
> No it still panics. Here is the dmesg.

OK, I'll let Nikanth have a look, if all else fails we can always revert
that patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
  2010-11-13 12:57             ` Peter Zijlstra
@ 2010-11-13 13:10               ` Wu Fengguang
  2010-11-13 19:12                 ` Yinghai Lu
  0 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2010-11-13 13:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Nikanth Karthikesan, Yinghai Lu,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
> > On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
> > > On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
> > > > > Will try and figure out how the heck that's happening, Ingo any clue?
> > > > 
> > > > It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
> > > > ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
> > > > 
> > > > The interesting part is, the commit was introduced in 
> > > > 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
> > > 
> > > Argh, that commit again..
> > > 
> > > Does this fix it: http://lkml.org/lkml/2010/11/12/8
> > 
> > No it still panics. Here is the dmesg.
> 
> OK, I'll let Nikanth have a look, if all else fails we can always
> revert that patch.

It's the same bug.

Just tried another machine, I get the same divide error.  The patch
posted in lkml/2010/11/12/8 does not fix it. But after reverting
commit 50f2d7f682f9, it boots OK.

Thanks,
Fengguang
---
PS. dmesg with divide error

[    0.000000] console [ttyS0] enabled, bootconsole disabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
[    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
[    0.000000] ... CHAINHASH_SIZE:          16384
[    0.000000]  memory used by lock dependency info: 6367 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] allocated 167772160 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] ODEBUG: 15 of 15 active objects replaced
[    0.000000] hpet clockevent registered
[    0.001000] Fast TSC calibration using PIT
[    0.002000] Detected 2800.469 MHz processor.
[    0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj(00469)
[    0.010818] pid_max: default: 32768 minimum: 301
[    0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.044553] Mount-cache hash table entries: 256
[    0.049469] Initializing cgroup subsys debug
[    0.053834] Initializing cgroup subsys ns
[    0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
[    0.066968] Initializing cgroup subsys cpuacct
[    0.071511] Initializing cgroup subsys memory
[    0.075988] Initializing cgroup subsys devices
[    0.080527] Initializing cgroup subsys freezer
[    0.085107] CPU: Physical Processor ID: 0
[    0.089209] CPU: Processor Core ID: 0
[    0.092974] mce: CPU supports 9 MCE banks
[    0.097095] CPU0: Thermal monitoring enabled (TM1)
[    0.101990] using mwait in idle threads.
[    0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
[    0.113535] ... version:                3
[    0.117641] ... bit width:              48
[    0.121828] ... generic registers:      4
[    0.125926] ... value mask:             0000ffffffffffff
[    0.131328] ... max period:             000000007fffffff
[    0.136734] ... fixed-purpose events:   3
[    0.140839] ... event mask:             000000070000000f
[    0.147297] ACPI: Core revision 20101013
[    0.175646] ftrace: allocating 24175 entries in 95 pages
[    0.190912] Setting APIC routing to flat
[    0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.211643] CPU0: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz stepping 01
[    0.325243] lockdep: fixing up alternatives.
[    0.330242] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
[    0.430140]  #2lockdep: fixing up alternatives.
[    0.526962]  #3lockdep: fixing up alternatives.
[    0.623755]  #4lockdep: fixing up alternatives.
[    0.720588]  Ok.
[    0.722525] Booting Node   1, Processors  #5lockdep: fixing up alternatives.
[    0.822389]  Ok.
[    0.824327] Booting Node   0, Processors  #6
[    0.919089] TSC synchronization [CPU#0 -> CPU#6]:
[    0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
[    0.003999] Marking TSC unstable due to check_tsc_sync_source failed
[    0.557048] lockdep: fixing up alternatives.
[    0.558041]  Ok.
[    0.559004] Booting Node   1, Processors  #7 Ok.
[    0.632157] Brought up 8 CPUs
[    0.633006] Total of 8 processors activated (44799.46 BogoMIPS).
[    0.634048] Testing NMI watchdog ... OK.
[    0.658054] divide error: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[    0.658999] last sysfs file:
[    0.658999] CPU 0
[    0.658999] Modules linked in:
[    0.658999]
[    0.658999] Pid: 1, comm: swapper Tainted: G        W   2.6.37-rc1+ #111 X8DTN/X8DTN
[    0.658999] RIP: 0010:[<ffffffff810a9d18>]  [<ffffffff810a9d18>] find_busiest_group+0x6b8/0x1480
[    0.658999] RSP: 0018:ffff88022f965870  EFLAGS: 00010006
[    0.658999] RAX: 0000000000100000 RBX: ffff88022f965aec RCX: 0000000000000000
[    0.658999] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000008
[    0.658999] RBP: ffff88022f965a30 R08: ffff88022fa00278 R09: ffff88022fa00268
[    0.658999] R10: 0000000000000003 R11: 0000000000000001 R12: 00000000001d2d00
[    0.658999] R13: 00000000001d2d00 R14: 00000000001d2d00 R15: 0000000000000008
[    0.658999] FS:  0000000000000000(0000) GS:ffff8800bc400000(0000) knlGS:0000000000000000
[    0.658999] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.658999] CR2: 0000000000000000 CR3: 0000000001ee1000 CR4: 00000000000006f0
[    0.658999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.658999] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.658999] Process swapper (pid: 1, threadinfo ffff88022f964000, task ffff88042f45c000)
[    0.658999] Stack:
[    0.658999]  ffff88022f965890 ffff88022f9659d0 0000000000000006 ffff8800bcfd2d00
[    0.658999]  0000000000000000 00000000001d2d00 ffff88022f965ae0 000000022f965910
[    0.658999]  ffff88022f965801 ffffffff810929ed ffff8800bc40de48 000000000042f47c
[    0.658999] Call Trace:
[    0.658999]  [<ffffffff810929ed>] ? __phys_addr+0x5d/0x120
[    0.658999]  [<ffffffff810b2614>] load_balance+0xe4/0xcb0
[    0.658999]  [<ffffffff810b0b54>] ? dequeue_task_fair+0x1f4/0x250
[    0.658999]  [<ffffffff8199be5d>] schedule+0xb0d/0x14b0
[    0.658999]  [<ffffffff810cc60e>] ? __sysctl_head_next+0x19e/0x1a0
[    0.658999]  [<ffffffff8199d2dd>] schedule_timeout+0x50d/0x570
[    0.658999]  [<ffffffff8110b9bc>] ? print_lock_contention_bug+0x2c/0x110
[    0.658999]  [<ffffffff810af7a1>] ? get_parent_ip+0x11/0x90
[    0.658999]  [<ffffffff819a7cbd>] ? sub_preempt_count+0x12d/0x1f0
[    0.658999]  [<ffffffff8199b10b>] wait_for_common+0x16b/0x290
[    0.658999]  [<ffffffff810b4950>] ? default_wake_function+0x0/0x20
[    0.658999]  [<ffffffff8199b34d>] wait_for_completion+0x1d/0x20
[    0.658999]  [<ffffffff810efdfb>] kthread_create+0x9b/0x150
[    0.658999]  [<ffffffff810e8310>] ? rescuer_thread+0x0/0x2a0
[    0.658999]  [<ffffffff81202078>] ? __kmalloc_node+0x2b8/0x340
[    0.658999]  [<ffffffff810e7d5a>] __alloc_workqueue_key+0x27a/0x830
[    0.658999]  [<ffffffff8263b23f>] cpuset_init_smp+0x56/0x8c
[    0.658999]  [<ffffffff8261d148>] kernel_init+0x17a/0x27c
[    0.658999]  [<ffffffff81051a24>] kernel_thread_helper+0x4/0x10
[    0.658999]  [<ffffffff819a2c14>] ? restore_args+0x0/0x30
[    0.658999]  [<ffffffff8261cfce>] ? kernel_init+0x0/0x27c
[    0.658999]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    0.658999] Code: 04 f5 20 87 43 82 48 89 94 07 80 08 00 00 41 89 4f 08 90 4c 8b 8d e0 fe ff ff 48 8b 75 a8 31 d2 41 8b 49 08 48 89 f0 48 c1 e0 0a <48> f7 f1 48 8b 4d b0 31 d2 48 85 c9 0f 95 c2 48 89 45 a0 48 63
[    0.658999] RIP  [<ffffffff810a9d18>] find_busiest_group+0x6b8/0x1480
[    0.658999]  RSP <ffff88022f965870>
[    0.658999] ---[ end trace 4eaa2a86a8e2da23 ]---
[    0.658999] divide error: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
[    0.658999] last sysfs file:
[    0.658999] CPU 1
[    0.658999] Modules linked in:
[    0.658999]
[    0.658999] Pid: 2, comm: kthreadd Tainted: G      D W   2.6.37-rc1+ #111 X8DTN/X8DTN
[    0.658999] RIP: 0010:[<ffffffff810a3321>]  [<ffffffff810a3321>] select_task_rq_fair+0x691/0x9a0
[    0.658999] RSP: 0000:ffff88022f967c30  EFLAGS: 00010002
[    0.658999] RAX: 0000000000100000 RBX: 0000000000000400 RCX: 0000000000000000
[    0.658999] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000008
[    0.658999] RBP: ffff88022f967cf0 R08: ffff88022fa00278 R09: 0000000000000000
[    0.658999] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
[    0.658999] R13: ffff88022fa00278 R14: ffff88022fa00268 R15: 0000000000000003
[    0.658999] FS:  0000000000000000(0000) GS:ffff8800bc600000(0000) knlGS:0000000000000000
[    0.658999] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.658999] CR2: 0000000000000000 CR3: 0000000001ee1000 CR4: 00000000000006e0
[    0.658999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.658999] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.658999] Process kthreadd (pid: 2, threadinfo ffff88022f966000, task ffff88022f968000)
[    0.658999] Stack:
[    0.658999]  000000002f967c50 ffff8800bc7d2d18 ffff880200000006 00000000001d2d00
[    0.658999]  00000000001d2d00 0000000000000000 000000000000007d 0000000000000000
[    0.658999]  0000000800000001 ffff88022faa41b0 00000001810a59ea 0000000000000000
[    0.658999] Call Trace:
[    0.658999]  [<ffffffff810b4a01>] wake_up_new_task+0x51/0x2d0
[    0.658999]  [<ffffffff810eb83c>] ? __task_pid_nr_ns+0x10c/0x130
[    0.658999]  [<ffffffff810eb730>] ? __task_pid_nr_ns+0x0/0x130
[    0.658999]  [<ffffffff810bb0e3>] do_fork+0x693/0x7b0
[    0.658999]  [<ffffffff819a22e8>] ? _raw_spin_unlock_irq+0x68/0x90
[    0.658999]  [<ffffffff810a7a98>] ? finish_task_switch+0x118/0x1d0
[    0.658999]  [<ffffffff810af7a1>] ? get_parent_ip+0x11/0x90
[    0.658999]  [<ffffffff819a7cbd>] ? sub_preempt_count+0x12d/0x1f0
[    0.658999]  [<ffffffff8105c276>] kernel_thread+0x76/0x80
[    0.658999]  [<ffffffff810ef9a0>] ? kthread+0x0/0xd0
[    0.658999]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    0.658999]  [<ffffffff810f0006>] kthreadd+0x136/0x1a0
[    0.658999]  [<ffffffff8110d629>] ? trace_hardirqs_on_caller+0x29/0x210
[    0.658999]  [<ffffffff81051a24>] kernel_thread_helper+0x4/0x10
[    0.658999]  [<ffffffff819a2c14>] ? restore_args+0x0/0x30
[    0.658999]  [<ffffffff810efed0>] ? kthreadd+0x0/0x1a0
[    0.658999]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    0.658999] Code: 95 50 ff ff ff eb 99 0f 1f 00 45 89 f4 4c 8b 75 b0 48 89 d8 48 c1 e0 0a 31 d2 49 83 c7 02 41 8b 4e 08 4a 83 04 fd d8 ec 08 82 01 <48> f7 f1 45 85 e4 0f 85 33 01 00 00 31 d2 48 3b 45 a0 0f 92 c2
[    0.658999] RIP  [<ffffffff810a3321>] select_task_rq_fair+0x691/0x9a0
[    0.658999]  RSP <ffff88022f967c30>
[    0.658999] ---[ end trace 4eaa2a86a8e2da24 ]---
[    0.658999] note: kthreadd[2] exited with preempt_count 2
[    0.658999] note: swapper[1] exited with preempt_count 1
[    0.659015] swapper used greatest stack depth: 3680 bytes left
[    0.660011] Kernel panic - not syncing: Attempted to kill init!
[    0.661008] Pid: 1, comm: swapper Tainted: G      D W   2.6.37-rc1+ #111
[    0.662005] Call Trace:
[    0.663012]  [<ffffffff8199acb0>] panic+0xb1/0x222
[    0.664011]  [<ffffffff810c2ff0>] do_exit+0xd10/0xdb0
[    0.665009]  [<ffffffff819a2398>] ? _raw_spin_unlock_irqrestore+0x88/0xd0
[    0.666011]  [<ffffffff819a414c>] oops_end+0x10c/0x150
[    0.667011]  [<ffffffff8105623a>] die+0x8a/0xc0
[    0.668011]  [<ffffffff819a337c>] do_trap+0x11c/0x1c0
[    0.669011]  [<ffffffff81051bee>] do_divide_error+0xbe/0xe0
[    0.670011]  [<ffffffff810a9d18>] ? find_busiest_group+0x6b8/0x1480
[    0.671011]  [<ffffffff8110ae39>] ? trace_hardirqs_off_caller+0x29/0x150
[    0.672009]  [<ffffffff819a1028>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[    0.673017]  [<ffffffff819a2c44>] ? irq_return+0x0/0xc
[    0.674012]  [<ffffffff8105183b>] divide_error+0x1b/0x20
[    0.675013]  [<ffffffff810a9d18>] ? find_busiest_group+0x6b8/0x1480
[    0.676013]  [<ffffffff810929ed>] ? __phys_addr+0x5d/0x120
[    0.677018]  [<ffffffff810b2614>] load_balance+0xe4/0xcb0
[    0.678012]  [<ffffffff810b0b54>] ? dequeue_task_fair+0x1f4/0x250
[    0.679015]  [<ffffffff8199be5d>] schedule+0xb0d/0x14b0
[    0.680009]  [<ffffffff810cc60e>] ? __sysctl_head_next+0x19e/0x1a0
[    0.681015]  [<ffffffff8199d2dd>] schedule_timeout+0x50d/0x570
[    0.682009]  [<ffffffff8110b9bc>] ? print_lock_contention_bug+0x2c/0x110
[    0.683012]  [<ffffffff810af7a1>] ? get_parent_ip+0x11/0x90
[    0.684009]  [<ffffffff819a7cbd>] ? sub_preempt_count+0x12d/0x1f0
[    0.685010]  [<ffffffff8199b10b>] wait_for_common+0x16b/0x290
[    0.686010]  [<ffffffff810b4950>] ? default_wake_function+0x0/0x20
[    0.687012]  [<ffffffff8199b34d>] wait_for_completion+0x1d/0x20
[    0.688009]  [<ffffffff810efdfb>] kthread_create+0x9b/0x150
[    0.689008]  [<ffffffff810e8310>] ? rescuer_thread+0x0/0x2a0
[    0.690012]  [<ffffffff81202078>] ? __kmalloc_node+0x2b8/0x340
[    0.691021]  [<ffffffff810e7d5a>] __alloc_workqueue_key+0x27a/0x830
[    0.692012]  [<ffffffff8263b23f>] cpuset_init_smp+0x56/0x8c
[    0.693010]  [<ffffffff8261d148>] kernel_init+0x17a/0x27c
[    0.694009]  [<ffffffff81051a24>] kernel_thread_helper+0x4/0x10
[    0.695012]  [<ffffffff819a2c14>] ? restore_args+0x0/0x30
[    0.696010]  [<ffffffff8261cfce>] ? kernel_init+0x0/0x27c
[    0.697009]  [<ffffffff81051a20>] ? kernel_thread_helper+0x0/0x10
[    2.074478] Rebooting in 10 seconds..


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
  2010-11-13 13:10               ` Wu Fengguang
@ 2010-11-13 19:12                 ` Yinghai Lu
  2010-11-13 19:41                   ` Peter Zijlstra
       [not found]                   ` <20101113235746.GA9458@localhost>
  0 siblings, 2 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-11-13 19:12 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 11/13/2010 05:10 AM, Wu Fengguang wrote:
> On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
>> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
>>> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
>>>> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
>>>>>> Will try and figure out how the heck that's happening, Ingo any clue?
>>>>>
>>>>> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
>>>>> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
>>>>>
>>>>> The interesting part is, the commit was introduced in 
>>>>> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
>>>>
>>>> Argh, that commit again..
>>>>
>>>> Does this fix it: http://lkml.org/lkml/2010/11/12/8
>>>
>>> No it still panics. Here is the dmesg.
>>
>> OK, I'll let Nikanth have a look, if all else fails we can always
>> revert that patch.
> 
> It's the same bug.
> 
> Just tried another machine, I get the same divide error.  The patch
> posted in lkml/2010/11/12/8 does not fix it. But after reverting
> commit 50f2d7f682f9, it boots OK.
> 
> Thanks,
> Fengguang
> ---
> PS. dmesg with divide error
> 
> [    0.000000] console [ttyS0] enabled, bootconsole disabled
> [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
> [    0.000000] ... MAX_LOCK_DEPTH:          48
> [    0.000000] ... MAX_LOCKDEP_KEYS:        8191
> [    0.000000] ... CLASSHASH_SIZE:          4096
> [    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
> [    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
> [    0.000000] ... CHAINHASH_SIZE:          16384
> [    0.000000]  memory used by lock dependency info: 6367 kB
> [    0.000000]  per task-struct memory footprint: 2688 bytes
> [    0.000000] allocated 167772160 bytes of page_cgroup
> [    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
> [    0.000000] ODEBUG: 15 of 15 active objects replaced
> [    0.000000] hpet clockevent registered
> [    0.001000] Fast TSC calibration using PIT
> [    0.002000] Detected 2800.469 MHz processor.
> [    0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj(00469)
> [    0.010818] pid_max: default: 32768 minimum: 301
> [    0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
> [    0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
> [    0.044553] Mount-cache hash table entries: 256
> [    0.049469] Initializing cgroup subsys debug
> [    0.053834] Initializing cgroup subsys ns
> [    0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
> [    0.066968] Initializing cgroup subsys cpuacct
> [    0.071511] Initializing cgroup subsys memory
> [    0.075988] Initializing cgroup subsys devices
> [    0.080527] Initializing cgroup subsys freezer
> [    0.085107] CPU: Physical Processor ID: 0
> [    0.089209] CPU: Processor Core ID: 0
> [    0.092974] mce: CPU supports 9 MCE banks
> [    0.097095] CPU0: Thermal monitoring enabled (TM1)
> [    0.101990] using mwait in idle threads.
> [    0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
> [    0.113535] ... version:                3
> [    0.117641] ... bit width:              48
> [    0.121828] ... generic registers:      4
> [    0.125926] ... value mask:             0000ffffffffffff
> [    0.131328] ... max period:             000000007fffffff
> [    0.136734] ... fixed-purpose events:   3
> [    0.140839] ... event mask:             000000070000000f
> [    0.147297] ACPI: Core revision 20101013
> [    0.175646] ftrace: allocating 24175 entries in 95 pages
> [    0.190912] Setting APIC routing to flat
> [    0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.211643] CPU0: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz stepping 01
> [    0.325243] lockdep: fixing up alternatives.
> [    0.330242] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
> [    0.430140]  #2lockdep: fixing up alternatives.
> [    0.526962]  #3lockdep: fixing up alternatives.
> [    0.623755]  #4lockdep: fixing up alternatives.
> [    0.720588]  Ok.
> [    0.722525] Booting Node   1, Processors  #5lockdep: fixing up alternatives.
> [    0.822389]  Ok.
> [    0.824327] Booting Node   0, Processors  #6
> [    0.919089] TSC synchronization [CPU#0 -> CPU#6]:
> [    0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.003999] Marking TSC unstable due to check_tsc_sync_source failed
> [    0.557048] lockdep: fixing up alternatives.
> [    0.558041]  Ok.
> [    0.559004] Booting Node   1, Processors  #7 Ok.
> [    0.632157] Brought up 8 CPUs
> [    0.633006] Total of 8 processors activated (44799.46 BogoMIPS).

assume that when you have 
CONFIG_NR_CPUS\x16
instead of
CONFIG_NR_CPUS=8

it will boot ok?

Thanks

	Yinghai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
  2010-11-13 19:12                 ` Yinghai Lu
@ 2010-11-13 19:41                   ` Peter Zijlstra
       [not found]                   ` <20101113235746.GA9458@localhost>
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2010-11-13 19:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Wu Fengguang, LKML, Ingo Molnar, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On Sat, 2010-11-13 at 11:12 -0800, Yinghai Lu wrote:
> > [    0.633006] Total of 8 processors activated (44799.46 BogoMIPS).
> 
> assume that when you have 
> CONFIG_NR_CPUS\x16
> instead of
> CONFIG_NR_CPUS=8
> 
> it will boot ok? 

If it would that'd still be a bug.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
       [not found]                   ` <20101113235746.GA9458@localhost>
@ 2010-11-14  0:18                     ` Yinghai Lu
  2010-11-14  1:38                     ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num Yinghai Lu
  1 sibling, 0 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-11-14  0:18 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, Andrew Morton,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 11/13/2010 03:57 PM, Wu Fengguang wrote:
> On Sun, Nov 14, 2010 at 03:12:20AM +0800, Yinghai Lu wrote:
>> On 11/13/2010 05:10 AM, Wu Fengguang wrote:
>>> On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
>>>> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
>>>>> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
>>>>>> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
>>>>>>>> Will try and figure out how the heck that's happening, Ingo any clue?
>>>>>>>
>>>>>>> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
>>>>>>> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
>>>>>>>
>>>>>>> The interesting part is, the commit was introduced in 
>>>>>>> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
>>>>>>
>>>>>> Argh, that commit again..
>>>>>>
>>>>>> Does this fix it: http://lkml.org/lkml/2010/11/12/8
>>>>>
>>>>> No it still panics. Here is the dmesg.
>>>>
>>>> OK, I'll let Nikanth have a look, if all else fails we can always
>>>> revert that patch.
>>>
>>> It's the same bug.
>>>
>>> Just tried another machine, I get the same divide error.  The patch
>>> posted in lkml/2010/11/12/8 does not fix it. But after reverting
>>> commit 50f2d7f682f9, it boots OK.
>>>
>>> Thanks,
>>> Fengguang
>>> ---
>>> PS. dmesg with divide error
>>>
>>> [    0.000000] console [ttyS0] enabled, bootconsole disabled
>>> [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
>>> [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
>>> [    0.000000] ... MAX_LOCK_DEPTH:          48
>>> [    0.000000] ... MAX_LOCKDEP_KEYS:        8191
>>> [    0.000000] ... CLASSHASH_SIZE:          4096
>>> [    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
>>> [    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
>>> [    0.000000] ... CHAINHASH_SIZE:          16384
>>> [    0.000000]  memory used by lock dependency info: 6367 kB
>>> [    0.000000]  per task-struct memory footprint: 2688 bytes
>>> [    0.000000] allocated 167772160 bytes of page_cgroup
>>> [    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
>>> [    0.000000] ODEBUG: 15 of 15 active objects replaced
>>> [    0.000000] hpet clockevent registered
>>> [    0.001000] Fast TSC calibration using PIT
>>> [    0.002000] Detected 2800.469 MHz processor.
>>> [    0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj(00469)
>>> [    0.010818] pid_max: default: 32768 minimum: 301
>>> [    0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
>>> [    0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
>>> [    0.044553] Mount-cache hash table entries: 256
>>> [    0.049469] Initializing cgroup subsys debug
>>> [    0.053834] Initializing cgroup subsys ns
>>> [    0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
>>> [    0.066968] Initializing cgroup subsys cpuacct
>>> [    0.071511] Initializing cgroup subsys memory
>>> [    0.075988] Initializing cgroup subsys devices
>>> [    0.080527] Initializing cgroup subsys freezer
>>> [    0.085107] CPU: Physical Processor ID: 0
>>> [    0.089209] CPU: Processor Core ID: 0
>>> [    0.092974] mce: CPU supports 9 MCE banks
>>> [    0.097095] CPU0: Thermal monitoring enabled (TM1)
>>> [    0.101990] using mwait in idle threads.
>>> [    0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
>>> [    0.113535] ... version:                3
>>> [    0.117641] ... bit width:              48
>>> [    0.121828] ... generic registers:      4
>>> [    0.125926] ... value mask:             0000ffffffffffff
>>> [    0.131328] ... max period:             000000007fffffff
>>> [    0.136734] ... fixed-purpose events:   3
>>> [    0.140839] ... event mask:             000000070000000f
>>> [    0.147297] ACPI: Core revision 20101013
>>> [    0.175646] ftrace: allocating 24175 entries in 95 pages
>>> [    0.190912] Setting APIC routing to flat
>>> [    0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>>> [    0.211643] CPU0: Intel(R) Xeon(R) CPU           X5660  @ 2.80GHz stepping 01
>>> [    0.325243] lockdep: fixing up alternatives.
>>> [    0.330242] Booting Node   0, Processors  #1lockdep: fixing up alternatives.
>>> [    0.430140]  #2lockdep: fixing up alternatives.
>>> [    0.526962]  #3lockdep: fixing up alternatives.
>>> [    0.623755]  #4lockdep: fixing up alternatives.
>>> [    0.720588]  Ok.
>>> [    0.722525] Booting Node   1, Processors  #5lockdep: fixing up alternatives.
>>> [    0.822389]  Ok.
>>> [    0.824327] Booting Node   0, Processors  #6
>>> [    0.919089] TSC synchronization [CPU#0 -> CPU#6]:
>>> [    0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
>>> [    0.003999] Marking TSC unstable due to check_tsc_sync_source failed
>>> [    0.557048] lockdep: fixing up alternatives.
>>> [    0.558041]  Ok.
>>> [    0.559004] Booting Node   1, Processors  #7 Ok.
>>> [    0.632157] Brought up 8 CPUs
>>> [    0.633006] Total of 8 processors activated (44799.46 BogoMIPS).
>>
>> assume that when you have 
>> CONFIG_NR_CPUS\x16
>> instead of
>> CONFIG_NR_CPUS=8
>>
>> it will boot ok?
> 
> No. But it boots OK with CONFIG_NR_CPUSd: it actually has 24 CPUs, a bit more
> than your expectation :)
> 
> This also boots the other 16 CPU box that used to lockup in find_busiest_group().

please check attached patch, it should fix the problem.

Thanks

	Yinghai

[PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitaion

Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus2 will get crash...

[    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[    9.235021] divide error: 0000 [#1] SMP 
[    9.235315] last sysfs file: 
[    9.235481] CPU 1 
[    9.235592] Modules linked in:
[    9.245398] 
[    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
[    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.265835] RSP: 0000:ffff88103f8d1c40  EFLAGS: 00010046
[    9.285550] RAX: 0000000000000000 RBX: ffff88103f887de0 RCX: 0000000000000000
[    9.305356] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
[    9.305711] RBP: ffff88103f8d1d10 R08: 0000000000000200 R09: ffff88103f887e38
[    9.325709] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[    9.326038] R13: ffff88107e80dfb0 R14: 0000000000000001 R15: ffff88103f887e40
[    9.345655] FS:  0000000000000000(0000) GS:ffff88107e800000(0000) knlGS:0000000000000000
[    9.365503] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    9.365776] CR2: 0000000000000000 CR3: 0000000002417000 CR4: 00000000000006e0
[    9.385583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.405368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    9.405713] Process kthreadd (pid: 2, threadinfo ffff88103f8d0000, task ffff88305c8aa2d0)
[    9.425563] Stack:
[    9.425668]  ffff88103f8d1cb0 0000000000000046 0000000000000000 0000000200000000
[    9.445509]  0000000000000000 0000000100000000 0000000000000046 ffffffff82bd1ce0
[    9.465350]  000000015c8aa2d0 00000000001d2540 00000000001d2540 0000007d3f8d1d28
[    9.465763] Call Trace:
[    9.465875]  [<ffffffff810747c3>] wake_up_new_task+0x3c/0x10e
[    9.485486]  [<ffffffff8107b2e3>] do_fork+0x28c/0x35f
[    9.485753]  [<ffffffff810ab832>] ? __lock_acquire+0x1801/0x1813
[    9.505474]  [<ffffffff8106f2bd>] ? finish_task_switch+0x80/0xf4
[    9.525264]  [<ffffffff8106f286>] ? finish_task_switch+0x49/0xf4
[    9.525575]  [<ffffffff8109da72>] ? local_clock+0x2b/0x3c
[    9.545281]  [<ffffffff8103da76>] kernel_thread+0x70/0x72
[    9.545544]  [<ffffffff81097c83>] ? kthread+0x0/0xa8
[    9.545797]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
[    9.565519]  [<ffffffff81098099>] kthreadd+0xe8/0x12b
[    9.585185]  [<ffffffff81037994>] kernel_thread_helper+0x4/0x10
[    9.585485]  [<ffffffff81cd793c>] ? restore_args+0x0/0x30
[    9.605192]  [<ffffffff81097fb1>] ? kthreadd+0x0/0x12b
[    9.605479]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
[    9.625295] Code: a0 be 00 02 00 00 ff c2 48 63 d2 e8 f8 67 3b 00 3b 05 86 8e 52 01 48 89 c7 89 45 c8 7c c1 48 8b 45 b0 8b 4b 08 31 d2 48 c1 e0 0a <48> f7 f1 45 85 e4 75 08 48 3b 45 b8 72 08 eb 0d 48 89 45 a8 eb 
[    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.665356]  RSP <ffff88103f8d1c40>
[    9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/acpi/boot.c |    7 +++++++
 arch/x86/mm/srat_64.c       |    8 ++++++++
 drivers/acpi/numa.c         |   14 ++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
=================================--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
 {
 	unsigned int ver = 0;
 
+#ifdef CONFIG_X86_64
+	if (id >= (MAX_APICS-1)) {
+		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
+		return;
+	}
+#endif
+
 	if (!enabled) {
 		++disabled_cpus;
 		return;
Index: linux-2.6/arch/x86/mm/srat_64.c
=================================--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
 	}
 
 	apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node);
+		return;
+	}
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
@@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct
 		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
 	else
 		apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
Index: linux-2.6/drivers/acpi/numa.c
=================================--- linux-2.6.orig/drivers/acpi/numa.c
+++ linux-2.6/drivers/acpi/numa.c
@@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
 int __init acpi_numa_init(void)
 {
 	int ret = 0;
+	int nr_cpu_entries = nr_cpu_ids;
+
+#ifdef CONFIG_X86_64
+	/*
+	 * Should not limit number with cpu num that will handle,
+	 * SRAT cpu entries could have different order with that in MADT.
+	 * So go over all cpu entries in SRAT to get apicid to node mapping.
+	 */
+	nr_cpu_entries = MAX_LOCAL_APIC;
+#endif
 
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, nr_cpu_ids);
+				     acpi_parse_x2apic_affinity, nr_cpu_entries);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, nr_cpu_ids);
+				     acpi_parse_processor_affinity, nr_cpu_entries);
 		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num
       [not found]                   ` <20101113235746.GA9458@localhost>
  2010-11-14  0:18                     ` Yinghai Lu
@ 2010-11-14  1:38                     ` Yinghai Lu
  2010-11-14 17:32                       ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
  2010-12-15 22:01                       ` H. Peter Anvin
  1 sibling, 2 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-11-14  1:38 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin
  Cc: Wu Fengguang, Peter Zijlstra, LKML, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, linux-hotplug@vger.kernel.org,
	Eric Dumazet, Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao,
	Takuya Yoshikawa


Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus2 will get crash...

[    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[    9.235021] divide error: 0000 [#1] SMP 
[    9.235315] last sysfs file: 
[    9.235481] CPU 1 
[    9.235592] Modules linked in:
[    9.245398] 
[    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
[    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.265835] RSP: 0000:ffff88103f8d1c40  EFLAGS: 00010046
[    9.285550] RAX: 0000000000000000 RBX: ffff88103f887de0 RCX: 0000000000000000
[    9.305356] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
[    9.305711] RBP: ffff88103f8d1d10 R08: 0000000000000200 R09: ffff88103f887e38
[    9.325709] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[    9.326038] R13: ffff88107e80dfb0 R14: 0000000000000001 R15: ffff88103f887e40
[    9.345655] FS:  0000000000000000(0000) GS:ffff88107e800000(0000) knlGS:0000000000000000
[    9.365503] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    9.365776] CR2: 0000000000000000 CR3: 0000000002417000 CR4: 00000000000006e0
[    9.385583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.405368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    9.405713] Process kthreadd (pid: 2, threadinfo ffff88103f8d0000, task ffff88305c8aa2d0)
[    9.425563] Stack:
[    9.425668]  ffff88103f8d1cb0 0000000000000046 0000000000000000 0000000200000000
[    9.445509]  0000000000000000 0000000100000000 0000000000000046 ffffffff82bd1ce0
[    9.465350]  000000015c8aa2d0 00000000001d2540 00000000001d2540 0000007d3f8d1d28
[    9.465763] Call Trace:
[    9.465875]  [<ffffffff810747c3>] wake_up_new_task+0x3c/0x10e
[    9.485486]  [<ffffffff8107b2e3>] do_fork+0x28c/0x35f
[    9.485753]  [<ffffffff810ab832>] ? __lock_acquire+0x1801/0x1813
[    9.505474]  [<ffffffff8106f2bd>] ? finish_task_switch+0x80/0xf4
[    9.525264]  [<ffffffff8106f286>] ? finish_task_switch+0x49/0xf4
[    9.525575]  [<ffffffff8109da72>] ? local_clock+0x2b/0x3c
[    9.545281]  [<ffffffff8103da76>] kernel_thread+0x70/0x72
[    9.545544]  [<ffffffff81097c83>] ? kthread+0x0/0xa8
[    9.545797]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
[    9.565519]  [<ffffffff81098099>] kthreadd+0xe8/0x12b
[    9.585185]  [<ffffffff81037994>] kernel_thread_helper+0x4/0x10
[    9.585485]  [<ffffffff81cd793c>] ? restore_args+0x0/0x30
[    9.605192]  [<ffffffff81097fb1>] ? kthreadd+0x0/0x12b
[    9.605479]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
[    9.625295] Code: a0 be 00 02 00 00 ff c2 48 63 d2 e8 f8 67 3b 00 3b 05 86 8e 52 01 48 89 c7 89 45 c8 7c c1 48 8b 45 b0 8b 4b 08 31 d2 48 c1 e0 0a <48> f7 f1 45 85 e4 75 08 48 3b 45 b8 72 08 eb 0d 48 89 45 a8 eb 
[    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.665356]  RSP <ffff88103f8d1c40>
[    9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

it should fix following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id"662

Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/acpi/boot.c |    7 +++++++
 arch/x86/mm/srat_64.c       |    8 ++++++++
 drivers/acpi/numa.c         |   14 ++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
=================================--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
 {
 	unsigned int ver = 0;
 
+#ifdef CONFIG_X86_64
+	if (id >= (MAX_APICS-1)) {
+		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
+		return;
+	}
+#endif
+
 	if (!enabled) {
 		++disabled_cpus;
 		return;
Index: linux-2.6/arch/x86/mm/srat_64.c
=================================--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
 	}
 
 	apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node);
+		return;
+	}
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
@@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct
 		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
 	else
 		apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
Index: linux-2.6/drivers/acpi/numa.c
=================================--- linux-2.6.orig/drivers/acpi/numa.c
+++ linux-2.6/drivers/acpi/numa.c
@@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
 int __init acpi_numa_init(void)
 {
 	int ret = 0;
+	int nr_cpu_entries = nr_cpu_ids;
+
+#ifdef CONFIG_X86_64
+	/*
+	 * Should not limit number with cpu num that will handle,
+	 * SRAT cpu entries could have different order with that in MADT.
+	 * So go over all cpu entries in SRAT to get apicid to node mapping.
+	 */
+	nr_cpu_entries = MAX_LOCAL_APIC;
+#endif
 
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, nr_cpu_ids);
+				     acpi_parse_x2apic_affinity, nr_cpu_entries);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, nr_cpu_ids);
+				     acpi_parse_processor_affinity, nr_cpu_entries);
 		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-11-14  1:38                     ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num Yinghai Lu
@ 2010-11-14 17:32                       ` Wu Fengguang
  2010-11-14 18:02                         ` Yinghai Lu
  2010-11-14 18:19                         ` Yinghai Lu
  2010-12-15 22:01                       ` H. Peter Anvin
  1 sibling, 2 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-11-14 17:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

[-- Attachment #1: Type: text/plain, Size: 7348 bytes --]

Hi,

I just found another problem. When passing "mem=256" to 2.6.37-rc1,
it dies hard early (not able to print any boot log). With this patch
applied, it's a bit better: it shows a kernel panic, but still dies
hard (not able to reboot with "panic=10").

Attached is the screenshot in kvm (it's not specific to kvm, it dies
hard on two more physical boxes). The screenshot shows that it panics
inside reserve_trampoline_memory().

Thanks,
Fengguang

On Sun, Nov 14, 2010 at 09:38:41AM +0800, Yinghai Lu wrote:
> 
> Recent Intel new system have different order in MADT, aka will list all thread0
> at first, then all thread1.
> But SRAT table still old order, it will list cpus in one socket all together.
> 
> If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
> to put some cpus apic id to node mapping into apicid_to_node[].
> 
> for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
> 
> [    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
> [    9.235021] divide error: 0000 [#1] SMP 
> [    9.235315] last sysfs file: 
> [    9.235481] CPU 1 
> [    9.235592] Modules linked in:
> [    9.245398] 
> [    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
> [    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> [    9.265835] RSP: 0000:ffff88103f8d1c40  EFLAGS: 00010046
> [    9.285550] RAX: 0000000000000000 RBX: ffff88103f887de0 RCX: 0000000000000000
> [    9.305356] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200
> [    9.305711] RBP: ffff88103f8d1d10 R08: 0000000000000200 R09: ffff88103f887e38
> [    9.325709] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> [    9.326038] R13: ffff88107e80dfb0 R14: 0000000000000001 R15: ffff88103f887e40
> [    9.345655] FS:  0000000000000000(0000) GS:ffff88107e800000(0000) knlGS:0000000000000000
> [    9.365503] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    9.365776] CR2: 0000000000000000 CR3: 0000000002417000 CR4: 00000000000006e0
> [    9.385583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    9.405368] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    9.405713] Process kthreadd (pid: 2, threadinfo ffff88103f8d0000, task ffff88305c8aa2d0)
> [    9.425563] Stack:
> [    9.425668]  ffff88103f8d1cb0 0000000000000046 0000000000000000 0000000200000000
> [    9.445509]  0000000000000000 0000000100000000 0000000000000046 ffffffff82bd1ce0
> [    9.465350]  000000015c8aa2d0 00000000001d2540 00000000001d2540 0000007d3f8d1d28
> [    9.465763] Call Trace:
> [    9.465875]  [<ffffffff810747c3>] wake_up_new_task+0x3c/0x10e
> [    9.485486]  [<ffffffff8107b2e3>] do_fork+0x28c/0x35f
> [    9.485753]  [<ffffffff810ab832>] ? __lock_acquire+0x1801/0x1813
> [    9.505474]  [<ffffffff8106f2bd>] ? finish_task_switch+0x80/0xf4
> [    9.525264]  [<ffffffff8106f286>] ? finish_task_switch+0x49/0xf4
> [    9.525575]  [<ffffffff8109da72>] ? local_clock+0x2b/0x3c
> [    9.545281]  [<ffffffff8103da76>] kernel_thread+0x70/0x72
> [    9.545544]  [<ffffffff81097c83>] ? kthread+0x0/0xa8
> [    9.545797]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
> [    9.565519]  [<ffffffff81098099>] kthreadd+0xe8/0x12b
> [    9.585185]  [<ffffffff81037994>] kernel_thread_helper+0x4/0x10
> [    9.585485]  [<ffffffff81cd793c>] ? restore_args+0x0/0x30
> [    9.605192]  [<ffffffff81097fb1>] ? kthreadd+0x0/0x12b
> [    9.605479]  [<ffffffff81037990>] ? kernel_thread_helper+0x0/0x10
> [    9.625295] Code: a0 be 00 02 00 00 ff c2 48 63 d2 e8 f8 67 3b 00 3b 05 86 8e 52 01 48 89 c7 89 45 c8 7c c1 48 8b 45 b0 8b 4b 08 31 d2 48 c1 e0 0a <48> f7 f1 45 85 e4 75 08 48 3b 45 b8 72 08 eb 0d 48 89 45 a8 eb 
> [    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> [    9.665356]  RSP <ffff88103f8d1c40>
> [    9.665568] ---[ end trace 2296156d35fdfc87 ]---
> 
> So let just parse all cpu entries in SRAT.
> 
> Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
> apicid_to_node[].
> 
> it should fix following bug too.
> https://bugzilla.kernel.org/show_bug.cgi?id=22662
> 
> Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> 
> ---
>  arch/x86/kernel/acpi/boot.c |    7 +++++++
>  arch/x86/mm/srat_64.c       |    8 ++++++++
>  drivers/acpi/numa.c         |   14 ++++++++++++--
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>  {
>  	unsigned int ver = 0;
>  
> +#ifdef CONFIG_X86_64
> +	if (id >= (MAX_APICS-1)) {
> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
> +		return;
> +	}
> +#endif
> +
>  	if (!enabled) {
>  		++disabled_cpus;
>  		return;
> Index: linux-2.6/arch/x86/mm/srat_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/srat_64.c
> +++ linux-2.6/arch/x86/mm/srat_64.c
> @@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
>  	}
>  
>  	apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> @@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct
>  		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
>  	else
>  		apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> Index: linux-2.6/drivers/acpi/numa.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/numa.c
> +++ linux-2.6/drivers/acpi/numa.c
> @@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
>  int __init acpi_numa_init(void)
>  {
>  	int ret = 0;
> +	int nr_cpu_entries = nr_cpu_ids;
> +
> +#ifdef CONFIG_X86_64
> +	/*
> +	 * Should not limit number with cpu num that will handle,
> +	 * SRAT cpu entries could have different order with that in MADT.
> +	 * So go over all cpu entries in SRAT to get apicid to node mapping.
> +	 */
> +	nr_cpu_entries = MAX_LOCAL_APIC;
> +#endif
>  
>  	/* SRAT: Static Resource Affinity Table */
>  	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
>  		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
> -				     acpi_parse_x2apic_affinity, nr_cpu_ids);
> +				     acpi_parse_x2apic_affinity, nr_cpu_entries);
>  		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
> -				     acpi_parse_processor_affinity, nr_cpu_ids);
> +				     acpi_parse_processor_affinity, nr_cpu_entries);
>  		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>  					    acpi_parse_memory_affinity,
>  					    NR_NODE_MEMBLKS);

[-- Attachment #2: panic-reserve_trampoline_memory.png --]
[-- Type: image/png, Size: 18830 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-11-14 17:32                       ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
@ 2010-11-14 18:02                         ` Yinghai Lu
  2010-11-14 18:19                         ` Yinghai Lu
  1 sibling, 0 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-11-14 18:02 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 11/14/2010 09:32 AM, Wu Fengguang wrote:
> Hi,
> 
> I just found another problem. When passing "mem%6" to 2.6.37-rc1,
> it dies hard early (not able to print any boot log). With this patch
> applied, it's a bit better: it shows a kernel panic, but still dies
> hard (not able to reboot with "panic\x10").

do you mean mem%6M ?

Thanks
	Yinghai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-11-14 17:32                       ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
  2010-11-14 18:02                         ` Yinghai Lu
@ 2010-11-14 18:19                         ` Yinghai Lu
  2010-11-15  1:22                           ` Wu Fengguang
  1 sibling, 1 reply; 22+ messages in thread
From: Yinghai Lu @ 2010-11-14 18:19 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 11/14/2010 09:32 AM, Wu Fengguang wrote:
> Hi,
> 
> I just found another problem. When passing "mem%6" to 2.6.37-rc1,
> it dies hard early (not able to print any boot log). With this patch
> applied, it's a bit better: it shows a kernel panic, but still dies
> hard (not able to reboot with "panic\x10").
> 

if you did use "mem%6", you should get panic.

early console in setup code
Probing EDD (edd=off to disable)... ok
early console in decompress_kernel
decompress_kernel:
  input: [0x2431269-0x2e7fec6], output: 0x1000000, heap: [0x2e856c0-0x2e8c6bf]

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[    0.000000] bootconsole [uart0] enabled
[    0.000000] Kernel Layout:
[    0.000000]   .text: [0x01000000-0x01ce0e48]
[    0.000000] .rodata: [0x01ce6000-0x0240ffff]
[    0.000000]   .data: [0x02410000-0x025a50bf]
[    0.000000]   .init: [0x025a7000-0x0286afff]
[    0.000000]    .bss: [0x02875000-0x0348a6d7]
[    0.000000]    .brk: [0x0348b000-0x034aafff]
[    0.000000]     memblock_x86_reserve_range: [0x01000000-0x0348a6d7]    TEXT DATA BSS
[    0.000000]     memblock_x86_reserve_range: [0x7c59d000-0x7fffefff]          RAMDISK
[    0.000000]     memblock_x86_reserve_range: [0x0009fc00-0x000fffff]  * BIOS reserved
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.37-rc1-tip-yh-02102-g776a022-dirty (yhlu@linux-siqj.site) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #276 SMP Sat Nov 13 15:34:15 PST 2010
[    0.000000] Command line: BOOT_IMAGE=linux debug apicÞbug ramdisk_size&2144 root=/dev/ram0 rw ip=dhcp mem%6 console=uart8250,io,0x3f8,115200 initrd=initrd.img
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: [0x00000000000000-0x0000000009fbff] (usable)
[    0.000000]  BIOS-e820: [0x0000000009fc00-0x0000000009ffff] (reserved)
[    0.000000]  BIOS-e820: [0x000000000f0000-0x000000000fffff] (reserved)
[    0.000000]  BIOS-e820: [0x00000000100000-0x0000002ffeffff] (usable)
[    0.000000]  BIOS-e820: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
[    0.000000]  BIOS-e820: [0x000000fffc0000-0x000000ffffffff] (reserved)
[    0.000000] e820 remove range: [0x00000000000100-0xfffffffffffffffe] (usable)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] user-defined physical RAM map:
[    0.000000]  user: [0x00000000000000-0x000000000000ff] (usable)
[    0.000000]  user: [0x0000000009fc00-0x0000000009ffff] (reserved)
[    0.000000]  user: [0x000000000f0000-0x000000000fffff] (reserved)
[    0.000000]  user: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
[    0.000000]  user: [0x000000fffc0000-0x000000ffffffff] (reserved)
[    0.000000] e820 update range: [0x00000000000000-0x000000000000ff] (usable) => (reserved)
[    0.000000] aligned physical RAM map:
[    0.000000]  aligned: [0x00000000000000-0x000000000000ff] (reserved)
[    0.000000]  aligned: [0x0000000009fc00-0x0000000009ffff] (reserved)
[    0.000000]  aligned: [0x000000000f0000-0x000000000fffff] (reserved)
[    0.000000]  aligned: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
[    0.000000]  aligned: [0x000000fffc0000-0x000000ffffffff] (reserved)
[    0.000000] DMI 2.5 present.
[    0.000000] DMI: /VirtualBox, BIOS VirtualBox 12/01/2006
[    0.000000] e820 update range: [0x00000000000000-0x0000000000ffff] (usable) => (reserved)
[    0.000000] e820 remove range: [0x000000000a0000-0x000000000fffff] (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x0 max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: uncachable
[    0.000000] MTRR variable ranges disabled:
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] CPU MTRRs all blank - virtualized system.
[    0.000000] Scan SMP from ffff880000000000 for 1024 bytes.
[    0.000000] Scan SMP from ffff88000009fc00 for 1024 bytes.
[    0.000000] found SMP MP-table at [ffff88000009fff0] 9fff0
[    0.000000]     memblock_x86_reserve_range: [0x0009fff0-0x0009ffff]   * MP-table mpf
[    0.000000]   mpc: e1160-e1254
[    0.000000]     memblock_x86_reserve_range: [0x000e1160-0x000e1253]   * MP-table mpc
[    0.000000]     memblock_x86_reserve_range: [0x0348b000-0x0348b070]              BRK
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x0
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0x00000000000000-0xffffffffffffffff], 0x0 bytes
[    0.000000]  reserved.cnt  = 0x6
[    0.000000]  reserved[0x0]   [0x0000000009fc00-0x000000000fffff], 0x60400 bytes
[    0.000000]  reserved[0x1]   [0x0000000009fff0-0x0000000009ffff], 0x10 bytes
[    0.000000]  reserved[0x2]   [0x000000000e1160-0x000000000e1253], 0xf4 bytes
[    0.000000]  reserved[0x3]   [0x00000001000000-0x0000000348a6d7], 0x248a6d8 bytes
[    0.000000]  reserved[0x4]   [0x0000000348b000-0x0000000348b070], 0x71 bytes
[    0.000000]  reserved[0x5]   [0x0000007c59d000-0x0000007fffefff], 0x3a62000 bytes
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Kernel panic - not syncing: Cannot allocate trampoline
[    0.000000] 
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc1-tip-yh-02102-g776a022-dirty #276
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81cd3f14>] panic+0x91/0x1a3
[    0.000000]  [<ffffffff82783490>] reserve_trampoline_memory+0x46/0x72
[    0.000000]  [<ffffffff827801fe>] setup_arch+0x5b0/0xae3
[    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
[    0.000000]  [<ffffffff81cd4067>] ? printk+0x41/0x43
[    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
[    0.000000]  [<ffffffff8277bb0d>] start_kernel+0xd7/0x3e8
[    0.000000]  [<ffffffff8277b2cc>] x86_64_start_reservations+0x9c/0xa0
[    0.000000]  [<ffffffff8277b3e4>] x86_64_start_kernel+0x114/0x11b
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at kernel/lockdep.c:2322 trace_hardirqs_on_caller+0xc3/0x178()
[    0.000000] Hardware name: VirtualBox
[    0.000000] Modules linked in:
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc1-tip-yh-02102-g776a022-dirty #276
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff8107bda0>] warn_slowpath_common+0x85/0x9d
[    0.000000]  [<ffffffff81cd3fe1>] ? panic+0x15e/0x1a3
[    0.000000]  [<ffffffff8107bdd2>] warn_slowpath_null+0x1a/0x1c
[    0.000000]  [<ffffffff810a8e59>] trace_hardirqs_on_caller+0xc3/0x178
[    0.000000]  [<ffffffff810a8f1b>] trace_hardirqs_on+0xd/0xf
[    0.000000]  [<ffffffff81cd3fe1>] panic+0x15e/0x1a3
[    0.000000]  [<ffffffff82783490>] reserve_trampoline_memory+0x46/0x72
[    0.000000]  [<ffffffff827801fe>] setup_arch+0x5b0/0xae3
[    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
[    0.000000]  [<ffffffff81cd4067>] ? printk+0x41/0x43
[    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
[    0.000000]  [<ffffffff8277bb0d>] start_kernel+0xd7/0x3e8
[    0.000000]  [<ffffffff8277b2cc>] x86_64_start_reservations+0x9c/0xa0
[    0.000000]  [<ffffffff8277b3e4>] x86_64_start_kernel+0x114/0x11b
[    0.000000] ---[ end trace a7919e7f17c0a725 ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-11-14 18:19                         ` Yinghai Lu
@ 2010-11-15  1:22                           ` Wu Fengguang
  0 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-11-15  1:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On Mon, Nov 15, 2010 at 02:19:30AM +0800, Yinghai Lu wrote:
> On 11/14/2010 09:32 AM, Wu Fengguang wrote:
> > Hi,
> > 
> > I just found another problem. When passing "mem%6" to 2.6.37-rc1,
> > it dies hard early (not able to print any boot log). With this patch
> > applied, it's a bit better: it shows a kernel panic, but still dies
> > hard (not able to reboot with "panic\x10").
> > 
> 
> if you did use "mem%6", you should get panic.

Oops, "256" is accepted as 256 bytes..
Sorry for the noise! It boots OK with "mem%6M".

Thanks,
Fengguang

> early console in setup code
> Probing EDD (edd=off to disable)... ok
> early console in decompress_kernel
> decompress_kernel:
>   input: [0x2431269-0x2e7fec6], output: 0x1000000, heap: [0x2e856c0-0x2e8c6bf]
> 
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
> [    0.000000] bootconsole [uart0] enabled
> [    0.000000] Kernel Layout:
> [    0.000000]   .text: [0x01000000-0x01ce0e48]
> [    0.000000] .rodata: [0x01ce6000-0x0240ffff]
> [    0.000000]   .data: [0x02410000-0x025a50bf]
> [    0.000000]   .init: [0x025a7000-0x0286afff]
> [    0.000000]    .bss: [0x02875000-0x0348a6d7]
> [    0.000000]    .brk: [0x0348b000-0x034aafff]
> [    0.000000]     memblock_x86_reserve_range: [0x01000000-0x0348a6d7]    TEXT DATA BSS
> [    0.000000]     memblock_x86_reserve_range: [0x7c59d000-0x7fffefff]          RAMDISK
> [    0.000000]     memblock_x86_reserve_range: [0x0009fc00-0x000fffff]  * BIOS reserved
> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Initializing cgroup subsys cpu
> [    0.000000] Linux version 2.6.37-rc1-tip-yh-02102-g776a022-dirty (yhlu@linux-siqj.site) (gcc version 4.5.0 20100604 [gcc-4_5-branch revision 160292] (SUSE Linux) ) #276 SMP Sat Nov 13 15:34:15 PST 2010
> [    0.000000] Command line: BOOT_IMAGE=linux debug apicÞbug ramdisk_size&2144 root=/dev/ram0 rw ip=dhcp mem%6 console=uart8250,io,0x3f8,115200 initrd=initrd.img
> [    0.000000] KERNEL supported cpus:
> [    0.000000]   Intel GenuineIntel
> [    0.000000]   AMD AuthenticAMD
> [    0.000000]   Centaur CentaurHauls
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: [0x00000000000000-0x0000000009fbff] (usable)
> [    0.000000]  BIOS-e820: [0x0000000009fc00-0x0000000009ffff] (reserved)
> [    0.000000]  BIOS-e820: [0x000000000f0000-0x000000000fffff] (reserved)
> [    0.000000]  BIOS-e820: [0x00000000100000-0x0000002ffeffff] (usable)
> [    0.000000]  BIOS-e820: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
> [    0.000000]  BIOS-e820: [0x000000fffc0000-0x000000ffffffff] (reserved)
> [    0.000000] e820 remove range: [0x00000000000100-0xfffffffffffffffe] (usable)
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] user-defined physical RAM map:
> [    0.000000]  user: [0x00000000000000-0x000000000000ff] (usable)
> [    0.000000]  user: [0x0000000009fc00-0x0000000009ffff] (reserved)
> [    0.000000]  user: [0x000000000f0000-0x000000000fffff] (reserved)
> [    0.000000]  user: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
> [    0.000000]  user: [0x000000fffc0000-0x000000ffffffff] (reserved)
> [    0.000000] e820 update range: [0x00000000000000-0x000000000000ff] (usable) => (reserved)
> [    0.000000] aligned physical RAM map:
> [    0.000000]  aligned: [0x00000000000000-0x000000000000ff] (reserved)
> [    0.000000]  aligned: [0x0000000009fc00-0x0000000009ffff] (reserved)
> [    0.000000]  aligned: [0x000000000f0000-0x000000000fffff] (reserved)
> [    0.000000]  aligned: [0x0000002fff0000-0x0000002fffffff] (ACPI data)
> [    0.000000]  aligned: [0x000000fffc0000-0x000000ffffffff] (reserved)
> [    0.000000] DMI 2.5 present.
> [    0.000000] DMI: /VirtualBox, BIOS VirtualBox 12/01/2006
> [    0.000000] e820 update range: [0x00000000000000-0x0000000000ffff] (usable) => (reserved)
> [    0.000000] e820 remove range: [0x000000000a0000-0x000000000fffff] (usable)
> [    0.000000] No AGP bridge found
> [    0.000000] last_pfn = 0x0 max_arch_pfn = 0x400000000
> [    0.000000] MTRR default type: uncachable
> [    0.000000] MTRR variable ranges disabled:
> [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
> [    0.000000] CPU MTRRs all blank - virtualized system.
> [    0.000000] Scan SMP from ffff880000000000 for 1024 bytes.
> [    0.000000] Scan SMP from ffff88000009fc00 for 1024 bytes.
> [    0.000000] found SMP MP-table at [ffff88000009fff0] 9fff0
> [    0.000000]     memblock_x86_reserve_range: [0x0009fff0-0x0009ffff]   * MP-table mpf
> [    0.000000]   mpc: e1160-e1254
> [    0.000000]     memblock_x86_reserve_range: [0x000e1160-0x000e1253]   * MP-table mpc
> [    0.000000]     memblock_x86_reserve_range: [0x0348b000-0x0348b070]              BRK
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x0
> [    0.000000]  memory.cnt  = 0x1
> [    0.000000]  memory[0x0]     [0x00000000000000-0xffffffffffffffff], 0x0 bytes
> [    0.000000]  reserved.cnt  = 0x6
> [    0.000000]  reserved[0x0]   [0x0000000009fc00-0x000000000fffff], 0x60400 bytes
> [    0.000000]  reserved[0x1]   [0x0000000009fff0-0x0000000009ffff], 0x10 bytes
> [    0.000000]  reserved[0x2]   [0x000000000e1160-0x000000000e1253], 0xf4 bytes
> [    0.000000]  reserved[0x3]   [0x00000001000000-0x0000000348a6d7], 0x248a6d8 bytes
> [    0.000000]  reserved[0x4]   [0x0000000348b000-0x0000000348b070], 0x71 bytes
> [    0.000000]  reserved[0x5]   [0x0000007c59d000-0x0000007fffefff], 0x3a62000 bytes
> [    0.000000] initial memory mapped : 0 - 20000000
> [    0.000000] Kernel panic - not syncing: Cannot allocate trampoline
> [    0.000000] 
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc1-tip-yh-02102-g776a022-dirty #276
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81cd3f14>] panic+0x91/0x1a3
> [    0.000000]  [<ffffffff82783490>] reserve_trampoline_memory+0x46/0x72
> [    0.000000]  [<ffffffff827801fe>] setup_arch+0x5b0/0xae3
> [    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
> [    0.000000]  [<ffffffff81cd4067>] ? printk+0x41/0x43
> [    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
> [    0.000000]  [<ffffffff8277bb0d>] start_kernel+0xd7/0x3e8
> [    0.000000]  [<ffffffff8277b2cc>] x86_64_start_reservations+0x9c/0xa0
> [    0.000000]  [<ffffffff8277b3e4>] x86_64_start_kernel+0x114/0x11b
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at kernel/lockdep.c:2322 trace_hardirqs_on_caller+0xc3/0x178()
> [    0.000000] Hardware name: VirtualBox
> [    0.000000] Modules linked in:
> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc1-tip-yh-02102-g776a022-dirty #276
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff8107bda0>] warn_slowpath_common+0x85/0x9d
> [    0.000000]  [<ffffffff81cd3fe1>] ? panic+0x15e/0x1a3
> [    0.000000]  [<ffffffff8107bdd2>] warn_slowpath_null+0x1a/0x1c
> [    0.000000]  [<ffffffff810a8e59>] trace_hardirqs_on_caller+0xc3/0x178
> [    0.000000]  [<ffffffff810a8f1b>] trace_hardirqs_on+0xd/0xf
> [    0.000000]  [<ffffffff81cd3fe1>] panic+0x15e/0x1a3
> [    0.000000]  [<ffffffff82783490>] reserve_trampoline_memory+0x46/0x72
> [    0.000000]  [<ffffffff827801fe>] setup_arch+0x5b0/0xae3
> [    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
> [    0.000000]  [<ffffffff81cd4067>] ? printk+0x41/0x43
> [    0.000000]  [<ffffffff827cb3a0>] ? boot_command_line+0x0/0x800
> [    0.000000]  [<ffffffff8277bb0d>] start_kernel+0xd7/0x3e8
> [    0.000000]  [<ffffffff8277b2cc>] x86_64_start_reservations+0x9c/0xa0
> [    0.000000]  [<ffffffff8277b3e4>] x86_64_start_kernel+0x114/0x11b
> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-11-14  1:38                     ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num Yinghai Lu
  2010-11-14 17:32                       ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
@ 2010-12-15 22:01                       ` H. Peter Anvin
  2010-12-15 22:40                         ` Yinghai Lu
  1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2010-12-15 22:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 11/13/2010 05:38 PM, Yinghai Lu wrote:
> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
> =================================> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>  {
>  	unsigned int ver = 0;
>  
> +#ifdef CONFIG_X86_64
> +	if (id >= (MAX_APICS-1)) {
> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
> +		return;
> +	}
> +#endif
> +
>  	if (!enabled) {
>  		++disabled_cpus;
>  		return;

Why the #ifdef?


> Index: linux-2.6/arch/x86/mm/srat_64.c
> =================================> --- linux-2.6.orig/arch/x86/mm/srat_64.c
> +++ linux-2.6/arch/x86/mm/srat_64.c
> @@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
>  	}
>  
>  	apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped that apicid too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> @@ -168,6 +172,10 @@ acpi_numa_processor_affinity_init(struct
>  		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
>  	else
>  		apic_id = pa->apic_id;
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
> +		return;
> +	}
>  	apicid_to_node[apic_id] = node;
>  	node_set(node, cpu_nodes_parsed);
>  	acpi_numa = 1;
> Index: linux-2.6/drivers/acpi/numa.c
> =================================> --- linux-2.6.orig/drivers/acpi/numa.c
> +++ linux-2.6/drivers/acpi/numa.c
> @@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
>  int __init acpi_numa_init(void)
>  {
>  	int ret = 0;
> +	int nr_cpu_entries = nr_cpu_ids;
> +
> +#ifdef CONFIG_X86_64
> +	/*
> +	 * Should not limit number with cpu num that will handle,
> +	 * SRAT cpu entries could have different order with that in MADT.
> +	 * So go over all cpu entries in SRAT to get apicid to node mapping.
> +	 */
> +	nr_cpu_entries = MAX_LOCAL_APIC;
> +#endif
>  

Same here...

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-12-15 22:01                       ` H. Peter Anvin
@ 2010-12-15 22:40                         ` Yinghai Lu
  2010-12-15 22:53                           ` H. Peter Anvin
  0 siblings, 1 reply; 22+ messages in thread
From: Yinghai Lu @ 2010-12-15 22:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 12/15/2010 02:01 PM, H. Peter Anvin wrote:
> On 11/13/2010 05:38 PM, Yinghai Lu wrote:
>> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
>> =================================>> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
>> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
>> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>>  {
>>  	unsigned int ver = 0;
>>  
>> +#ifdef CONFIG_X86_64
>> +	if (id >= (MAX_APICS-1)) {
>> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
>> +		return;
>> +	}
>> +#endif
>> +
>>  	if (!enabled) {
>>  		++disabled_cpus;
>>  		return;
> 
> Why the #ifdef?

try to limit the affects to 32bit's bunch sub arch etc.

Thanks

	Yinghai


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-12-15 22:40                         ` Yinghai Lu
@ 2010-12-15 22:53                           ` H. Peter Anvin
  2010-12-15 22:57                             ` Yinghai Lu
  2010-12-17  3:09                             ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
  0 siblings, 2 replies; 22+ messages in thread
From: H. Peter Anvin @ 2010-12-15 22:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 12/15/2010 02:40 PM, Yinghai Lu wrote:
> On 12/15/2010 02:01 PM, H. Peter Anvin wrote:
>> On 11/13/2010 05:38 PM, Yinghai Lu wrote:
>>> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
>>> =================================>>> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
>>> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
>>> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>>>  {
>>>  	unsigned int ver = 0;
>>>  
>>> +#ifdef CONFIG_X86_64
>>> +	if (id >= (MAX_APICS-1)) {
>>> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
>>> +		return;
>>> +	}
>>> +#endif
>>> +
>>>  	if (!enabled) {
>>>  		++disabled_cpus;
>>>  		return;
>>
>> Why the #ifdef?
> 
> try to limit the affects to 32bit's bunch sub arch etc.
> 

I really, really don't like that... we want more unification, not less...

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu
  2010-12-15 22:53                           ` H. Peter Anvin
@ 2010-12-15 22:57                             ` Yinghai Lu
  2010-12-17  3:09                             ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
  1 sibling, 0 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-12-15 22:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

On 12/15/2010 02:53 PM, H. Peter Anvin wrote:
> On 12/15/2010 02:40 PM, Yinghai Lu wrote:
>> On 12/15/2010 02:01 PM, H. Peter Anvin wrote:
>>> On 11/13/2010 05:38 PM, Yinghai Lu wrote:
>>>> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
>>>> =================================>>>> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
>>>> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
>>>> @@ -198,6 +198,13 @@ static void __cpuinit acpi_register_lapi
>>>>  {
>>>>  	unsigned int ver = 0;
>>>>  
>>>> +#ifdef CONFIG_X86_64
>>>> +	if (id >= (MAX_APICS-1)) {
>>>> +		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
>>>> +		return;
>>>> +	}
>>>> +#endif
>>>> +
>>>>  	if (!enabled) {
>>>>  		++disabled_cpus;
>>>>  		return;
>>>
>>> Why the #ifdef?
>>
>> try to limit the affects to 32bit's bunch sub arch etc.
>>
> 
> I really, really don't like that... we want more unification, not less...

ok, will try to remove them.

Yinghai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit
  2010-12-15 22:53                           ` H. Peter Anvin
  2010-12-15 22:57                             ` Yinghai Lu
@ 2010-12-17  3:09                             ` Yinghai Lu
  2010-12-17  3:09                               ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu Yinghai Lu
  2010-12-17 20:56                               ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit David Rientjes
  1 sibling, 2 replies; 22+ messages in thread
From: Yinghai Lu @ 2010-12-17  3:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa

We should use MAX_LOCAL_APIC for max apic ids
and MAX_APICS as number of local apics.

also apic_version[] array should max apic id related.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/apicdef.h |    1 +
 arch/x86/include/asm/mpspec.h  |    2 +-
 arch/x86/kernel/apic/apic.c    |    2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/include/asm/apicdef.h
=================================--- linux-2.6.orig/arch/x86/include/asm/apicdef.h
+++ linux-2.6/arch/x86/include/asm/apicdef.h
@@ -145,6 +145,7 @@
 
 #ifdef CONFIG_X86_32
 # define MAX_IO_APICS 64
+# define MAX_LOCAL_APIC 256
 #else
 # define MAX_IO_APICS 128
 # define MAX_LOCAL_APIC 32768
Index: linux-2.6/arch/x86/kernel/apic/apic.c
=================================--- linux-2.6.orig/arch/x86/kernel/apic/apic.c
+++ linux-2.6/arch/x86/kernel/apic/apic.c
@@ -1689,7 +1689,7 @@ void __init register_lapic_address(unsig
  * This initializes the IO-APIC and APIC hardware if this is
  * a UP kernel.
  */
-int apic_version[MAX_APICS];
+int apic_version[MAX_LOCAL_APIC];
 
 int __init APIC_init_uniprocessor(void)
 {
Index: linux-2.6/arch/x86/include/asm/mpspec.h
=================================--- linux-2.6.orig/arch/x86/include/asm/mpspec.h
+++ linux-2.6/arch/x86/include/asm/mpspec.h
@@ -6,7 +6,7 @@
 #include <asm/mpspec_def.h>
 #include <asm/x86_init.h>
 
-extern int apic_version[MAX_APICS];
+extern int apic_version[];
 extern int pic_mode;
 
 #ifdef CONFIG_X86_32

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu
  2010-12-17  3:09                             ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
@ 2010-12-17  3:09                               ` Yinghai Lu
  2010-12-17 18:53                                 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Venkatesh Pallipadi
  2010-12-17 20:56                               ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit David Rientjes
  1 sibling, 1 reply; 22+ messages in thread
From: Yinghai Lu @ 2010-12-17  3:09 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Wu Fengguang,
	Peter Zijlstra, LKML, Nikanth Karthikesan, David Rientjes,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa


Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.

If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].

for example for 4 sockets system with 64 cpus with nr_cpus2 will get crash...

[    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[    9.235021] divide error: 0000 [#1] SMP 
[    9.235315] last sysfs file: 
[    9.235481] CPU 1 
[    9.235592] Modules linked in:
[    9.245398] 
[    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
[    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[    9.665356]  RSP <ffff88103f8d1c40>
[    9.665568] ---[ end trace 2296156d35fdfc87 ]---

So let just parse all cpu entries in SRAT.

Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].

it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id"662

-v2: expand to 32bit according to hpa
   need to add MAX_LOCAL_APIC for 32bit

Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/acpi/boot.c |    5 +++++
 arch/x86/mm/srat_32.c       |    1 +
 arch/x86/mm/srat_64.c       |   10 ++++++++++
 drivers/acpi/numa.c         |   14 ++++++++++++--
 4 files changed, 28 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/boot.c
=================================--- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
+++ linux-2.6/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,11 @@ static void __cpuinit acpi_register_lapi
 {
 	unsigned int ver = 0;
 
+	if (id >= (MAX_LOCAL_APIC-1)) {
+		printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
+		return;
+	}
+
 	if (!enabled) {
 		++disabled_cpus;
 		return;
Index: linux-2.6/arch/x86/mm/srat_64.c
=================================--- linux-2.6.orig/arch/x86/mm/srat_64.c
+++ linux-2.6/arch/x86/mm/srat_64.c
@@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
 	}
 
 	apic_id = pa->apic_id;
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
@@ -168,6 +172,12 @@ acpi_numa_processor_affinity_init(struct
 		apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
 	else
 		apic_id = pa->apic_id;
+
+	if (apic_id >= MAX_LOCAL_APIC) {
+		printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
+		return;
+	}
+
 	apicid_to_node[apic_id] = node;
 	node_set(node, cpu_nodes_parsed);
 	acpi_numa = 1;
Index: linux-2.6/drivers/acpi/numa.c
=================================--- linux-2.6.orig/drivers/acpi/numa.c
+++ linux-2.6/drivers/acpi/numa.c
@@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
 int __init acpi_numa_init(void)
 {
 	int ret = 0;
+	int nr_cpu_entries = nr_cpu_ids;
+
+#ifdef CONFIG_X86
+	/*
+	 * Should not limit number with cpu num that is from NR_CPUS or nr_cpus+	 * SRAT cpu entries could have different order with that in MADT.
+	 * So go over all cpu entries in SRAT to get apicid to node mapping.
+	 */
+	nr_cpu_entries = MAX_LOCAL_APIC;
+#endif
 
 	/* SRAT: Static Resource Affinity Table */
 	if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
-				     acpi_parse_x2apic_affinity, nr_cpu_ids);
+				     acpi_parse_x2apic_affinity, nr_cpu_entries);
 		acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
-				     acpi_parse_processor_affinity, nr_cpu_ids);
+				     acpi_parse_processor_affinity, nr_cpu_entries);
 		ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
 					    acpi_parse_memory_affinity,
 					    NR_NODE_MEMBLKS);
Index: linux-2.6/arch/x86/mm/srat_32.c
=================================--- linux-2.6.orig/arch/x86/mm/srat_32.c
+++ linux-2.6/arch/x86/mm/srat_32.c
@@ -92,6 +92,7 @@ acpi_numa_processor_affinity_init(struct
 	/* mark this node as "seen" in node bitmap */
 	BMAP_SET(pxm_bitmap, cpu_affinity->proximity_domain_lo);
 
+	/* don't need to check apic_id here, because it is always 8 bits */
 	apicid_to_pxm[cpu_affinity->apic_id] = cpu_affinity->proximity_domain_lo;
 
 	printk(KERN_DEBUG "CPU %02x in proximity domain %02x\n",

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have
  2010-12-17  3:09                               ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu Yinghai Lu
@ 2010-12-17 18:53                                 ` Venkatesh Pallipadi
  2010-12-17 19:27                                   ` Yinghai Lu
  0 siblings, 1 reply; 22+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-17 18:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Wu Fengguang, Peter Zijlstra, LKML, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, linux-hotplug@vger.kernel.org,
	Eric Dumazet, Bjorn Helgaas, Nikhil Rao, Takuya Yoshikawa

linus git + these two patches still fails on my test system with the
divide error. The failure dump is similar to what I reported here
http://lkml.indiana.edu/hypermail//linux/kernel/1012.1/03641.html

This patch description talk about new Intel systems. The test system I
am seeing failure here is an ancient Intel (2 socket P4 HT) system.
AFAICS, it does not even have an SRAT table (no "ACPI: SRAT" message
in dmesg).

Thanks,
Venki

On Thu, Dec 16, 2010 at 7:09 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>
> Recent Intel new system have different order in MADT, aka will list all thread0
> at first, then all thread1.
> But SRAT table still old order, it will list cpus in one socket all together.
>
> If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
> to put some cpus apic id to node mapping into apicid_to_node[].
>
> for example for 4 sockets system with 64 cpus with nr_cpus2 will get crash...
>
> [    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
> [    9.235021] divide error: 0000 [#1] SMP
> [    9.235315] last sysfs file:
> [    9.235481] CPU 1
> [    9.235592] Modules linked in:
> [    9.245398]
> [    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
> [    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> ...
> [    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
> [    9.665356]  RSP <ffff88103f8d1c40>
> [    9.665568] ---[ end trace 2296156d35fdfc87 ]---
>
> So let just parse all cpu entries in SRAT.
>
> Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
> apicid_to_node[].
>
> it fixes following bug too.
> https://bugzilla.kernel.org/show_bug.cgi?id"662
>
> -v2: expand to 32bit according to hpa
>   need to add MAX_LOCAL_APIC for 32bit
>
> Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> Tested-by: Myron Stowe <myron.stowe@hp.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  arch/x86/kernel/acpi/boot.c |    5 +++++
>  arch/x86/mm/srat_32.c       |    1 +
>  arch/x86/mm/srat_64.c       |   10 ++++++++++
>  drivers/acpi/numa.c         |   14 ++++++++++++--
>  4 files changed, 28 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/arch/x86/kernel/acpi/boot.c
> =================================> --- linux-2.6.orig/arch/x86/kernel/acpi/boot.c
> +++ linux-2.6/arch/x86/kernel/acpi/boot.c
> @@ -198,6 +198,11 @@ static void __cpuinit acpi_register_lapi
>  {
>        unsigned int ver = 0;
>
> +       if (id >= (MAX_LOCAL_APIC-1)) {
> +               printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
> +               return;
> +       }
> +
>        if (!enabled) {
>                ++disabled_cpus;
>                return;
> Index: linux-2.6/arch/x86/mm/srat_64.c
> =================================> --- linux-2.6.orig/arch/x86/mm/srat_64.c
> +++ linux-2.6/arch/x86/mm/srat_64.c
> @@ -134,6 +134,10 @@ acpi_numa_x2apic_affinity_init(struct ac
>        }
>
>        apic_id = pa->apic_id;
> +       if (apic_id >= MAX_LOCAL_APIC) {
> +               printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
> +               return;
> +       }
>        apicid_to_node[apic_id] = node;
>        node_set(node, cpu_nodes_parsed);
>        acpi_numa = 1;
> @@ -168,6 +172,12 @@ acpi_numa_processor_affinity_init(struct
>                apic_id = (pa->apic_id << 8) | pa->local_sapic_eid;
>        else
>                apic_id = pa->apic_id;
> +
> +       if (apic_id >= MAX_LOCAL_APIC) {
> +               printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node);
> +               return;
> +       }
> +
>        apicid_to_node[apic_id] = node;
>        node_set(node, cpu_nodes_parsed);
>        acpi_numa = 1;
> Index: linux-2.6/drivers/acpi/numa.c
> =================================> --- linux-2.6.orig/drivers/acpi/numa.c
> +++ linux-2.6/drivers/acpi/numa.c
> @@ -275,13 +275,23 @@ acpi_table_parse_srat(enum acpi_srat_typ
>  int __init acpi_numa_init(void)
>  {
>        int ret = 0;
> +       int nr_cpu_entries = nr_cpu_ids;
> +
> +#ifdef CONFIG_X86
> +       /*
> +        * Should not limit number with cpu num that is from NR_CPUS or nr_cpus> +        * SRAT cpu entries could have different order with that in MADT.
> +        * So go over all cpu entries in SRAT to get apicid to node mapping.
> +        */
> +       nr_cpu_entries = MAX_LOCAL_APIC;
> +#endif
>
>        /* SRAT: Static Resource Affinity Table */
>        if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
>                acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
> -                                    acpi_parse_x2apic_affinity, nr_cpu_ids);
> +                                    acpi_parse_x2apic_affinity, nr_cpu_entries);
>                acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
> -                                    acpi_parse_processor_affinity, nr_cpu_ids);
> +                                    acpi_parse_processor_affinity, nr_cpu_entries);
>                ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
>                                            acpi_parse_memory_affinity,
>                                            NR_NODE_MEMBLKS);
> Index: linux-2.6/arch/x86/mm/srat_32.c
> =================================> --- linux-2.6.orig/arch/x86/mm/srat_32.c
> +++ linux-2.6/arch/x86/mm/srat_32.c
> @@ -92,6 +92,7 @@ acpi_numa_processor_affinity_init(struct
>        /* mark this node as "seen" in node bitmap */
>        BMAP_SET(pxm_bitmap, cpu_affinity->proximity_domain_lo);
>
> +       /* don't need to check apic_id here, because it is always 8 bits */
>        apicid_to_pxm[cpu_affinity->apic_id] = cpu_affinity->proximity_domain_lo;
>
>        printk(KERN_DEBUG "CPU %02x in proximity domain %02x\n",
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have
  2010-12-17 18:53                                 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Venkatesh Pallipadi
@ 2010-12-17 19:27                                   ` Yinghai Lu
  2010-12-17 19:35                                     ` Kay Sievers
  0 siblings, 1 reply; 22+ messages in thread
From: Yinghai Lu @ 2010-12-17 19:27 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: H. Peter Anvin, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Wu Fengguang, Peter Zijlstra, LKML, Nikanth Karthikesan,
	David Rientjes, Zheng, Shaohui, linux-hotplug@vger.kernel.org,
	Eric Dumazet, Bjorn Helgaas, Nikhil Rao, Takuya Yoshikawa

On 12/17/2010 10:53 AM, Venkatesh Pallipadi wrote:
> linus git + these two patches still fails on my test system with the
> divide error. The failure dump is similar to what I reported here
> http://lkml.indiana.edu/hypermail//linux/kernel/1012.1/03641.html
> 
> This patch description talk about new Intel systems. The test system I
> am seeing failure here is an ancient Intel (2 socket P4 HT) system.
> AFAICS, it does not even have an SRAT table (no "ACPI: SRAT" message
> in dmesg).

that could be different cause.

Do you have whole boot log with debug etc?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have
  2010-12-17 19:27                                   ` Yinghai Lu
@ 2010-12-17 19:35                                     ` Kay Sievers
  0 siblings, 0 replies; 22+ messages in thread
From: Kay Sievers @ 2010-12-17 19:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Venkatesh Pallipadi, H. Peter Anvin, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Wu Fengguang, Peter Zijlstra, LKML,
	Nikanth Karthikesan, David Rientjes, Zheng, Shaohui,
	linux-hotplug@vger.kernel.org, Eric Dumazet, Bjorn Helgaas,
	Nikhil Rao, Takuya Yoshikawa

Guys, mind dropping linux-hotplug@ here? Linux-kernel@ should be
enough. This list is almost entirely about *userspace* hotplug issues.

Thanks,
Kay

On Fri, Dec 17, 2010 at 20:27, Yinghai Lu <yinghai@kernel.org> wrote:
> On 12/17/2010 10:53 AM, Venkatesh Pallipadi wrote:
>> linus git + these two patches still fails on my test system with the
>> divide error. The failure dump is similar to what I reported here
>> http://lkml.indiana.edu/hypermail//linux/kernel/1012.1/03641.html
>>
>> This patch description talk about new Intel systems. The test system I
>> am seeing failure here is an ancient Intel (2 socket P4 HT) system.
>> AFAICS, it does not even have an SRAT table (no "ACPI: SRAT" message
>> in dmesg).
>
> that could be different cause.
>
> Do you have whole boot log with debug etc?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at Â http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit
  2010-12-17  3:09                             ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
  2010-12-17  3:09                               ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu Yinghai Lu
@ 2010-12-17 20:56                               ` David Rientjes
  1 sibling, 0 replies; 22+ messages in thread
From: David Rientjes @ 2010-12-17 20:56 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Wu Fengguang, Peter Zijlstra, LKML, Nikanth Karthikesan,
	Zheng, Shaohui, linux-hotplug@vger.kernel.org, Eric Dumazet,
	Bjorn Helgaas, Venkatesh Pallipadi, Nikhil Rao, Takuya Yoshikawa,
	Tejun Heo

On Thu, 16 Dec 2010, Yinghai Lu wrote:

> Index: linux-2.6/arch/x86/include/asm/apicdef.h
> =================================> --- linux-2.6.orig/arch/x86/include/asm/apicdef.h
> +++ linux-2.6/arch/x86/include/asm/apicdef.h
> @@ -145,6 +145,7 @@
>  
>  #ifdef CONFIG_X86_32
>  # define MAX_IO_APICS 64
> +# define MAX_LOCAL_APIC 256
>  #else
>  # define MAX_IO_APICS 128
>  # define MAX_LOCAL_APIC 32768
> Index: linux-2.6/arch/x86/kernel/apic/apic.c
> =================================> --- linux-2.6.orig/arch/x86/kernel/apic/apic.c
> +++ linux-2.6/arch/x86/kernel/apic/apic.c
> @@ -1689,7 +1689,7 @@ void __init register_lapic_address(unsig
>   * This initializes the IO-APIC and APIC hardware if this is
>   * a UP kernel.
>   */
> -int apic_version[MAX_APICS];
> +int apic_version[MAX_LOCAL_APIC];
>  
>  int __init APIC_init_uniprocessor(void)
>  {
> Index: linux-2.6/arch/x86/include/asm/mpspec.h
> =================================> --- linux-2.6.orig/arch/x86/include/asm/mpspec.h
> +++ linux-2.6/arch/x86/include/asm/mpspec.h
> @@ -6,7 +6,7 @@
>  #include <asm/mpspec_def.h>
>  #include <asm/x86_init.h>
>  
> -extern int apic_version[MAX_APICS];
> +extern int apic_version[];
>  extern int pic_mode;
>  
>  #ifdef CONFIG_X86_32
> 

This looks like it duplicates and conflicts with Tejun's NUMA unification 
patchset, specifically http://marc.info/?l=linux-kernel&m\x129087155712396

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-12-17 20:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20101111100628.GA24728@localhost>
     [not found] ` <1289478978.2084.74.camel@laptop>
     [not found]   ` <20101111124015.GA9706@localhost>
     [not found]     ` <1289480656.2084.80.camel@laptop>
     [not found]       ` <20101113084018.GA23098@localhost>
2010-11-13 10:30         ` [BUG 2.6.27-rc1] find_busiest_group() LOCKUP Peter Zijlstra
2010-11-13 12:00           ` Wu Fengguang
2010-11-13 12:57             ` Peter Zijlstra
2010-11-13 13:10               ` Wu Fengguang
2010-11-13 19:12                 ` Yinghai Lu
2010-11-13 19:41                   ` Peter Zijlstra
     [not found]                   ` <20101113235746.GA9458@localhost>
2010-11-14  0:18                     ` Yinghai Lu
2010-11-14  1:38                     ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num Yinghai Lu
2010-11-14 17:32                       ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
2010-11-14 18:02                         ` Yinghai Lu
2010-11-14 18:19                         ` Yinghai Lu
2010-11-15  1:22                           ` Wu Fengguang
2010-12-15 22:01                       ` H. Peter Anvin
2010-12-15 22:40                         ` Yinghai Lu
2010-12-15 22:53                           ` H. Peter Anvin
2010-12-15 22:57                             ` Yinghai Lu
2010-12-17  3:09                             ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
2010-12-17  3:09                               ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu Yinghai Lu
2010-12-17 18:53                                 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Venkatesh Pallipadi
2010-12-17 19:27                                   ` Yinghai Lu
2010-12-17 19:35                                     ` Kay Sievers
2010-12-17 20:56                               ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).