From: Yinghai Lu <yinghai@kernel.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Nikanth Karthikesan <knikanth@suse.de>,
David Rientjes <rientjes@google.com>,
"Zheng, Shaohui" <shaohui.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-hotplug@vger.kernel.org" <linux-hotplug@vger.kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
Bjorn Helgaas <bjorn.helgaas@hp.com>,
Venkatesh Pallipadi <venki@google.com>,
Nikhil Rao <ncrao@google.com>,
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Subject: Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
Date: Sat, 13 Nov 2010 19:12:20 +0000 [thread overview]
Message-ID: <4CDEE314.6090107@kernel.org> (raw)
In-Reply-To: <20101113131042.GA5522@localhost>
On 11/13/2010 05:10 AM, Wu Fengguang wrote:
> On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
>> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
>>> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
>>>> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
>>>>>> Will try and figure out how the heck that's happening, Ingo any clue?
>>>>>
>>>>> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
>>>>> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
>>>>>
>>>>> The interesting part is, the commit was introduced in
>>>>> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
>>>>
>>>> Argh, that commit again..
>>>>
>>>> Does this fix it: http://lkml.org/lkml/2010/11/12/8
>>>
>>> No it still panics. Here is the dmesg.
>>
>> OK, I'll let Nikanth have a look, if all else fails we can always
>> revert that patch.
>
> It's the same bug.
>
> Just tried another machine, I get the same divide error. The patch
> posted in lkml/2010/11/12/8 does not fix it. But after reverting
> commit 50f2d7f682f9, it boots OK.
>
> Thanks,
> Fengguang
> ---
> PS. dmesg with divide error
>
> [ 0.000000] console [ttyS0] enabled, bootconsole disabled
> [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
> [ 0.000000] ... MAX_LOCK_DEPTH: 48
> [ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
> [ 0.000000] ... CLASSHASH_SIZE: 4096
> [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
> [ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
> [ 0.000000] ... CHAINHASH_SIZE: 16384
> [ 0.000000] memory used by lock dependency info: 6367 kB
> [ 0.000000] per task-struct memory footprint: 2688 bytes
> [ 0.000000] allocated 167772160 bytes of page_cgroup
> [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
> [ 0.000000] ODEBUG: 15 of 15 active objects replaced
> [ 0.000000] hpet clockevent registered
> [ 0.001000] Fast TSC calibration using PIT
> [ 0.002000] Detected 2800.469 MHz processor.
> [ 0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj(00469)
> [ 0.010818] pid_max: default: 32768 minimum: 301
> [ 0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
> [ 0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
> [ 0.044553] Mount-cache hash table entries: 256
> [ 0.049469] Initializing cgroup subsys debug
> [ 0.053834] Initializing cgroup subsys ns
> [ 0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
> [ 0.066968] Initializing cgroup subsys cpuacct
> [ 0.071511] Initializing cgroup subsys memory
> [ 0.075988] Initializing cgroup subsys devices
> [ 0.080527] Initializing cgroup subsys freezer
> [ 0.085107] CPU: Physical Processor ID: 0
> [ 0.089209] CPU: Processor Core ID: 0
> [ 0.092974] mce: CPU supports 9 MCE banks
> [ 0.097095] CPU0: Thermal monitoring enabled (TM1)
> [ 0.101990] using mwait in idle threads.
> [ 0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
> [ 0.113535] ... version: 3
> [ 0.117641] ... bit width: 48
> [ 0.121828] ... generic registers: 4
> [ 0.125926] ... value mask: 0000ffffffffffff
> [ 0.131328] ... max period: 000000007fffffff
> [ 0.136734] ... fixed-purpose events: 3
> [ 0.140839] ... event mask: 000000070000000f
> [ 0.147297] ACPI: Core revision 20101013
> [ 0.175646] ftrace: allocating 24175 entries in 95 pages
> [ 0.190912] Setting APIC routing to flat
> [ 0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.211643] CPU0: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz stepping 01
> [ 0.325243] lockdep: fixing up alternatives.
> [ 0.330242] Booting Node 0, Processors #1lockdep: fixing up alternatives.
> [ 0.430140] #2lockdep: fixing up alternatives.
> [ 0.526962] #3lockdep: fixing up alternatives.
> [ 0.623755] #4lockdep: fixing up alternatives.
> [ 0.720588] Ok.
> [ 0.722525] Booting Node 1, Processors #5lockdep: fixing up alternatives.
> [ 0.822389] Ok.
> [ 0.824327] Booting Node 0, Processors #6
> [ 0.919089] TSC synchronization [CPU#0 -> CPU#6]:
> [ 0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
> [ 0.003999] Marking TSC unstable due to check_tsc_sync_source failed
> [ 0.557048] lockdep: fixing up alternatives.
> [ 0.558041] Ok.
> [ 0.559004] Booting Node 1, Processors #7 Ok.
> [ 0.632157] Brought up 8 CPUs
> [ 0.633006] Total of 8 processors activated (44799.46 BogoMIPS).
assume that when you have
CONFIG_NR_CPUS\x16
instead of
CONFIG_NR_CPUS=8
it will boot ok?
Thanks
Yinghai
WARNING: multiple messages have this Message-ID (diff)
From: Yinghai Lu <yinghai@kernel.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Nikanth Karthikesan <knikanth@suse.de>,
David Rientjes <rientjes@google.com>,
"Zheng, Shaohui" <shaohui.zheng@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-hotplug@vger.kernel.org" <linux-hotplug@vger.kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
Bjorn Helgaas <bjorn.helgaas@hp.com>,
Venkatesh Pallipadi <venki@google.com>,
Nikhil Rao <ncrao@google.com>,
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Subject: Re: [BUG 2.6.27-rc1] find_busiest_group() LOCKUP
Date: Sat, 13 Nov 2010 11:12:20 -0800 [thread overview]
Message-ID: <4CDEE314.6090107@kernel.org> (raw)
In-Reply-To: <20101113131042.GA5522@localhost>
On 11/13/2010 05:10 AM, Wu Fengguang wrote:
> On Sat, Nov 13, 2010 at 08:57:58PM +0800, Peter Zijlstra wrote:
>> On Sat, 2010-11-13 at 20:00 +0800, Wu Fengguang wrote:
>>> On Sat, Nov 13, 2010 at 06:30:24PM +0800, Peter Zijlstra wrote:
>>>> On Sat, 2010-11-13 at 16:40 +0800, Wu Fengguang wrote:
>>>>>> Will try and figure out how the heck that's happening, Ingo any clue?
>>>>>
>>>>> It's back to normal on 2.6.37-rc1 when reverting commit 50f2d7f682f9
>>>>> ("x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA").
>>>>>
>>>>> The interesting part is, the commit was introduced in
>>>>> 2.6.36-rc7..2.6.36, however 2.6.36 boots OK, while 2.6.37-rc1 panics.
>>>>
>>>> Argh, that commit again..
>>>>
>>>> Does this fix it: http://lkml.org/lkml/2010/11/12/8
>>>
>>> No it still panics. Here is the dmesg.
>>
>> OK, I'll let Nikanth have a look, if all else fails we can always
>> revert that patch.
>
> It's the same bug.
>
> Just tried another machine, I get the same divide error. The patch
> posted in lkml/2010/11/12/8 does not fix it. But after reverting
> commit 50f2d7f682f9, it boots OK.
>
> Thanks,
> Fengguang
> ---
> PS. dmesg with divide error
>
> [ 0.000000] console [ttyS0] enabled, bootconsole disabled
> [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
> [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
> [ 0.000000] ... MAX_LOCK_DEPTH: 48
> [ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
> [ 0.000000] ... CLASSHASH_SIZE: 4096
> [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
> [ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
> [ 0.000000] ... CHAINHASH_SIZE: 16384
> [ 0.000000] memory used by lock dependency info: 6367 kB
> [ 0.000000] per task-struct memory footprint: 2688 bytes
> [ 0.000000] allocated 167772160 bytes of page_cgroup
> [ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
> [ 0.000000] ODEBUG: 15 of 15 active objects replaced
> [ 0.000000] hpet clockevent registered
> [ 0.001000] Fast TSC calibration using PIT
> [ 0.002000] Detected 2800.469 MHz processor.
> [ 0.000010] Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.93 BogoMIPS (lpj=2800469)
> [ 0.010818] pid_max: default: 32768 minimum: 301
> [ 0.021745] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
> [ 0.035657] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
> [ 0.044553] Mount-cache hash table entries: 256
> [ 0.049469] Initializing cgroup subsys debug
> [ 0.053834] Initializing cgroup subsys ns
> [ 0.057940] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
> [ 0.066968] Initializing cgroup subsys cpuacct
> [ 0.071511] Initializing cgroup subsys memory
> [ 0.075988] Initializing cgroup subsys devices
> [ 0.080527] Initializing cgroup subsys freezer
> [ 0.085107] CPU: Physical Processor ID: 0
> [ 0.089209] CPU: Processor Core ID: 0
> [ 0.092974] mce: CPU supports 9 MCE banks
> [ 0.097095] CPU0: Thermal monitoring enabled (TM1)
> [ 0.101990] using mwait in idle threads.
> [ 0.106006] Performance Events: PEBS fmt1+, Westmere events, Intel PMU driver.
> [ 0.113535] ... version: 3
> [ 0.117641] ... bit width: 48
> [ 0.121828] ... generic registers: 4
> [ 0.125926] ... value mask: 0000ffffffffffff
> [ 0.131328] ... max period: 000000007fffffff
> [ 0.136734] ... fixed-purpose events: 3
> [ 0.140839] ... event mask: 000000070000000f
> [ 0.147297] ACPI: Core revision 20101013
> [ 0.175646] ftrace: allocating 24175 entries in 95 pages
> [ 0.190912] Setting APIC routing to flat
> [ 0.195562] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.211643] CPU0: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz stepping 01
> [ 0.325243] lockdep: fixing up alternatives.
> [ 0.330242] Booting Node 0, Processors #1lockdep: fixing up alternatives.
> [ 0.430140] #2lockdep: fixing up alternatives.
> [ 0.526962] #3lockdep: fixing up alternatives.
> [ 0.623755] #4lockdep: fixing up alternatives.
> [ 0.720588] Ok.
> [ 0.722525] Booting Node 1, Processors #5lockdep: fixing up alternatives.
> [ 0.822389] Ok.
> [ 0.824327] Booting Node 0, Processors #6
> [ 0.919089] TSC synchronization [CPU#0 -> CPU#6]:
> [ 0.924155] Measured 296 cycles TSC warp between CPUs, turning off TSC clock.
> [ 0.003999] Marking TSC unstable due to check_tsc_sync_source failed
> [ 0.557048] lockdep: fixing up alternatives.
> [ 0.558041] Ok.
> [ 0.559004] Booting Node 1, Processors #7 Ok.
> [ 0.632157] Brought up 8 CPUs
> [ 0.633006] Total of 8 processors activated (44799.46 BogoMIPS).
assume that when you have
CONFIG_NR_CPUS=16
instead of
CONFIG_NR_CPUS=8
it will boot ok?
Thanks
Yinghai
next prev parent reply other threads:[~2010-11-13 19:12 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-11 10:06 [BUG 2.6.27-rc1] find_busiest_group() LOCKUP Wu Fengguang
2010-11-11 10:09 ` Wu Fengguang
2010-11-11 12:36 ` Peter Zijlstra
2010-11-11 12:40 ` Wu Fengguang
2010-11-11 13:04 ` Peter Zijlstra
2010-11-13 8:40 ` Wu Fengguang
2010-11-13 8:40 ` Wu Fengguang
2010-11-13 10:30 ` Peter Zijlstra
2010-11-13 10:30 ` Peter Zijlstra
2010-11-13 12:00 ` Wu Fengguang
2010-11-13 12:00 ` Wu Fengguang
2010-11-13 12:57 ` Peter Zijlstra
2010-11-13 12:57 ` Peter Zijlstra
2010-11-13 13:10 ` Wu Fengguang
2010-11-13 13:10 ` Wu Fengguang
2010-11-13 19:12 ` Yinghai Lu [this message]
2010-11-13 19:12 ` Yinghai Lu
2010-11-13 19:41 ` Peter Zijlstra
2010-11-13 19:41 ` Peter Zijlstra
2010-11-13 23:57 ` Wu Fengguang
2010-11-14 0:18 ` Yinghai Lu
2010-11-14 0:18 ` Yinghai Lu
2010-11-14 0:55 ` Wu Fengguang
2010-11-14 1:38 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num Yinghai Lu
2010-11-14 1:38 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-11-14 17:32 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
2010-11-14 17:32 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Wu Fengguang
2010-11-14 18:02 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Yinghai Lu
2010-11-14 18:02 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-11-14 18:19 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Yinghai Lu
2010-11-14 18:19 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-11-15 1:22 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Wu Fengguang
2010-11-15 1:22 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Wu Fengguang
2010-12-15 22:01 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu H. Peter Anvin
2010-12-15 22:01 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation H. Peter Anvin
2010-12-15 22:40 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Yinghai Lu
2010-12-15 22:40 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-12-15 22:53 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu H. Peter Anvin
2010-12-15 22:53 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation H. Peter Anvin
2010-12-15 22:57 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu Yinghai Lu
2010-12-15 22:57 ` [PATCH] x86, acpi: Handle all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-12-17 3:09 ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit Yinghai Lu
2010-12-17 3:09 ` Yinghai Lu
2010-12-17 3:09 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu Yinghai Lu
2010-12-17 3:09 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-12-17 18:53 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Venkatesh Pallipadi
2010-12-17 18:53 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu num limitation Venkatesh Pallipadi
2010-12-17 19:27 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Yinghai Lu
2010-12-17 19:27 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-12-17 19:35 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have Kay Sievers
2010-12-17 19:35 ` [PATCH -v2 2/2] x86, acpi: Parse all SRAT cpu entries even have cpu num limitation Kay Sievers
2010-12-17 23:32 ` Venkatesh Pallipadi
2010-12-21 4:31 ` Venkatesh Pallipadi
2010-12-22 6:43 ` David Rientjes
2010-12-22 20:28 ` Venkatesh Pallipadi
2010-12-22 22:51 ` David Rientjes
2010-12-27 18:43 ` H. Peter Anvin
2010-12-23 23:22 ` [tip:x86/apic] x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation tip-bot for Yinghai Lu
2010-12-17 20:56 ` [PATCH 1/2] x86, acpi: add MAX_LOCAL_APIC for 32bit David Rientjes
2010-12-17 20:56 ` David Rientjes
2010-12-23 23:21 ` [tip:x86/apic] x86, acpi: Add " tip-bot for Yinghai Lu
2010-11-11 12:42 ` [BUG 2.6.27-rc1] find_busiest_group() LOCKUP Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CDEE314.6090107@kernel.org \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bjorn.helgaas@hp.com \
--cc=eric.dumazet@gmail.com \
--cc=fengguang.wu@intel.com \
--cc=knikanth@suse.de \
--cc=linux-hotplug@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=ncrao@google.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=shaohui.zheng@intel.com \
--cc=venki@google.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.