From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758559AbYJGVXX (ORCPT ); Tue, 7 Oct 2008 17:23:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757323AbYJGVWd (ORCPT ); Tue, 7 Oct 2008 17:22:33 -0400 Received: from e1.ny.us.ibm.com ([32.97.182.141]:44596 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758455AbYJGVWS (ORCPT ); Tue, 7 Oct 2008 17:22:18 -0400 Date: Tue, 7 Oct 2008 14:22:15 -0700 From: "Paul E. McKenney" To: Andi Kleen Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, rjw@sisk.pl, dipankar@in.ibm.com, tglx@linuxtronix.de Subject: Re: RCU hang on cpu re-hotplug with 2.6.27rc8 Message-ID: <20081007212215.GN6384@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20081006141220.GA14160@basil.nowhere.org> <20081006232837.GA1157@basil.nowhere.org> <20081007030822.GC6820@linux.vnet.ibm.com> <20081007071544.GC20740@one.firstfloor.org> <20081007152629.GH6384@linux.vnet.ibm.com> <20081007154939.GN20740@one.firstfloor.org> <20081007163401.GJ6384@linux.vnet.ibm.com> <20081007210947.GP20740@one.firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081007210947.GP20740@one.firstfloor.org> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 07, 2008 at 11:09:47PM +0200, Andi Kleen wrote: > On Tue, Oct 07, 2008 at 09:34:01AM -0700, Paul E. McKenney wrote: > > Thank you! Hmmm, classic RCU, worked just fine in 2.6.27-rc7 with > > Thomas's patch. I was doing random onlines and offlines in a loop, > > with about 3 seconds between each operation continuously for more than > > ten hours, both x86 and Power. So could you please try 2.6.27-rc7 with > > Thomas's patch as follows? > > > > http://www.rdrop.com/users/paulmck/patches/2.6.27-rc7-tglx-timer-1.patch > > Same effect. Hung on the first try > > bash D 00000000ffff25c1 0 4755 4742 > ffff88027b127bf8 0000000000000086 ffff88027b127c18 0000000000000296 > ffff88027c80b330 ffff8804be488b90 ffff88027c80b578 0000000300000296 > ffff88027b127c18 ffffffff808cbd18 ffff88002805d600 ffff88027d182098 > Call Trace: > [] schedule_timeout+0x22/0xb4 > [] ? __switch_to+0x320/0x330 > [] ? cpupri_set+0xc5/0xd8 > [] wait_for_common+0xcd/0x131 > [] ? default_wake_function+0x0/0xf > [] wait_for_completion+0x18/0x1a > [] synchronize_rcu+0x35/0x3c > [] ? wakeme_after_rcu+0x0/0x12 > [] partition_sched_domains+0x9b/0x1dd > [] ? wake_up_process+0x10/0x12 > [] update_sched_domains+0x2e/0x35 > [] notifier_call_chain+0x33/0x5b > [] __raw_notifier_call_chain+0x9/0xb > [] raw_notifier_call_chain+0xf/0x11 > [] _cpu_up+0xd3/0x10c > [] cpu_up+0x57/0x67 > [] store_online+0x4d/0x75 > [] sysdev_store+0x1b/0x1d > [] sysfs_write_file+0xe0/0x11c > [] vfs_write+0xae/0x137 > [] sys_write+0x47/0x6f > [] system_call_fastpath+0x16/0x1b Thus far, as usual, I cannot reproduce, either on x86 or Power. You are running on hyperthreaded machines? If so, what happens if you disable CONFIG_SCHED_SMT and CONFIG_SCHED_MC? You are running on a 16-CPU x86-64 box? Thanx, Paul