From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753517Ab1EOGBt (ORCPT ); Sun, 15 May 2011 02:01:49 -0400 Received: from e5.ny.us.ibm.com ([32.97.182.145]:58998 "EHLO e5.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751300Ab1EOGBr (ORCPT ); Sun, 15 May 2011 02:01:47 -0400 Date: Sat, 14 May 2011 23:01:43 -0700 From: "Paul E. McKenney" To: Yinghai Lu Cc: Ingo Molnar , linux-kernel@vger.kernel.org Subject: Re: [GIT PULL rcu/next] rcu commits for 2.6.40 Message-ID: <20110515060143.GE2258@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <4DCB8BCD.1080607@kernel.org> <4DCB8F7A.90603@kernel.org> <20110512092013.GJ2258@linux.vnet.ibm.com> <4DCC52FB.6030500@kernel.org> <20110514142621.GB2258@linux.vnet.ibm.com> <20110514153118.GA24311@linux.vnet.ibm.com> <20110514183453.GA32756@linux.vnet.ibm.com> <4DCF5322.7030305@kernel.org> <4DCF6789.70701@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DCF6789.70701@kernel.org> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 14, 2011 at 10:41:29PM -0700, Yinghai Lu wrote: > On 05/14/2011 09:14 PM, Yinghai Lu wrote: > > On 05/14/2011 11:34 AM, Paul E. McKenney wrote: > >>> and do the inspection afterwards. > >> > >> And here is a lightly-tested patch, which applies on tip/core/rcu. > >> > >> This problem could account for both the long delays seen with e59fb312 > >> (Decrease memory-barrier usage based on semi-formal proof) and the > >> shorter delays seen with a26ac245 (move TREE_RCU from softirq to kthread). > > > > yes. it fixes the problem. > > > > for 1024g system when hotadd mem enabled in kernel config > > > > [ 31.814803] cpu_dev_init done > > [ 35.437163] memory_dev_init done > > > > even it is with gcc from opensuse 11.3 > > got: > > [ 86.931217] Switched to NOHz mode on CPU #0 > [ 86.931272] Switched to NOHz mode on CPU #25 > [ 86.931278] ------------[ cut here ]------------ > [ 86.931290] WARNING: at kernel/rcutree.c:364 rcu_enter_nohz+0x44/0x76() > [ 86.931294] Hardware name: Sun Fire X4800 M2 > [ 86.931297] Modules linked in: > [ 86.931303] Pid: 0, comm: swapper Not tainted 2.6.39-rc7-tip-yh-04836-g5e42dc2-dirty #3 > [ 86.931307] Call Trace: > [ 86.931333] [] warn_slowpath_common+0x85/0x9d > [ 86.931338] Switched to NOHz mode on CPU #74 > [ 86.931346] [] warn_slowpath_null+0x1a/0x1c > [ 86.931356] [] rcu_enter_nohz+0x44/0x76 > [ 86.931370] [] tick_nohz_stop_sched_tick+0x27d/0x366 > [ 86.931381] [] cpu_idle+0x7a/0xcc > [ 86.931397] [] rest_init+0xb7/0xbe > [ 86.931408] [] ? csum_partial_copy_generic+0x16c/0x16c > [ 86.931423] [] start_kernel+0x3b2/0x3bd > [ 86.931428] Switched to NOHz mode on CPU #94 > [ 86.931436] [] x86_64_start_reservations+0x9c/0xa0 > [ 86.931446] [] x86_64_start_kernel+0x1d8/0x1e3 > [ 86.931463] ---[ end trace 2cfc591bf7de931f ]--- > [ 86.931598] Switched to NOHz mode on CPU #151 > [ 86.931613] Switched to NOHz mode on CPU #152 As I expected! There is a dyntick entry/exit mismatch somewhere. I haven't yet been able to find it by inspection, and I cannot reproduce on the systems that I have access to. One way that this could happen is if the interrupt-exit code on your architecture sometimes failed to call irq_exit(). Thanx, Paul