From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754104AbbLATzA (ORCPT ); Tue, 1 Dec 2015 14:55:00 -0500 Received: from e32.co.us.ibm.com ([32.97.110.150]:53828 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750931AbbLATy6 (ORCPT ); Tue, 1 Dec 2015 14:54:58 -0500 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org;linux-rt-users@vger.kernel.org Date: Tue, 1 Dec 2015 11:55:37 -0800 From: "Paul E. McKenney" To: fupan li Cc: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: RCU stall and the system boot hang Message-ID: <20151201195537.GA13568@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151127162809.GM26643@linux.vnet.ibm.com> <20151128145342.GO26643@linux.vnet.ibm.com> <20151129060537.GP26643@linux.vnet.ibm.com> <20151130171918.GR26643@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151130171918.GR26643@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15120119-0005-0000-0000-00001A41FA28 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 30, 2015 at 09:19:18AM -0800, Paul E. McKenney wrote: > On Mon, Nov 30, 2015 at 02:54:13PM +0800, fupan li wrote: [ . . . ] > > No, just a normal boot, and these stalls were happened before > > systemd services running. > > Interesting. My testing show v4.1 being OK, with the first issues showing > up somewhere between v4.1 and v4.2. Or at least v4.1 is reliable enough > that is passes 42 hours of focused rcutorture testing, where v4.2 tends > to fail in under two hours. And it seems to happen only on multisocket > systems -- I seem to be able to hammer as hard as I want on my four-core > (eight hardware thread) laptop without an issue. And I take it back. After beating on it for the better part of a week, I did get one failure on my single-socket laptop. So maybe I need to make my rcutorture scripts force tests to cross socket boundaries where possible... Thanx, Paul