From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753596AbcCWB7h (ORCPT ); Tue, 22 Mar 2016 21:59:37 -0400 Received: from e17.ny.us.ibm.com ([129.33.205.207]:39772 "EHLO e17.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753131AbcCWB73 (ORCPT ); Tue, 22 Mar 2016 21:59:29 -0400 X-IBM-Helo: d01dlp01.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 22 Mar 2016 18:59:32 -0700 From: "Paul E. McKenney" To: Bart Van Assche Cc: "linux-kernel@vger.kernel.org" Subject: Re: RCU stall Message-ID: <20160323015932.GX4287@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <56F1A8F2.9000905@sandisk.com> <20160322204510.GS4287@linux.vnet.ibm.com> <56F1DAF6.3030804@sandisk.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56F1DAF6.3030804@sandisk.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16032301-0041-0000-0000-000003AB5367 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 22, 2016 at 04:53:26PM -0700, Bart Van Assche wrote: > On 03/22/2016 01:45 PM, Paul E. McKenney wrote: > >You are getting a soft lockup as well as an RCU CPU stall warning, so > >it looks like something is taking a very long time in blk_done_softirq(). > > > >You have multiple occurrences at different times, so it looks to be > >a long time as opposed to an infinite time. Are you perhaps doing > >something that would make a huge amount of work for blk_done_softirq()? > > > >See Documentation/RCU/stallwarn.txt in the kernel source tree for more > >info on how to debug this sort of thing. > > Hello Paul, > > None of the drivers involved in the test I ran contain RCU code that > has been changed recently. The block and SCSI subsystems processes > I/O completions in softirq context but until last week I hadn't seen > any RCU lockup complaints when I ran an SRP test against a kernel > with lockdep and several other kernel debugging options enabled. > This is why I sent an e-mail to you. I have read > Documentation/RCU/stallwarn.txt after I received your reply but this > didn't provide me any clue about where to look for the root cause. > Any further help would be appreciated. My suggestion would be to check the block/SCSI softirq handler for event traces. If there are some, enable them and see what the loop is doing. Documentation/trace/ftrace.txt describes how to enable existing event tracing. If there is no event tracing, consider adding some in your local view. Failing that, there is always printk(). ;-) Or perhaps you have some sort of debug setup. Either way, the next step is to work out why that CPU is spending so much time in that loop. Thanx, Paul