From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759089AbcK3VD6 (ORCPT ); Wed, 30 Nov 2016 16:03:58 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35899 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757058AbcK3VB4 (ORCPT ); Wed, 30 Nov 2016 16:01:56 -0500 Date: Wed, 30 Nov 2016 13:01:52 -0800 From: "Paul E. McKenney" To: Guenter Roeck Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , sparclinux@vger.kernel.org, davem@davemloft.net Subject: Re: next: Commit 'mm: Prevent __alloc_pages_nodemask() RCU CPU stall ...' causing hang on sparc32 qemu Reply-To: paulmck@linux.vnet.ibm.com References: <20161129212308.GA12447@roeck-us.net> <20161130012817.GH3924@linux.vnet.ibm.com> <20161130070212.GM3924@linux.vnet.ibm.com> <929f6b29-461a-6e94-fcfd-710c3da789e9@roeck-us.net> <20161130120333.GQ3924@linux.vnet.ibm.com> <20161130192159.GB22216@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161130192159.GB22216@roeck-us.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16113021-0008-0000-0000-000006378036 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006169; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000193; SDB=6.00787514; UDB=6.00380947; IPR=6.00565194; BA=6.00004933; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013494; XFM=3.00000011; UTC=2016-11-30 21:01:54 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16113021-0009-0000-0000-00003D7780D0 Message-Id: <20161130210152.GL3924@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-30_16:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611300336 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 30, 2016 at 11:21:59AM -0800, Guenter Roeck wrote: > On Wed, Nov 30, 2016 at 04:03:33AM -0800, Paul E. McKenney wrote: > > On Wed, Nov 30, 2016 at 02:52:11AM -0800, Guenter Roeck wrote: > > > On 11/29/2016 11:02 PM, Paul E. McKenney wrote: > > > >On Tue, Nov 29, 2016 at 08:32:51PM -0800, Guenter Roeck wrote: > > > >>On 11/29/2016 05:28 PM, Paul E. McKenney wrote: > > > >>>On Tue, Nov 29, 2016 at 01:23:08PM -0800, Guenter Roeck wrote: > > > >>>>Hi Paul, > > > >>>> > > > >>>>most of my qemu tests for sparc32 targets started to fail in next-20161129. > > > >>>>The problem is only seen in SMP builds; non-SMP builds are fine. > > > >>>>Bisect points to commit 2d66cccd73436 ("mm: Prevent __alloc_pages_nodemask() > > > >>>>RCU CPU stall warnings"); reverting that commit fixes the problem. > > > > And I have dropped this patch. Michal Hocko showed me the error of > > my ways with this patch. > > > > :-) > > On another note, I still get RCU tracebacks in the s390 tests. > > BUG: sleeping function called from invalid context at mm/page_alloc.c:3775 > > That is caused by 'rcu: Maintain special bits at bottom of ->dynticks counter'; > if I recall correctly we had discussed that earlier. Indeed, I had missed a dyntick counter update back on Nov 11, which meant that some of the code was still looking at the low-order bit instead of the next bit up. This is now fixed. So to get to the error message you call out above, I need to have improperly left the system in bh state or left irqs disabled, while the system was running normally without an oops. I am having a hard time seeing how this patch can do that. I would be more suspicious of f2a471ffc8a8 ("rcu: Allow boot-time use of cond_resched_rcu_qs()"). So you bisected or did a revert to work out which was the offending commit? Thanx, Paul