From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751179AbdEAS6W (ORCPT ); Mon, 1 May 2017 14:58:22 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:34194 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750725AbdEAS6T (ORCPT ); Mon, 1 May 2017 14:58:19 -0400 Date: Mon, 1 May 2017 11:58:19 -0700 From: "Paul E. McKenney" To: Tejun Heo Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Reply-To: paulmck@linux.vnet.ibm.com References: <20170501165747.GA993@linux.vnet.ibm.com> <20170501183807.GA7054@linux.vnet.ibm.com> <20170501184402.GB8921@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170501184402.GB8921@htj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17050118-0008-0000-0000-0000020BEB75 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007007; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00854823; UDB=6.00422997; IPR=6.00633983; BA=6.00005321; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015258; XFM=3.00000014; UTC=2017-05-01 18:58:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17050118-0009-0000-0000-0000350C763B Message-Id: <20170501185819.GJ3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-01_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705010118 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 01, 2017 at 02:44:02PM -0400, Tejun Heo wrote: > Hello, Paul. > > On Mon, May 01, 2017 at 11:38:07AM -0700, Paul E. McKenney wrote: > > On Mon, May 01, 2017 at 09:57:47AM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I am hitting this WARN_ON_ONCE() in process_one_work() and am wondering > > > what I did wrong to make this happen: > > > > Oh, wait... Rescuer, it says. Might this be due to the fact that RCU's > > expedited grace periods block within a workqueue handler? Might this > > in turn run the system out of workqueue kthreads? If this is the likely > > cause, my approach would be to rework the expected-grace-period workqueue > > handler to return when waiting for the grace period to complete, and to > > replace the current wakeup with a schedule_work() or something similar. > > That should be completely fine. It could just be that the rescuer > path has a bug around CPU hotplug handling. Can you please confirm > either way on the cpuset usage? I have no explicit cpuset usage or affinity of the workqueue handlers themselves. However, this is thus far only happening in CONFIG_NO_HZ_FULL=y runs, in this case, with the kernel boot parameter nohz_full=2-9 out of 16 CPUs. IIRC, this sets up a "housekeeping" cpuset that pushes normal tasks away from the nohz_full CPUs. I do build with CONFIG_HOTPLUG_CPU=y, and the test does a lot of hotplugging. Also, other kthreads (but again, not the workqueue handlers) do a lot of explicit CPU-affinity manipulation. Thanx, Paul