From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751602AbdBNAQD (ORCPT ); Mon, 13 Feb 2017 19:16:03 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43346 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751422AbdBNAQC (ORCPT ); Mon, 13 Feb 2017 19:16:02 -0500 Date: Mon, 13 Feb 2017 16:16:00 -0800 From: "Paul E. McKenney" To: Tejun Heo Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: Is it really safe to use workqueues to drive expedited grace periods? Reply-To: paulmck@linux.vnet.ibm.com References: <20170210212158.GA20183@linux.vnet.ibm.com> <20170211023541.GB19050@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170211023541.GB19050@mtj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17021400-8235-0000-0000-00000AF40B32 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006611; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000203; SDB=6.00821554; UDB=6.00401819; IPR=6.00599007; BA=6.00005133; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014271; XFM=3.00000011; UTC=2017-02-14 00:15:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17021400-8236-0000-0000-000039913088 Message-Id: <20170214001600.GZ30506@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-02-13_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702140000 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 11, 2017 at 11:35:41AM +0900, Tejun Heo wrote: > Hello, Paul. > > On Fri, Feb 10, 2017 at 01:21:58PM -0800, Paul E. McKenney wrote: > > So RCU's expedited grace periods have been using workqueues for a > > little while, and things seem to be working. But as usual, I worry... > > Is this use subject to some sort of deadlock where RCU's workqueue cannot > > start running until after a grace period completes, but that grace > > period is the one needing the workqueue? Note that there are ways to > > set up your kernel so that all RCU grace periods are expedited. > > > > Should I be worried? If not, what prevents this from being a problem, > > especially given that workqueue handlers are allowed to wait for RCU > > grace periods to complete? > > A per-cpu (normal) workqueue's concurrency is regulated automatically > so that there are at least one worker running for the worker pool on a > given CPU. > > Let's say there are two work items queued on a workqueue. The first > one is something which will do synchronize_rcu() and the second is the > expedited grace period work item. When the first one runs > synchronize_rcu(), it'd block. If there are no other work items > running at the time, workqueue will dispatch another worker so that > there's at least one actively running, which in this case will be the > expedited rcu grace period work item. > > The dispatching of a new worker can be delayed by two things - memory > pressure preventing creation of a new worker and the workqueue hitting > maximum concurrency limit. > > If expedited RCU grace period is something that memory reclaim path > may depend on, the workqueue that it executes on should have > WQ_MEM_RECLAIM set, which will guarantee that there's at least one > worker (across all CPUs) which is ready to serve the work items on > that workqueue regardless of memory pressure. > > The latter, concurrency limit, would only matter if the RCU work items > use system_wq. system_wq's concurrency limit is very high (512 per > CPU), but it is theoretically possible to fill all up with work items > doing synchronize_rcu() with the expedited RCU work item scheduled > behind it. The system would already be in a very messed up state > outside the RCU situation tho. Thank you for the information! So if I am to continue using workqueues for expedited RCU grace periods, I believe that need to do the following: 1. Use alloc_workqueue() to create my own WQ_MEM_RECLAIM workqueue. 2. Rework my workqueue handler to avoid blocking waiting for the expedited grace period to complete. I should be able to do a small number of timed wait, but if I actually wait for the grace period to complete, I might end up hogging the reserved items. (Or does my workqueue supply them for me? If so, so much the better!) 3. Concurrency would not be a problem -- there can be no more four work elements in flight across both possible flavors of expedited grace periods. Anything I am missing here? Thanx, Paul