From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760684AbZBESDr (ORCPT ); Thu, 5 Feb 2009 13:03:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755085AbZBESDc (ORCPT ); Thu, 5 Feb 2009 13:03:32 -0500 Received: from mx2.redhat.com ([66.187.237.31]:48966 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756098AbZBESDa (ORCPT ); Thu, 5 Feb 2009 13:03:30 -0500 Date: Thu, 5 Feb 2009 19:00:15 +0100 From: Oleg Nesterov To: Frederic Weisbecker Cc: Lai Jiangshan , Peter Zijlstra , Ingo Molnar , Andrew Morton , Eric Dumazet , Linux Kernel Mailing List Subject: Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue Message-ID: <20090205180015.GA28738@redhat.com> References: <497838F0.7020408@cn.fujitsu.com> <20090122093046.GC5891@nowhere> <20090122093649.GD24758@elte.hu> <1232622615.4890.114.camel@laptop> <498AA0F1.2030003@cn.fujitsu.com> <20090205170156.GA25517@redhat.com> <20090205172429.GA23531@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090205172429.GA23531@nowhere> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/05, Frederic Weisbecker wrote: > > On Thu, Feb 05, 2009 at 06:01:56PM +0100, Oleg Nesterov wrote: > > On 02/05, Lai Jiangshan wrote: > > > > > > DEADLOCK EXAMPLE for explain my above option: > > > > > > (work_func0() and work_func1() are work callback, and they > > > calls flush_workqueue()) > > > > > > CPU#0 CPU#1 > > > run_workqueue() run_workqueue() > > > work_func0() work_func1() > > > flush_workqueue() flush_workqueue() > > > flush_cpu_workqueue(0) . > > > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > > > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 > > > > > > DEADLOCK! > > > > I am not sure. Note that when work_func0() calls run_workqueue(), > > it will clear cwq->current_work, so another flush_ on CPU#1 will > > not wait for work_func0, no? > > No but CPU#1 can wait for a completion that will never be done, because > CWQ#0 is waiting for CWQ#1. Still can't understand. When work_func0()->run_workqueue() returns, we should have no works in ->worklist and ->current_work must be NULL. If we have a barrier which was inserted before - it should be flushed. But yes, deadlock is possible, if other works come after run_workqueue() returns and before work_func1() starts the flush. Just the description is not exactly accurate, imho. And we have other problems. Just to say, nothing can guarantee that run_workqueue() will ever return. It is correct if some work_struct always re-queues itself and should be cancelled before destroy_workqueue(). Oleg.