From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754271AbZAZWYm (ORCPT ); Mon, 26 Jan 2009 17:24:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752004AbZAZWYe (ORCPT ); Mon, 26 Jan 2009 17:24:34 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:56408 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751969AbZAZWYc (ORCPT ); Mon, 26 Jan 2009 17:24:32 -0500 Date: Mon, 26 Jan 2009 23:24:05 +0100 From: Ingo Molnar To: Oleg Nesterov Cc: Andrew Morton , a.p.zijlstra@chello.nl, rusty@rustcorp.com.au, travis@sgi.com, mingo@redhat.com, davej@redhat.com, cpufreq@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] work_on_cpu: Use our own workqueue. Message-ID: <20090126222405.GA15896@elte.hu> References: <20090125230130.bcdab2e5.akpm@linux-foundation.org> <20090126171618.GA32091@elte.hu> <20090126103529.cb124a58.akpm@linux-foundation.org> <20090126202022.GA8867@elte.hu> <20090126130046.37b8f34e.akpm@linux-foundation.org> <20090126212727.GA13670@elte.hu> <20090126133551.fab5e27a.akpm@linux-foundation.org> <20090126214516.GA22142@elte.hu> <20090126140116.35f9c173.akpm@linux-foundation.org> <20090126221502.GA4542@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090126221502.GA4542@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Oleg Nesterov wrote: > On 01/26, Andrew Morton wrote: > > > > On Mon, 26 Jan 2009 22:45:16 +0100 > > Ingo Molnar wrote: > > > > > that would change the concept of execution but indeed it would be > > > interesting to try. It's outside the scope of late -rcs i guess, but > > > worthwile nevertheless. > > > > > > > Well it turns out that I was having a less-than-usually-senile moment: > > > > : commit b89deed32ccc96098bd6bc953c64bba6b847774f > > : Author: Oleg Nesterov > > : AuthorDate: Wed May 9 02:33:52 2007 -0700 > > : Commit: Linus Torvalds > > : CommitDate: Wed May 9 12:30:50 2007 -0700 > > : > > : implement flush_work() > > : > > : A basic problem with flush_scheduled_work() is that it blocks behind _all_ > > : presently-queued works, rather than just the work whcih the caller wants to > > : flush. If the caller holds some lock, and if one of the queued work happens > > : to want that lock as well then accidental deadlocks can occur. > > : > > : One example of this is the phy layer: it wants to flush work while holding > > : rtnl_lock(). But if a linkwatch event happens to be queued, the phy code will > > : deadlock because the linkwatch callback function takes rtnl_lock. > > : > > : So we implement a new function which will flush a *single* work - just the one > > : which the caller wants to free up. Thus we avoid the accidental deadlocks > > : which can arise from unrelated subsystems' callbacks taking shared locks. > > : > > : flush_work() non-blockingly dequeues the work_struct which we want to kill, > > : then it waits for its handler to complete on all CPUs. > > : > > : Add ->current_work to the "struct cpu_workqueue_struct", it points to > > : currently running "struct work_struct". When flush_work(work) detects > > : ->current_work == work, it inserts a barrier at the _head_ of ->worklist > > : (and thus right _after_ that work) and waits for completition. This means > > : that the next work fired on that CPU will be this barrier, or another > > : barrier queued by concurrent flush_work(), so the caller of flush_work() > > : will be woken before any "regular" work has a chance to run. > > : > > : When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect > > : against CPU hotplug), CPU may go away. But in that case take_over_work() will > > : move a barrier we queued to another CPU, it will be fired sometime, and > > : wait_on_work() will be woken. > > : > > : Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before > > : take_over_work(), so cwq->thread should complete its ->worklist (and thus > > : the barrier), because currently we don't check kthread_should_stop() in > > : run_workqueue(). But even if we did, everything should be ok. > > > > > > Why isn't that working in this case?? > > Cough. Because that "flush_work()" was renamed to cancel_work_sync(). > Because it really cancells the work_struct if it can. > > Now we have flush_work() which does not cancel, but waits for completion > of the single work_struct. Of course, it can hang if the caller holds > the lock which can be taken by another work in that workqueue. > > Oleg. Andrew's suggestion does make sense though: for any not-in-progress worklet we can dequeue that worklet and execute it in the flushing context. [ And if that worklet cannot be dequeued because it's being processed then that's fine and we can wait on that single worklet, without waiting on any other 'unrelated' worklets. ] That does not help work_on_cpu() though: that facility really uses the fact that workqueues are implemented via per CPU threads - hence we cannot remove the worklet from the queue and execute it in the flushing context. Ingo