From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754387AbZBITRv (ORCPT ); Mon, 9 Feb 2009 14:17:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754002AbZBITRm (ORCPT ); Mon, 9 Feb 2009 14:17:42 -0500 Received: from mx2.redhat.com ([66.187.237.31]:34273 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753914AbZBITRl (ORCPT ); Mon, 9 Feb 2009 14:17:41 -0500 Date: Mon, 9 Feb 2009 20:14:05 +0100 From: Oleg Nesterov To: Lai Jiangshan Cc: Peter Zijlstra , Ingo Molnar , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Andrew Morton , Eric Dumazet , Linux Kernel Mailing List Subject: Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue Message-ID: <20090209191405.GA4561@redhat.com> References: <497838F0.7020408@cn.fujitsu.com> <20090122093046.GC5891@nowhere> <20090122093649.GD24758@elte.hu> <1232622615.4890.114.camel@laptop> <498AA0F1.2030003@cn.fujitsu.com> <498B9675.3000202@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <498B9675.3000202@cn.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/06, Lai Jiangshan wrote: > > 1) lockdep will complain when recursion run_workqueue() > 2) The recursive implement of run_workqueue() makes flush_workqueue() > and it's doc are inconsistent. It may hide deadlock and other bugs. > 3) recursion run_workqueue() will poison cwq->current_work, > but flush_work() and __cancel_work_timer() ...etc. need > reliable cwq->current_work. I think this change is good. If we still have users which call flush from work->func() they should be fixed, imho. And while I knew this recursive flush is bad, I didn't realize how bad it is until Lai spelled this. Thanks. Acked-by: Oleg Nesterov > Signed-off-by: Lai Jiangshan > --- > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2f44583..1129cde 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > struct workqueue_struct *wq; > struct task_struct *thread; > - > - int run_depth; /* Detect run_workqueue() recursion depth */ > } ____cacheline_aligned; > > /* > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > static void run_workqueue(struct cpu_workqueue_struct *cwq) > { > spin_lock_irq(&cwq->lock); > - cwq->run_depth++; > - if (cwq->run_depth > 3) { > - /* morton gets to eat his hat */ > - printk("%s: recursion depth exceeded: %d\n", > - __func__, cwq->run_depth); > - dump_stack(); > - } > while (!list_empty(&cwq->worklist)) { > struct work_struct *work = list_entry(cwq->worklist.next, > struct work_struct, entry); > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > spin_lock_irq(&cwq->lock); > cwq->current_work = NULL; > } > - cwq->run_depth--; > spin_unlock_irq(&cwq->lock); > } > > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > { > - int active; > + int active = 0; > + struct wq_barrier barr; > > - if (cwq->thread == current) { > - /* > - * Probably keventd trying to flush its own queue. So simply run > - * it by hand rather than deadlocking. > - */ > - run_workqueue(cwq); > - active = 1; > - } else { > - struct wq_barrier barr; > + WARN_ON(cwq->thread == current); > > - active = 0; > - spin_lock_irq(&cwq->lock); > - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > - insert_wq_barrier(cwq, &barr, &cwq->worklist); > - active = 1; > - } > - spin_unlock_irq(&cwq->lock); > - > - if (active) > - wait_for_completion(&barr.done); > + spin_lock_irq(&cwq->lock); > + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > + insert_wq_barrier(cwq, &barr, &cwq->worklist); > + active = 1; > } > + spin_unlock_irq(&cwq->lock); > + > + if (active) > + wait_for_completion(&barr.done); > > return active; > } >