From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752129AbZAZHCK (ORCPT ); Mon, 26 Jan 2009 02:02:10 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751313AbZAZHBz (ORCPT ); Mon, 26 Jan 2009 02:01:55 -0500 Received: from smtp1.linux-foundation.org ([140.211.169.13]:49758 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750863AbZAZHBz (ORCPT ); Mon, 26 Jan 2009 02:01:55 -0500 Date: Sun, 25 Jan 2009 23:01:30 -0800 From: Andrew Morton To: Rusty Russell Cc: Mike Travis , Ingo Molnar , Dave Jones , cpufreq@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/3] work_on_cpu: Use our own workqueue. Message-Id: <20090125230130.bcdab2e5.akpm@linux-foundation.org> In-Reply-To: <200901261711.43943.rusty@rustcorp.com.au> References: <20090116191108.135927000@polaris-admin.engr.sgi.com> <20090116191108.533053000@polaris-admin.engr.sgi.com> <20090124001537.7cfde78e.akpm@linux-foundation.org> <200901261711.43943.rusty@rustcorp.com.au> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 26 Jan 2009 17:11:43 +1030 Rusty Russell wrote: > On Saturday 24 January 2009 18:45:37 Andrew Morton wrote: > > On Fri, 16 Jan 2009 11:11:10 -0800 Mike Travis wrote: > > > > > From: Rusty Russell > > > > > > Impact: remove potential clashes with generic kevent workqueue > > > > > > Annoyingly, some places we want to use work_on_cpu are already in > > > workqueues. As per Ingo's suggestion, we create a different workqueue > > > for work_on_cpu. > > > > > > Signed-off-by: Rusty Russell > > > Signed-off-by: Mike Travis > > > --- > > > kernel/workqueue.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > --- linux-2.6-for-ingo.orig/kernel/workqueue.c > > > +++ linux-2.6-for-ingo/kernel/workqueue.c > > > @@ -971,6 +971,8 @@ undo: > > > } > > > > > > #ifdef CONFIG_SMP > > > +static struct workqueue_struct *work_on_cpu_wq __read_mostly; > > > > Pity the poor reader who comes along trying to work out why this exists. (chirp, chirp) > > > struct work_for_cpu { > > > struct work_struct work; > > > long (*fn)(void *); > > > @@ -1001,7 +1003,7 @@ long work_on_cpu(unsigned int cpu, long > > > INIT_WORK(&wfc.work, do_work_for_cpu); > > > wfc.fn = fn; > > > wfc.arg = arg; > > > - schedule_work_on(cpu, &wfc.work); > > > + queue_work_on(cpu, work_on_cpu_wq, &wfc.work); > > > flush_work(&wfc.work); > > > > > > return wfc.ret; > > > @@ -1019,4 +1021,8 @@ void __init init_workqueues(void) > > > hotcpu_notifier(workqueue_cpu_callback, 0); > > > keventd_wq = create_workqueue("events"); > > > BUG_ON(!keventd_wq); > > > +#ifdef CONFIG_SMP > > > + work_on_cpu_wq = create_workqueue("work_on_cpu"); > > > + BUG_ON(!work_on_cpu_wq); > > > +#endif > > > > Yet another kernel thread for each CPU. All because of some dung way > > down in arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c. > > > > Is there no other way? > > Perhaps, but this works. Trying to be clever got me into this mess in the first place. > > We could stop using workqueues and change work_on_cpu to create a thread every time, which would give it a new failure mode so I don't know that everyone could use it any more. Or we could keep a single thread around to do all the cpus, and duplicate much of the workqueue code. > > None of these options are appealing... Can we try harder please? 10 screenfuls of kernel threads in the ps output is just irritating. How about banning the use of work_on_cpu() from schedule_work() handlers and then fixing that driver somehow? What _is_ the bug anyway? The only description we were given was Impact: remove potential clashes with generic kevent workqueue Annoyingly, some places we want to use work_on_cpu are already in workqueues. As per Ingo's suggestion, we create a different workqueue for work_on_cpu. which didn't bother telling anyone squat. When was this bug added? Was it added into that driver or was it due to infrastructural changes?