From: Mike Travis <travis@sgi.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>, Dave Jones <davej@redhat.com>,
cpufreq@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] work_on_cpu: Use our own workqueue.
Date: Wed, 28 Jan 2009 09:19:24 -0800 [thread overview]
Message-ID: <4980939C.6010102@sgi.com> (raw)
In-Reply-To: <200901282332.29429.rusty@rustcorp.com.au>
Hi Rusty,
I'm testing this now on x86_64 and one question comes up. The
initialization of the woc_wq thread happens quite late. Might it
be better to initialize it earlier?
(I haven't tested all work_on_cpu callers yet, plus there are a
couple that are waiting in the queue that I'm also testing.)
Oops, one problem just popped up. After running a couple of
offline/online tests, this came up:
[ 1080.185022] INFO: task kwork_on_cpu:491 blocked for more than 480 seconds.
[ 1080.205849] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.229745] kwork_on_cpu D 0000000000000000 3432 491 2
[ 1080.229750] ffff88012e043ec0 0000000000000046 000000002e043e60 ffffffff81d5ef00
[ 1080.229754] ffffffff81d5ef00 ffffffff81d5ef00 ffffffff81d5ef00 ffffffff81d5ef00
[ 1080.229757] ffffffff81d5ef00 ffffffff81d58580 ffffffff81d5ef00 ffff88012e03ab40
[ 1080.229760] Call Trace:
[ 1080.229770] [<ffffffff8106e241>] ? trace_hardirqs_on+0xd/0xf
[ 1080.229774] [<ffffffff810a6bc5>] do_work_on_cpu+0x72/0x13d
[ 1080.229778] [<ffffffff8105d994>] ? autoremove_wake_function+0x0/0x3d
[ 1080.229781] [<ffffffff810a6b53>] ? do_work_on_cpu+0x0/0x13d
[ 1080.229783] [<ffffffff810a6b53>] ? do_work_on_cpu+0x0/0x13d
[ 1080.229786] [<ffffffff8105d809>] kthread+0x4e/0x7d
[ 1080.229790] [<ffffffff8100d7ba>] child_rip+0xa/0x20
[ 1080.229793] [<ffffffff8100d1bc>] ? restore_args+0x0/0x30
[ 1080.229795] [<ffffffff8105d796>] ? kthreadd+0x167/0x18c
[ 1080.229798] [<ffffffff8105d7bb>] ? kthread+0x0/0x7d
[ 1080.229801] [<ffffffff8100d7b0>] ? child_rip+0x0/0x20
[ 1080.229802] INFO: lockdep is turned off.
Thanks,
Mike
Rusty Russell wrote:
> On Tuesday 27 January 2009 17:55:19 Andrew Morton wrote:
> [ lots of good stuff ]
>
>> Making work_on_cu() truly generic is quite hard!
>
> Yes. What do you think of this workqueue-less approach?
>
> Subject: work_on_cpu: non-workqueue solution.
>
> work_on_cpu was designed to replace the meme of:
>
> cpumask_t saved = current->cpus_allowed;
> set_cpus_allowed(cpumask_of(dst));
> ... do something on cpu dst ...
> set_cpus_allowed(saved);
>
> It should have been a trivial routine, and a trivial conversion.
>
> The original version did a get_online_cpus(), then put the work on the
> kevent queue, called flush_work(), then put_online_cpus(). See
> 2d3854a37e8b767a51aba38ed6d22817b0631e33.
>
> Several locking warnings resulted from different users being converted
> over the months this patch was used leading up to 2.6.29. One was
> fixed to use smp_call_function_single
> (b2bb85549134c005e997e5a7ed303bda6a1ae738), but it's always risky to
> convert code which was running in user context into interrupt context.
> When another one was reported, we dropped the get_online_cpus() and
> relied on the callers to do that (as they should have been doing
> before) as seen in 31ad9081200c06ccc350625d41d1f8b2d1cef29f.
>
> But there was still a locking issue with
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c, so that conversion was
> reverted in Ingo's tree. Turns out that it the conversion caused
> work_on_cpu to be called from inside keventd in some paths.
>
> The obvious solution was to change to using our own workqueue, so we
> could use it in ACPI and so it would actually be a generically useful
> function as intended.
>
> Andrew Morton complained about NR_CPUS new threads. Which I happen to
> agree with. (He also bitched about the comment being too short, so
> this one is a fucking novel).
>
> While I was at linux.conf.au and avoiding reading my mail, a long
> discussion ensued (See lkml "[PATCH 2/3] work_on_cpu: Use our own
> workqueue."), including useful advice such as not ever grabbing a lock
> inside a work function. Or something; it gets a bit confusing.
>
> To make the pain stop, this attempts to create work_on_cpu without
> using workqueues at all. It does create one new thread. But as long
> as you don't call work_on_cpu inside work_on_cpu, or create normal
> locking inversions, it should work just peachy.
>
> FIXME: It includes work_on_cpu.h from workqueue.h to avoid chasing all
> the users, but they should include it directly.
>
> I tested it lightly (as seen in this patch), but my test machine
> doesn't hit any work_on_cpu paths.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> diff --git a/include/linux/work_on_cpu.h b/include/linux/work_on_cpu.h
> new file mode 100644
> --- /dev/null
> +++ b/include/linux/work_on_cpu.h
> @@ -0,0 +1,13 @@
> +#ifndef _LINUX_WORK_ON_CPU_H
> +#define _LINUX_WORK_ON_CPU_H
> +
> +#ifndef CONFIG_SMP
> +static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> + return fn(arg);
> +}
> +#else
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
> +#endif /* CONFIG_SMP */
> +
> +#endif /* _LINUX_WORK_ON_CPU_H */
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -9,6 +9,7 @@
> #include <linux/linkage.h>
> #include <linux/bitops.h>
> #include <linux/lockdep.h>
> +#include <linux/work_on_cpu.h>
> #include <asm/atomic.h>
>
> struct workqueue_struct;
> @@ -251,13 +252,4 @@ void cancel_rearming_delayed_work(struct
> {
> cancel_delayed_work_sync(work);
> }
> -
> -#ifndef CONFIG_SMP
> -static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> -{
> - return fn(arg);
> -}
> -#else
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
> -#endif /* CONFIG_SMP */
> #endif
> diff --git a/kernel/Makefile b/kernel/Makefile
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -92,6 +92,7 @@ obj-$(CONFIG_FUNCTION_TRACER) += trace/
> obj-$(CONFIG_FUNCTION_TRACER) += trace/
> obj-$(CONFIG_TRACING) += trace/
> obj-$(CONFIG_SMP) += sched_cpupri.o
> +obj-$(CONFIG_SMP) += work_on_cpu.o
>
> ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
> # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
> diff --git a/kernel/work_on_cpu.c b/kernel/work_on_cpu.c
> new file mode 100644
> --- /dev/null
> +++ b/kernel/work_on_cpu.c
> @@ -0,0 +1,108 @@
> +#include <linux/work_on_cpu.h>
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
> +#include <linux/completion.h>
> +#include <linux/cpumask.h>
> +#include <linux/module.h>
> +
> +/* The thread waits for new work on this waitqueue. */
> +static DECLARE_WAIT_QUEUE_HEAD(woc_wq);
> +/* The lock ensures only one job is done at a time. */
> +static DEFINE_MUTEX(woc_mutex);
> +
> +/* The details of the current job. */
> +struct work_for_cpu {
> + unsigned int cpu;
> + long (*fn)(void *);
> + void *arg;
> + long ret;
> + struct completion done;
> +};
> +
> +/* The pointer to the current job. NULL if nothing pending. */
> +static struct work_for_cpu *current_work;
> +
> +/* We leave our thread on whatever cpu it was on last. We can get
> + * batted onto another CPU by move_task_off_dead_cpu if that cpu goes
> + * down, but the caller of work_on_cpu() was supposed to ensure that
> + * doesn't happen, so it should only happen when idle. */
> +static int do_work_on_cpu(void *unused)
> +{
> + for (;;) {
> + struct completion *done;
> +
> + wait_event(woc_wq, current_work != NULL);
> +
> + set_cpus_allowed_ptr(current, cpumask_of(current_work->cpu));
> + WARN_ON(smp_processor_id() != current_work->cpu);
> +
> + current_work->ret = current_work->fn(current_work->arg);
> + /* Make sure ret is set before we complete(). Paranoia. */
> + wmb();
> +
> + /* Reset current_work so we don't spin. */
> + done = ¤t_work->done;
> + current_work = NULL;
> +
> + /* Reset current_work for next work_on_cpu(). */
> + complete(done);
> + }
> +}
> +
> +/**
> + * work_on_cpu - run a function in user context on a particular cpu
> + * @cpu: the cpu to run on
> + * @fn: the function to run
> + * @arg: the function arg
> + *
> + * This will return the value @fn returns.
> + * It is up to the caller to ensure that the cpu doesn't go offline.
> + */
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> + struct work_for_cpu work;
> +
> + work.cpu = cpu;
> + work.fn = fn;
> + work.arg = arg;
> + init_completion(&work.done);
> +
> + mutex_lock(&woc_mutex);
> + /* Make sure all is in place before it sees fn set. */
> + wmb();
> + current_work = &work;
> + wake_up(&woc_wq);
> +
> + wait_for_completion(&work.done);
> + BUG_ON(current_work);
> + mutex_unlock(&woc_mutex);
> +
> + return work.ret;
> +}
> +EXPORT_SYMBOL_GPL(work_on_cpu);
> +
> +#if 1
> +static long test_fn(void *arg)
> +{
> + printk("%u: %lu\n", smp_processor_id(), (long)arg);
> + return (long)arg + 100;
> +}
> +#endif
> +
> +static int __init init(void)
> +{
> + unsigned int i;
> +
> + kthread_run(do_work_on_cpu, NULL, "kwork_on_cpu");
> +
> +#if 1
> + for_each_online_cpu(i) {
> + long ret = work_on_cpu(i, test_fn, (void *)i);
> + printk("CPU %i returned %li\n", i, ret);
> + BUG_ON(ret != i + 100);
> + }
> +#endif
> +
> + return 0;
> +}
> +module_init(init);
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -970,47 +970,6 @@ undo:
> return ret;
> }
>
> -#ifdef CONFIG_SMP
> -static struct workqueue_struct *work_on_cpu_wq __read_mostly;
> -
> -struct work_for_cpu {
> - struct work_struct work;
> - long (*fn)(void *);
> - void *arg;
> - long ret;
> -};
> -
> -static void do_work_for_cpu(struct work_struct *w)
> -{
> - struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
> -
> - wfc->ret = wfc->fn(wfc->arg);
> -}
> -
> -/**
> - * work_on_cpu - run a function in user context on a particular cpu
> - * @cpu: the cpu to run on
> - * @fn: the function to run
> - * @arg: the function arg
> - *
> - * This will return the value @fn returns.
> - * It is up to the caller to ensure that the cpu doesn't go offline.
> - */
> -long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> -{
> - struct work_for_cpu wfc;
> -
> - INIT_WORK(&wfc.work, do_work_for_cpu);
> - wfc.fn = fn;
> - wfc.arg = arg;
> - queue_work_on(cpu, work_on_cpu_wq, &wfc.work);
> - flush_work(&wfc.work);
> -
> - return wfc.ret;
> -}
> -EXPORT_SYMBOL_GPL(work_on_cpu);
> -#endif /* CONFIG_SMP */
> -
> void __init init_workqueues(void)
> {
> alloc_cpumask_var(&cpu_populated_map, GFP_KERNEL);
> @@ -1021,8 +980,4 @@ void __init init_workqueues(void)
> hotcpu_notifier(workqueue_cpu_callback, 0);
> keventd_wq = create_workqueue("events");
> BUG_ON(!keventd_wq);
> -#ifdef CONFIG_SMP
> - work_on_cpu_wq = create_workqueue("work_on_cpu");
> - BUG_ON(!work_on_cpu_wq);
> -#endif
> }
>
next prev parent reply other threads:[~2009-01-28 17:19 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-16 19:11 [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq Mike Travis
2009-01-16 19:11 ` [PATCH 1/3] work_on_cpu: dont try to get_online_cpus() in work_on_cpu Mike Travis
2009-01-16 19:11 ` [PATCH 2/3] work_on_cpu: Use our own workqueue Mike Travis
2009-01-24 8:15 ` Andrew Morton
[not found] ` <200901261711.43943.rusty@rustcorp.com.au>
2009-01-26 7:01 ` Andrew Morton
2009-01-26 17:16 ` Ingo Molnar
2009-01-26 18:35 ` Andrew Morton
2009-01-26 20:20 ` Ingo Molnar
2009-01-26 20:43 ` Mike Travis
2009-01-26 21:00 ` Andrew Morton
2009-01-26 21:27 ` Ingo Molnar
2009-01-26 21:35 ` Andrew Morton
2009-01-26 21:45 ` Ingo Molnar
2009-01-26 22:01 ` Andrew Morton
2009-01-26 22:05 ` Ingo Molnar
2009-01-26 22:16 ` Andrew Morton
2009-01-26 22:20 ` Ingo Molnar
2009-01-26 22:50 ` Andrew Morton
2009-01-26 22:59 ` Ingo Molnar
2009-01-26 23:42 ` Andrew Morton
2009-01-26 23:53 ` Ingo Molnar
2009-01-27 0:42 ` Andrew Morton
2009-01-26 22:31 ` Oleg Nesterov
2009-01-26 22:15 ` Oleg Nesterov
2009-01-26 22:24 ` Ingo Molnar
2009-01-26 22:37 ` Oleg Nesterov
2009-01-26 22:42 ` Ingo Molnar
2009-01-26 21:50 ` Oleg Nesterov
2009-01-26 22:17 ` Ingo Molnar
2009-01-26 23:01 ` Mike Travis
2009-01-27 0:09 ` Oleg Nesterov
2009-01-27 7:15 ` Rusty Russell
2009-01-27 17:55 ` Oleg Nesterov
2009-01-27 7:05 ` Rusty Russell
2009-01-27 7:25 ` Andrew Morton
2009-01-27 15:28 ` Ingo Molnar
2009-01-27 16:51 ` Andrew Morton
2009-01-28 13:02 ` Rusty Russell
2009-01-28 17:19 ` Mike Travis [this message]
2009-01-28 17:32 ` Mike Travis
2009-01-29 10:39 ` Rusty Russell
2009-01-28 19:44 ` Andrew Morton
2009-01-29 1:43 ` Rusty Russell
2009-01-29 2:12 ` Andrew Morton
2009-01-30 6:03 ` Rusty Russell
2009-01-30 6:30 ` Andrew Morton
2009-01-30 13:49 ` Ingo Molnar
2009-01-30 17:08 ` Andrew Morton
2009-01-30 21:59 ` Rusty Russell
2009-01-30 22:17 ` Andrew Morton
2009-02-02 12:35 ` Rusty Russell
2009-02-03 4:06 ` Andrew Morton
2009-02-04 2:44 ` Rusty Russell
2009-02-04 3:01 ` Andrew Morton
2009-02-04 10:41 ` Rusty Russell
2009-02-04 15:36 ` Andrew Morton
2009-02-04 21:35 ` Ingo Molnar
2009-02-04 21:48 ` Andrew Morton
2009-02-04 21:54 ` Ingo Molnar
2009-02-04 23:45 ` Rusty Russell
2009-02-05 12:19 ` Pavel Machek
2009-02-05 17:44 ` Dmitry Adamushko
2009-02-10 8:54 ` Rusty Russell
2009-02-10 9:35 ` Andrew Morton
2009-02-11 0:32 ` Rusty Russell
2009-01-16 19:11 ` [PATCH 3/3] cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write Mike Travis
2009-01-16 23:38 ` [PATCH 0/3] cpu freq: fix problems with work_on_cpu usage in acpi-cpufreq [PULL request] Mike Travis
2009-01-17 22:08 ` Ingo Molnar
2009-01-19 17:11 ` Mike Travis
2009-01-19 17:26 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4980939C.6010102@sgi.com \
--to=travis@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=cpufreq@vger.kernel.org \
--cc=davej@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox