* [PATCH 0/6] Lazy workqueues
@ 2009-08-20 10:19 Jens Axboe
2009-08-20 10:19 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe
` (8 more replies)
0 siblings, 9 replies; 30+ messages in thread
From: Jens Axboe @ 2009-08-20 10:19 UTC (permalink / raw)
To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan
(sorry for the resend, but apparently the directory had some patches
in it already. plus, stupid git send-email doesn't default to
no chain replies, really annoying)
Hi,
After yesterdays rant on having too many kernel threads and checking
how many I actually have running on this system (531!), I decided to
try and do something about it.
My goal was to retain the workqueue interface instead of coming up with
a new scheme that required conversion (or converting to slow_work which,
btw, is an awful name :-). I also wanted to retain the affinity
guarantees of workqueues as much as possible.
So this is a first step in that direction, it's probably full of races
and holes, but should get the idea across. It adds a
create_lazy_workqueue() helper, similar to the other variants that we
currently have. A lazy workqueue works like a normal workqueue, except
that it only (by default) starts a core thread instead of threads for
all online CPUs. When work is queued on a lazy workqueue for a CPU
that doesn't have a thread running, it will be placed on the core CPUs
list and that will then create and move the work to the right target.
Should task creation fail, the queued work will be executed on the
core CPU instead. Once a lazy workqueue thread has been idle for a
certain amount of time, it will again exit.
The patch boots here and I exercised the rpciod workqueue and
verified that it gets created, runs on the right CPU, and exits a while
later. So core functionality should be there, even if it has holes.
With this patchset, I am now down to 280 kernel threads on one of my test
boxes. Still too many, but it's a start and a net reduction of 251
threads here, or 47%!
The code can also be pulled from:
git://git.kernel.dk/linux-2.6-block.git workqueue
--
Jens Axboe
^ permalink raw reply [flat|nested] 30+ messages in thread* [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe @ 2009-08-20 10:19 ` Jens Axboe 2009-08-20 10:20 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe ` (7 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:19 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Collapse the three ints into a flags variable, in preparation for adding another flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- include/linux/workqueue.h | 32 ++++++++++++++++++-------------- kernel/workqueue.c | 22 ++++++++-------------- 2 files changed, 26 insertions(+), 28 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 13e1adf..f14e20e 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -165,12 +165,17 @@ struct execute_work { extern struct workqueue_struct * -__create_workqueue_key(const char *name, int singlethread, - int freezeable, int rt, struct lock_class_key *key, - const char *lock_name); +__create_workqueue_key(const char *name, unsigned int flags, + struct lock_class_key *key, const char *lock_name); + +enum { + WQ_F_SINGLETHREAD = 1, + WQ_F_FREEZABLE = 2, + WQ_F_RT = 4, +}; #ifdef CONFIG_LOCKDEP -#define __create_workqueue(name, singlethread, freezeable, rt) \ +#define __create_workqueue(name, flags) \ ({ \ static struct lock_class_key __key; \ const char *__lock_name; \ @@ -180,20 +185,19 @@ __create_workqueue_key(const char *name, int singlethread, else \ __lock_name = #name; \ \ - __create_workqueue_key((name), (singlethread), \ - (freezeable), (rt), &__key, \ - __lock_name); \ + __create_workqueue_key((name), (flags), &__key, __lockname); \ }) #else -#define __create_workqueue(name, singlethread, freezeable, rt) \ - __create_workqueue_key((name), (singlethread), (freezeable), (rt), \ - NULL, NULL) +#define __create_workqueue(name, flags) \ + __create_workqueue_key((name), (flags), NULL, NULL) #endif -#define create_workqueue(name) __create_workqueue((name), 0, 0, 0) -#define create_rt_workqueue(name) __create_workqueue((name), 0, 0, 1) -#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0) -#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 0) +#define create_workqueue(name) __create_workqueue((name), 0) +#define create_rt_workqueue(name) __create_workqueue((name), WQ_F_RT) +#define create_freezeable_workqueue(name) \ + __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) +#define create_singlethread_workqueue(name) \ + __create_workqueue((name), WQ_F_SINGLETHREAD) extern void destroy_workqueue(struct workqueue_struct *wq); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0668795..02ba7c9 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -60,9 +60,7 @@ struct workqueue_struct { struct cpu_workqueue_struct *cpu_wq; struct list_head list; const char *name; - int singlethread; - int freezeable; /* Freeze threads during suspend */ - int rt; + unsigned int flags; /* WQ_F_* flags */ #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif @@ -84,9 +82,9 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; static cpumask_var_t cpu_populated_map __read_mostly; /* If it's single threaded, it isn't in the list of workqueues. */ -static inline int is_wq_single_threaded(struct workqueue_struct *wq) +static inline bool is_wq_single_threaded(struct workqueue_struct *wq) { - return wq->singlethread; + return wq->flags & WQ_F_SINGLETHREAD; } static const struct cpumask *wq_cpu_map(struct workqueue_struct *wq) @@ -314,7 +312,7 @@ static int worker_thread(void *__cwq) struct cpu_workqueue_struct *cwq = __cwq; DEFINE_WAIT(wait); - if (cwq->wq->freezeable) + if (cwq->wq->flags & WQ_F_FREEZABLE) set_freezable(); set_user_nice(current, -5); @@ -768,7 +766,7 @@ static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) */ if (IS_ERR(p)) return PTR_ERR(p); - if (cwq->wq->rt) + if (cwq->wq->flags & WQ_F_RT) sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m); cwq->thread = p; @@ -789,9 +787,7 @@ static void start_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) } struct workqueue_struct *__create_workqueue_key(const char *name, - int singlethread, - int freezeable, - int rt, + unsigned int flags, struct lock_class_key *key, const char *lock_name) { @@ -811,12 +807,10 @@ struct workqueue_struct *__create_workqueue_key(const char *name, wq->name = name; lockdep_init_map(&wq->lockdep_map, lock_name, key, 0); - wq->singlethread = singlethread; - wq->freezeable = freezeable; - wq->rt = rt; + wq->flags = flags; INIT_LIST_HEAD(&wq->list); - if (singlethread) { + if (flags & WQ_F_SINGLETHREAD) { cwq = init_cpu_workqueue(wq, singlethread_cpu); err = create_workqueue_thread(cwq, singlethread_cpu); start_workqueue_thread(cwq, -1); -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe 2009-08-20 10:19 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe @ 2009-08-20 10:20 ` Jens Axboe 2009-08-20 12:01 ` Frederic Weisbecker 2009-08-20 10:20 ` [PATCH 3/6] crypto: use " Jens Axboe ` (6 subsequent siblings) 8 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:20 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Lazy workqueues are like normal workqueues, except they don't start a thread per CPU by default. Instead threads are started when they are needed, and exit when they have been idle for some time. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- include/linux/workqueue.h | 5 ++ kernel/workqueue.c | 152 ++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 147 insertions(+), 10 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index f14e20e..b2dd267 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -32,6 +32,7 @@ struct work_struct { #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif + unsigned int cpu; }; #define WORK_DATA_INIT() ATOMIC_LONG_INIT(0) @@ -172,6 +173,7 @@ enum { WQ_F_SINGLETHREAD = 1, WQ_F_FREEZABLE = 2, WQ_F_RT = 4, + WQ_F_LAZY = 8, }; #ifdef CONFIG_LOCKDEP @@ -198,6 +200,7 @@ enum { __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) #define create_singlethread_workqueue(name) \ __create_workqueue((name), WQ_F_SINGLETHREAD) +#define create_lazy_workqueue(name) __create_workqueue((name), WQ_F_LAZY) extern void destroy_workqueue(struct workqueue_struct *wq); @@ -211,6 +214,8 @@ extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, extern void flush_workqueue(struct workqueue_struct *wq); extern void flush_scheduled_work(void); +extern void workqueue_set_lazy_timeout(struct workqueue_struct *wq, + unsigned long timeout); extern int schedule_work(struct work_struct *work); extern int schedule_work_on(int cpu, struct work_struct *work); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 02ba7c9..d9ccebc 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -61,11 +61,17 @@ struct workqueue_struct { struct list_head list; const char *name; unsigned int flags; /* WQ_F_* flags */ + unsigned long lazy_timeout; + unsigned int core_cpu; #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif }; +/* Default lazy workqueue timeout */ +#define WQ_DEF_LAZY_TIMEOUT (60 * HZ) + + /* Serializes the accesses to the list of workqueues. */ static DEFINE_SPINLOCK(workqueue_lock); static LIST_HEAD(workqueues); @@ -81,6 +87,8 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; */ static cpumask_var_t cpu_populated_map __read_mostly; +static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu); + /* If it's single threaded, it isn't in the list of workqueues. */ static inline bool is_wq_single_threaded(struct workqueue_struct *wq) { @@ -141,11 +149,29 @@ static void insert_work(struct cpu_workqueue_struct *cwq, static void __queue_work(struct cpu_workqueue_struct *cwq, struct work_struct *work) { + struct workqueue_struct *wq = cwq->wq; unsigned long flags; - spin_lock_irqsave(&cwq->lock, flags); - insert_work(cwq, work, &cwq->worklist); - spin_unlock_irqrestore(&cwq->lock, flags); + /* + * This is a lazy workqueue and this particular CPU thread has + * exited. We can't create it from here, so add this work on our + * static thread. It will create this thread and move the work there. + */ + if ((wq->flags & WQ_F_LAZY) && !cwq->thread) { + struct cpu_workqueue_struct *__cwq; + + local_irq_save(flags); + __cwq = wq_per_cpu(wq, wq->core_cpu); + work->cpu = smp_processor_id(); + spin_lock(&__cwq->lock); + insert_work(__cwq, work, &__cwq->worklist); + spin_unlock_irqrestore(&__cwq->lock, flags); + } else { + spin_lock_irqsave(&cwq->lock, flags); + work->cpu = smp_processor_id(); + insert_work(cwq, work, &cwq->worklist); + spin_unlock_irqrestore(&cwq->lock, flags); + } } /** @@ -259,13 +285,16 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, } EXPORT_SYMBOL_GPL(queue_delayed_work_on); -static void run_workqueue(struct cpu_workqueue_struct *cwq) +static int run_workqueue(struct cpu_workqueue_struct *cwq) { + int did_work = 0; + spin_lock_irq(&cwq->lock); while (!list_empty(&cwq->worklist)) { struct work_struct *work = list_entry(cwq->worklist.next, struct work_struct, entry); work_func_t f = work->func; + int cpu; #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct @@ -280,7 +309,34 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) trace_workqueue_execution(cwq->thread, work); cwq->current_work = work; list_del_init(cwq->worklist.next); + cpu = smp_processor_id(); spin_unlock_irq(&cwq->lock); + did_work = 1; + + /* + * If work->cpu isn't us, then we need to create the target + * workqueue thread (if someone didn't already do that) and + * move the work over there. + */ + if ((cwq->wq->flags & WQ_F_LAZY) && work->cpu != cpu) { + struct cpu_workqueue_struct *__cwq; + struct task_struct *p; + int err; + + __cwq = wq_per_cpu(cwq->wq, work->cpu); + p = __cwq->thread; + if (!p) + err = create_workqueue_thread(__cwq, work->cpu); + p = __cwq->thread; + if (p) { + if (work->cpu >= 0) + kthread_bind(p, work->cpu); + insert_work(__cwq, work, &__cwq->worklist); + wake_up_process(p); + goto out; + } + } + BUG_ON(get_wq_data(work) != cwq); work_clear_pending(work); @@ -305,24 +361,45 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) cwq->current_work = NULL; } spin_unlock_irq(&cwq->lock); +out: + return did_work; } static int worker_thread(void *__cwq) { struct cpu_workqueue_struct *cwq = __cwq; + struct workqueue_struct *wq = cwq->wq; + unsigned long last_active = jiffies; DEFINE_WAIT(wait); + int may_exit; - if (cwq->wq->flags & WQ_F_FREEZABLE) + if (wq->flags & WQ_F_FREEZABLE) set_freezable(); set_user_nice(current, -5); + /* + * Allow exit if this isn't our core thread + */ + if ((wq->flags & WQ_F_LAZY) && smp_processor_id() != wq->core_cpu) + may_exit = 1; + else + may_exit = 0; + for (;;) { + int did_work; + prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE); if (!freezing(current) && !kthread_should_stop() && - list_empty(&cwq->worklist)) - schedule(); + list_empty(&cwq->worklist)) { + unsigned long timeout = wq->lazy_timeout; + + if (timeout && may_exit) + schedule_timeout(timeout); + else + schedule(); + } finish_wait(&cwq->more_work, &wait); try_to_freeze(); @@ -330,7 +407,19 @@ static int worker_thread(void *__cwq) if (kthread_should_stop()) break; - run_workqueue(cwq); + did_work = run_workqueue(cwq); + + /* + * If we did no work for the defined timeout period and we are + * allowed to exit, do so. + */ + if (did_work) + last_active = jiffies; + else if (time_after(jiffies, last_active + wq->lazy_timeout) && + may_exit) { + cwq->thread = NULL; + break; + } } return 0; @@ -814,7 +903,10 @@ struct workqueue_struct *__create_workqueue_key(const char *name, cwq = init_cpu_workqueue(wq, singlethread_cpu); err = create_workqueue_thread(cwq, singlethread_cpu); start_workqueue_thread(cwq, -1); + wq->core_cpu = singlethread_cpu; } else { + int created = 0; + cpu_maps_update_begin(); /* * We must place this wq on list even if the code below fails. @@ -833,10 +925,16 @@ struct workqueue_struct *__create_workqueue_key(const char *name, */ for_each_possible_cpu(cpu) { cwq = init_cpu_workqueue(wq, cpu); - if (err || !cpu_online(cpu)) + if (err || !cpu_online(cpu) || + (created && (wq->flags & WQ_F_LAZY))) continue; err = create_workqueue_thread(cwq, cpu); start_workqueue_thread(cwq, cpu); + if (!err) { + if (!created) + wq->core_cpu = cpu; + created++; + } } cpu_maps_update_done(); } @@ -844,7 +942,9 @@ struct workqueue_struct *__create_workqueue_key(const char *name, if (err) { destroy_workqueue(wq); wq = NULL; - } + } else if (wq->flags & WQ_F_LAZY) + workqueue_set_lazy_timeout(wq, WQ_DEF_LAZY_TIMEOUT); + return wq; } EXPORT_SYMBOL_GPL(__create_workqueue_key); @@ -877,6 +977,13 @@ static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) cwq->thread = NULL; } +static bool hotplug_should_start_thread(struct workqueue_struct *wq, int cpu) +{ + if ((wq->flags & WQ_F_LAZY) && cpu != wq->core_cpu) + return 0; + return 1; +} + /** * destroy_workqueue - safely terminate a workqueue * @wq: target workqueue @@ -923,6 +1030,8 @@ undo: switch (action) { case CPU_UP_PREPARE: + if (!hotplug_should_start_thread(wq, cpu)) + break; if (!create_workqueue_thread(cwq, cpu)) break; printk(KERN_ERR "workqueue [%s] for %i failed\n", @@ -932,6 +1041,8 @@ undo: goto undo; case CPU_ONLINE: + if (!hotplug_should_start_thread(wq, cpu)) + break; start_workqueue_thread(cwq, cpu); break; @@ -999,6 +1110,27 @@ long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) EXPORT_SYMBOL_GPL(work_on_cpu); #endif /* CONFIG_SMP */ +/** + * workqueue_set_lazy_timeout - set lazy exit timeout + * @wq: the associated workqueue_struct + * @timeout: timeout in jiffies + * + * This will set the timeout for a lazy workqueue. If no work has been + * processed for @timeout jiffies, then the workqueue is allowed to exit. + * It will be dynamically created again when work is queued to it. + * + * Note that this only works for workqueues created with + * create_lazy_workqueue(). + */ +void workqueue_set_lazy_timeout(struct workqueue_struct *wq, + unsigned long timeout) +{ + if (WARN_ON(!(wq->flags & WQ_F_LAZY))) + return; + + wq->lazy_timeout = timeout; +} + void __init init_workqueues(void) { alloc_cpumask_var(&cpu_populated_map, GFP_KERNEL); -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-20 10:20 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe @ 2009-08-20 12:01 ` Frederic Weisbecker 2009-08-20 12:10 ` Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Frederic Weisbecker @ 2009-08-20 12:01 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, Aug 20, 2009 at 12:20:00PM +0200, Jens Axboe wrote: > Lazy workqueues are like normal workqueues, except they don't > start a thread per CPU by default. Instead threads are started > when they are needed, and exit when they have been idle for > some time. > > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > --- > include/linux/workqueue.h | 5 ++ > kernel/workqueue.c | 152 ++++++++++++++++++++++++++++++++++++++++++--- > 2 files changed, 147 insertions(+), 10 deletions(-) > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h > index f14e20e..b2dd267 100644 > --- a/include/linux/workqueue.h > +++ b/include/linux/workqueue.h > @@ -32,6 +32,7 @@ struct work_struct { > #ifdef CONFIG_LOCKDEP > struct lockdep_map lockdep_map; > #endif > + unsigned int cpu; > }; > > #define WORK_DATA_INIT() ATOMIC_LONG_INIT(0) > @@ -172,6 +173,7 @@ enum { > WQ_F_SINGLETHREAD = 1, > WQ_F_FREEZABLE = 2, > WQ_F_RT = 4, > + WQ_F_LAZY = 8, > }; > > #ifdef CONFIG_LOCKDEP > @@ -198,6 +200,7 @@ enum { > __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) > #define create_singlethread_workqueue(name) \ > __create_workqueue((name), WQ_F_SINGLETHREAD) > +#define create_lazy_workqueue(name) __create_workqueue((name), WQ_F_LAZY) > > extern void destroy_workqueue(struct workqueue_struct *wq); > > @@ -211,6 +214,8 @@ extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, > > extern void flush_workqueue(struct workqueue_struct *wq); > extern void flush_scheduled_work(void); > +extern void workqueue_set_lazy_timeout(struct workqueue_struct *wq, > + unsigned long timeout); > > extern int schedule_work(struct work_struct *work); > extern int schedule_work_on(int cpu, struct work_struct *work); > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 02ba7c9..d9ccebc 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -61,11 +61,17 @@ struct workqueue_struct { > struct list_head list; > const char *name; > unsigned int flags; /* WQ_F_* flags */ > + unsigned long lazy_timeout; > + unsigned int core_cpu; > #ifdef CONFIG_LOCKDEP > struct lockdep_map lockdep_map; > #endif > }; > > +/* Default lazy workqueue timeout */ > +#define WQ_DEF_LAZY_TIMEOUT (60 * HZ) > + > + > /* Serializes the accesses to the list of workqueues. */ > static DEFINE_SPINLOCK(workqueue_lock); > static LIST_HEAD(workqueues); > @@ -81,6 +87,8 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; > */ > static cpumask_var_t cpu_populated_map __read_mostly; > > +static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu); > + > /* If it's single threaded, it isn't in the list of workqueues. */ > static inline bool is_wq_single_threaded(struct workqueue_struct *wq) > { > @@ -141,11 +149,29 @@ static void insert_work(struct cpu_workqueue_struct *cwq, > static void __queue_work(struct cpu_workqueue_struct *cwq, > struct work_struct *work) > { > + struct workqueue_struct *wq = cwq->wq; > unsigned long flags; > > - spin_lock_irqsave(&cwq->lock, flags); > - insert_work(cwq, work, &cwq->worklist); > - spin_unlock_irqrestore(&cwq->lock, flags); > + /* > + * This is a lazy workqueue and this particular CPU thread has > + * exited. We can't create it from here, so add this work on our > + * static thread. It will create this thread and move the work there. > + */ > + if ((wq->flags & WQ_F_LAZY) && !cwq->thread) { Isn't this part racy? If a work has just been queued but the thread hasn't had yet enough time to start until we get there...? > + struct cpu_workqueue_struct *__cwq; > + > + local_irq_save(flags); > + __cwq = wq_per_cpu(wq, wq->core_cpu); > + work->cpu = smp_processor_id(); > + spin_lock(&__cwq->lock); > + insert_work(__cwq, work, &__cwq->worklist); > + spin_unlock_irqrestore(&__cwq->lock, flags); > + } else { > + spin_lock_irqsave(&cwq->lock, flags); > + work->cpu = smp_processor_id(); > + insert_work(cwq, work, &cwq->worklist); > + spin_unlock_irqrestore(&cwq->lock, flags); > + } > } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-20 12:01 ` Frederic Weisbecker @ 2009-08-20 12:10 ` Jens Axboe 0 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-20 12:10 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, Aug 20 2009, Frederic Weisbecker wrote: > On Thu, Aug 20, 2009 at 12:20:00PM +0200, Jens Axboe wrote: > > Lazy workqueues are like normal workqueues, except they don't > > start a thread per CPU by default. Instead threads are started > > when they are needed, and exit when they have been idle for > > some time. > > > > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > > --- > > include/linux/workqueue.h | 5 ++ > > kernel/workqueue.c | 152 ++++++++++++++++++++++++++++++++++++++++++--- > > 2 files changed, 147 insertions(+), 10 deletions(-) > > > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h > > index f14e20e..b2dd267 100644 > > --- a/include/linux/workqueue.h > > +++ b/include/linux/workqueue.h > > @@ -32,6 +32,7 @@ struct work_struct { > > #ifdef CONFIG_LOCKDEP > > struct lockdep_map lockdep_map; > > #endif > > + unsigned int cpu; > > }; > > > > #define WORK_DATA_INIT() ATOMIC_LONG_INIT(0) > > @@ -172,6 +173,7 @@ enum { > > WQ_F_SINGLETHREAD = 1, > > WQ_F_FREEZABLE = 2, > > WQ_F_RT = 4, > > + WQ_F_LAZY = 8, > > }; > > > > #ifdef CONFIG_LOCKDEP > > @@ -198,6 +200,7 @@ enum { > > __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) > > #define create_singlethread_workqueue(name) \ > > __create_workqueue((name), WQ_F_SINGLETHREAD) > > +#define create_lazy_workqueue(name) __create_workqueue((name), WQ_F_LAZY) > > > > extern void destroy_workqueue(struct workqueue_struct *wq); > > > > @@ -211,6 +214,8 @@ extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, > > > > extern void flush_workqueue(struct workqueue_struct *wq); > > extern void flush_scheduled_work(void); > > +extern void workqueue_set_lazy_timeout(struct workqueue_struct *wq, > > + unsigned long timeout); > > > > extern int schedule_work(struct work_struct *work); > > extern int schedule_work_on(int cpu, struct work_struct *work); > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 02ba7c9..d9ccebc 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -61,11 +61,17 @@ struct workqueue_struct { > > struct list_head list; > > const char *name; > > unsigned int flags; /* WQ_F_* flags */ > > + unsigned long lazy_timeout; > > + unsigned int core_cpu; > > #ifdef CONFIG_LOCKDEP > > struct lockdep_map lockdep_map; > > #endif > > }; > > > > +/* Default lazy workqueue timeout */ > > +#define WQ_DEF_LAZY_TIMEOUT (60 * HZ) > > + > > + > > /* Serializes the accesses to the list of workqueues. */ > > static DEFINE_SPINLOCK(workqueue_lock); > > static LIST_HEAD(workqueues); > > @@ -81,6 +87,8 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; > > */ > > static cpumask_var_t cpu_populated_map __read_mostly; > > > > +static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu); > > + > > /* If it's single threaded, it isn't in the list of workqueues. */ > > static inline bool is_wq_single_threaded(struct workqueue_struct *wq) > > { > > @@ -141,11 +149,29 @@ static void insert_work(struct cpu_workqueue_struct *cwq, > > static void __queue_work(struct cpu_workqueue_struct *cwq, > > struct work_struct *work) > > { > > + struct workqueue_struct *wq = cwq->wq; > > unsigned long flags; > > > > - spin_lock_irqsave(&cwq->lock, flags); > > - insert_work(cwq, work, &cwq->worklist); > > - spin_unlock_irqrestore(&cwq->lock, flags); > > + /* > > + * This is a lazy workqueue and this particular CPU thread has > > + * exited. We can't create it from here, so add this work on our > > + * static thread. It will create this thread and move the work there. > > + */ > > + if ((wq->flags & WQ_F_LAZY) && !cwq->thread) { > > > > Isn't this part racy? If a work has just been queued but the thread > hasn't had yet enough time to start until we get there...? Sure it is, see my initial description about holes and races :-) Thread re-recreation and such need to ensure that one and only one gets set up, of course. I just didn't want to spend a lot of time making it air tight in case people had big complaints that means I have to rewrite bits of it. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 3/6] crypto: use lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe 2009-08-20 10:19 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe 2009-08-20 10:20 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe @ 2009-08-20 10:20 ` Jens Axboe 2009-08-20 10:20 ` [PATCH 4/6] libata: use lazy workqueues for the pio task Jens Axboe ` (5 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:20 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- crypto/crypto_wq.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/crypto/crypto_wq.c b/crypto/crypto_wq.c index fdcf624..88cccee 100644 --- a/crypto/crypto_wq.c +++ b/crypto/crypto_wq.c @@ -20,7 +20,7 @@ EXPORT_SYMBOL_GPL(kcrypto_wq); static int __init crypto_wq_init(void) { - kcrypto_wq = create_workqueue("crypto"); + kcrypto_wq = create_lazy_workqueue("crypto"); if (unlikely(!kcrypto_wq)) return -ENOMEM; return 0; -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 4/6] libata: use lazy workqueues for the pio task 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (2 preceding siblings ...) 2009-08-20 10:20 ` [PATCH 3/6] crypto: use " Jens Axboe @ 2009-08-20 10:20 ` Jens Axboe 2009-08-20 12:40 ` Stefan Richter 2009-08-20 10:20 ` [PATCH 5/6] aio: use lazy workqueues Jens Axboe ` (4 subsequent siblings) 8 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:20 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- drivers/ata/libata-core.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 072ba5e..35f74c9 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -6580,7 +6580,7 @@ static int __init ata_init(void) { ata_parse_force_param(); - ata_wq = create_workqueue("ata"); + ata_wq = create_lazy_workqueue("ata"); if (!ata_wq) goto free_force_tbl; -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 4/6] libata: use lazy workqueues for the pio task 2009-08-20 10:20 ` [PATCH 4/6] libata: use lazy workqueues for the pio task Jens Axboe @ 2009-08-20 12:40 ` Stefan Richter 2009-08-20 12:48 ` Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Stefan Richter @ 2009-08-20 12:40 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan Jens Axboe wrote: > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > --- > drivers/ata/libata-core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index 072ba5e..35f74c9 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -6580,7 +6580,7 @@ static int __init ata_init(void) > { > ata_parse_force_param(); > > - ata_wq = create_workqueue("ata"); > + ata_wq = create_lazy_workqueue("ata"); > if (!ata_wq) > goto free_force_tbl; > However, this does not solve the issue of lacking parallelism on UP machines, does it? -- Stefan Richter -=====-==--= =--- =-=-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/6] libata: use lazy workqueues for the pio task 2009-08-20 12:40 ` Stefan Richter @ 2009-08-20 12:48 ` Jens Axboe 0 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-20 12:48 UTC (permalink / raw) To: Stefan Richter; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, Aug 20 2009, Stefan Richter wrote: > Jens Axboe wrote: >> Signed-off-by: Jens Axboe <jens.axboe@oracle.com> >> --- >> drivers/ata/libata-core.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c >> index 072ba5e..35f74c9 100644 >> --- a/drivers/ata/libata-core.c >> +++ b/drivers/ata/libata-core.c >> @@ -6580,7 +6580,7 @@ static int __init ata_init(void) >> { >> ata_parse_force_param(); >> - ata_wq = create_workqueue("ata"); >> + ata_wq = create_lazy_workqueue("ata"); >> if (!ata_wq) >> goto free_force_tbl; >> > > However, this does not solve the issue of lacking parallelism on UP > machines, does it? No, the next step is needed there, having multiple threads. Pretty similar to what Frederic described. Note that the current implementation doesn't really solve that either, since work will be executed on the CPU it is queued on. So there's no existing guarantee that it works, on UP or SMP. This implementaion doesn't modify that behaviour, it's identical to the current workqueue implementation in that respect. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 5/6] aio: use lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (3 preceding siblings ...) 2009-08-20 10:20 ` [PATCH 4/6] libata: use lazy workqueues for the pio task Jens Axboe @ 2009-08-20 10:20 ` Jens Axboe 2009-08-20 15:09 ` Jeff Moyer 2009-08-20 10:20 ` [PATCH 6/6] sunrpc: " Jens Axboe ` (3 subsequent siblings) 8 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:20 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- fs/aio.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index d065b2c..4103b59 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -72,7 +72,7 @@ static int __init aio_setup(void) kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); - aio_wq = create_workqueue("aio"); + aio_wq = create_lazy_workqueue("aio"); pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page)); -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 5/6] aio: use lazy workqueues 2009-08-20 10:20 ` [PATCH 5/6] aio: use lazy workqueues Jens Axboe @ 2009-08-20 15:09 ` Jeff Moyer 2009-08-21 18:31 ` Zach Brown 0 siblings, 1 reply; 30+ messages in thread From: Jeff Moyer @ 2009-08-20 15:09 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan, zach.brown Jens Axboe <jens.axboe@oracle.com> writes: > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > --- > fs/aio.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index d065b2c..4103b59 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -72,7 +72,7 @@ static int __init aio_setup(void) > kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); > kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); > > - aio_wq = create_workqueue("aio"); > + aio_wq = create_lazy_workqueue("aio"); > > pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page)); So far as I can tell, the aio workqueue isn't used for much these days. We could probably get away with switching to keventd. Zach, isn't someone working on a patch to get rid of all of the -EIOCBRETRY infrastructure? That patch would probably make things clearer in this area. Cheers, Jeff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 5/6] aio: use lazy workqueues 2009-08-20 15:09 ` Jeff Moyer @ 2009-08-21 18:31 ` Zach Brown 0 siblings, 0 replies; 30+ messages in thread From: Zach Brown @ 2009-08-21 18:31 UTC (permalink / raw) To: Jeff Moyer; +Cc: Jens Axboe, linux-kernel, jeff, benh, htejun, bzolnier, alan > So far as I can tell, the aio workqueue isn't used for much these days. > We could probably get away with switching to keventd. It's only used by drivers/usb/gadget to implement O_DIRECT reads by DMAing into kmalloc()ed memory and then performing the copy_to_user() in the retry thread's task context after it has assumed the submitting task's mm. > Zach, isn't > someone working on a patch to get rid of all of the -EIOCBRETRY > infrastructure? That patch would probably make things clearer in this > area. Yeah, a startling amount of fs/aio.c vanishes if we get rid of EIOCBRETRY. I'm puttering away at it, but I'll be on holiday next week so it'll be a while before anything emerges. - z ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 6/6] sunrpc: use lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (4 preceding siblings ...) 2009-08-20 10:20 ` [PATCH 5/6] aio: use lazy workqueues Jens Axboe @ 2009-08-20 10:20 ` Jens Axboe 2009-08-20 12:04 ` [PATCH 0/6] Lazy workqueues Peter Zijlstra ` (2 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:20 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- net/sunrpc/sched.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 8f459ab..ce99fe2 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -970,7 +970,7 @@ static int rpciod_start(void) * Create the rpciod thread and wait for it to start. */ dprintk("RPC: creating workqueue rpciod\n"); - wq = create_workqueue("rpciod"); + wq = create_lazy_workqueue("rpciod"); rpciod_workqueue = wq; return rpciod_workqueue != NULL; } -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (5 preceding siblings ...) 2009-08-20 10:20 ` [PATCH 6/6] sunrpc: " Jens Axboe @ 2009-08-20 12:04 ` Peter Zijlstra 2009-08-20 12:08 ` Jens Axboe 2009-08-20 12:22 ` Frederic Weisbecker 2009-08-20 12:55 ` Tejun Heo 8 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-08-20 12:04 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, 2009-08-20 at 12:19 +0200, Jens Axboe wrote: > (sorry for the resend, but apparently the directory had some patches > in it already. plus, stupid git send-email doesn't default to > no chain replies, really annoying) Newer versions should, I made a stink about this some time ago. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:04 ` [PATCH 0/6] Lazy workqueues Peter Zijlstra @ 2009-08-20 12:08 ` Jens Axboe 2009-08-20 12:16 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 12:08 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, Aug 20 2009, Peter Zijlstra wrote: > On Thu, 2009-08-20 at 12:19 +0200, Jens Axboe wrote: > > (sorry for the resend, but apparently the directory had some patches > > in it already. plus, stupid git send-email doesn't default to > > no chain replies, really annoying) > > Newer versions should, I made a stink about this some time ago. git version 1.6.4.173.g3f189 That's pretty new... But perhaps I should complain too, it's been annoying me forever. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:08 ` Jens Axboe @ 2009-08-20 12:16 ` Peter Zijlstra 2009-08-23 2:42 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Peter Zijlstra @ 2009-08-20 12:16 UTC (permalink / raw) To: Jens Axboe Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan, Junio C Hamano On Thu, 2009-08-20 at 14:08 +0200, Jens Axboe wrote: > On Thu, Aug 20 2009, Peter Zijlstra wrote: > > On Thu, 2009-08-20 at 12:19 +0200, Jens Axboe wrote: > > > (sorry for the resend, but apparently the directory had some patches > > > in it already. plus, stupid git send-email doesn't default to > > > no chain replies, really annoying) > > > > Newer versions should, I made a stink about this some time ago. > > git version 1.6.4.173.g3f189 > > That's pretty new... But perhaps I should complain too, it's been > annoying me forever. http://marc.info/?l=git&m=123457137328461&w=2 Apparently it didn't happen, nor did I ever see a reply to that posting. Junio, what happened here? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:16 ` Peter Zijlstra @ 2009-08-23 2:42 ` Junio C Hamano 2009-08-24 7:04 ` git send-email defaults Peter Zijlstra 2009-08-24 8:04 ` [PATCH 0/6] Lazy workqueues Jens Axboe 0 siblings, 2 replies; 30+ messages in thread From: Junio C Hamano @ 2009-08-23 2:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Jens Axboe, linux-kernel, jeff, benh, htejun, bzolnier, alan Peter Zijlstra <peterz@infradead.org> writes: > On Thu, 2009-08-20 at 14:08 +0200, Jens Axboe wrote: > ... >> That's pretty new... But perhaps I should complain too, it's been >> annoying me forever. > > http://marc.info/?l=git&m=123457137328461&w=2 > > Apparently it didn't happen, nor did I ever see a reply to that posting. > > Junio, what happened here? Nothing happened. I do not recall anybody objecting to, but then when nothing happened in neither 1.6.3 nor 1.6.4, nobody jumped up-and-down demanding the change of default either. So overall impression I got from this was that nobody really cared deeply enough either way. But we are talking about 1.7.0 to become a release to correct wrong defaults we have had once and for all ;-), and I am tempted to roll this topic into the mix. Here is what I queued to my 'next' branch tonight. -- >8 -- From: Junio C Hamano <gitster@pobox.com> Date: Sat, 22 Aug 2009 12:48:48 -0700 Subject: [PATCH] send-email: make --no-chain-reply-to the default In http://article.gmane.org/gmane.comp.version-control.git/109790 I threatened to announce a change to the default threading style used by send-email to no-chain-reply-to (i.e. the second and subsequent messages will all be replies to the first one), unless nobody objected, in 1.6.3. Nobody objected, as far as I can dig the list archive. But when nothing happened in 1.6.3 nor 1.6.4, nobody from the camp who complained loudly that led to the message did not complain either. So I am guessing that after all nobody cares about this. But 1.7.0 is a good time to change this, and as I said in the message, I personally think it is a good change, so here it is. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/git-send-email.txt | 6 +++--- git-send-email.perl | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 767cf4d..626c2dc 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -84,7 +84,7 @@ See the CONFIGURATION section for 'sendemail.multiedit'. --in-reply-to=<identifier>:: Specify the contents of the first In-Reply-To header. Subsequent emails will refer to the previous email - instead of this if --chain-reply-to is set (the default) + instead of this if --chain-reply-to is set. Only necessary if --compose is also set. If --compose is not set, this will be prompted for. @@ -171,8 +171,8 @@ Automating email sent. If disabled with "--no-chain-reply-to", all emails after the first will be sent as replies to the first email sent. When using this, it is recommended that the first file given be an overview of the - entire patch series. Default is the value of the 'sendemail.chainreplyto' - configuration value; if that is unspecified, default to --chain-reply-to. + entire patch series. Disabled by default, but the 'sendemail.chainreplyto' + configuration variable can be used to enable it. --identity=<identity>:: A configuration identity. When given, causes values in the diff --git a/git-send-email.perl b/git-send-email.perl index 0700d80..c1d0930 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -71,7 +71,7 @@ git send-email [options] <file | directory | rev-list options > --suppress-cc <str> * author, self, sob, cc, cccmd, body, bodycc, all. --[no-]signed-off-by-cc * Send to Signed-off-by: addresses. Default on. --[no-]suppress-from * Send to self. Default off. - --[no-]chain-reply-to * Chain In-Reply-To: fields. Default on. + --[no-]chain-reply-to * Chain In-Reply-To: fields. Default off. --[no-]thread * Use In-Reply-To: field. Default on. Administering: @@ -188,7 +188,7 @@ my (@suppress_cc); my %config_bool_settings = ( "thread" => [\$thread, 1], - "chainreplyto" => [\$chain_reply_to, 1], + "chainreplyto" => [\$chain_reply_to, undef], "suppressfrom" => [\$suppress_from, undef], "signedoffbycc" => [\$signed_off_by_cc, undef], "signedoffcc" => [\$signed_off_by_cc, undef], # Deprecated -- 1.6.4.1.255.g5556a ^ permalink raw reply related [flat|nested] 30+ messages in thread
* git send-email defaults 2009-08-23 2:42 ` Junio C Hamano @ 2009-08-24 7:04 ` Peter Zijlstra 2009-08-24 8:04 ` [PATCH 0/6] Lazy workqueues Jens Axboe 1 sibling, 0 replies; 30+ messages in thread From: Peter Zijlstra @ 2009-08-24 7:04 UTC (permalink / raw) To: Junio C Hamano Cc: Jens Axboe, linux-kernel, jeff, benh, htejun, bzolnier, alan On Sat, 2009-08-22 at 19:42 -0700, Junio C Hamano wrote: > Peter Zijlstra <peterz@infradead.org> writes: > > > On Thu, 2009-08-20 at 14:08 +0200, Jens Axboe wrote: > > ... > >> That's pretty new... But perhaps I should complain too, it's been > >> annoying me forever. > > > > http://marc.info/?l=git&m=123457137328461&w=2 > > > > Apparently it didn't happen, nor did I ever see a reply to that posting. > > > > Junio, what happened here? > > Nothing happened. > > I do not recall anybody objecting to, but then when nothing happened in > neither 1.6.3 nor 1.6.4, nobody jumped up-and-down demanding the change of > default either. So overall impression I got from this was that nobody > really cared deeply enough either way. And here I was thinking it was settled when no objections came ;-) > But we are talking about 1.7.0 to become a release to correct wrong > defaults we have had once and for all ;-), and I am tempted to roll this > topic into the mix. Here is what I queued to my 'next' branch tonight. The sooner this hits the distros the better.. Thanks for committing the change, looking fwd to 1.7 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-23 2:42 ` Junio C Hamano 2009-08-24 7:04 ` git send-email defaults Peter Zijlstra @ 2009-08-24 8:04 ` Jens Axboe 2009-08-24 9:03 ` Junio C Hamano 1 sibling, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-24 8:04 UTC (permalink / raw) To: Junio C Hamano Cc: Peter Zijlstra, linux-kernel, jeff, benh, htejun, bzolnier, alan On Sat, Aug 22 2009, Junio C Hamano wrote: > Peter Zijlstra <peterz@infradead.org> writes: > > > On Thu, 2009-08-20 at 14:08 +0200, Jens Axboe wrote: > > ... > >> That's pretty new... But perhaps I should complain too, it's been > >> annoying me forever. > > > > http://marc.info/?l=git&m=123457137328461&w=2 > > > > Apparently it didn't happen, nor did I ever see a reply to that posting. > > > > Junio, what happened here? > > Nothing happened. > > I do not recall anybody objecting to, but then when nothing happened in > neither 1.6.3 nor 1.6.4, nobody jumped up-and-down demanding the change of > default either. So overall impression I got from this was that nobody > really cared deeply enough either way. That's some strange logic right there :-). Of course nobody complained, they thought it was a done deal. > But we are talking about 1.7.0 to become a release to correct wrong > defaults we have had once and for all ;-), and I am tempted to roll this > topic into the mix. Here is what I queued to my 'next' branch tonight. OK that's at least something, looking forward to being able to prune that argument from my scripts. It completely destroys viewability of larger patchsets. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-24 8:04 ` [PATCH 0/6] Lazy workqueues Jens Axboe @ 2009-08-24 9:03 ` Junio C Hamano 2009-08-24 9:11 ` Peter Zijlstra 0 siblings, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2009-08-24 9:03 UTC (permalink / raw) To: Jens Axboe Cc: Peter Zijlstra, linux-kernel, jeff, benh, htejun, bzolnier, alan Jens Axboe <jens.axboe@oracle.com> writes: > OK that's at least something, looking forward to being able to prune > that argument from my scripts. Ahahh. An option everybody will want to pass but is prone to be forgotten and hard to type from the command line is one thing, but if you are scripting in order to reuse the script over and over, that is a separate story. Is losing an option from your script really the goal of this fuss? In any case, you not need to wait for a new version nor a patch at all for that goal. You can simply add [sendemail] chainreplyto = no to your .git/config (or $HOME/.gitconfig). Both your script and your command line invocation will default not to create deep threads with the setting. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-24 9:03 ` Junio C Hamano @ 2009-08-24 9:11 ` Peter Zijlstra 0 siblings, 0 replies; 30+ messages in thread From: Peter Zijlstra @ 2009-08-24 9:11 UTC (permalink / raw) To: Junio C Hamano Cc: Jens Axboe, linux-kernel, jeff, benh, htejun, bzolnier, alan On Mon, 2009-08-24 at 02:03 -0700, Junio C Hamano wrote: > Jens Axboe <jens.axboe@oracle.com> writes: > > > OK that's at least something, looking forward to being able to prune > > that argument from my scripts. > > Ahahh. > > An option everybody will want to pass but is prone to be forgotten and > hard to type from the command line is one thing, but if you are scripting > in order to reuse the script over and over, that is a separate story. Is > losing an option from your script really the goal of this fuss? > > In any case, you not need to wait for a new version nor a patch at all for > that goal. You can simply add > > [sendemail] > chainreplyto = no > > to your .git/config (or $HOME/.gitconfig). Both your script and your > command line invocation will default not to create deep threads with the > setting. For me its about getting the default right, because lots of people simply use git-send-email without scrips, and often .gitconfig gets lost or simply doesn't get carried around the various development machines. Also, it stop every new person mailing patches from having to be told to flip that setting. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (6 preceding siblings ...) 2009-08-20 12:04 ` [PATCH 0/6] Lazy workqueues Peter Zijlstra @ 2009-08-20 12:22 ` Frederic Weisbecker 2009-08-20 12:41 ` Jens Axboe 2009-08-20 12:59 ` Steven Whitehouse 2009-08-20 12:55 ` Tejun Heo 8 siblings, 2 replies; 30+ messages in thread From: Frederic Weisbecker @ 2009-08-20 12:22 UTC (permalink / raw) To: Jens Axboe Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan, Andrew Morton, Oleg Nesterov On Thu, Aug 20, 2009 at 12:19:58PM +0200, Jens Axboe wrote: > (sorry for the resend, but apparently the directory had some patches > in it already. plus, stupid git send-email doesn't default to > no chain replies, really annoying) > > Hi, > > After yesterdays rant on having too many kernel threads and checking > how many I actually have running on this system (531!), I decided to > try and do something about it. > > My goal was to retain the workqueue interface instead of coming up with > a new scheme that required conversion (or converting to slow_work which, > btw, is an awful name :-). I also wanted to retain the affinity > guarantees of workqueues as much as possible. > > So this is a first step in that direction, it's probably full of races > and holes, but should get the idea across. It adds a > create_lazy_workqueue() helper, similar to the other variants that we > currently have. A lazy workqueue works like a normal workqueue, except > that it only (by default) starts a core thread instead of threads for > all online CPUs. When work is queued on a lazy workqueue for a CPU > that doesn't have a thread running, it will be placed on the core CPUs > list and that will then create and move the work to the right target. > Should task creation fail, the queued work will be executed on the > core CPU instead. Once a lazy workqueue thread has been idle for a > certain amount of time, it will again exit. > > The patch boots here and I exercised the rpciod workqueue and > verified that it gets created, runs on the right CPU, and exits a while > later. So core functionality should be there, even if it has holes. > > With this patchset, I am now down to 280 kernel threads on one of my test > boxes. Still too many, but it's a start and a net reduction of 251 > threads here, or 47%! > > The code can also be pulled from: > > git://git.kernel.dk/linux-2.6-block.git workqueue > > -- > Jens Axboe That looks like a nice idea that may indeed solve the problem of thread proliferation with per cpu workqueue. Now I think there is another problem that taint the workqueues from the beginning which is the deadlocks induced by one work that waits another one in the same workqueue. And since the workqueues are executing the jobs by serializing, the effect is deadlocks. Often, drivers need to move from the central events/%d to a dedicated workqueue because of that. A idea to solve this: We could have one thread per struct work_struct. Similarly to this patchset, this thread waits for queuing requests, but only for this work struct. If the target cpu has no thread for this work, then create one, like you do, etc... Then the idea is to have one workqueue per struct work_struct, which handles per cpu task creation, etc... And this workqueue only handles the given work. That may solve the deadlocks scenario that are often reported and lead to dedicated workqueue creation. That also makes disappearing the work execution serialization between different worklets. We just keep the serialization between same work, which seems a pretty natural thing and is less haphazard than multiple works of different natures randomly serialized between them. Note the effect would not only be a reducing of deadlocks but also probably an increasing of throughput because works of different natures won't need anymore to wait for the previous one completion. Also a reducing of latency (a high prio work that waits for a lower prio work). There are good chances that we won't need any more per driver/subsys workqueue creation after that, because everything would be per worklet. We could use a single schedule_work() for all of them and not bother choosing a specific workqueue or the central events/%d Hmm? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:22 ` Frederic Weisbecker @ 2009-08-20 12:41 ` Jens Axboe 2009-08-20 13:04 ` Tejun Heo 2009-08-20 12:59 ` Steven Whitehouse 1 sibling, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 12:41 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan, Andrew Morton, Oleg Nesterov On Thu, Aug 20 2009, Frederic Weisbecker wrote: > On Thu, Aug 20, 2009 at 12:19:58PM +0200, Jens Axboe wrote: > > (sorry for the resend, but apparently the directory had some patches > > in it already. plus, stupid git send-email doesn't default to > > no chain replies, really annoying) > > > > Hi, > > > > After yesterdays rant on having too many kernel threads and checking > > how many I actually have running on this system (531!), I decided to > > try and do something about it. > > > > My goal was to retain the workqueue interface instead of coming up with > > a new scheme that required conversion (or converting to slow_work which, > > btw, is an awful name :-). I also wanted to retain the affinity > > guarantees of workqueues as much as possible. > > > > So this is a first step in that direction, it's probably full of races > > and holes, but should get the idea across. It adds a > > create_lazy_workqueue() helper, similar to the other variants that we > > currently have. A lazy workqueue works like a normal workqueue, except > > that it only (by default) starts a core thread instead of threads for > > all online CPUs. When work is queued on a lazy workqueue for a CPU > > that doesn't have a thread running, it will be placed on the core CPUs > > list and that will then create and move the work to the right target. > > Should task creation fail, the queued work will be executed on the > > core CPU instead. Once a lazy workqueue thread has been idle for a > > certain amount of time, it will again exit. > > > > The patch boots here and I exercised the rpciod workqueue and > > verified that it gets created, runs on the right CPU, and exits a while > > later. So core functionality should be there, even if it has holes. > > > > With this patchset, I am now down to 280 kernel threads on one of my test > > boxes. Still too many, but it's a start and a net reduction of 251 > > threads here, or 47%! > > > > The code can also be pulled from: > > > > git://git.kernel.dk/linux-2.6-block.git workqueue > > > > -- > > Jens Axboe > > > That looks like a nice idea that may indeed solve the problem of > thread proliferation with per cpu workqueue. > > Now I think there is another problem that taint the workqueues from > the beginning which is the deadlocks induced by one work that waits > another one in the same workqueue. And since the workqueues are > executing the jobs by serializing, the effect is deadlocks. > > Often, drivers need to move from the central events/%d to a dedicated > workqueue because of that. > > A idea to solve this: > > We could have one thread per struct work_struct. Similarly to this > patchset, this thread waits for queuing requests, but only for this > work struct. If the target cpu has no thread for this work, then > create one, like you do, etc... > > Then the idea is to have one workqueue per struct work_struct, which > handles per cpu task creation, etc... And this workqueue only handles > the given work. > > That may solve the deadlocks scenario that are often reported and lead > to dedicated workqueue creation. > > That also makes disappearing the work execution serialization between > different worklets. We just keep the serialization between same work, > which seems a pretty natural thing and is less haphazard than multiple > works of different natures randomly serialized between them. > > Note the effect would not only be a reducing of deadlocks but also > probably an increasing of throughput because works of different > natures won't need anymore to wait for the previous one completion. > > Also a reducing of latency (a high prio work that waits for a lower > prio work). > > There are good chances that we won't need any more per driver/subsys > workqueue creation after that, because everything would be per > worklet. We could use a single schedule_work() for all of them and > not bother choosing a specific workqueue or the central events/%d > > Hmm? I pretty much agree with you, my initial plan for a thread pool would be very similar. I'll gradually work towards that goal. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:41 ` Jens Axboe @ 2009-08-20 13:04 ` Tejun Heo 0 siblings, 0 replies; 30+ messages in thread From: Tejun Heo @ 2009-08-20 13:04 UTC (permalink / raw) To: Jens Axboe Cc: Frederic Weisbecker, linux-kernel, jeff, benh, bzolnier, alan, Andrew Morton, Oleg Nesterov Jens Axboe wrote: > On Thu, Aug 20 2009, Frederic Weisbecker wrote: >> A idea to solve this: >> >> We could have one thread per struct work_struct. Similarly to this >> patchset, this thread waits for queuing requests, but only for this >> work struct. If the target cpu has no thread for this work, then >> create one, like you do, etc... >> >> Then the idea is to have one workqueue per struct work_struct, which >> handles per cpu task creation, etc... And this workqueue only handles >> the given work. >> >> That may solve the deadlocks scenario that are often reported and lead >> to dedicated workqueue creation. >> >> That also makes disappearing the work execution serialization between >> different worklets. We just keep the serialization between same work, >> which seems a pretty natural thing and is less haphazard than multiple >> works of different natures randomly serialized between them. >> >> Note the effect would not only be a reducing of deadlocks but also >> probably an increasing of throughput because works of different >> natures won't need anymore to wait for the previous one completion. >> >> Also a reducing of latency (a high prio work that waits for a lower >> prio work). >> >> There are good chances that we won't need any more per driver/subsys >> workqueue creation after that, because everything would be per >> worklet. We could use a single schedule_work() for all of them and >> not bother choosing a specific workqueue or the central events/%d >> >> Hmm? > > I pretty much agree with you, my initial plan for a thread pool would be > very similar. I'll gradually work towards that goal. Several issues that come to my mind with the above approach are... * There will still be cases where you need fixed dedicated thread. Execution resources for anything which might be used during IO needs to be preallocated (at least some of it) to guarantee forward progress. * Depending on how popular works are used (and I think their usage will grow with improvements like this), we might end up with many idling threads again and please note that thread creation / destruction is quite costly compared to what works usually do. * Having different threads executing different works all the time might improve latency but if works are used frequently enough it's likely to lower throughput because short works which can be handled in batch by a single thread now needs to be handled by different threads. Scheduling overhead can be significant compared to what those works actually do and it will also cost much more cache footprint-wise. Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:22 ` Frederic Weisbecker 2009-08-20 12:41 ` Jens Axboe @ 2009-08-20 12:59 ` Steven Whitehouse 1 sibling, 0 replies; 30+ messages in thread From: Steven Whitehouse @ 2009-08-20 12:59 UTC (permalink / raw) To: Frederic Weisbecker Cc: Jens Axboe, linux-kernel, jeff, benh, htejun, bzolnier, alan, Andrew Morton, Oleg Nesterov Hi, On Thu, 2009-08-20 at 14:22 +0200, Frederic Weisbecker wrote: > On Thu, Aug 20, 2009 at 12:19:58PM +0200, Jens Axboe wrote: > > (sorry for the resend, but apparently the directory had some patches > > in it already. plus, stupid git send-email doesn't default to > > no chain replies, really annoying) > > > > Hi, > > > > After yesterdays rant on having too many kernel threads and checking > > how many I actually have running on this system (531!), I decided to > > try and do something about it. > > > > My goal was to retain the workqueue interface instead of coming up with > > a new scheme that required conversion (or converting to slow_work which, > > btw, is an awful name :-). I also wanted to retain the affinity > > guarantees of workqueues as much as possible. > > > > So this is a first step in that direction, it's probably full of races > > and holes, but should get the idea across. It adds a > > create_lazy_workqueue() helper, similar to the other variants that we > > currently have. A lazy workqueue works like a normal workqueue, except > > that it only (by default) starts a core thread instead of threads for > > all online CPUs. When work is queued on a lazy workqueue for a CPU > > that doesn't have a thread running, it will be placed on the core CPUs > > list and that will then create and move the work to the right target. > > Should task creation fail, the queued work will be executed on the > > core CPU instead. Once a lazy workqueue thread has been idle for a > > certain amount of time, it will again exit. > > > > The patch boots here and I exercised the rpciod workqueue and > > verified that it gets created, runs on the right CPU, and exits a while > > later. So core functionality should be there, even if it has holes. > > > > With this patchset, I am now down to 280 kernel threads on one of my test > > boxes. Still too many, but it's a start and a net reduction of 251 > > threads here, or 47%! > > > > The code can also be pulled from: > > > > git://git.kernel.dk/linux-2.6-block.git workqueue > > > > -- > > Jens Axboe > > > That looks like a nice idea that may indeed solve the problem of thread > proliferation with per cpu workqueue. > > Now I think there is another problem that taint the workqueues from the > beginning which is the deadlocks induced by one work that waits another > one in the same workqueue. And since the workqueues are executing the jobs > by serializing, the effect is deadlocks. > In GFS2 we've also got an additional issue. We cannot create threads at the point of use (or let pending work block on thread creation) because it implies a GFP_KERNEL memory allocation which could call back into the fs. This is a particular issue with journal recovery (which uses slow_work now, older versions used a private thread) and the code which deals with inodes which have been unlinked remotely. In addition to that the glock workqueue which we are using would be much better turned into a tasklet, or similar. The reason why we cannot do this is that submission of block I/O is only possible from process context. At some stage it might be possible to partially solve the problem by separating the parts of the state machine which submit I/O from those which don't, but I'm not convinced that effort it worth it. There is also the issue of ordering of I/O requests. The glocks are (for those which submit I/O) in a 1:1 relationship with inodes or resource groups and thus indexed by disk block number. I have considered in the past, creating a workqueue with an elevator based work submission interface. This would greatly improve the I/O patterns created by multiple submissions of glock work items. In particular it would make a big difference when the glock shrinker marks dirty glocks for removal from the glock cache (under memory pressure) or when processing large numbers of remote callbacks. I've not yet come to any conclusion as to whether the "elevator workqueue" is a good idea or not, any suggestions of a better solution are very welcome, Steve. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe ` (7 preceding siblings ...) 2009-08-20 12:22 ` Frederic Weisbecker @ 2009-08-20 12:55 ` Tejun Heo 2009-08-21 6:58 ` Jens Axboe 8 siblings, 1 reply; 30+ messages in thread From: Tejun Heo @ 2009-08-20 12:55 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, bzolnier, alan Hello, Jens. Jens Axboe wrote: > After yesterdays rant on having too many kernel threads and checking > how many I actually have running on this system (531!), I decided to > try and do something about it. Heh... that's a lot. How many cpus do you have there? Care to share the output of "ps -ef"? > My goal was to retain the workqueue interface instead of coming up with > a new scheme that required conversion (or converting to slow_work which, > btw, is an awful name :-). I also wanted to retain the affinity > guarantees of workqueues as much as possible. > > So this is a first step in that direction, it's probably full of races > and holes, but should get the idea across. It adds a > create_lazy_workqueue() helper, similar to the other variants that we > currently have. A lazy workqueue works like a normal workqueue, except > that it only (by default) starts a core thread instead of threads for > all online CPUs. When work is queued on a lazy workqueue for a CPU > that doesn't have a thread running, it will be placed on the core CPUs > list and that will then create and move the work to the right target. > Should task creation fail, the queued work will be executed on the > core CPU instead. Once a lazy workqueue thread has been idle for a > certain amount of time, it will again exit. Yeap, the approach seems simple and nice and resolves the problem of too many idle workers. > The patch boots here and I exercised the rpciod workqueue and > verified that it gets created, runs on the right CPU, and exits a while > later. So core functionality should be there, even if it has holes. > > With this patchset, I am now down to 280 kernel threads on one of my test > boxes. Still too many, but it's a start and a net reduction of 251 > threads here, or 47%! I'm trying to find out whether the perfect concurrency idea I talked about on the other thread can be implemented in reasonable manner. Would you mind holding for a few days before investing too much effort into expanding this one to handle multiple workers? Thanks. -- tejun ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/6] Lazy workqueues 2009-08-20 12:55 ` Tejun Heo @ 2009-08-21 6:58 ` Jens Axboe 0 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-21 6:58 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, jeff, benh, bzolnier, alan [-- Attachment #1: Type: text/plain, Size: 2757 bytes --] On Thu, Aug 20 2009, Tejun Heo wrote: > Hello, Jens. > > Jens Axboe wrote: > > After yesterdays rant on having too many kernel threads and checking > > how many I actually have running on this system (531!), I decided to > > try and do something about it. > > Heh... that's a lot. How many cpus do you have there? Care to share > the output of "ps -ef"? That system has 64 cpus. ps -ef attached. > > My goal was to retain the workqueue interface instead of coming up with > > a new scheme that required conversion (or converting to slow_work which, > > btw, is an awful name :-). I also wanted to retain the affinity > > guarantees of workqueues as much as possible. > > > > So this is a first step in that direction, it's probably full of races > > and holes, but should get the idea across. It adds a > > create_lazy_workqueue() helper, similar to the other variants that we > > currently have. A lazy workqueue works like a normal workqueue, except > > that it only (by default) starts a core thread instead of threads for > > all online CPUs. When work is queued on a lazy workqueue for a CPU > > that doesn't have a thread running, it will be placed on the core CPUs > > list and that will then create and move the work to the right target. > > Should task creation fail, the queued work will be executed on the > > core CPU instead. Once a lazy workqueue thread has been idle for a > > certain amount of time, it will again exit. > > Yeap, the approach seems simple and nice and resolves the problem of > too many idle workers. I think so too :-) > > The patch boots here and I exercised the rpciod workqueue and > > verified that it gets created, runs on the right CPU, and exits a while > > later. So core functionality should be there, even if it has holes. > > > > With this patchset, I am now down to 280 kernel threads on one of my test > > boxes. Still too many, but it's a start and a net reduction of 251 > > threads here, or 47%! > > I'm trying to find out whether the perfect concurrency idea I talked > about on the other thread can be implemented in reasonable manner. > Would you mind holding for a few days before investing too much effort > into expanding this one to handle multiple workers? No problem, I'll just get the races closed up in the existing version. I think we basically have two classes of users here - one that the existing workqueue scheme works well for, high performance work execution where CPU affinity matters. The other is just slow work execution (like the libata pio task stuff), which would be better handled by a generic thread pool implementation. I think we should start converting those users to slow_work, in fact I think I'll try libata to try and set a good example :-) -- Jens Axboe [-- Attachment #2: ps-ef.txt --] [-- Type: text/plain, Size: 25896 bytes --] UID PID PPID C STIME TTY TIME CMD root 1 0 3 09:53 ? 00:00:06 init [2] root 2 0 0 09:53 ? 00:00:00 [kthreadd] root 3 2 0 09:53 ? 00:00:00 [migration/0] root 4 2 0 09:53 ? 00:00:00 [ksoftirqd/0] root 5 2 0 09:53 ? 00:00:00 [watchdog/0] root 6 2 0 09:53 ? 00:00:00 [migration/1] root 7 2 0 09:53 ? 00:00:00 [ksoftirqd/1] root 8 2 0 09:53 ? 00:00:00 [watchdog/1] root 9 2 0 09:53 ? 00:00:00 [migration/2] root 10 2 0 09:53 ? 00:00:00 [ksoftirqd/2] root 11 2 0 09:53 ? 00:00:00 [watchdog/2] root 12 2 0 09:53 ? 00:00:00 [migration/3] root 13 2 0 09:53 ? 00:00:00 [ksoftirqd/3] root 14 2 0 09:53 ? 00:00:00 [watchdog/3] root 15 2 0 09:53 ? 00:00:00 [migration/4] root 16 2 0 09:53 ? 00:00:00 [ksoftirqd/4] root 17 2 0 09:53 ? 00:00:00 [watchdog/4] root 18 2 0 09:53 ? 00:00:00 [migration/5] root 19 2 0 09:53 ? 00:00:00 [ksoftirqd/5] root 20 2 0 09:53 ? 00:00:00 [watchdog/5] root 21 2 0 09:53 ? 00:00:00 [migration/6] root 22 2 0 09:53 ? 00:00:00 [ksoftirqd/6] root 23 2 0 09:53 ? 00:00:00 [watchdog/6] root 24 2 0 09:53 ? 00:00:00 [migration/7] root 25 2 0 09:53 ? 00:00:00 [ksoftirqd/7] root 26 2 0 09:53 ? 00:00:00 [watchdog/7] root 27 2 0 09:53 ? 00:00:00 [migration/8] root 28 2 0 09:53 ? 00:00:00 [ksoftirqd/8] root 29 2 0 09:53 ? 00:00:00 [watchdog/8] root 30 2 0 09:53 ? 00:00:00 [migration/9] root 31 2 0 09:53 ? 00:00:00 [ksoftirqd/9] root 32 2 0 09:53 ? 00:00:00 [watchdog/9] root 33 2 0 09:53 ? 00:00:00 [migration/10] root 34 2 0 09:53 ? 00:00:00 [ksoftirqd/10] root 35 2 0 09:53 ? 00:00:00 [watchdog/10] root 36 2 0 09:53 ? 00:00:00 [migration/11] root 37 2 0 09:53 ? 00:00:00 [ksoftirqd/11] root 38 2 0 09:53 ? 00:00:00 [watchdog/11] root 39 2 0 09:53 ? 00:00:00 [migration/12] root 40 2 0 09:53 ? 00:00:00 [ksoftirqd/12] root 41 2 0 09:53 ? 00:00:00 [watchdog/12] root 42 2 0 09:53 ? 00:00:00 [migration/13] root 43 2 0 09:53 ? 00:00:00 [ksoftirqd/13] root 44 2 0 09:53 ? 00:00:00 [watchdog/13] root 45 2 0 09:53 ? 00:00:00 [migration/14] root 46 2 0 09:53 ? 00:00:00 [ksoftirqd/14] root 47 2 0 09:53 ? 00:00:00 [watchdog/14] root 48 2 0 09:53 ? 00:00:00 [migration/15] root 49 2 0 09:53 ? 00:00:00 [ksoftirqd/15] root 50 2 0 09:53 ? 00:00:00 [watchdog/15] root 51 2 0 09:53 ? 00:00:00 [migration/16] root 52 2 0 09:53 ? 00:00:00 [ksoftirqd/16] root 53 2 0 09:53 ? 00:00:00 [watchdog/16] root 54 2 0 09:53 ? 00:00:00 [migration/17] root 55 2 0 09:53 ? 00:00:00 [ksoftirqd/17] root 56 2 0 09:53 ? 00:00:00 [watchdog/17] root 57 2 0 09:53 ? 00:00:00 [migration/18] root 58 2 0 09:53 ? 00:00:00 [ksoftirqd/18] root 59 2 0 09:53 ? 00:00:00 [watchdog/18] root 60 2 0 09:53 ? 00:00:00 [migration/19] root 61 2 0 09:53 ? 00:00:00 [ksoftirqd/19] root 62 2 0 09:53 ? 00:00:00 [watchdog/19] root 63 2 0 09:53 ? 00:00:00 [migration/20] root 64 2 0 09:53 ? 00:00:00 [ksoftirqd/20] root 65 2 0 09:53 ? 00:00:00 [watchdog/20] root 66 2 0 09:53 ? 00:00:00 [migration/21] root 67 2 0 09:53 ? 00:00:00 [ksoftirqd/21] root 68 2 0 09:53 ? 00:00:00 [watchdog/21] root 69 2 0 09:53 ? 00:00:00 [migration/22] root 70 2 0 09:53 ? 00:00:00 [ksoftirqd/22] root 71 2 0 09:53 ? 00:00:00 [watchdog/22] root 72 2 0 09:53 ? 00:00:00 [migration/23] root 73 2 0 09:53 ? 00:00:00 [ksoftirqd/23] root 74 2 0 09:53 ? 00:00:00 [watchdog/23] root 75 2 0 09:53 ? 00:00:00 [migration/24] root 76 2 0 09:53 ? 00:00:00 [ksoftirqd/24] root 77 2 0 09:53 ? 00:00:00 [watchdog/24] root 78 2 0 09:53 ? 00:00:00 [migration/25] root 79 2 0 09:53 ? 00:00:00 [ksoftirqd/25] root 80 2 0 09:53 ? 00:00:00 [watchdog/25] root 81 2 0 09:53 ? 00:00:00 [migration/26] root 82 2 0 09:53 ? 00:00:00 [ksoftirqd/26] root 83 2 0 09:53 ? 00:00:00 [watchdog/26] root 84 2 0 09:53 ? 00:00:00 [migration/27] root 85 2 0 09:53 ? 00:00:00 [ksoftirqd/27] root 86 2 0 09:53 ? 00:00:00 [watchdog/27] root 87 2 0 09:53 ? 00:00:00 [migration/28] root 88 2 0 09:53 ? 00:00:00 [ksoftirqd/28] root 89 2 0 09:53 ? 00:00:00 [watchdog/28] root 90 2 0 09:53 ? 00:00:00 [migration/29] root 91 2 0 09:53 ? 00:00:00 [ksoftirqd/29] root 92 2 0 09:53 ? 00:00:00 [watchdog/29] root 93 2 0 09:53 ? 00:00:00 [migration/30] root 94 2 0 09:53 ? 00:00:00 [ksoftirqd/30] root 95 2 0 09:53 ? 00:00:00 [watchdog/30] root 96 2 0 09:53 ? 00:00:00 [migration/31] root 97 2 0 09:53 ? 00:00:00 [ksoftirqd/31] root 98 2 0 09:53 ? 00:00:00 [watchdog/31] root 99 2 0 09:53 ? 00:00:00 [migration/32] root 100 2 0 09:53 ? 00:00:00 [ksoftirqd/32] root 101 2 0 09:53 ? 00:00:00 [watchdog/32] root 102 2 0 09:53 ? 00:00:00 [migration/33] root 103 2 0 09:53 ? 00:00:00 [ksoftirqd/33] root 104 2 0 09:53 ? 00:00:00 [watchdog/33] root 105 2 0 09:53 ? 00:00:00 [migration/34] root 106 2 0 09:53 ? 00:00:00 [ksoftirqd/34] root 107 2 0 09:53 ? 00:00:00 [watchdog/34] root 108 2 0 09:53 ? 00:00:00 [migration/35] root 109 2 0 09:53 ? 00:00:00 [ksoftirqd/35] root 110 2 0 09:53 ? 00:00:00 [watchdog/35] root 111 2 0 09:53 ? 00:00:00 [migration/36] root 112 2 0 09:53 ? 00:00:00 [ksoftirqd/36] root 113 2 0 09:53 ? 00:00:00 [watchdog/36] root 114 2 0 09:53 ? 00:00:00 [migration/37] root 115 2 0 09:53 ? 00:00:00 [ksoftirqd/37] root 116 2 0 09:53 ? 00:00:00 [watchdog/37] root 117 2 0 09:53 ? 00:00:00 [migration/38] root 118 2 0 09:53 ? 00:00:00 [ksoftirqd/38] root 119 2 0 09:53 ? 00:00:00 [watchdog/38] root 120 2 0 09:53 ? 00:00:00 [migration/39] root 121 2 0 09:53 ? 00:00:00 [ksoftirqd/39] root 122 2 0 09:53 ? 00:00:00 [watchdog/39] root 123 2 0 09:53 ? 00:00:00 [migration/40] root 124 2 0 09:53 ? 00:00:00 [ksoftirqd/40] root 125 2 0 09:53 ? 00:00:00 [watchdog/40] root 126 2 0 09:53 ? 00:00:00 [migration/41] root 127 2 0 09:53 ? 00:00:00 [ksoftirqd/41] root 128 2 0 09:53 ? 00:00:00 [watchdog/41] root 129 2 0 09:53 ? 00:00:00 [migration/42] root 130 2 0 09:53 ? 00:00:00 [ksoftirqd/42] root 131 2 0 09:53 ? 00:00:00 [watchdog/42] root 132 2 0 09:53 ? 00:00:00 [migration/43] root 133 2 0 09:53 ? 00:00:00 [ksoftirqd/43] root 134 2 0 09:53 ? 00:00:00 [watchdog/43] root 135 2 0 09:53 ? 00:00:00 [migration/44] root 136 2 0 09:53 ? 00:00:00 [ksoftirqd/44] root 137 2 0 09:53 ? 00:00:00 [watchdog/44] root 138 2 0 09:53 ? 00:00:00 [migration/45] root 139 2 0 09:53 ? 00:00:00 [ksoftirqd/45] root 140 2 0 09:53 ? 00:00:00 [watchdog/45] root 141 2 0 09:53 ? 00:00:00 [migration/46] root 142 2 0 09:53 ? 00:00:00 [ksoftirqd/46] root 143 2 0 09:53 ? 00:00:00 [watchdog/46] root 144 2 0 09:53 ? 00:00:00 [migration/47] root 145 2 0 09:53 ? 00:00:00 [ksoftirqd/47] root 146 2 0 09:53 ? 00:00:00 [watchdog/47] root 147 2 0 09:53 ? 00:00:00 [migration/48] root 148 2 0 09:53 ? 00:00:00 [ksoftirqd/48] root 149 2 0 09:53 ? 00:00:00 [watchdog/48] root 150 2 0 09:53 ? 00:00:00 [migration/49] root 151 2 0 09:53 ? 00:00:00 [ksoftirqd/49] root 152 2 0 09:53 ? 00:00:00 [watchdog/49] root 153 2 0 09:53 ? 00:00:00 [migration/50] root 154 2 0 09:53 ? 00:00:00 [ksoftirqd/50] root 155 2 0 09:53 ? 00:00:00 [watchdog/50] root 156 2 0 09:53 ? 00:00:00 [migration/51] root 157 2 0 09:53 ? 00:00:00 [ksoftirqd/51] root 158 2 0 09:53 ? 00:00:00 [watchdog/51] root 159 2 0 09:53 ? 00:00:00 [migration/52] root 160 2 0 09:53 ? 00:00:00 [ksoftirqd/52] root 161 2 0 09:53 ? 00:00:00 [watchdog/52] root 162 2 0 09:53 ? 00:00:00 [migration/53] root 163 2 0 09:53 ? 00:00:00 [ksoftirqd/53] root 164 2 0 09:53 ? 00:00:00 [watchdog/53] root 165 2 0 09:53 ? 00:00:00 [migration/54] root 166 2 0 09:53 ? 00:00:00 [ksoftirqd/54] root 167 2 0 09:53 ? 00:00:00 [watchdog/54] root 168 2 0 09:53 ? 00:00:00 [migration/55] root 169 2 0 09:53 ? 00:00:00 [ksoftirqd/55] root 170 2 0 09:53 ? 00:00:00 [watchdog/55] root 171 2 0 09:53 ? 00:00:00 [migration/56] root 172 2 0 09:53 ? 00:00:00 [ksoftirqd/56] root 173 2 0 09:53 ? 00:00:00 [watchdog/56] root 174 2 0 09:53 ? 00:00:00 [migration/57] root 175 2 0 09:53 ? 00:00:00 [ksoftirqd/57] root 176 2 0 09:53 ? 00:00:00 [watchdog/57] root 177 2 0 09:53 ? 00:00:00 [migration/58] root 178 2 0 09:53 ? 00:00:00 [ksoftirqd/58] root 179 2 0 09:53 ? 00:00:00 [watchdog/58] root 180 2 0 09:53 ? 00:00:00 [migration/59] root 181 2 0 09:53 ? 00:00:00 [ksoftirqd/59] root 182 2 0 09:53 ? 00:00:00 [watchdog/59] root 183 2 0 09:53 ? 00:00:00 [migration/60] root 184 2 0 09:53 ? 00:00:00 [ksoftirqd/60] root 185 2 0 09:53 ? 00:00:00 [watchdog/60] root 186 2 0 09:53 ? 00:00:00 [migration/61] root 187 2 0 09:53 ? 00:00:00 [ksoftirqd/61] root 188 2 0 09:53 ? 00:00:00 [watchdog/61] root 189 2 0 09:53 ? 00:00:00 [migration/62] root 190 2 0 09:53 ? 00:00:00 [ksoftirqd/62] root 191 2 0 09:53 ? 00:00:00 [watchdog/62] root 192 2 0 09:53 ? 00:00:00 [migration/63] root 193 2 0 09:53 ? 00:00:00 [ksoftirqd/63] root 194 2 0 09:53 ? 00:00:00 [watchdog/63] root 195 2 0 09:53 ? 00:00:00 [events/0] root 196 2 0 09:53 ? 00:00:00 [events/1] root 197 2 0 09:53 ? 00:00:00 [events/2] root 198 2 0 09:53 ? 00:00:00 [events/3] root 199 2 0 09:53 ? 00:00:00 [events/4] root 200 2 0 09:53 ? 00:00:00 [events/5] root 201 2 0 09:53 ? 00:00:00 [events/6] root 202 2 0 09:53 ? 00:00:00 [events/7] root 203 2 0 09:53 ? 00:00:00 [events/8] root 204 2 0 09:53 ? 00:00:00 [events/9] root 205 2 0 09:53 ? 00:00:00 [events/10] root 206 2 0 09:53 ? 00:00:00 [events/11] root 207 2 0 09:53 ? 00:00:00 [events/12] root 208 2 0 09:53 ? 00:00:00 [events/13] root 209 2 0 09:53 ? 00:00:00 [events/14] root 210 2 0 09:53 ? 00:00:00 [events/15] root 211 2 0 09:53 ? 00:00:00 [events/16] root 212 2 0 09:53 ? 00:00:00 [events/17] root 213 2 0 09:53 ? 00:00:00 [events/18] root 214 2 0 09:53 ? 00:00:00 [events/19] root 215 2 0 09:53 ? 00:00:00 [events/20] root 216 2 0 09:53 ? 00:00:00 [events/21] root 217 2 0 09:53 ? 00:00:00 [events/22] root 218 2 0 09:53 ? 00:00:00 [events/23] root 219 2 0 09:53 ? 00:00:00 [events/24] root 220 2 0 09:53 ? 00:00:00 [events/25] root 221 2 0 09:53 ? 00:00:00 [events/26] root 222 2 0 09:53 ? 00:00:00 [events/27] root 223 2 0 09:53 ? 00:00:00 [events/28] root 224 2 0 09:53 ? 00:00:00 [events/29] root 225 2 0 09:53 ? 00:00:00 [events/30] root 226 2 0 09:53 ? 00:00:00 [events/31] root 227 2 0 09:53 ? 00:00:00 [events/32] root 228 2 0 09:53 ? 00:00:00 [events/33] root 229 2 0 09:53 ? 00:00:00 [events/34] root 230 2 0 09:53 ? 00:00:00 [events/35] root 231 2 0 09:53 ? 00:00:00 [events/36] root 232 2 0 09:53 ? 00:00:00 [events/37] root 233 2 0 09:53 ? 00:00:00 [events/38] root 234 2 0 09:53 ? 00:00:00 [events/39] root 235 2 0 09:53 ? 00:00:00 [events/40] root 236 2 0 09:53 ? 00:00:00 [events/41] root 237 2 0 09:53 ? 00:00:00 [events/42] root 238 2 0 09:53 ? 00:00:00 [events/43] root 239 2 0 09:53 ? 00:00:00 [events/44] root 240 2 0 09:53 ? 00:00:00 [events/45] root 241 2 0 09:53 ? 00:00:00 [events/46] root 242 2 0 09:53 ? 00:00:00 [events/47] root 243 2 0 09:53 ? 00:00:00 [events/48] root 244 2 0 09:53 ? 00:00:00 [events/49] root 245 2 0 09:53 ? 00:00:00 [events/50] root 246 2 0 09:53 ? 00:00:00 [events/51] root 247 2 0 09:53 ? 00:00:00 [events/52] root 248 2 0 09:53 ? 00:00:00 [events/53] root 249 2 0 09:53 ? 00:00:00 [events/54] root 250 2 0 09:53 ? 00:00:00 [events/55] root 251 2 0 09:53 ? 00:00:00 [events/56] root 252 2 0 09:53 ? 00:00:00 [events/57] root 253 2 0 09:53 ? 00:00:00 [events/58] root 254 2 0 09:53 ? 00:00:00 [events/59] root 255 2 0 09:53 ? 00:00:00 [events/60] root 256 2 0 09:53 ? 00:00:00 [events/61] root 257 2 0 09:53 ? 00:00:00 [events/62] root 258 2 0 09:53 ? 00:00:00 [events/63] root 259 2 0 09:53 ? 00:00:00 [khelper] root 264 2 0 09:53 ? 00:00:00 [async/mgr] root 432 2 0 09:53 ? 00:00:00 [sync_supers] root 434 2 0 09:53 ? 00:00:00 [bdi-default] root 435 2 0 09:53 ? 00:00:00 [kblockd/0] root 436 2 0 09:53 ? 00:00:00 [kblockd/1] root 437 2 0 09:53 ? 00:00:00 [kblockd/2] root 438 2 0 09:53 ? 00:00:00 [kblockd/3] root 439 2 0 09:53 ? 00:00:00 [kblockd/4] root 440 2 0 09:53 ? 00:00:00 [kblockd/5] root 441 2 0 09:53 ? 00:00:00 [kblockd/6] root 442 2 0 09:53 ? 00:00:00 [kblockd/7] root 443 2 0 09:53 ? 00:00:00 [kblockd/8] root 444 2 0 09:53 ? 00:00:00 [kblockd/9] root 445 2 0 09:53 ? 00:00:00 [kblockd/10] root 446 2 0 09:53 ? 00:00:00 [kblockd/11] root 447 2 0 09:53 ? 00:00:00 [kblockd/12] root 448 2 0 09:53 ? 00:00:00 [kblockd/13] root 449 2 0 09:53 ? 00:00:00 [kblockd/14] root 450 2 0 09:53 ? 00:00:00 [kblockd/15] root 451 2 0 09:53 ? 00:00:00 [kblockd/16] root 452 2 0 09:53 ? 00:00:00 [kblockd/17] root 453 2 0 09:53 ? 00:00:00 [kblockd/18] root 454 2 0 09:53 ? 00:00:00 [kblockd/19] root 455 2 0 09:53 ? 00:00:00 [kblockd/20] root 456 2 0 09:53 ? 00:00:00 [kblockd/21] root 457 2 0 09:53 ? 00:00:00 [kblockd/22] root 458 2 0 09:53 ? 00:00:00 [kblockd/23] root 459 2 0 09:53 ? 00:00:00 [kblockd/24] root 460 2 0 09:53 ? 00:00:00 [kblockd/25] root 461 2 0 09:53 ? 00:00:00 [kblockd/26] root 462 2 0 09:53 ? 00:00:00 [kblockd/27] root 463 2 0 09:53 ? 00:00:00 [kblockd/28] root 464 2 0 09:53 ? 00:00:00 [kblockd/29] root 465 2 0 09:53 ? 00:00:00 [kblockd/30] root 466 2 0 09:53 ? 00:00:00 [kblockd/31] root 467 2 0 09:53 ? 00:00:00 [kblockd/32] root 468 2 0 09:53 ? 00:00:00 [kblockd/33] root 469 2 0 09:53 ? 00:00:00 [kblockd/34] root 470 2 0 09:53 ? 00:00:00 [kblockd/35] root 471 2 0 09:53 ? 00:00:00 [kblockd/36] root 472 2 0 09:53 ? 00:00:00 [kblockd/37] root 473 2 0 09:53 ? 00:00:00 [kblockd/38] root 474 2 0 09:53 ? 00:00:00 [kblockd/39] root 475 2 0 09:53 ? 00:00:00 [kblockd/40] root 476 2 0 09:53 ? 00:00:00 [kblockd/41] root 477 2 0 09:53 ? 00:00:00 [kblockd/42] root 478 2 0 09:53 ? 00:00:00 [kblockd/43] root 479 2 0 09:53 ? 00:00:00 [kblockd/44] root 480 2 0 09:53 ? 00:00:00 [kblockd/45] root 481 2 0 09:53 ? 00:00:00 [kblockd/46] root 482 2 0 09:53 ? 00:00:00 [kblockd/47] root 483 2 0 09:53 ? 00:00:00 [kblockd/48] root 484 2 0 09:53 ? 00:00:00 [kblockd/49] root 485 2 0 09:53 ? 00:00:00 [kblockd/50] root 486 2 0 09:53 ? 00:00:00 [kblockd/51] root 487 2 0 09:53 ? 00:00:00 [kblockd/52] root 488 2 0 09:53 ? 00:00:00 [kblockd/53] root 489 2 0 09:53 ? 00:00:00 [kblockd/54] root 490 2 0 09:53 ? 00:00:00 [kblockd/55] root 491 2 0 09:53 ? 00:00:00 [kblockd/56] root 492 2 0 09:53 ? 00:00:00 [kblockd/57] root 493 2 0 09:53 ? 00:00:00 [kblockd/58] root 494 2 0 09:53 ? 00:00:00 [kblockd/59] root 495 2 0 09:53 ? 00:00:00 [kblockd/60] root 496 2 0 09:53 ? 00:00:00 [kblockd/61] root 497 2 0 09:53 ? 00:00:00 [kblockd/62] root 498 2 0 09:53 ? 00:00:00 [kblockd/63] root 500 2 0 09:53 ? 00:00:00 [kacpid] root 501 2 0 09:53 ? 00:00:00 [kacpi_notify] root 502 2 0 09:53 ? 00:00:00 [kacpi_hotplug] root 720 2 0 09:53 ? 00:00:00 [ata/0] root 721 2 0 09:53 ? 00:00:00 [ata_aux] root 723 2 0 09:53 ? 00:00:00 [kseriod] root 757 2 0 09:53 ? 00:00:00 [kondemand/0] root 1287 2 0 09:53 ? 00:00:00 [khungtaskd] root 1288 2 0 09:53 ? 00:00:00 [kswapd0] root 1335 2 0 09:53 ? 00:00:00 [aio/0] root 1349 2 0 09:53 ? 00:00:00 [nfsiod] root 2154 2 0 09:53 ? 00:00:00 [scsi_eh_0] root 2181 2 0 09:53 ? 00:00:00 [scsi_eh_1] root 2184 2 0 09:53 ? 00:00:00 [scsi_eh_2] root 2186 2 0 09:53 ? 00:00:00 [scsi_eh_3] root 2188 2 0 09:53 ? 00:00:00 [scsi_eh_4] root 2190 2 0 09:53 ? 00:00:00 [scsi_eh_5] root 2192 2 0 09:53 ? 00:00:00 [scsi_eh_6] root 2223 2 0 09:53 ? 00:00:00 [kpsmoused] root 2227 2 0 09:53 ? 00:00:00 [rpciod/0] root 2245 2 0 09:53 ? 00:00:00 [kondemand/1] root 2246 2 0 09:53 ? 00:00:00 [kondemand/2] root 2247 2 0 09:53 ? 00:00:00 [kondemand/4] root 2278 2 0 09:53 ? 00:00:00 [kondemand/36] root 2279 2 0 09:53 ? 00:00:00 [kondemand/39] root 2301 2 0 09:53 ? 00:00:00 [kondemand/63] root 2304 2 0 09:53 ? 00:00:00 [kondemand/43] root 2308 2 0 09:53 ? 00:00:00 [kondemand/37] root 2309 2 0 09:53 ? 00:00:00 [kondemand/32] root 2310 2 0 09:53 ? 00:00:00 [kondemand/8] root 2313 2 0 09:53 ? 00:00:00 [kjournald] root 2314 2 0 09:53 ? 00:00:00 [kondemand/44] root 2317 2 0 09:53 ? 00:00:00 [kondemand/41] root 2322 2 0 09:53 ? 00:00:00 [kondemand/52] root 2327 2 0 09:53 ? 00:00:00 [kondemand/60] root 2329 2 0 09:53 ? 00:00:00 [kondemand/48] root 2331 2 0 09:53 ? 00:00:00 [kondemand/56] root 2352 2 0 09:53 ? 00:00:00 [kondemand/28] root 2369 2 0 09:53 ? 00:00:00 [kondemand/16] root 2383 1 1 09:53 ? 00:00:03 udevd --daemon root 2392 2 0 09:53 ? 00:00:00 [kondemand/40] root 2395 2 0 09:53 ? 00:00:00 [kondemand/45] root 2398 2 0 09:53 ? 00:00:00 [kondemand/11] root 2401 2 0 09:53 ? 00:00:00 [kondemand/33] root 2427 2 0 09:53 ? 00:00:00 [kondemand/49] root 2437 2 0 09:53 ? 00:00:00 [kondemand/47] root 2442 2 0 09:53 ? 00:00:00 [kondemand/13] root 2447 2 0 09:53 ? 00:00:00 [kondemand/51] root 2451 2 0 09:53 ? 00:00:00 [kondemand/55] root 2452 2 0 09:53 ? 00:00:00 [kondemand/59] root 2474 2 0 09:53 ? 00:00:00 [kondemand/53] root 2480 2 0 09:53 ? 00:00:00 [kondemand/57] root 2515 2 0 09:53 ? 00:00:00 [kondemand/7] root 2564 2 0 09:53 ? 00:00:00 [kondemand/61] root 2577 2 0 09:53 ? 00:00:00 [kondemand/35] root 3648 2 0 09:53 ? 00:00:00 [ksuspend_usbd] root 3655 2 0 09:53 ? 00:00:00 [khubd] root 3710 2 0 09:53 ? 00:00:00 [mpt_poll_0] root 3711 2 0 09:53 ? 00:00:00 [mpt/0] root 3873 2 0 09:53 ? 00:00:00 [kondemand/20] root 3901 2 0 09:53 ? 00:00:00 [usbhid_resumer] root 3931 2 0 09:53 ? 00:00:00 [kondemand/29] root 3932 2 0 09:53 ? 00:00:00 [kondemand/5] root 3987 2 0 09:53 ? 00:00:00 [kondemand/9] root 4094 2 0 09:53 ? 00:00:00 [scsi_eh_7] root 4109 2 0 09:53 ? 00:00:00 [kondemand/12] root 4130 2 0 09:53 ? 00:00:00 [kondemand/17] root 4132 2 0 09:53 ? 00:00:00 [kondemand/21] root 4199 2 0 09:53 ? 00:00:00 [kondemand/25] root 4459 2 0 09:54 ? 00:00:00 [kjournald] root 4525 2 0 09:54 ? 00:00:00 [flush-8:0] root 4543 1 0 09:54 ? 00:00:00 dhclient3 -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp3/dhclient.eth0.leases eth0 daemon 4560 1 0 09:54 ? 00:00:00 /sbin/portmap statd 4571 1 0 09:54 ? 00:00:00 /sbin/rpc.statd root 4732 1 0 09:54 ? 00:00:00 /usr/sbin/rsyslogd -c3 root 4743 1 0 09:54 ? 00:00:00 /usr/sbin/acpid root 4756 1 0 09:54 ? 00:00:00 /usr/sbin/sshd 101 5061 1 0 09:54 ? 00:00:00 /usr/sbin/exim4 -bd -q30m daemon 5088 1 0 09:54 ? 00:00:00 /usr/sbin/atd root 5108 1 0 09:54 ? 00:00:00 /usr/sbin/cron root 5125 1 0 09:54 tty1 00:00:00 /sbin/getty 38400 tty1 root 5126 1 0 09:54 tty2 00:00:00 /sbin/getty 38400 tty2 root 5127 1 0 09:54 tty3 00:00:00 /sbin/getty 38400 tty3 root 5128 1 0 09:54 tty4 00:00:00 /sbin/getty 38400 tty4 root 5129 1 0 09:54 tty5 00:00:00 /sbin/getty 38400 tty5 root 5130 1 0 09:54 tty6 00:00:00 /sbin/getty 38400 tty6 root 5159 2 0 09:55 ? 00:00:00 [kondemand/38] root 5182 4756 1 09:56 ? 00:00:00 sshd: axboe [priv] axboe 5186 5182 0 09:56 ? 00:00:00 sshd: axboe@pts/0 axboe 5187 5186 0 09:56 pts/0 00:00:00 -bash axboe 5190 5187 0 09:56 pts/0 00:00:00 ps -ef ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/6] Lazy workqueues @ 2009-08-20 10:17 Jens Axboe 2009-08-20 10:17 ` [PATCH 1/4] direct-io: unify argument passing by adding a dio_args structure Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:17 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan Hi, After yesterdays rant on having too many kernel threads and checking how many I actually have running on this system (531!), I decided to try and do something about it. My goal was to retain the workqueue interface instead of coming up with a new scheme that required conversion (or converting to slow_work which, btw, is an awful name :-). I also wanted to retain the affinity guarantees of workqueues as much as possible. So this is a first step in that direction, it's probably full of races and holes, but should get the idea across. It adds a create_lazy_workqueue() helper, similar to the other variants that we currently have. A lazy workqueue works like a normal workqueue, except that it only (by default) starts a core thread instead of threads for all online CPUs. When work is queued on a lazy workqueue for a CPU that doesn't have a thread running, it will be placed on the core CPUs list and that will then create and move the work to the right target. Should task creation fail, the queued work will be executed on the core CPU instead. Once a lazy workqueue thread has been idle for a certain amount of time, it will again exit. The patch boots here and I exercised the rpciod workqueue and verified that it gets created, runs on the right CPU, and exits a while later. So core functionality should be there, even if it has holes. With this patchset, I am now down to 280 kernel threads on one of my test boxes. Still too many, but it's a start and a net reduction of 251 threads here, or 47%! The code can also be pulled from: git://git.kernel.dk/linux-2.6-block.git workqueue -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/4] direct-io: unify argument passing by adding a dio_args structure 2009-08-20 10:17 Jens Axboe @ 2009-08-20 10:17 ` Jens Axboe 2009-08-20 10:17 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:17 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe The O_DIRECT IO path is a mess of arguments. Clean that up by passing those arguments in a dedicated dio_args structure. This is in preparation for changing the internal implementation to be page based instead of using iovecs. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- fs/block_dev.c | 7 ++-- fs/btrfs/inode.c | 4 +-- fs/direct-io.c | 70 +++++++++++++++++++++++++++---------------- fs/ext2/inode.c | 8 ++--- fs/ext3/inode.c | 15 ++++----- fs/ext4/inode.c | 15 ++++----- fs/fat/inode.c | 12 +++---- fs/gfs2/aops.c | 11 ++---- fs/hfs/inode.c | 7 ++-- fs/hfsplus/inode.c | 8 ++-- fs/jfs/inode.c | 7 ++-- fs/nfs/direct.c | 9 ++---- fs/nilfs2/inode.c | 9 ++--- fs/ocfs2/aops.c | 11 ++----- fs/reiserfs/inode.c | 7 +--- fs/xfs/linux-2.6/xfs_aops.c | 19 ++++-------- include/linux/fs.h | 59 +++++++++++++++++++++--------------- include/linux/nfs_fs.h | 3 +- mm/filemap.c | 8 +++-- 19 files changed, 141 insertions(+), 148 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 94dfda2..2e494fa 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -166,14 +166,13 @@ blkdev_get_blocks(struct inode *inode, sector_t iblock, } static ssize_t -blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +blkdev_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode), - iov, offset, nr_segs, blkdev_get_blocks, NULL); + return blockdev_direct_IO_no_locking(iocb, inode, I_BDEV(inode), + args, blkdev_get_blocks, NULL); } int __sync_blockdev(struct block_device *bdev, int wait) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 272b9b2..094e3a7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4308,9 +4308,7 @@ out: return em; } -static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, - unsigned long nr_segs) +static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { return -EINVAL; } diff --git a/fs/direct-io.c b/fs/direct-io.c index 8b10b87..181848c 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -929,14 +929,14 @@ out: * Releases both i_mutex and i_alloc_sem */ static ssize_t -direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, - const struct iovec *iov, loff_t offset, unsigned long nr_segs, - unsigned blkbits, get_block_t get_block, dio_iodone_t end_io, - struct dio *dio) +direct_io_worker(struct kiocb *iocb, struct inode *inode, + struct dio_args *args, unsigned blkbits, get_block_t get_block, + dio_iodone_t end_io, struct dio *dio) { - unsigned long user_addr; + const struct iovec *iov = args->iov; + unsigned long user_addr; unsigned long flags; - int seg; + int seg, rw = args->rw; ssize_t ret = 0; ssize_t ret2; size_t bytes; @@ -945,7 +945,7 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, dio->rw = rw; dio->blkbits = blkbits; dio->blkfactor = inode->i_blkbits - blkbits; - dio->block_in_file = offset >> blkbits; + dio->block_in_file = args->offset >> blkbits; dio->get_block = get_block; dio->end_io = end_io; @@ -965,14 +965,14 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, if (unlikely(dio->blkfactor)) dio->pages_in_io = 2; - for (seg = 0; seg < nr_segs; seg++) { - user_addr = (unsigned long)iov[seg].iov_base; + for (seg = 0; seg < args->nr_segs; seg++) { + user_addr = (unsigned long) iov[seg].iov_base; dio->pages_in_io += ((user_addr+iov[seg].iov_len +PAGE_SIZE-1)/PAGE_SIZE - user_addr/PAGE_SIZE); } - for (seg = 0; seg < nr_segs; seg++) { + for (seg = 0; seg < args->nr_segs; seg++) { user_addr = (unsigned long)iov[seg].iov_base; dio->size += bytes = iov[seg].iov_len; @@ -1076,7 +1076,7 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, spin_unlock_irqrestore(&dio->bio_lock, flags); if (ret2 == 0) { - ret = dio_complete(dio, offset, ret); + ret = dio_complete(dio, args->offset, ret); kfree(dio); } else BUG_ON(ret != -EIOCBQUEUED); @@ -1106,10 +1106,9 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, * Additional i_alloc_sem locking requirements described inline below. */ ssize_t -__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, - struct block_device *bdev, const struct iovec *iov, loff_t offset, - unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io, - int dio_lock_type) +__blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, + struct block_device *bdev, struct dio_args *args, get_block_t get_block, + dio_iodone_t end_io, int dio_lock_type) { int seg; size_t size; @@ -1118,10 +1117,11 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, unsigned bdev_blkbits = 0; unsigned blocksize_mask = (1 << blkbits) - 1; ssize_t retval = -EINVAL; - loff_t end = offset; + loff_t end = args->offset; struct dio *dio; int release_i_mutex = 0; int acquire_i_mutex = 0; + int rw = args->rw; if (rw & WRITE) rw = WRITE_ODIRECT; @@ -1129,18 +1129,18 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, if (bdev) bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev)); - if (offset & blocksize_mask) { + if (args->offset & blocksize_mask) { if (bdev) blkbits = bdev_blkbits; blocksize_mask = (1 << blkbits) - 1; - if (offset & blocksize_mask) + if (args->offset & blocksize_mask) goto out; } /* Check the memory alignment. Blocks cannot straddle pages */ - for (seg = 0; seg < nr_segs; seg++) { - addr = (unsigned long)iov[seg].iov_base; - size = iov[seg].iov_len; + for (seg = 0; seg < args->nr_segs; seg++) { + addr = (unsigned long) args->iov[seg].iov_base; + size = args->iov[seg].iov_len; end += size; if ((addr & blocksize_mask) || (size & blocksize_mask)) { if (bdev) @@ -1168,7 +1168,7 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, dio->lock_type = dio_lock_type; if (dio_lock_type != DIO_NO_LOCKING) { /* watch out for a 0 len io from a tricksy fs */ - if (rw == READ && end > offset) { + if (rw == READ && end > args->offset) { struct address_space *mapping; mapping = iocb->ki_filp->f_mapping; @@ -1177,8 +1177,8 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, release_i_mutex = 1; } - retval = filemap_write_and_wait_range(mapping, offset, - end - 1); + retval = filemap_write_and_wait_range(mapping, + args->offset, end - 1); if (retval) { kfree(dio); goto out; @@ -1204,8 +1204,8 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) && (end > i_size_read(inode))); - retval = direct_io_worker(rw, iocb, inode, iov, offset, - nr_segs, blkbits, get_block, end_io, dio); + retval = direct_io_worker(iocb, inode, args, blkbits, get_block, end_io, + dio); /* * In case of error extending write may have instantiated a few @@ -1231,3 +1231,21 @@ out: return retval; } EXPORT_SYMBOL(__blockdev_direct_IO); + +ssize_t generic_file_direct_IO(int rw, struct address_space *mapping, + struct kiocb *iocb, const struct iovec *iov, + loff_t offset, unsigned long nr_segs) +{ + struct dio_args args = { + .rw = rw, + .iov = iov, + .length = iov_length(iov, nr_segs), + .offset = offset, + .nr_segs = nr_segs, + }; + + if (mapping->a_ops->direct_IO) + return mapping->a_ops->direct_IO(iocb, &args); + + return -EINVAL; +} diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index e271303..e813df7 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -790,15 +790,13 @@ static sector_t ext2_bmap(struct address_space *mapping, sector_t block) return generic_block_bmap(mapping,block,ext2_get_block); } -static ssize_t -ext2_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +static ssize_t ext2_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, ext2_get_block, NULL); + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + ext2_get_block, NULL); } static int diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index b49908a..11dc0d1 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1713,9 +1713,7 @@ static int ext3_releasepage(struct page *page, gfp_t wait) * crashes then stale disk data _may_ be exposed inside the file. But current * VFS code falls back into buffered path in that case so we are safe. */ -static ssize_t ext3_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, - unsigned long nr_segs) +static ssize_t ext3_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; @@ -1723,10 +1721,10 @@ static ssize_t ext3_direct_IO(int rw, struct kiocb *iocb, handle_t *handle; ssize_t ret; int orphan = 0; - size_t count = iov_length(iov, nr_segs); + size_t count = args->length; - if (rw == WRITE) { - loff_t final_size = offset + count; + if (args->rw == WRITE) { + loff_t final_size = args->offset + count; if (final_size > inode->i_size) { /* Credits for sb + inode write */ @@ -1746,8 +1744,7 @@ static ssize_t ext3_direct_IO(int rw, struct kiocb *iocb, } } - ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, + ret = blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, ext3_get_block, NULL); if (orphan) { @@ -1765,7 +1762,7 @@ static ssize_t ext3_direct_IO(int rw, struct kiocb *iocb, if (inode->i_nlink) ext3_orphan_del(handle, inode); if (ret > 0) { - loff_t end = offset + ret; + loff_t end = args->offset + ret; if (end > inode->i_size) { ei->i_disksize = end; i_size_write(inode, end); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f9c642b..164fdb3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3267,9 +3267,7 @@ static int ext4_releasepage(struct page *page, gfp_t wait) * crashes then stale disk data _may_ be exposed inside the file. But current * VFS code falls back into buffered path in that case so we are safe. */ -static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, - unsigned long nr_segs) +static ssize_t ext4_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; @@ -3277,10 +3275,10 @@ static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, handle_t *handle; ssize_t ret; int orphan = 0; - size_t count = iov_length(iov, nr_segs); + size_t count = args->length; - if (rw == WRITE) { - loff_t final_size = offset + count; + if (args->rw == WRITE) { + loff_t final_size = args->offset + count; if (final_size > inode->i_size) { /* Credits for sb + inode write */ @@ -3300,8 +3298,7 @@ static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, } } - ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, + ret = blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, ext4_get_block, NULL); if (orphan) { @@ -3319,7 +3316,7 @@ static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb, if (inode->i_nlink) ext4_orphan_del(handle, inode); if (ret > 0) { - loff_t end = offset + ret; + loff_t end = args->offset + ret; if (end > inode->i_size) { ei->i_disksize = end; i_size_write(inode, end); diff --git a/fs/fat/inode.c b/fs/fat/inode.c index 8970d8c..9d41851 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -167,14 +167,12 @@ static int fat_write_end(struct file *file, struct address_space *mapping, return err; } -static ssize_t fat_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +static ssize_t fat_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - if (rw == WRITE) { + if (args->rw == WRITE) { /* * FIXME: blockdev_direct_IO() doesn't use ->write_begin(), * so we need to update the ->mmu_private to block boundary. @@ -184,7 +182,7 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb, * * Return 0, and fallback to normal buffered write. */ - loff_t size = offset + iov_length(iov, nr_segs); + loff_t size = args->offset + args->length; if (MSDOS_I(inode)->mmu_private < size) return 0; } @@ -193,8 +191,8 @@ static ssize_t fat_direct_IO(int rw, struct kiocb *iocb, * FAT need to use the DIO_LOCKING for avoiding the race * condition of fat_get_block() and ->truncate(). */ - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, fat_get_block, NULL); + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + fat_get_block, NULL); } static sector_t _fat_bmap(struct address_space *mapping, sector_t block) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 7ebae9a..a9422a2 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -1021,9 +1021,7 @@ static int gfs2_ok_for_dio(struct gfs2_inode *ip, int rw, loff_t offset) -static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, - unsigned long nr_segs) +static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; @@ -1043,13 +1041,12 @@ static ssize_t gfs2_direct_IO(int rw, struct kiocb *iocb, rv = gfs2_glock_nq(&gh); if (rv) return rv; - rv = gfs2_ok_for_dio(ip, rw, offset); + rv = gfs2_ok_for_dio(ip, args->rw, args->offset); if (rv != 1) goto out; /* dio not valid, fall back to buffered i/o */ - rv = blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_sb->s_bdev, - iov, offset, nr_segs, - gfs2_get_block_direct, NULL); + rv = blockdev_direct_IO_no_locking(iocb, inode, inode->i_sb->s_bdev, + args, gfs2_get_block_direct, NULL); out: gfs2_glock_dq_m(1, &gh); gfs2_holder_uninit(&gh); diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c index a1cbff2..2998914 100644 --- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -107,14 +107,13 @@ static int hfs_releasepage(struct page *page, gfp_t mask) return res ? try_to_free_buffers(page) : 0; } -static ssize_t hfs_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, unsigned long nr_segs) +static ssize_t hfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, hfs_get_block, NULL); + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + hfs_get_block, NULL); } static int hfs_writepages(struct address_space *mapping, diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c index 1bcf597..dd7102b 100644 --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -100,14 +100,14 @@ static int hfsplus_releasepage(struct page *page, gfp_t mask) return res ? try_to_free_buffers(page) : 0; } -static ssize_t hfsplus_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, unsigned long nr_segs) +static ssize_t hfsplus_direct_IO(struct kiocb *iocb, + struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, hfsplus_get_block, NULL); + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + hfsplus_get_block, NULL); } static int hfsplus_writepages(struct address_space *mapping, diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c index b2ae190..e1420de 100644 --- a/fs/jfs/inode.c +++ b/fs/jfs/inode.c @@ -306,14 +306,13 @@ static sector_t jfs_bmap(struct address_space *mapping, sector_t block) return generic_block_bmap(mapping, block, jfs_get_block); } -static ssize_t jfs_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, unsigned long nr_segs) +static ssize_t jfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, jfs_get_block, NULL); + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + jfs_get_block, NULL); } const struct address_space_operations jfs_aops = { diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index e4e089a..45d931b 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -103,21 +103,18 @@ static inline int put_dreq(struct nfs_direct_req *dreq) /** * nfs_direct_IO - NFS address space operation for direct I/O * @rw: direction (read or write) - * @iocb: target I/O control block - * @iov: array of vectors that define I/O buffer - * @pos: offset in file to begin the operation - * @nr_segs: size of iovec array + * @args: IO arguments * * The presence of this routine in the address space ops vector means * the NFS client supports direct I/O. However, we shunt off direct * read and write requests before the VFS gets them, so this method * should never be called. */ -ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs) +ssize_t nfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n", iocb->ki_filp->f_path.dentry->d_name.name, - (long long) pos, nr_segs); + (long long) args->offset, args->nr_segs); return -EINVAL; } diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c index fe9d8f2..840c307 100644 --- a/fs/nilfs2/inode.c +++ b/fs/nilfs2/inode.c @@ -222,19 +222,18 @@ static int nilfs_write_end(struct file *file, struct address_space *mapping, } static ssize_t -nilfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +nilfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; ssize_t size; - if (rw == WRITE) + if (args->rw == WRITE) return 0; /* Needs synchronization with the cleaner */ - size = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, nilfs_get_block, NULL); + size = blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, + nilfs_get_block, NULL); return size; } diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index b401654..56e61ba 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -668,11 +668,7 @@ static int ocfs2_releasepage(struct page *page, gfp_t wait) return jbd2_journal_try_to_free_buffers(journal, page, wait); } -static ssize_t ocfs2_direct_IO(int rw, - struct kiocb *iocb, - const struct iovec *iov, - loff_t offset, - unsigned long nr_segs) +static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host; @@ -687,9 +683,8 @@ static ssize_t ocfs2_direct_IO(int rw, if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) return 0; - ret = blockdev_direct_IO_no_locking(rw, iocb, inode, - inode->i_sb->s_bdev, iov, offset, - nr_segs, + ret = blockdev_direct_IO_no_locking(iocb, inode, + inode->i_sb->s_bdev, args, ocfs2_direct_IO_get_blocks, ocfs2_dio_end_io); diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index a14d6cd..201e6ca 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -3025,15 +3025,12 @@ static int reiserfs_releasepage(struct page *page, gfp_t unused_gfp_flags) /* We thank Mingming Cao for helping us understand in great detail what to do in this section of the code. */ -static ssize_t reiserfs_direct_IO(int rw, struct kiocb *iocb, - const struct iovec *iov, loff_t offset, - unsigned long nr_segs) +static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, - offset, nr_segs, + return blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, args, reiserfs_get_blocks_direct_io, NULL); } diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index aecf251..0faf1fe 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -1532,11 +1532,8 @@ xfs_end_io_direct( STATIC ssize_t xfs_vm_direct_IO( - int rw, struct kiocb *iocb, - const struct iovec *iov, - loff_t offset, - unsigned long nr_segs) + struct dio_args *args) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; @@ -1545,18 +1542,14 @@ xfs_vm_direct_IO( bdev = xfs_find_bdev_for_inode(XFS_I(inode)); - if (rw == WRITE) { + if (args->rw == WRITE) { iocb->private = xfs_alloc_ioend(inode, IOMAP_UNWRITTEN); - ret = blockdev_direct_IO_own_locking(rw, iocb, inode, - bdev, iov, offset, nr_segs, - xfs_get_blocks_direct, - xfs_end_io_direct); + ret = blockdev_direct_IO_own_locking(iocb, inode, bdev, args, + xfs_get_blocks_direct, xfs_end_io_direct); } else { iocb->private = xfs_alloc_ioend(inode, IOMAP_READ); - ret = blockdev_direct_IO_no_locking(rw, iocb, inode, - bdev, iov, offset, nr_segs, - xfs_get_blocks_direct, - xfs_end_io_direct); + ret = blockdev_direct_IO_no_locking(iocb, inode, + bdev, args, xfs_get_blocks_direct, xfs_end_io_direct); } if (unlikely(ret != -EIOCBQUEUED && iocb->private)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 67888a9..5971116 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -560,6 +560,7 @@ typedef struct { typedef int (*read_actor_t)(read_descriptor_t *, struct page *, unsigned long, unsigned long); +struct dio_args; struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); @@ -585,8 +586,7 @@ struct address_space_operations { sector_t (*bmap)(struct address_space *, sector_t); void (*invalidatepage) (struct page *, unsigned long); int (*releasepage) (struct page *, gfp_t); - ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, - loff_t offset, unsigned long nr_segs); + ssize_t (*direct_IO)(struct kiocb *, struct dio_args *); int (*get_xip_mem)(struct address_space *, pgoff_t, int, void **, unsigned long *); /* migrate the contents of a page to the specified target */ @@ -2241,10 +2241,24 @@ static inline int xip_truncate_page(struct address_space *mapping, loff_t from) #endif #ifdef CONFIG_BLOCK -ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, - struct block_device *bdev, const struct iovec *iov, loff_t offset, - unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io, - int lock_type); + +/* + * Arguments passwed to aops->direct_IO() + */ +struct dio_args { + int rw; + const struct iovec *iov; + unsigned long length; + loff_t offset; + unsigned long nr_segs; +}; + +ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, + struct block_device *bdev, struct dio_args *args, get_block_t get_block, + dio_iodone_t end_io, int lock_type); + +ssize_t generic_file_direct_IO(int, struct address_space *, struct kiocb *, + const struct iovec *, loff_t, unsigned long); enum { DIO_LOCKING = 1, /* need locking between buffered and direct access */ @@ -2252,31 +2266,28 @@ enum { DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */ }; -static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb, - struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_block_t get_block, - dio_iodone_t end_io) +static inline ssize_t blockdev_direct_IO(struct kiocb *iocb, + struct inode *inode, struct block_device *bdev, struct dio_args *args, + get_block_t get_block, dio_iodone_t end_io) { - return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_block, end_io, DIO_LOCKING); + return __blockdev_direct_IO(iocb, inode, bdev, args, + get_block, end_io, DIO_LOCKING); } -static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb, - struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_block_t get_block, - dio_iodone_t end_io) +static inline ssize_t blockdev_direct_IO_no_locking(struct kiocb *iocb, + struct inode *inode, struct block_device *bdev, struct dio_args *args, + get_block_t get_block, dio_iodone_t end_io) { - return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_block, end_io, DIO_NO_LOCKING); + return __blockdev_direct_IO(iocb, inode, bdev, args, + get_block, end_io, DIO_NO_LOCKING); } -static inline ssize_t blockdev_direct_IO_own_locking(int rw, struct kiocb *iocb, - struct inode *inode, struct block_device *bdev, const struct iovec *iov, - loff_t offset, unsigned long nr_segs, get_block_t get_block, - dio_iodone_t end_io) +static inline ssize_t blockdev_direct_IO_own_locking(struct kiocb *iocb, + struct inode *inode, struct block_device *bdev, struct dio_args *args, + get_block_t get_block, dio_iodone_t end_io) { - return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, - nr_segs, get_block, end_io, DIO_OWN_LOCKING); + return __blockdev_direct_IO(iocb, inode, bdev, args, + get_block, end_io, DIO_OWN_LOCKING); } #endif diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index f6b9024..97a2383 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -408,8 +408,7 @@ extern int nfs3_removexattr (struct dentry *, const char *name); /* * linux/fs/nfs/direct.c */ -extern ssize_t nfs_direct_IO(int, struct kiocb *, const struct iovec *, loff_t, - unsigned long); +extern ssize_t nfs_direct_IO(struct kiocb *, struct dio_args *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos); diff --git a/mm/filemap.c b/mm/filemap.c index ccea3b6..cf85298 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1345,8 +1345,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, retval = filemap_write_and_wait_range(mapping, pos, pos + iov_length(iov, nr_segs) - 1); if (!retval) { - retval = mapping->a_ops->direct_IO(READ, iocb, - iov, pos, nr_segs); + retval = generic_file_direct_IO(READ, mapping, + iocb, iov, + pos, nr_segs); } if (retval > 0) *ppos = pos + retval; @@ -2144,7 +2145,8 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, } } - written = mapping->a_ops->direct_IO(WRITE, iocb, iov, pos, *nr_segs); + written = generic_file_direct_IO(WRITE, mapping, iocb, iov, pos, + *nr_segs); /* * Finally, try again to invalidate clean pages which might have been -- 1.6.4.53.g3f55e ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags 2009-08-20 10:17 ` [PATCH 1/4] direct-io: unify argument passing by adding a dio_args structure Jens Axboe @ 2009-08-20 10:17 ` Jens Axboe 2009-08-20 10:17 ` [PATCH 2/4] direct-io: make O_DIRECT IO path be page based Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:17 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Collapse the three ints into a flags variable, in preparation for adding another flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- include/linux/workqueue.h | 32 ++++++++++++++++++-------------- kernel/workqueue.c | 22 ++++++++-------------- 2 files changed, 26 insertions(+), 28 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 13e1adf..f14e20e 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -165,12 +165,17 @@ struct execute_work { extern struct workqueue_struct * -__create_workqueue_key(const char *name, int singlethread, - int freezeable, int rt, struct lock_class_key *key, - const char *lock_name); +__create_workqueue_key(const char *name, unsigned int flags, + struct lock_class_key *key, const char *lock_name); + +enum { + WQ_F_SINGLETHREAD = 1, + WQ_F_FREEZABLE = 2, + WQ_F_RT = 4, +}; #ifdef CONFIG_LOCKDEP -#define __create_workqueue(name, singlethread, freezeable, rt) \ +#define __create_workqueue(name, flags) \ ({ \ static struct lock_class_key __key; \ const char *__lock_name; \ @@ -180,20 +185,19 @@ __create_workqueue_key(const char *name, int singlethread, else \ __lock_name = #name; \ \ - __create_workqueue_key((name), (singlethread), \ - (freezeable), (rt), &__key, \ - __lock_name); \ + __create_workqueue_key((name), (flags), &__key, __lockname); \ }) #else -#define __create_workqueue(name, singlethread, freezeable, rt) \ - __create_workqueue_key((name), (singlethread), (freezeable), (rt), \ - NULL, NULL) +#define __create_workqueue(name, flags) \ + __create_workqueue_key((name), (flags), NULL, NULL) #endif -#define create_workqueue(name) __create_workqueue((name), 0, 0, 0) -#define create_rt_workqueue(name) __create_workqueue((name), 0, 0, 1) -#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1, 0) -#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0, 0) +#define create_workqueue(name) __create_workqueue((name), 0) +#define create_rt_workqueue(name) __create_workqueue((name), WQ_F_RT) +#define create_freezeable_workqueue(name) \ + __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) +#define create_singlethread_workqueue(name) \ + __create_workqueue((name), WQ_F_SINGLETHREAD) extern void destroy_workqueue(struct workqueue_struct *wq); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 0668795..02ba7c9 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -60,9 +60,7 @@ struct workqueue_struct { struct cpu_workqueue_struct *cpu_wq; struct list_head list; const char *name; - int singlethread; - int freezeable; /* Freeze threads during suspend */ - int rt; + unsigned int flags; /* WQ_F_* flags */ #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif @@ -84,9 +82,9 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; static cpumask_var_t cpu_populated_map __read_mostly; /* If it's single threaded, it isn't in the list of workqueues. */ -static inline int is_wq_single_threaded(struct workqueue_struct *wq) +static inline bool is_wq_single_threaded(struct workqueue_struct *wq) { - return wq->singlethread; + return wq->flags & WQ_F_SINGLETHREAD; } static const struct cpumask *wq_cpu_map(struct workqueue_struct *wq) @@ -314,7 +312,7 @@ static int worker_thread(void *__cwq) struct cpu_workqueue_struct *cwq = __cwq; DEFINE_WAIT(wait); - if (cwq->wq->freezeable) + if (cwq->wq->flags & WQ_F_FREEZABLE) set_freezable(); set_user_nice(current, -5); @@ -768,7 +766,7 @@ static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) */ if (IS_ERR(p)) return PTR_ERR(p); - if (cwq->wq->rt) + if (cwq->wq->flags & WQ_F_RT) sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m); cwq->thread = p; @@ -789,9 +787,7 @@ static void start_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) } struct workqueue_struct *__create_workqueue_key(const char *name, - int singlethread, - int freezeable, - int rt, + unsigned int flags, struct lock_class_key *key, const char *lock_name) { @@ -811,12 +807,10 @@ struct workqueue_struct *__create_workqueue_key(const char *name, wq->name = name; lockdep_init_map(&wq->lockdep_map, lock_name, key, 0); - wq->singlethread = singlethread; - wq->freezeable = freezeable; - wq->rt = rt; + wq->flags = flags; INIT_LIST_HEAD(&wq->list); - if (singlethread) { + if (flags & WQ_F_SINGLETHREAD) { cwq = init_cpu_workqueue(wq, singlethread_cpu); err = create_workqueue_thread(cwq, singlethread_cpu); start_workqueue_thread(cwq, -1); -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/4] direct-io: make O_DIRECT IO path be page based 2009-08-20 10:17 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe @ 2009-08-20 10:17 ` Jens Axboe 2009-08-20 10:17 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:17 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Currently we pass in the iovec array and let the O_DIRECT core handle the get_user_pages() business. This work, but it means that we can ever only use user pages for O_DIRECT. Switch the aops->direct_IO() and below code to use page arrays instead, so that it doesn't make any assumptions about who the pages belong to. This works directly for all users but NFS, which just uses the same helper that the generic mapping read/write functions also call. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- fs/direct-io.c | 304 ++++++++++++++++++++---------------------------- fs/nfs/direct.c | 161 +++++++++---------------- fs/nfs/file.c | 8 +- include/linux/fs.h | 15 ++- include/linux/nfs_fs.h | 7 +- mm/filemap.c | 6 +- 6 files changed, 206 insertions(+), 295 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 181848c..22a945b 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -38,12 +38,6 @@ #include <asm/atomic.h> /* - * How many user pages to map in one call to get_user_pages(). This determines - * the size of a structure on the stack. - */ -#define DIO_PAGES 64 - -/* * This code generally works in units of "dio_blocks". A dio_block is * somewhere between the hard sector size and the filesystem block size. it * is determined on a per-invocation basis. When talking to the filesystem @@ -105,20 +99,13 @@ struct dio { sector_t cur_page_block; /* Where it starts */ /* - * Page fetching state. These variables belong to dio_refill_pages(). - */ - int curr_page; /* changes */ - int total_pages; /* doesn't change */ - unsigned long curr_user_address;/* changes */ - - /* * Page queue. These variables belong to dio_refill_pages() and * dio_get_page(). */ - struct page *pages[DIO_PAGES]; /* page buffer */ - unsigned head; /* next page to process */ - unsigned tail; /* last valid page + 1 */ - int page_errors; /* errno from get_user_pages() */ + struct page **pages; /* page buffer */ + unsigned int head_page; /* next page to process */ + unsigned int total_pages; /* last valid page + 1 */ + unsigned int first_page_off; /* offset into first page in map */ /* BIO completion state */ spinlock_t bio_lock; /* protects BIO fields below */ @@ -134,57 +121,6 @@ struct dio { }; /* - * How many pages are in the queue? - */ -static inline unsigned dio_pages_present(struct dio *dio) -{ - return dio->tail - dio->head; -} - -/* - * Go grab and pin some userspace pages. Typically we'll get 64 at a time. - */ -static int dio_refill_pages(struct dio *dio) -{ - int ret; - int nr_pages; - - nr_pages = min(dio->total_pages - dio->curr_page, DIO_PAGES); - ret = get_user_pages_fast( - dio->curr_user_address, /* Where from? */ - nr_pages, /* How many pages? */ - dio->rw == READ, /* Write to memory? */ - &dio->pages[0]); /* Put results here */ - - if (ret < 0 && dio->blocks_available && (dio->rw & WRITE)) { - struct page *page = ZERO_PAGE(0); - /* - * A memory fault, but the filesystem has some outstanding - * mapped blocks. We need to use those blocks up to avoid - * leaking stale data in the file. - */ - if (dio->page_errors == 0) - dio->page_errors = ret; - page_cache_get(page); - dio->pages[0] = page; - dio->head = 0; - dio->tail = 1; - ret = 0; - goto out; - } - - if (ret >= 0) { - dio->curr_user_address += ret * PAGE_SIZE; - dio->curr_page += ret; - dio->head = 0; - dio->tail = ret; - ret = 0; - } -out: - return ret; -} - -/* * Get another userspace page. Returns an ERR_PTR on error. Pages are * buffered inside the dio so that we can call get_user_pages() against a * decent number of pages, less frequently. To provide nicer use of the @@ -192,15 +128,10 @@ out: */ static struct page *dio_get_page(struct dio *dio) { - if (dio_pages_present(dio) == 0) { - int ret; + if (dio->head_page < dio->total_pages) + return dio->pages[dio->head_page++]; - ret = dio_refill_pages(dio); - if (ret) - return ERR_PTR(ret); - BUG_ON(dio_pages_present(dio) == 0); - } - return dio->pages[dio->head++]; + return NULL; } /** @@ -245,8 +176,6 @@ static int dio_complete(struct dio *dio, loff_t offset, int ret) up_read_non_owner(&dio->inode->i_alloc_sem); if (ret == 0) - ret = dio->page_errors; - if (ret == 0) ret = dio->io_error; if (ret == 0) ret = transferred; @@ -351,8 +280,10 @@ static void dio_bio_submit(struct dio *dio) */ static void dio_cleanup(struct dio *dio) { - while (dio_pages_present(dio)) - page_cache_release(dio_get_page(dio)); + struct page *page; + + while ((page = dio_get_page(dio)) != NULL) + page_cache_release(page); } /* @@ -490,7 +421,6 @@ static int dio_bio_reap(struct dio *dio) */ static int get_more_blocks(struct dio *dio) { - int ret; struct buffer_head *map_bh = &dio->map_bh; sector_t fs_startblk; /* Into file, in filesystem-sized blocks */ unsigned long fs_count; /* Number of filesystem-sized blocks */ @@ -502,38 +432,33 @@ static int get_more_blocks(struct dio *dio) * If there was a memory error and we've overwritten all the * mapped blocks then we can now return that memory error */ - ret = dio->page_errors; - if (ret == 0) { - BUG_ON(dio->block_in_file >= dio->final_block_in_request); - fs_startblk = dio->block_in_file >> dio->blkfactor; - dio_count = dio->final_block_in_request - dio->block_in_file; - fs_count = dio_count >> dio->blkfactor; - blkmask = (1 << dio->blkfactor) - 1; - if (dio_count & blkmask) - fs_count++; - - map_bh->b_state = 0; - map_bh->b_size = fs_count << dio->inode->i_blkbits; - - create = dio->rw & WRITE; - if (dio->lock_type == DIO_LOCKING) { - if (dio->block_in_file < (i_size_read(dio->inode) >> - dio->blkbits)) - create = 0; - } else if (dio->lock_type == DIO_NO_LOCKING) { + BUG_ON(dio->block_in_file >= dio->final_block_in_request); + fs_startblk = dio->block_in_file >> dio->blkfactor; + dio_count = dio->final_block_in_request - dio->block_in_file; + fs_count = dio_count >> dio->blkfactor; + blkmask = (1 << dio->blkfactor) - 1; + if (dio_count & blkmask) + fs_count++; + + map_bh->b_state = 0; + map_bh->b_size = fs_count << dio->inode->i_blkbits; + + create = dio->rw & WRITE; + if (dio->lock_type == DIO_LOCKING) { + if (dio->block_in_file < (i_size_read(dio->inode) >> + dio->blkbits)) create = 0; - } - - /* - * For writes inside i_size we forbid block creations: only - * overwrites are permitted. We fall back to buffered writes - * at a higher level for inside-i_size block-instantiating - * writes. - */ - ret = (*dio->get_block)(dio->inode, fs_startblk, - map_bh, create); + } else if (dio->lock_type == DIO_NO_LOCKING) { + create = 0; } - return ret; + + /* + * For writes inside i_size we forbid block creations: only + * overwrites are permitted. We fall back to buffered writes + * at a higher level for inside-i_size block-instantiating + * writes. + */ + return dio->get_block(dio->inode, fs_startblk, map_bh, create); } /* @@ -567,8 +492,8 @@ static int dio_bio_add_page(struct dio *dio) { int ret; - ret = bio_add_page(dio->bio, dio->cur_page, - dio->cur_page_len, dio->cur_page_offset); + ret = bio_add_page(dio->bio, dio->cur_page, dio->cur_page_len, + dio->cur_page_offset); if (ret == dio->cur_page_len) { /* * Decrement count only, if we are done with this page @@ -804,6 +729,9 @@ static int do_direct_IO(struct dio *dio) unsigned this_chunk_blocks; /* # of blocks */ unsigned u; + offset_in_page += dio->first_page_off; + dio->first_page_off = 0; + if (dio->blocks_available == 0) { /* * Need to go and map some more disk @@ -933,13 +861,10 @@ direct_io_worker(struct kiocb *iocb, struct inode *inode, struct dio_args *args, unsigned blkbits, get_block_t get_block, dio_iodone_t end_io, struct dio *dio) { - const struct iovec *iov = args->iov; - unsigned long user_addr; unsigned long flags; - int seg, rw = args->rw; + int rw = args->rw; ssize_t ret = 0; ssize_t ret2; - size_t bytes; dio->inode = inode; dio->rw = rw; @@ -965,46 +890,25 @@ direct_io_worker(struct kiocb *iocb, struct inode *inode, if (unlikely(dio->blkfactor)) dio->pages_in_io = 2; - for (seg = 0; seg < args->nr_segs; seg++) { - user_addr = (unsigned long) iov[seg].iov_base; - dio->pages_in_io += - ((user_addr+iov[seg].iov_len +PAGE_SIZE-1)/PAGE_SIZE - - user_addr/PAGE_SIZE); - } + dio->pages_in_io += args->nr_segs; + dio->size = args->length; + if (args->user_addr) { + dio->first_page_off = args->user_addr & ~PAGE_MASK; + dio->first_block_in_page = dio->first_page_off >> blkbits; + if (dio->first_block_in_page) + dio->first_page_off -= 1 << blkbits; + } else + dio->first_page_off = args->first_page_off; - for (seg = 0; seg < args->nr_segs; seg++) { - user_addr = (unsigned long)iov[seg].iov_base; - dio->size += bytes = iov[seg].iov_len; - - /* Index into the first page of the first block */ - dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits; - dio->final_block_in_request = dio->block_in_file + - (bytes >> blkbits); - /* Page fetching state */ - dio->head = 0; - dio->tail = 0; - dio->curr_page = 0; - - dio->total_pages = 0; - if (user_addr & (PAGE_SIZE-1)) { - dio->total_pages++; - bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1)); - } - dio->total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE; - dio->curr_user_address = user_addr; - - ret = do_direct_IO(dio); + dio->final_block_in_request = dio->block_in_file + (dio->size >> blkbits); + dio->head_page = 0; + dio->total_pages = args->nr_segs; - dio->result += iov[seg].iov_len - + ret = do_direct_IO(dio); + + dio->result += args->length - ((dio->final_block_in_request - dio->block_in_file) << blkbits); - - if (ret) { - dio_cleanup(dio); - break; - } - } /* end iovec loop */ - if (ret == -ENOTBLK && (rw & WRITE)) { /* * The remaining part of the request will be @@ -1110,9 +1014,6 @@ __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct dio_args *args, get_block_t get_block, dio_iodone_t end_io, int dio_lock_type) { - int seg; - size_t size; - unsigned long addr; unsigned blkbits = inode->i_blkbits; unsigned bdev_blkbits = 0; unsigned blocksize_mask = (1 << blkbits) - 1; @@ -1138,17 +1039,14 @@ __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, } /* Check the memory alignment. Blocks cannot straddle pages */ - for (seg = 0; seg < args->nr_segs; seg++) { - addr = (unsigned long) args->iov[seg].iov_base; - size = args->iov[seg].iov_len; - end += size; - if ((addr & blocksize_mask) || (size & blocksize_mask)) { - if (bdev) - blkbits = bdev_blkbits; - blocksize_mask = (1 << blkbits) - 1; - if ((addr & blocksize_mask) || (size & blocksize_mask)) - goto out; - } + if ((args->user_addr & blocksize_mask) || + (args->length & blocksize_mask)) { + if (bdev) + blkbits = bdev_blkbits; + blocksize_mask = (1 << blkbits) - 1; + if ((args->user_addr & blocksize_mask) || + (args->length & blocksize_mask)) + goto out; } dio = kzalloc(sizeof(*dio), GFP_KERNEL); @@ -1156,6 +1054,8 @@ __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, if (!dio) goto out; + dio->pages = args->pages; + /* * For block device access DIO_NO_LOCKING is used, * neither readers nor writers do any locking at all @@ -1232,20 +1132,70 @@ out: } EXPORT_SYMBOL(__blockdev_direct_IO); -ssize_t generic_file_direct_IO(int rw, struct address_space *mapping, - struct kiocb *iocb, const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +static ssize_t __generic_file_direct_IO(int rw, struct address_space *mapping, + struct kiocb *iocb, + const struct iovec *iov, loff_t offset, + dio_io_actor *actor) { + struct page *stack_pages[UIO_FASTIOV]; + unsigned long nr_pages, start, end; struct dio_args args = { - .rw = rw, - .iov = iov, - .length = iov_length(iov, nr_segs), + .pages = stack_pages, + .length = iov->iov_len, + .user_addr = (unsigned long) iov->iov_base, .offset = offset, - .nr_segs = nr_segs, }; + ssize_t ret; + + end = (args.user_addr + iov->iov_len + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = args.user_addr >> PAGE_SHIFT; + nr_pages = end - start; + + if (nr_pages >= UIO_FASTIOV) { + args.pages = kzalloc(nr_pages * sizeof(struct page *), + GFP_KERNEL); + if (!args.pages) + return -ENOMEM; + } + + ret = get_user_pages_fast(args.user_addr, nr_pages, rw == READ, + args.pages); + if (ret > 0) { + args.nr_segs = ret; + ret = actor(iocb, &args); + } - if (mapping->a_ops->direct_IO) - return mapping->a_ops->direct_IO(iocb, &args); + if (args.pages != stack_pages) + kfree(args.pages); - return -EINVAL; + return ret; +} + +/* + * Transform the iov into a page based structure for passing into the lower + * parts of O_DIRECT handling + */ +ssize_t generic_file_direct_IO(int rw, struct address_space *mapping, + struct kiocb *kiocb, const struct iovec *iov, + loff_t offset, unsigned long nr_segs, + dio_io_actor *actor) +{ + ssize_t ret = 0, ret2; + unsigned long i; + + for (i = 0; i < nr_segs; i++) { + ret2 = __generic_file_direct_IO(rw, mapping, kiocb, iov, offset, + actor); + if (ret2 < 0) { + if (!ret) + ret = ret2; + break; + } + iov++; + offset += ret2; + ret += ret2; + } + + return ret; } +EXPORT_SYMBOL_GPL(generic_file_direct_IO); diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 45d931b..d9da548 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -271,13 +271,12 @@ static const struct rpc_call_ops nfs_read_direct_ops = { * no requests have been sent, just return an error. */ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, - const struct iovec *iov, - loff_t pos) + struct dio_args *args) { struct nfs_open_context *ctx = dreq->ctx; struct inode *inode = ctx->path.dentry->d_inode; - unsigned long user_addr = (unsigned long)iov->iov_base; - size_t count = iov->iov_len; + unsigned long user_addr = args->user_addr; + size_t count = args->length; size_t rsize = NFS_SERVER(inode)->rsize; struct rpc_task *task; struct rpc_message msg = { @@ -306,24 +305,8 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, if (unlikely(!data)) break; - down_read(¤t->mm->mmap_sem); - result = get_user_pages(current, current->mm, user_addr, - data->npages, 1, 0, data->pagevec, NULL); - up_read(¤t->mm->mmap_sem); - if (result < 0) { - nfs_readdata_free(data); - break; - } - if ((unsigned)result < data->npages) { - bytes = result * PAGE_SIZE; - if (bytes <= pgbase) { - nfs_direct_release_pages(data->pagevec, result); - nfs_readdata_free(data); - break; - } - bytes -= pgbase; - data->npages = result; - } + data->pagevec = args->pages; + data->npages = args->nr_segs; get_dreq(dreq); @@ -332,7 +315,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, data->cred = msg.rpc_cred; data->args.fh = NFS_FH(inode); data->args.context = ctx; - data->args.offset = pos; + data->args.offset = args->offset; data->args.pgbase = pgbase; data->args.pages = data->pagevec; data->args.count = bytes; @@ -361,7 +344,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, started += bytes; user_addr += bytes; - pos += bytes; + args->offset += bytes; /* FIXME: Remove this unnecessary math from final patch */ pgbase += bytes; pgbase &= ~PAGE_MASK; @@ -376,26 +359,19 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, } static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, - const struct iovec *iov, - unsigned long nr_segs, - loff_t pos) + struct dio_args *args) { ssize_t result = -EINVAL; size_t requested_bytes = 0; - unsigned long seg; get_dreq(dreq); - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *vec = &iov[seg]; - result = nfs_direct_read_schedule_segment(dreq, vec, pos); - if (result < 0) - break; - requested_bytes += result; - if ((size_t)result < vec->iov_len) - break; - pos += vec->iov_len; - } + result = nfs_direct_read_schedule_segment(dreq, args); + if (result < 0) + goto out; + + requested_bytes += result; + args += result; if (put_dreq(dreq)) nfs_direct_complete(dreq); @@ -403,13 +379,13 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, if (requested_bytes != 0) return 0; +out: if (result < 0) return result; return -EIO; } -static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) +static ssize_t nfs_direct_read(struct kiocb *iocb, struct dio_args *args) { ssize_t result = 0; struct inode *inode = iocb->ki_filp->f_mapping->host; @@ -424,7 +400,7 @@ static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov, if (!is_sync_kiocb(iocb)) dreq->iocb = iocb; - result = nfs_direct_read_schedule_iovec(dreq, iov, nr_segs, pos); + result = nfs_direct_read_schedule_iovec(dreq, args); if (!result) result = nfs_direct_wait(dreq); nfs_direct_req_release(dreq); @@ -691,13 +667,13 @@ static const struct rpc_call_ops nfs_write_direct_ops = { * no requests have been sent, just return an error. */ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, - const struct iovec *iov, - loff_t pos, int sync) + struct dio_args *args, + int sync) { struct nfs_open_context *ctx = dreq->ctx; struct inode *inode = ctx->path.dentry->d_inode; - unsigned long user_addr = (unsigned long)iov->iov_base; - size_t count = iov->iov_len; + unsigned long user_addr = args->user_addr; + size_t count = args->length; struct rpc_task *task; struct rpc_message msg = { .rpc_cred = ctx->cred, @@ -726,24 +702,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, if (unlikely(!data)) break; - down_read(¤t->mm->mmap_sem); - result = get_user_pages(current, current->mm, user_addr, - data->npages, 0, 0, data->pagevec, NULL); - up_read(¤t->mm->mmap_sem); - if (result < 0) { - nfs_writedata_free(data); - break; - } - if ((unsigned)result < data->npages) { - bytes = result * PAGE_SIZE; - if (bytes <= pgbase) { - nfs_direct_release_pages(data->pagevec, result); - nfs_writedata_free(data); - break; - } - bytes -= pgbase; - data->npages = result; - } + data->pagevec = args->pages; + data->npages = args->nr_segs; get_dreq(dreq); @@ -754,7 +714,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, data->cred = msg.rpc_cred; data->args.fh = NFS_FH(inode); data->args.context = ctx; - data->args.offset = pos; + data->args.offset = args->offset; data->args.pgbase = pgbase; data->args.pages = data->pagevec; data->args.count = bytes; @@ -784,7 +744,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, started += bytes; user_addr += bytes; - pos += bytes; + args->offset += bytes; /* FIXME: Remove this useless math from the final patch */ pgbase += bytes; @@ -800,27 +760,19 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, } static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, - const struct iovec *iov, - unsigned long nr_segs, - loff_t pos, int sync) + struct dio_args *args, int sync) { ssize_t result = 0; size_t requested_bytes = 0; - unsigned long seg; get_dreq(dreq); - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *vec = &iov[seg]; - result = nfs_direct_write_schedule_segment(dreq, vec, - pos, sync); - if (result < 0) - break; - requested_bytes += result; - if ((size_t)result < vec->iov_len) - break; - pos += vec->iov_len; - } + result = nfs_direct_write_schedule_segment(dreq, args, sync); + if (result < 0) + goto out; + + requested_bytes += result; + args->offset += result; if (put_dreq(dreq)) nfs_direct_write_complete(dreq, dreq->inode); @@ -828,14 +780,13 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, if (requested_bytes != 0) return 0; +out: if (result < 0) return result; return -EIO; } -static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos, - size_t count) +static ssize_t nfs_direct_write(struct kiocb *iocb, struct dio_args *args) { ssize_t result = 0; struct inode *inode = iocb->ki_filp->f_mapping->host; @@ -848,7 +799,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, return -ENOMEM; nfs_alloc_commit_data(dreq); - if (dreq->commit_data == NULL || count < wsize) + if (dreq->commit_data == NULL || args->length < wsize) sync = NFS_FILE_SYNC; dreq->inode = inode; @@ -856,7 +807,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, if (!is_sync_kiocb(iocb)) dreq->iocb = iocb; - result = nfs_direct_write_schedule_iovec(dreq, iov, nr_segs, pos, sync); + result = nfs_direct_write_schedule_iovec(dreq, args, sync); if (!result) result = nfs_direct_wait(dreq); nfs_direct_req_release(dreq); @@ -867,9 +818,7 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, /** * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block - * @iov: vector of user buffers into which to read data - * @nr_segs: size of iov vector - * @pos: byte offset in file where reading starts + * @args: direct IO arguments * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -885,21 +834,20 @@ static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) +static ssize_t nfs_file_direct_read(struct kiocb *iocb, struct dio_args *args) { ssize_t retval = -EINVAL; struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; size_t count; - count = iov_length(iov, nr_segs); + count = args->length; nfs_add_stats(mapping->host, NFSIOS_DIRECTREADBYTES, count); dfprintk(FILE, "NFS: direct read(%s/%s, %zd@%Ld)\n", file->f_path.dentry->d_parent->d_name.name, file->f_path.dentry->d_name.name, - count, (long long) pos); + count, (long long) args->offset); retval = 0; if (!count) @@ -909,9 +857,9 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, const struct iovec *iov, if (retval) goto out; - retval = nfs_direct_read(iocb, iov, nr_segs, pos); + retval = nfs_direct_read(iocb, args); if (retval > 0) - iocb->ki_pos = pos + retval; + iocb->ki_pos = args->offset + retval; out: return retval; @@ -920,9 +868,7 @@ out: /** * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block - * @iov: vector of user buffers from which to write data - * @nr_segs: size of iov vector - * @pos: byte offset in file where writing starts + * @args: direct IO arguments * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -942,23 +888,22 @@ out: * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) +static ssize_t nfs_file_direct_write(struct kiocb *iocb, struct dio_args *args) { ssize_t retval = -EINVAL; struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; size_t count; - count = iov_length(iov, nr_segs); + count = args->length; nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count); dfprintk(FILE, "NFS: direct write(%s/%s, %zd@%Ld)\n", file->f_path.dentry->d_parent->d_name.name, file->f_path.dentry->d_name.name, - count, (long long) pos); + count, (long long) args->offset); - retval = generic_write_checks(file, &pos, &count, 0); + retval = generic_write_checks(file, &args->offset, &count, 0); if (retval) goto out; @@ -973,15 +918,23 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, const struct iovec *iov, if (retval) goto out; - retval = nfs_direct_write(iocb, iov, nr_segs, pos, count); + retval = nfs_direct_write(iocb, args); if (retval > 0) - iocb->ki_pos = pos + retval; + iocb->ki_pos = args->offset + retval; out: return retval; } +ssize_t nfs_file_direct_io(struct kiocb *kiocb, struct dio_args *args) +{ + if (args->rw == READ) + return nfs_file_direct_read(kiocb, args); + + return nfs_file_direct_write(kiocb, args); +} + /** * nfs_init_directcache - create a slab cache for nfs_direct_req structures * diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 0506232..97d8cc7 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -249,13 +249,15 @@ static ssize_t nfs_file_read(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos) { + struct address_space *mapping = iocb->ki_filp->f_mapping; struct dentry * dentry = iocb->ki_filp->f_path.dentry; struct inode * inode = dentry->d_inode; ssize_t result; size_t count = iov_length(iov, nr_segs); if (iocb->ki_filp->f_flags & O_DIRECT) - return nfs_file_direct_read(iocb, iov, nr_segs, pos); + return generic_file_direct_IO(READ, mapping, iocb, iov, pos, + nr_segs, nfs_file_direct_io); dprintk("NFS: read(%s/%s, %lu@%lu)\n", dentry->d_parent->d_name.name, dentry->d_name.name, @@ -546,13 +548,15 @@ static int nfs_need_sync_write(struct file *filp, struct inode *inode) static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov, unsigned long nr_segs, loff_t pos) { + struct address_space *mapping = iocb->ki_filp->f_mapping; struct dentry * dentry = iocb->ki_filp->f_path.dentry; struct inode * inode = dentry->d_inode; ssize_t result; size_t count = iov_length(iov, nr_segs); if (iocb->ki_filp->f_flags & O_DIRECT) - return nfs_file_direct_write(iocb, iov, nr_segs, pos); + return generic_file_direct_IO(WRITE, mapping, iocb, iov, pos, + nr_segs, nfs_file_direct_io); dprintk("NFS: write(%s/%s, %lu@%Ld)\n", dentry->d_parent->d_name.name, dentry->d_name.name, diff --git a/include/linux/fs.h b/include/linux/fs.h index 5971116..539994a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2247,18 +2247,27 @@ static inline int xip_truncate_page(struct address_space *mapping, loff_t from) */ struct dio_args { int rw; - const struct iovec *iov; + struct page **pages; + unsigned int first_page_off; + unsigned long nr_segs; unsigned long length; loff_t offset; - unsigned long nr_segs; + + /* + * Original user pointer, we'll get rid of this + */ + unsigned long user_addr; }; ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct dio_args *args, get_block_t get_block, dio_iodone_t end_io, int lock_type); +typedef ssize_t (dio_io_actor)(struct kiocb *, struct dio_args *); + ssize_t generic_file_direct_IO(int, struct address_space *, struct kiocb *, - const struct iovec *, loff_t, unsigned long); + const struct iovec *, loff_t, unsigned long, + dio_io_actor); enum { DIO_LOCKING = 1, /* need locking between buffered and direct access */ diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 97a2383..ded8337 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -409,12 +409,7 @@ extern int nfs3_removexattr (struct dentry *, const char *name); * linux/fs/nfs/direct.c */ extern ssize_t nfs_direct_IO(struct kiocb *, struct dio_args *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - const struct iovec *iov, unsigned long nr_segs, - loff_t pos); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - const struct iovec *iov, unsigned long nr_segs, - loff_t pos); +extern ssize_t nfs_file_direct_io(struct kiocb *, struct dio_args *); /* * linux/fs/nfs/dir.c diff --git a/mm/filemap.c b/mm/filemap.c index cf85298..3e03021 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1346,8 +1346,8 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, pos + iov_length(iov, nr_segs) - 1); if (!retval) { retval = generic_file_direct_IO(READ, mapping, - iocb, iov, - pos, nr_segs); + iocb, iov, pos, nr_segs, + mapping->a_ops->direct_IO); } if (retval > 0) *ppos = pos + retval; @@ -2146,7 +2146,7 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, } written = generic_file_direct_IO(WRITE, mapping, iocb, iov, pos, - *nr_segs); + *nr_segs, mapping->a_ops->direct_IO); /* * Finally, try again to invalidate clean pages which might have been -- 1.6.4.53.g3f55e ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-20 10:17 ` [PATCH 2/4] direct-io: make O_DIRECT IO path be page based Jens Axboe @ 2009-08-20 10:17 ` Jens Axboe 2009-08-21 0:20 ` Andrew Morton 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2009-08-20 10:17 UTC (permalink / raw) To: linux-kernel; +Cc: jeff, benh, htejun, bzolnier, alan, Jens Axboe Lazy workqueues are like normal workqueues, except they don't start a thread per CPU by default. Instead threads are started when they are needed, and exit when they have been idle for some time. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> --- include/linux/workqueue.h | 5 ++ kernel/workqueue.c | 152 ++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 147 insertions(+), 10 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index f14e20e..b2dd267 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -32,6 +32,7 @@ struct work_struct { #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif + unsigned int cpu; }; #define WORK_DATA_INIT() ATOMIC_LONG_INIT(0) @@ -172,6 +173,7 @@ enum { WQ_F_SINGLETHREAD = 1, WQ_F_FREEZABLE = 2, WQ_F_RT = 4, + WQ_F_LAZY = 8, }; #ifdef CONFIG_LOCKDEP @@ -198,6 +200,7 @@ enum { __create_workqueue((name), WQ_F_SINGLETHREAD | WQ_F_FREEZABLE) #define create_singlethread_workqueue(name) \ __create_workqueue((name), WQ_F_SINGLETHREAD) +#define create_lazy_workqueue(name) __create_workqueue((name), WQ_F_LAZY) extern void destroy_workqueue(struct workqueue_struct *wq); @@ -211,6 +214,8 @@ extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, extern void flush_workqueue(struct workqueue_struct *wq); extern void flush_scheduled_work(void); +extern void workqueue_set_lazy_timeout(struct workqueue_struct *wq, + unsigned long timeout); extern int schedule_work(struct work_struct *work); extern int schedule_work_on(int cpu, struct work_struct *work); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 02ba7c9..d9ccebc 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -61,11 +61,17 @@ struct workqueue_struct { struct list_head list; const char *name; unsigned int flags; /* WQ_F_* flags */ + unsigned long lazy_timeout; + unsigned int core_cpu; #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif }; +/* Default lazy workqueue timeout */ +#define WQ_DEF_LAZY_TIMEOUT (60 * HZ) + + /* Serializes the accesses to the list of workqueues. */ static DEFINE_SPINLOCK(workqueue_lock); static LIST_HEAD(workqueues); @@ -81,6 +87,8 @@ static const struct cpumask *cpu_singlethread_map __read_mostly; */ static cpumask_var_t cpu_populated_map __read_mostly; +static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu); + /* If it's single threaded, it isn't in the list of workqueues. */ static inline bool is_wq_single_threaded(struct workqueue_struct *wq) { @@ -141,11 +149,29 @@ static void insert_work(struct cpu_workqueue_struct *cwq, static void __queue_work(struct cpu_workqueue_struct *cwq, struct work_struct *work) { + struct workqueue_struct *wq = cwq->wq; unsigned long flags; - spin_lock_irqsave(&cwq->lock, flags); - insert_work(cwq, work, &cwq->worklist); - spin_unlock_irqrestore(&cwq->lock, flags); + /* + * This is a lazy workqueue and this particular CPU thread has + * exited. We can't create it from here, so add this work on our + * static thread. It will create this thread and move the work there. + */ + if ((wq->flags & WQ_F_LAZY) && !cwq->thread) { + struct cpu_workqueue_struct *__cwq; + + local_irq_save(flags); + __cwq = wq_per_cpu(wq, wq->core_cpu); + work->cpu = smp_processor_id(); + spin_lock(&__cwq->lock); + insert_work(__cwq, work, &__cwq->worklist); + spin_unlock_irqrestore(&__cwq->lock, flags); + } else { + spin_lock_irqsave(&cwq->lock, flags); + work->cpu = smp_processor_id(); + insert_work(cwq, work, &cwq->worklist); + spin_unlock_irqrestore(&cwq->lock, flags); + } } /** @@ -259,13 +285,16 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq, } EXPORT_SYMBOL_GPL(queue_delayed_work_on); -static void run_workqueue(struct cpu_workqueue_struct *cwq) +static int run_workqueue(struct cpu_workqueue_struct *cwq) { + int did_work = 0; + spin_lock_irq(&cwq->lock); while (!list_empty(&cwq->worklist)) { struct work_struct *work = list_entry(cwq->worklist.next, struct work_struct, entry); work_func_t f = work->func; + int cpu; #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct @@ -280,7 +309,34 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) trace_workqueue_execution(cwq->thread, work); cwq->current_work = work; list_del_init(cwq->worklist.next); + cpu = smp_processor_id(); spin_unlock_irq(&cwq->lock); + did_work = 1; + + /* + * If work->cpu isn't us, then we need to create the target + * workqueue thread (if someone didn't already do that) and + * move the work over there. + */ + if ((cwq->wq->flags & WQ_F_LAZY) && work->cpu != cpu) { + struct cpu_workqueue_struct *__cwq; + struct task_struct *p; + int err; + + __cwq = wq_per_cpu(cwq->wq, work->cpu); + p = __cwq->thread; + if (!p) + err = create_workqueue_thread(__cwq, work->cpu); + p = __cwq->thread; + if (p) { + if (work->cpu >= 0) + kthread_bind(p, work->cpu); + insert_work(__cwq, work, &__cwq->worklist); + wake_up_process(p); + goto out; + } + } + BUG_ON(get_wq_data(work) != cwq); work_clear_pending(work); @@ -305,24 +361,45 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) cwq->current_work = NULL; } spin_unlock_irq(&cwq->lock); +out: + return did_work; } static int worker_thread(void *__cwq) { struct cpu_workqueue_struct *cwq = __cwq; + struct workqueue_struct *wq = cwq->wq; + unsigned long last_active = jiffies; DEFINE_WAIT(wait); + int may_exit; - if (cwq->wq->flags & WQ_F_FREEZABLE) + if (wq->flags & WQ_F_FREEZABLE) set_freezable(); set_user_nice(current, -5); + /* + * Allow exit if this isn't our core thread + */ + if ((wq->flags & WQ_F_LAZY) && smp_processor_id() != wq->core_cpu) + may_exit = 1; + else + may_exit = 0; + for (;;) { + int did_work; + prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE); if (!freezing(current) && !kthread_should_stop() && - list_empty(&cwq->worklist)) - schedule(); + list_empty(&cwq->worklist)) { + unsigned long timeout = wq->lazy_timeout; + + if (timeout && may_exit) + schedule_timeout(timeout); + else + schedule(); + } finish_wait(&cwq->more_work, &wait); try_to_freeze(); @@ -330,7 +407,19 @@ static int worker_thread(void *__cwq) if (kthread_should_stop()) break; - run_workqueue(cwq); + did_work = run_workqueue(cwq); + + /* + * If we did no work for the defined timeout period and we are + * allowed to exit, do so. + */ + if (did_work) + last_active = jiffies; + else if (time_after(jiffies, last_active + wq->lazy_timeout) && + may_exit) { + cwq->thread = NULL; + break; + } } return 0; @@ -814,7 +903,10 @@ struct workqueue_struct *__create_workqueue_key(const char *name, cwq = init_cpu_workqueue(wq, singlethread_cpu); err = create_workqueue_thread(cwq, singlethread_cpu); start_workqueue_thread(cwq, -1); + wq->core_cpu = singlethread_cpu; } else { + int created = 0; + cpu_maps_update_begin(); /* * We must place this wq on list even if the code below fails. @@ -833,10 +925,16 @@ struct workqueue_struct *__create_workqueue_key(const char *name, */ for_each_possible_cpu(cpu) { cwq = init_cpu_workqueue(wq, cpu); - if (err || !cpu_online(cpu)) + if (err || !cpu_online(cpu) || + (created && (wq->flags & WQ_F_LAZY))) continue; err = create_workqueue_thread(cwq, cpu); start_workqueue_thread(cwq, cpu); + if (!err) { + if (!created) + wq->core_cpu = cpu; + created++; + } } cpu_maps_update_done(); } @@ -844,7 +942,9 @@ struct workqueue_struct *__create_workqueue_key(const char *name, if (err) { destroy_workqueue(wq); wq = NULL; - } + } else if (wq->flags & WQ_F_LAZY) + workqueue_set_lazy_timeout(wq, WQ_DEF_LAZY_TIMEOUT); + return wq; } EXPORT_SYMBOL_GPL(__create_workqueue_key); @@ -877,6 +977,13 @@ static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq) cwq->thread = NULL; } +static bool hotplug_should_start_thread(struct workqueue_struct *wq, int cpu) +{ + if ((wq->flags & WQ_F_LAZY) && cpu != wq->core_cpu) + return 0; + return 1; +} + /** * destroy_workqueue - safely terminate a workqueue * @wq: target workqueue @@ -923,6 +1030,8 @@ undo: switch (action) { case CPU_UP_PREPARE: + if (!hotplug_should_start_thread(wq, cpu)) + break; if (!create_workqueue_thread(cwq, cpu)) break; printk(KERN_ERR "workqueue [%s] for %i failed\n", @@ -932,6 +1041,8 @@ undo: goto undo; case CPU_ONLINE: + if (!hotplug_should_start_thread(wq, cpu)) + break; start_workqueue_thread(cwq, cpu); break; @@ -999,6 +1110,27 @@ long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) EXPORT_SYMBOL_GPL(work_on_cpu); #endif /* CONFIG_SMP */ +/** + * workqueue_set_lazy_timeout - set lazy exit timeout + * @wq: the associated workqueue_struct + * @timeout: timeout in jiffies + * + * This will set the timeout for a lazy workqueue. If no work has been + * processed for @timeout jiffies, then the workqueue is allowed to exit. + * It will be dynamically created again when work is queued to it. + * + * Note that this only works for workqueues created with + * create_lazy_workqueue(). + */ +void workqueue_set_lazy_timeout(struct workqueue_struct *wq, + unsigned long timeout) +{ + if (WARN_ON(!(wq->flags & WQ_F_LAZY))) + return; + + wq->lazy_timeout = timeout; +} + void __init init_workqueues(void) { alloc_cpumask_var(&cpu_populated_map, GFP_KERNEL); -- 1.6.4.173.g3f189 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-20 10:17 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe @ 2009-08-21 0:20 ` Andrew Morton 2009-08-24 8:06 ` Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Andrew Morton @ 2009-08-21 0:20 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan, jens.axboe On Thu, 20 Aug 2009 12:17:39 +0200 Jens Axboe <jens.axboe@oracle.com> wrote: > Lazy workqueues are like normal workqueues, except they don't > start a thread per CPU by default. Instead threads are started > when they are needed, and exit when they have been idle for > some time. > > > ... > > @@ -280,7 +309,34 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > trace_workqueue_execution(cwq->thread, work); > cwq->current_work = work; > list_del_init(cwq->worklist.next); > + cpu = smp_processor_id(); > spin_unlock_irq(&cwq->lock); > + did_work = 1; > + > + /* > + * If work->cpu isn't us, then we need to create the target > + * workqueue thread (if someone didn't already do that) and > + * move the work over there. > + */ > + if ((cwq->wq->flags & WQ_F_LAZY) && work->cpu != cpu) { > + struct cpu_workqueue_struct *__cwq; > + struct task_struct *p; > + int err; > + > + __cwq = wq_per_cpu(cwq->wq, work->cpu); > + p = __cwq->thread; > + if (!p) > + err = create_workqueue_thread(__cwq, work->cpu); > + p = __cwq->thread; > + if (p) { > + if (work->cpu >= 0) It's an unsigned int. This test is always true. > + kthread_bind(p, work->cpu); I wonder what happens if work->cpu isn't online any more. > + insert_work(__cwq, work, &__cwq->worklist); > + wake_up_process(p); > + goto out; > + } > + } > + > > BUG_ON(get_wq_data(work) != cwq); > work_clear_pending(work); > > ... > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/6] workqueue: add support for lazy workqueues 2009-08-21 0:20 ` Andrew Morton @ 2009-08-24 8:06 ` Jens Axboe 0 siblings, 0 replies; 30+ messages in thread From: Jens Axboe @ 2009-08-24 8:06 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, jeff, benh, htejun, bzolnier, alan On Thu, Aug 20 2009, Andrew Morton wrote: > On Thu, 20 Aug 2009 12:17:39 +0200 > Jens Axboe <jens.axboe@oracle.com> wrote: > > > Lazy workqueues are like normal workqueues, except they don't > > start a thread per CPU by default. Instead threads are started > > when they are needed, and exit when they have been idle for > > some time. > > > > > > ... > > > > @@ -280,7 +309,34 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > > trace_workqueue_execution(cwq->thread, work); > > cwq->current_work = work; > > list_del_init(cwq->worklist.next); > > + cpu = smp_processor_id(); > > spin_unlock_irq(&cwq->lock); > > + did_work = 1; > > + > > + /* > > + * If work->cpu isn't us, then we need to create the target > > + * workqueue thread (if someone didn't already do that) and > > + * move the work over there. > > + */ > > + if ((cwq->wq->flags & WQ_F_LAZY) && work->cpu != cpu) { > > + struct cpu_workqueue_struct *__cwq; > > + struct task_struct *p; > > + int err; > > + > > + __cwq = wq_per_cpu(cwq->wq, work->cpu); > > + p = __cwq->thread; > > + if (!p) > > + err = create_workqueue_thread(__cwq, work->cpu); > > + p = __cwq->thread; > > + if (p) { > > + if (work->cpu >= 0) > > It's an unsigned int. This test is always true. > > > + kthread_bind(p, work->cpu); > > I wonder what happens if work->cpu isn't online any more. That's a good question. The workqueue "documentation" states that it is the callers responsibility to ensure that the CPU stays online, but I think that requirement is pretty much ignored. Probably since it'd be costly to do. So that bits needs looking into. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-08-24 9:12 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-20 10:19 [PATCH 0/6] Lazy workqueues Jens Axboe 2009-08-20 10:19 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe 2009-08-20 10:20 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe 2009-08-20 12:01 ` Frederic Weisbecker 2009-08-20 12:10 ` Jens Axboe 2009-08-20 10:20 ` [PATCH 3/6] crypto: use " Jens Axboe 2009-08-20 10:20 ` [PATCH 4/6] libata: use lazy workqueues for the pio task Jens Axboe 2009-08-20 12:40 ` Stefan Richter 2009-08-20 12:48 ` Jens Axboe 2009-08-20 10:20 ` [PATCH 5/6] aio: use lazy workqueues Jens Axboe 2009-08-20 15:09 ` Jeff Moyer 2009-08-21 18:31 ` Zach Brown 2009-08-20 10:20 ` [PATCH 6/6] sunrpc: " Jens Axboe 2009-08-20 12:04 ` [PATCH 0/6] Lazy workqueues Peter Zijlstra 2009-08-20 12:08 ` Jens Axboe 2009-08-20 12:16 ` Peter Zijlstra 2009-08-23 2:42 ` Junio C Hamano 2009-08-24 7:04 ` git send-email defaults Peter Zijlstra 2009-08-24 8:04 ` [PATCH 0/6] Lazy workqueues Jens Axboe 2009-08-24 9:03 ` Junio C Hamano 2009-08-24 9:11 ` Peter Zijlstra 2009-08-20 12:22 ` Frederic Weisbecker 2009-08-20 12:41 ` Jens Axboe 2009-08-20 13:04 ` Tejun Heo 2009-08-20 12:59 ` Steven Whitehouse 2009-08-20 12:55 ` Tejun Heo 2009-08-21 6:58 ` Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2009-08-20 10:17 Jens Axboe 2009-08-20 10:17 ` [PATCH 1/4] direct-io: unify argument passing by adding a dio_args structure Jens Axboe 2009-08-20 10:17 ` [PATCH 1/6] workqueue: replace singlethread/freezable/rt parameters and variables with flags Jens Axboe 2009-08-20 10:17 ` [PATCH 2/4] direct-io: make O_DIRECT IO path be page based Jens Axboe 2009-08-20 10:17 ` [PATCH 2/6] workqueue: add support for lazy workqueues Jens Axboe 2009-08-21 0:20 ` Andrew Morton 2009-08-24 8:06 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).