* [PATCH] fastboot: keep at least one thread per cpu during boot
@ 2009-02-09 3:48 Frederic Weisbecker
2009-02-09 5:27 ` Arjan van de Ven
0 siblings, 1 reply; 4+ messages in thread
From: Frederic Weisbecker @ 2009-02-09 3:48 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Cornelia Huck, lkml
Async threads are created and destroyed depending on the number of jobs in queue.
It means that several async threads can be created for a specific batch of work,
then the threads will die after the completion of this batch, but they could be
needed just after this completion for another batch of work.
During the boot, such repetitive thread creations can be wasteful, that's why
this patch proposes to keep at least one thread per cpu (if they already have
been created once). Such a threshold of threads kept alive will prevent from
a part of the thread creation overhead.
This threshold will be dropped one the system_state switches from SYSTEM_BOOTING
to SYSTEM_RUNNING.
Note:
_ If this patch is accepted, I will try to extend it to modules loading on boot
_ One thread per cpu could sound a bit arbitrary here. Actually this is compromize
between memory saving (if we just created lots of async thread for a large batch
of jobs) and task creation overhead.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
include/linux/async.h | 2 +
init/main.c | 1 +
kernel/async.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/include/linux/async.h b/include/linux/async.h
index 68a9530..71a09e2 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -25,3 +25,5 @@ extern void async_synchronize_cookie(async_cookie_t cookie);
extern void async_synchronize_cookie_domain(async_cookie_t cookie,
struct list_head *list);
+extern void async_finish_boot(void);
+
diff --git a/init/main.c b/init/main.c
index 36de89b..fa99928 100644
--- a/init/main.c
+++ b/init/main.c
@@ -806,6 +806,7 @@ static noinline int init_post(void)
unlock_kernel();
mark_rodata_ro();
system_state = SYSTEM_RUNNING;
+ async_finish_boot();
numa_default_policy();
if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
diff --git a/kernel/async.c b/kernel/async.c
index f565891..25c12d0 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -176,7 +176,7 @@ static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct l
struct async_entry *entry;
unsigned long flags;
async_cookie_t newcookie;
-
+
/* allow irq-off callers */
entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
@@ -313,6 +313,54 @@ void async_synchronize_cookie(async_cookie_t cookie)
EXPORT_SYMBOL_GPL(async_synchronize_cookie);
+/**
+ * async_finish_boot - wake up the async thread which stayed alive only to keep
+ * the minimum threshold of async threads on boot.
+ */
+void async_finish_boot(void)
+{
+ wake_up(&async_new);
+}
+
+/**
+ * Adaptive wait function for the async threads.
+ * While booting, we want to keep about one thread per
+ * cpu to avoid wasteful threads creations/deletions.
+ * We return in the normal async thread creation/deletion mode once
+ * the boot is finished, since async is most used during the boot.
+ *
+ * @return: 0 if we assume the thread should be destroyed
+ */
+static int async_thread_sleep(int timeout)
+{
+ static atomic_t nb_sleeping = ATOMIC_INIT(-1);
+ int tc;
+ int ret;
+
+ /*
+ * If several async threads come here together and if we are in the
+ * boot stage, those which overlap the number of boot thread threshold
+ * will sleep and then assume they have to die...
+ */
+ tc = atomic_read(&thread_count) - atomic_inc_return(&nb_sleeping);
+
+ if (system_state == SYSTEM_BOOTING && tc <= num_online_cpus()) {
+ schedule();
+ if (system_state == SYSTEM_RUNNING)
+ /* We may have been awoken by async_finish_boot() */
+ ret = 0;
+ else
+ /* We may have a job to handle */
+ ret = timeout;
+ } else {
+ ret = schedule_timeout(timeout);
+ }
+
+ atomic_dec(&nb_sleeping);
+
+ return ret;
+}
+
static int async_thread(void *unused)
{
DECLARE_WAITQUEUE(wq, current);
@@ -330,7 +378,7 @@ static int async_thread(void *unused)
if (!list_empty(&async_pending))
run_one_entry();
else
- ret = schedule_timeout(HZ);
+ ret = async_thread_sleep(ret);
if (ret == 0) {
/*
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] fastboot: keep at least one thread per cpu during boot
2009-02-09 3:48 [PATCH] fastboot: keep at least one thread per cpu during boot Frederic Weisbecker
@ 2009-02-09 5:27 ` Arjan van de Ven
2009-02-09 10:17 ` Cornelia Huck
2009-02-09 13:34 ` Frederic Weisbecker
0 siblings, 2 replies; 4+ messages in thread
From: Arjan van de Ven @ 2009-02-09 5:27 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: Cornelia Huck, lkml
On Mon, 9 Feb 2009 04:48:27 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:
> Async threads are created and destroyed depending on the number of
> jobs in queue. It means that several async threads can be created for
> a specific batch of work, then the threads will die after the
> completion of this batch, but they could be needed just after this
> completion for another batch of work. During the boot, such
> repetitive thread creations can be wasteful, that's why this patch
> proposes to keep at least one thread per cpu (if they already have
> been created once). Such a threshold of threads kept alive will
> prevent from a part of the thread creation overhead. This threshold
> will be dropped one the system_state switches from SYSTEM_BOOTING to
> SYSTEM_RUNNING.
I'm not very fond of this to be honest;
at least during boot there's enough activity, and the time is so short
(that's the point of the parallel stuff!) that this will not kick in to
make a difference; specifically, every boot I've seen the number of
threads is highest near the end, and also the total kernel boot time is
below 1.5 seconds or so, not long enough for the threads to die.
Creating a thread is *CHEAP*. Really really cheap. You can do 100
thousands/second on even a modest CPU. If you have a high frequency of
events, you don't want this, sure, and that is why there is a one
second delay to give opportunity for reuse... but really....
Now, if async function calls get used more, I can see the point of
always keeping one thread alive, just for both performance and VM low
memory issues; but that's not what your patch is doing.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] fastboot: keep at least one thread per cpu during boot
2009-02-09 5:27 ` Arjan van de Ven
@ 2009-02-09 10:17 ` Cornelia Huck
2009-02-09 13:34 ` Frederic Weisbecker
1 sibling, 0 replies; 4+ messages in thread
From: Cornelia Huck @ 2009-02-09 10:17 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Frederic Weisbecker, lkml
On Sun, 8 Feb 2009 21:27:48 -0800,
Arjan van de Ven <arjan@infradead.org> wrote:
> On Mon, 9 Feb 2009 04:48:27 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Async threads are created and destroyed depending on the number of
> > jobs in queue. It means that several async threads can be created for
> > a specific batch of work, then the threads will die after the
> > completion of this batch, but they could be needed just after this
> > completion for another batch of work. During the boot, such
> > repetitive thread creations can be wasteful, that's why this patch
> > proposes to keep at least one thread per cpu (if they already have
> > been created once). Such a threshold of threads kept alive will
> > prevent from a part of the thread creation overhead. This threshold
> > will be dropped one the system_state switches from SYSTEM_BOOTING to
> > SYSTEM_RUNNING.
>
> I'm not very fond of this to be honest;
> at least during boot there's enough activity, and the time is so short
> (that's the point of the parallel stuff!) that this will not kick in to
> make a difference; specifically, every boot I've seen the number of
> threads is highest near the end, and also the total kernel boot time is
> below 1.5 seconds or so, not long enough for the threads to die.
>
> Creating a thread is *CHEAP*. Really really cheap. You can do 100
> thousands/second on even a modest CPU. If you have a high frequency of
> events, you don't want this, sure, and that is why there is a one
> second delay to give opportunity for reuse... but really....
Agreed.
>
>
> Now, if async function calls get used more, I can see the point of
> always keeping one thread alive, just for both performance and VM low
> memory issues; but that's not what your patch is doing.
I'd argue that the ability to _schedule_ async stuff without memory
allocation would help more with low memory situations - after all, the
work has been scheduled for later.
(For that, I have hacked up the following completely untested patch,
but I'm not yet completely happy with it.)
---
include/linux/async.h | 19 +++++++++
kernel/async.c | 105 +++++++++++++++++++++++++++++++++-----------------
2 files changed, 89 insertions(+), 35 deletions(-)
--- linux-2.6.orig/include/linux/async.h
+++ linux-2.6/include/linux/async.h
@@ -16,9 +16,28 @@
typedef u64 async_cookie_t;
typedef void (async_func_ptr) (void *data, async_cookie_t cookie);
+/**
+ * struct async_entry - entry for asynchronous scheduling
+ * @list: anchor for internal lists
+ * @cookie: cookie for checkpointing
+ * @func: asynchronous function to execute
+ * @data: data to pass to the function
+ * @running: synchronization domain to use
+ * @persistent: 1 if the entry must not be deleted by the core
+ */
+struct async_entry {
+ struct list_head list;
+ async_cookie_t cookie;
+ async_func_ptr *func;
+ void *data;
+ struct list_head *running;
+ int *persistent;
+};
+
extern async_cookie_t async_schedule(async_func_ptr *ptr, void *data);
extern async_cookie_t async_schedule_domain(async_func_ptr *ptr, void *data,
struct list_head *list);
+extern async_cookie_t async_schedule_prealloc(struct async_entry *entry);
extern void async_synchronize_full(void);
extern void async_synchronize_full_domain(struct list_head *list);
extern void async_synchronize_cookie(async_cookie_t cookie);
--- linux-2.6.orig/kernel/async.c
+++ linux-2.6/kernel/async.c
@@ -68,14 +68,6 @@ static DEFINE_SPINLOCK(async_lock);
static int async_enabled = 0;
-struct async_entry {
- struct list_head list;
- async_cookie_t cookie;
- async_func_ptr *func;
- void *data;
- struct list_head *running;
-};
-
static DECLARE_WAIT_QUEUE_HEAD(async_done);
static DECLARE_WAIT_QUEUE_HEAD(async_new);
@@ -157,7 +149,8 @@ static void run_one_entry(void)
list_del(&entry->list);
/* 5) free the entry */
- kfree(entry);
+ if (!entry->persistent)
+ kfree(entry);
atomic_dec(&entry_count);
spin_unlock_irqrestore(&async_lock, flags);
@@ -170,34 +163,24 @@ out:
spin_unlock_irqrestore(&async_lock, flags);
}
-
-static async_cookie_t __async_schedule(async_func_ptr *ptr, void *data, struct list_head *running)
+static async_cookie_t __async_run_sync(async_func_ptr *ptr, void *data)
{
- struct async_entry *entry;
- unsigned long flags;
async_cookie_t newcookie;
-
+ unsigned long flags;
- /* allow irq-off callers */
- entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
+ spin_lock_irqsave(&async_lock, flags);
+ newcookie = next_cookie++;
+ spin_unlock_irqrestore(&async_lock, flags);
- /*
- * If we're out of memory or if there's too much work
- * pending already, we execute synchronously.
- */
- if (!async_enabled || !entry || atomic_read(&entry_count) > MAX_WORK) {
- kfree(entry);
- spin_lock_irqsave(&async_lock, flags);
- newcookie = next_cookie++;
- spin_unlock_irqrestore(&async_lock, flags);
-
- /* low on memory.. run synchronously */
- ptr(data, newcookie);
- return newcookie;
- }
- entry->func = ptr;
- entry->data = data;
- entry->running = running;
+ /* Run synchronously */
+ ptr(data, newcookie);
+ return newcookie;
+}
+
+static async_cookie_t __async_schedule(struct async_entry *entry)
+{
+ unsigned long flags;
+ async_cookie_t newcookie;
spin_lock_irqsave(&async_lock, flags);
newcookie = entry->cookie = next_cookie++;
@@ -208,6 +191,24 @@ static async_cookie_t __async_schedule(a
return newcookie;
}
+static struct async_entry *__async_generate_entry(async_func_ptr *ptr,
+ void *data,
+ struct list_head *running)
+{
+ struct async_entry *entry;
+
+ if (!async_enabled || atomic_read(&entry_count) > MAX_WORK)
+ return NULL;
+ /* allow irq-off callers */
+ entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
+ if (entry) {
+ entry->func = ptr;
+ entry->data = data;
+ entry->running = running;
+ }
+ return entry;
+}
+
/**
* async_schedule - schedule a function for asynchronous execution
* @ptr: function to execute asynchronously
@@ -218,7 +219,13 @@ static async_cookie_t __async_schedule(a
*/
async_cookie_t async_schedule(async_func_ptr *ptr, void *data)
{
- return __async_schedule(ptr, data, &async_running);
+ struct async_entry *entry;
+
+ entry = __async_generate_entry(ptr, data, &async_running);
+ if (entry)
+ return __async_schedule(entry);
+ else
+ return __async_run_sync(ptr, data);
}
EXPORT_SYMBOL_GPL(async_schedule);
@@ -237,11 +244,39 @@ EXPORT_SYMBOL_GPL(async_schedule);
async_cookie_t async_schedule_domain(async_func_ptr *ptr, void *data,
struct list_head *running)
{
- return __async_schedule(ptr, data, running);
+ struct async_entry *entry;
+
+ entry = __async_generate_entry(ptr, data, running);
+ if (entry)
+ return __async_schedule(entry);
+ else
+ return __async_run_sync(ptr, data);
}
EXPORT_SYMBOL_GPL(async_schedule_domain);
/**
+ * async_schedule_prealloc - schedule a preallocated asynchronous entry
+ * @entry: pointer to asynchronous entry
+ *
+ * Returns an async_cookie_t that may be used for checkpointing later.
+ * The caller must have setup @entry before calling this function
+ * (especially @entry->func) and must make sure an entry is not scheduled
+ * multiple times simultaneously. @entry->running may be left blank to
+ * use the default synchronization domain.
+ * Note: This function may be called from atomic or non-atomic contexts.
+ */
+async_cookie_t async_schedule_prealloc(struct async_entry *entry)
+{
+ if (!entry->running)
+ entry->running = &async_running;
+ if (async_enabled && atomic_read(&entry_count) <= MAX_WORK)
+ return __async_schedule(entry);
+ else
+ return __async_run_sync(entry->func, entry->data);
+}
+EXPORT_SYMBOL_GPL(async_schedule_prealloc);
+
+/**
* async_synchronize_full - synchronize all asynchronous function calls
*
* This function waits until all asynchronous function calls have been done.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] fastboot: keep at least one thread per cpu during boot
2009-02-09 5:27 ` Arjan van de Ven
2009-02-09 10:17 ` Cornelia Huck
@ 2009-02-09 13:34 ` Frederic Weisbecker
1 sibling, 0 replies; 4+ messages in thread
From: Frederic Weisbecker @ 2009-02-09 13:34 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Cornelia Huck, lkml
On Sun, Feb 08, 2009 at 09:27:48PM -0800, Arjan van de Ven wrote:
> On Mon, 9 Feb 2009 04:48:27 +0100
> Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > Async threads are created and destroyed depending on the number of
> > jobs in queue. It means that several async threads can be created for
> > a specific batch of work, then the threads will die after the
> > completion of this batch, but they could be needed just after this
> > completion for another batch of work. During the boot, such
> > repetitive thread creations can be wasteful, that's why this patch
> > proposes to keep at least one thread per cpu (if they already have
> > been created once). Such a threshold of threads kept alive will
> > prevent from a part of the thread creation overhead. This threshold
> > will be dropped one the system_state switches from SYSTEM_BOOTING to
> > SYSTEM_RUNNING.
>
> I'm not very fond of this to be honest;
> at least during boot there's enough activity, and the time is so short
> (that's the point of the parallel stuff!) that this will not kick in to
> make a difference; specifically, every boot I've seen the number of
> threads is highest near the end, and also the total kernel boot time is
> below 1.5 seconds or so, not long enough for the threads to die.
My boot takes more time (about 5 seconds before modules loading).
> Creating a thread is *CHEAP*. Really really cheap. You can do 100
> thousands/second on even a modest CPU. If you have a high frequency of
> events, you don't want this, sure, and that is why there is a one
> second delay to give opportunity for reuse... but really....
Ok. And that's a problem with my patch. I did not have a suitable
testcase to produce a relevant benchmark: the aync insertions were to close
in time to capture something interesting.
If I saw the result of a testcase, I would have seen probably no different :-)
I guess you're right, this would have added new code to maintain for only micro optimizations...
>
> Now, if async function calls get used more, I can see the point of
> always keeping one thread alive, just for both performance and VM low
> memory issues; but that's not what your patch is doing.
Ok.
Perhaps the testcase would be suitable under embedeed systems.
I will perhaps test one day :-)
Thanks!
> --
> Arjan van de Ven Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-02-09 13:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-09 3:48 [PATCH] fastboot: keep at least one thread per cpu during boot Frederic Weisbecker
2009-02-09 5:27 ` Arjan van de Ven
2009-02-09 10:17 ` Cornelia Huck
2009-02-09 13:34 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox