* [PATCH 2/3] make kthread_create() more scalable
@ 2007-04-13 13:02 Oleg Nesterov
2007-04-13 21:31 ` Andrew Morton
2007-04-13 23:34 ` Eric W. Biederman
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2007-04-13 13:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Davide Libenzi, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
complete(&create->started) + schedule(). After that it can't be woken because
nobody can see the new task yet. This means:
- we don't need tasklist_lock for find_task_by_pid().
- create_kthread() doesn't need to wait for create->started. Instead,
kthread_create() first waits for create->created to get the result of
kernel_thread(), then waits for create->started to synchronize with
kthread().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
--- 2.6.21-rc5/kernel/kthread.c~1_CREATE 2007-04-13 14:39:21.000000000 +0400
+++ 2.6.21-rc5/kernel/kthread.c 2007-04-13 14:52:44.000000000 +0400
@@ -24,11 +24,11 @@ struct kthread_create_info
/* Information passed to kthread() from kthreadd. */
int (*threadfn)(void *data);
void *data;
+ struct completion created;
struct completion started;
/* Result passed back to kthread_create() from kthreadd. */
- struct task_struct *result;
- struct completion done;
+ pid_t result;
struct list_head list;
};
@@ -91,15 +91,9 @@ static void create_kthread(struct kthrea
/* We want our own signal handler (we take no signals by default). */
pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
- if (pid < 0) {
- create->result = ERR_PTR(pid);
- } else {
- wait_for_completion(&create->started);
- read_lock(&tasklist_lock);
- create->result = find_task_by_pid(pid);
- read_unlock(&tasklist_lock);
- }
- complete(&create->done);
+ create->result = pid;
+
+ complete(&create->created);
}
/**
@@ -127,27 +121,31 @@ struct task_struct *kthread_create(int (
...)
{
struct kthread_create_info create;
+ struct task_struct *ret;
+ va_list args;
create.threadfn = threadfn;
create.data = data;
+ init_completion(&create.created);
init_completion(&create.started);
- init_completion(&create.done);
spin_lock(&kthread_create_lock);
list_add_tail(&create.list, &kthread_create_list);
- wake_up_process(kthreadd_task);
spin_unlock(&kthread_create_lock);
+ wake_up_process(kthreadd_task);
- wait_for_completion(&create.done);
+ wait_for_completion(&create.created);
+ if (create.result < 0)
+ return ERR_PTR(create.result);
- if (!IS_ERR(create.result)) {
- va_list args;
- va_start(args, namefmt);
- vsnprintf(create.result->comm, sizeof(create.result->comm),
- namefmt, args);
- va_end(args);
- }
- return create.result;
+ wait_for_completion(&create.started);
+ ret = find_task_by_pid(create.result);
+
+ va_start(args, namefmt);
+ vsnprintf(ret->comm, sizeof(ret->comm), namefmt, args);
+ va_end(args);
+
+ return ret;
}
EXPORT_SYMBOL(kthread_create);
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 2/3] make kthread_create() more scalable
2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
@ 2007-04-13 21:31 ` Andrew Morton
2007-04-13 21:51 ` Eric W. Biederman
2007-04-13 23:34 ` Eric W. Biederman
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-04-13 21:31 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Davide Libenzi, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
On Fri, 13 Apr 2007 17:02:01 +0400
Oleg Nesterov <oleg@tv-sign.ru> wrote:
> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> complete(&create->started) + schedule(). After that it can't be woken because
> nobody can see the new task yet. This means:
>
> - we don't need tasklist_lock for find_task_by_pid().
>
> - create_kthread() doesn't need to wait for create->started. Instead,
> kthread_create() first waits for create->created to get the result of
> kernel_thread(), then waits for create->started to synchronize with
> kthread().
Why don't we need tasklist_lock for find_task_by_pid()? I'd have though that
we'd at least need rcu_read_lock(), and I'm not sure that the implicit
understanding of pid-management internals here is a great idea.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] make kthread_create() more scalable
2007-04-13 21:31 ` Andrew Morton
@ 2007-04-13 21:51 ` Eric W. Biederman
2007-04-13 22:08 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 21:51 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
Andrew Morton <akpm@linux-foundation.org> writes:
> On Fri, 13 Apr 2007 17:02:01 +0400
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
>> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
>> complete(&create->started) + schedule(). After that it can't be woken because
>> nobody can see the new task yet. This means:
>>
>> - we don't need tasklist_lock for find_task_by_pid().
>>
>> - create_kthread() doesn't need to wait for create->started. Instead,
>> kthread_create() first waits for create->created to get the result of
>> kernel_thread(), then waits for create->started to synchronize with
>> kthread().
>
> Why don't we need tasklist_lock for find_task_by_pid()? I'd have though that
> we'd at least need rcu_read_lock(), and I'm not sure that the implicit
> understanding of pid-management internals here is a great idea.
We need rcu_read_lock(). Or else something could permute the pid hash table
and get us into trouble.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] make kthread_create() more scalable
2007-04-13 21:51 ` Eric W. Biederman
@ 2007-04-13 22:08 ` Andrew Morton
2007-04-13 23:06 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-04-13 22:08 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
On Fri, 13 Apr 2007 15:51:29 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > On Fri, 13 Apr 2007 17:02:01 +0400
> > Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> >> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> >> complete(&create->started) + schedule(). After that it can't be woken because
> >> nobody can see the new task yet. This means:
> >>
> >> - we don't need tasklist_lock for find_task_by_pid().
> >>
> >> - create_kthread() doesn't need to wait for create->started. Instead,
> >> kthread_create() first waits for create->created to get the result of
> >> kernel_thread(), then waits for create->started to synchronize with
> >> kthread().
> >
> > Why don't we need tasklist_lock for find_task_by_pid()? I'd have though that
> > we'd at least need rcu_read_lock(), and I'm not sure that the implicit
> > understanding of pid-management internals here is a great idea.
>
> We need rcu_read_lock(). Or else something could permute the pid hash table
> and get us into trouble.
>
OK, I fixed that up.
The next patch (make-kthread_stop-scalable) removes the find_task_by_pid()
anyway.
Our kthread creation performance will be pretty poor anyway, due to the
need to do two (or more?) context switches. If we ever need
super-low-latency kernel thread creation (eg, on-demand threads for AIO)
then that code would need to go direct to kernel_thread(), I guess.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] make kthread_create() more scalable
2007-04-13 22:08 ` Andrew Morton
@ 2007-04-13 23:06 ` Eric W. Biederman
0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 23:06 UTC (permalink / raw)
To: Andrew Morton
Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
Andrew Morton <akpm@linux-foundation.org> writes:
>
> OK, I fixed that up.
>
> The next patch (make-kthread_stop-scalable) removes the find_task_by_pid()
> anyway.
Ok. Neat. I still need to review these a little more I have a different
set of criteria, but it is interesting work..
> Our kthread creation performance will be pretty poor anyway, due to the
> need to do two (or more?) context switches. If we ever need
> super-low-latency kernel thread creation (eg, on-demand threads for AIO)
> then that code would need to go direct to kernel_thread(), I guess.
Sure. AIO is a little bit of a different beast as it is IO for user space.
If low latency is important for starting kernel threads the right
answer would be to dig into the code and have a version rewrite
kernel_thread so that we copied a reference process instead of the
current.
Right now my practical target is killing all of the kernel threads
started with kernel_thread that then call daemonize. So we can remove
daemonize, as it is a serious maintenance hazard. kthread needs just
a little bit more work to support that.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] make kthread_create() more scalable
2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
2007-04-13 21:31 ` Andrew Morton
@ 2007-04-13 23:34 ` Eric W. Biederman
1 sibling, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 23:34 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds,
Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel
Oleg Nesterov <oleg@tv-sign.ru> writes:
> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> complete(&create->started) + schedule(). After that it can't be woken because
> nobody can see the new task yet. This means:
>
> - we don't need tasklist_lock for find_task_by_pid().
Well we do need rcu_read_lock();
But if we are going to wait until the thread has run we can just pass
back current.
> - create_kthread() doesn't need to wait for create->started. Instead,
> kthread_create() first waits for create->created to get the result of
> kernel_thread(), then waits for create->started to synchronize with
> kthread().
We can do even better. We only need a single completion.
If it was an error create_create can complete it.
Otherwise kthread() can complete it.
Patch in a bit as soon as I finish testing...
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-04-13 23:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
2007-04-13 21:31 ` Andrew Morton
2007-04-13 21:51 ` Eric W. Biederman
2007-04-13 22:08 ` Andrew Morton
2007-04-13 23:06 ` Eric W. Biederman
2007-04-13 23:34 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox