All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/3] make kthread_create() more scalable
@ 2007-04-13 13:02 Oleg Nesterov
  2007-04-13 21:31 ` Andrew Morton
  2007-04-13 23:34 ` Eric W. Biederman
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2007-04-13 13:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Davide Libenzi, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
complete(&create->started) + schedule(). After that it can't be woken because
nobody can see the new task yet. This means:

	- we don't need tasklist_lock for find_task_by_pid().

	- create_kthread() doesn't need to wait for create->started. Instead,
	  kthread_create() first waits for create->created to get the result of
	  kernel_thread(), then waits for create->started to synchronize with
	  kthread().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.21-rc5/kernel/kthread.c~1_CREATE	2007-04-13 14:39:21.000000000 +0400
+++ 2.6.21-rc5/kernel/kthread.c	2007-04-13 14:52:44.000000000 +0400
@@ -24,11 +24,11 @@ struct kthread_create_info
 	/* Information passed to kthread() from kthreadd. */
 	int (*threadfn)(void *data);
 	void *data;
+	struct completion created;
 	struct completion started;
 
 	/* Result passed back to kthread_create() from kthreadd. */
-	struct task_struct *result;
-	struct completion done;
+	pid_t result;
 
 	struct list_head list;
 };
@@ -91,15 +91,9 @@ static void create_kthread(struct kthrea
 
 	/* We want our own signal handler (we take no signals by default). */
 	pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
-	if (pid < 0) {
-		create->result = ERR_PTR(pid);
-	} else {
-		wait_for_completion(&create->started);
-		read_lock(&tasklist_lock);
-		create->result = find_task_by_pid(pid);
-		read_unlock(&tasklist_lock);
-	}
-	complete(&create->done);
+	create->result = pid;
+
+	complete(&create->created);
 }
 
 /**
@@ -127,27 +121,31 @@ struct task_struct *kthread_create(int (
 				   ...)
 {
 	struct kthread_create_info create;
+	struct task_struct *ret;
+	va_list args;
 
 	create.threadfn = threadfn;
 	create.data = data;
+	init_completion(&create.created);
 	init_completion(&create.started);
-	init_completion(&create.done);
 
 	spin_lock(&kthread_create_lock);
 	list_add_tail(&create.list, &kthread_create_list);
-	wake_up_process(kthreadd_task);
 	spin_unlock(&kthread_create_lock);
+	wake_up_process(kthreadd_task);
 
-	wait_for_completion(&create.done);
+	wait_for_completion(&create.created);
+	if (create.result < 0)
+		return ERR_PTR(create.result);
 
-	if (!IS_ERR(create.result)) {
-		va_list args;
-		va_start(args, namefmt);
-		vsnprintf(create.result->comm, sizeof(create.result->comm),
-			  namefmt, args);
-		va_end(args);
-	}
-	return create.result;
+	wait_for_completion(&create.started);
+	ret = find_task_by_pid(create.result);
+
+	va_start(args, namefmt);
+	vsnprintf(ret->comm, sizeof(ret->comm), namefmt, args);
+	va_end(args);
+
+	return ret;
 }
 EXPORT_SYMBOL(kthread_create);
 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] make kthread_create() more scalable
  2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
@ 2007-04-13 21:31 ` Andrew Morton
  2007-04-13 21:51   ` Eric W. Biederman
  2007-04-13 23:34 ` Eric W. Biederman
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-04-13 21:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Davide Libenzi, Eric W. Biederman, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

On Fri, 13 Apr 2007 17:02:01 +0400
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> complete(&create->started) + schedule(). After that it can't be woken because
> nobody can see the new task yet. This means:
> 
> 	- we don't need tasklist_lock for find_task_by_pid().
> 
> 	- create_kthread() doesn't need to wait for create->started. Instead,
> 	  kthread_create() first waits for create->created to get the result of
> 	  kernel_thread(), then waits for create->started to synchronize with
> 	  kthread().

Why don't we need tasklist_lock for find_task_by_pid()?  I'd have though that
we'd at least need rcu_read_lock(), and I'm not sure that the implicit
understanding of pid-management internals here is a great idea.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] make kthread_create() more scalable
  2007-04-13 21:31 ` Andrew Morton
@ 2007-04-13 21:51   ` Eric W. Biederman
  2007-04-13 22:08     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

Andrew Morton <akpm@linux-foundation.org> writes:

> On Fri, 13 Apr 2007 17:02:01 +0400
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
>> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
>> complete(&create->started) + schedule(). After that it can't be woken because
>> nobody can see the new task yet. This means:
>> 
>> 	- we don't need tasklist_lock for find_task_by_pid().
>> 
>> 	- create_kthread() doesn't need to wait for create->started. Instead,
>> 	  kthread_create() first waits for create->created to get the result of
>> 	  kernel_thread(), then waits for create->started to synchronize with
>> 	  kthread().
>
> Why don't we need tasklist_lock for find_task_by_pid()?  I'd have though that
> we'd at least need rcu_read_lock(), and I'm not sure that the implicit
> understanding of pid-management internals here is a great idea.

We need rcu_read_lock().  Or else something could permute the pid hash table
and get us into trouble.

Eric






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] make kthread_create() more scalable
  2007-04-13 21:51   ` Eric W. Biederman
@ 2007-04-13 22:08     ` Andrew Morton
  2007-04-13 23:06       ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-04-13 22:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

On Fri, 13 Apr 2007 15:51:29 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > On Fri, 13 Apr 2007 17:02:01 +0400
> > Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> >> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> >> complete(&create->started) + schedule(). After that it can't be woken because
> >> nobody can see the new task yet. This means:
> >> 
> >> 	- we don't need tasklist_lock for find_task_by_pid().
> >> 
> >> 	- create_kthread() doesn't need to wait for create->started. Instead,
> >> 	  kthread_create() first waits for create->created to get the result of
> >> 	  kernel_thread(), then waits for create->started to synchronize with
> >> 	  kthread().
> >
> > Why don't we need tasklist_lock for find_task_by_pid()?  I'd have though that
> > we'd at least need rcu_read_lock(), and I'm not sure that the implicit
> > understanding of pid-management internals here is a great idea.
> 
> We need rcu_read_lock().  Or else something could permute the pid hash table
> and get us into trouble.
> 

OK, I fixed that up.

The next patch (make-kthread_stop-scalable) removes the find_task_by_pid()
anyway.

Our kthread creation performance will be pretty poor anyway, due to the
need to do two (or more?) context switches.  If we ever need
super-low-latency kernel thread creation (eg, on-demand threads for AIO)
then that code would need to go direct to kernel_thread(), I guess.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] make kthread_create() more scalable
  2007-04-13 22:08     ` Andrew Morton
@ 2007-04-13 23:06       ` Eric W. Biederman
  0 siblings, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 23:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Davide Libenzi, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

Andrew Morton <akpm@linux-foundation.org> writes:
>
> OK, I fixed that up.
>
> The next patch (make-kthread_stop-scalable) removes the find_task_by_pid()
> anyway.

Ok. Neat.  I still need to review these a little more I have a different
set of criteria, but it is interesting work..

> Our kthread creation performance will be pretty poor anyway, due to the
> need to do two (or more?) context switches.  If we ever need
> super-low-latency kernel thread creation (eg, on-demand threads for AIO)
> then that code would need to go direct to kernel_thread(), I guess.

Sure.  AIO is a little bit of a different beast as it is IO for user space.

If low latency is important for starting kernel threads the right
answer would be to dig into the code and have a version rewrite
kernel_thread so that we copied a reference process instead of the
current.

Right now my practical target is killing all of the kernel threads
started with kernel_thread that then call daemonize.  So we can remove
daemonize, as it is a serious maintenance hazard.  kthread needs just
a little bit more work to support that.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] make kthread_create() more scalable
  2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
  2007-04-13 21:31 ` Andrew Morton
@ 2007-04-13 23:34 ` Eric W. Biederman
  1 sibling, 0 replies; 6+ messages in thread
From: Eric W. Biederman @ 2007-04-13 23:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Davide Libenzi, Ingo Molnar, Linus Torvalds,
	Rafael J. Wysocki, Roland McGrath, Rusty Russell, linux-kernel

Oleg Nesterov <oleg@tv-sign.ru> writes:

> If kernel_thread(kthread) succeeds, kthread() can not fail on its path to
> complete(&create->started) + schedule(). After that it can't be woken because
> nobody can see the new task yet. This means:
>
> 	- we don't need tasklist_lock for find_task_by_pid().

Well we do need rcu_read_lock();
But if we are going to wait until the thread has run we can just pass
back current.

> 	- create_kthread() doesn't need to wait for create->started. Instead,
> 	  kthread_create() first waits for create->created to get the result of
> 	  kernel_thread(), then waits for create->started to synchronize with
> 	  kthread().

We can do even better.  We only need a single completion. 
If it was an error create_create can complete it.
Otherwise kthread() can complete it.

Patch in a bit as soon as I finish testing...


Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-04-13 23:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-13 13:02 [PATCH 2/3] make kthread_create() more scalable Oleg Nesterov
2007-04-13 21:31 ` Andrew Morton
2007-04-13 21:51   ` Eric W. Biederman
2007-04-13 22:08     ` Andrew Morton
2007-04-13 23:06       ` Eric W. Biederman
2007-04-13 23:34 ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.