Linux Container Development
 help / color / mirror / Atom feed
* [PATCH] pid: improved namespaced iteration over processes list
@ 2008-12-15 16:49 Gowrishankar M
       [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Gowrishankar M @ 2008-12-15 16:49 UTC (permalink / raw)
  To: containers; +Cc: Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir

Below patch addresses a common solution for any place where a process
should be checked if it is associated to caller namespace. At present,
we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
namespace.

To avoid applying this check in every code related to PID namespace,
this patch reworks on iterative macros;for_each_process and do_each_thread.

This patch can also reduce latency time on process list lookup inside the
container, as we walk along pidmap, instead of every process in system.

Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
---
 include/linux/sched.h |    8 +++++---
 kernel/pid.c          |   17 +++++++++++++++++
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2e46189..8d3b520 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
 }
 #endif
 
-#define next_task(p)	list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
+#include <linux/nsproxy.h>
+#define next_task(p)	pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID)
+#define ns_init_task	(current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1))
 
 #define for_each_process(p) \
-	for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+	for (p = ns_init_task ; p != NULL ; p = next_task(p))
 
 /*
  * Careful: do_each_thread/while_each_thread is a double loop so
  *          'break' will not work as expected - use goto instead.
  */
 #define do_each_thread(g, t) \
-	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
+	for (g = t = ns_init_task ; g  != NULL ; (g = t = next_task(g))) do
 
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)
diff --git a/kernel/pid.c b/kernel/pid.c
index 064e76a..3273a96 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -493,6 +493,23 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
 	return pid;
 }
 
+struct pid *find_ge_tgid(int nr,  struct pid_namespace *ns)
+{
+	struct pid* pid;
+	struct task_struct* task;
+
+retry:
+	pid = find_ge_pid(nr, ns);
+	if (pid) {
+		task = pid_task(pid, PIDTYPE_PID);
+		if (!task || !has_group_leader_pid(task)) {
+			nr += 1;
+			goto retry;
+		}
+	}
+	return pid;
+}
+
 /*
  * The pid hash table is scaled according to the amount of memory in the
  * machine.  From a minimum of 16 slots up to 4096 slots at one gigabyte or
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] pid: improved namespaced iteration over processes list
       [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2008-12-15 18:32   ` Dave Hansen
  2008-12-15 19:46     ` Sukadev Bhattiprolu
  2008-12-15 21:47   ` Eric W. Biederman
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2008-12-15 18:32 UTC (permalink / raw)
  To: Gowrishankar M
  Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir

On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote:
> Below patch addresses a common solution for any place where a process
> should be checked if it is associated to caller namespace. At present,
> we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
> namespace.
> 
> To avoid applying this check in every code related to PID namespace,
> this patch reworks on iterative macros;for_each_process and do_each_thread.
> 
> This patch can also reduce latency time on process list lookup inside the
> container, as we walk along pidmap, instead of every process in system.
> 
> Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
> ---
>  include/linux/sched.h |    8 +++++---
>  kernel/pid.c          |   17 +++++++++++++++++
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2e46189..8d3b520 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
>  }
>  #endif
> 
> -#define next_task(p)	list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
> +#include <linux/nsproxy.h>
> +#define next_task(p)	pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID)
> +#define ns_init_task	(current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1))

Can you turn these into static inlines so that they're a bit more
readable?

>  #define for_each_process(p) \
> -	for (p = &init_task ; (p = next_task(p)) != &init_task ; )
> +	for (p = ns_init_task ; p != NULL ; p = next_task(p))
> 
>  /*
>   * Careful: do_each_thread/while_each_thread is a double loop so
>   *          'break' will not work as expected - use goto instead.
>   */
>  #define do_each_thread(g, t) \
> -	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> +	for (g = t = ns_init_task ; g  != NULL ; (g = t = next_task(g))) do

I have to wonder whether we should be changing this globally or adding a
new do_each_ns_thread() or something.  Are you worried this will cause
some collateral damage?

> +struct pid *find_ge_tgid(int nr,  struct pid_namespace *ns)
> +{
> +	struct pid* pid;
> +	struct task_struct* task;
> +
> +retry:
> +	pid = find_ge_pid(nr, ns);
> +	if (pid) {
> +		task = pid_task(pid, PIDTYPE_PID);
> +		if (!task || !has_group_leader_pid(task)) {
> +			nr += 1;
> +			goto retry;
> +		}
> +	}
> +	return pid;
> +}

I might have written that loop a bit differently.  Does this work?  Is
it any more clear?

	while (pid = find_ge_pid(nr, ns) {
		task = pid_task(pid, PIDTYPE_PID);
		if (task && has_group_leader_pid(task))
			break;
		nr++;
	}


-- Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pid: improved namespaced iteration over processes list
  2008-12-15 18:32   ` Dave Hansen
@ 2008-12-15 19:46     ` Sukadev Bhattiprolu
       [not found]       ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-15 19:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: containers, Balbir, ebiederm-aS9lmoZGLiVWk0Htik3J/w

Dave Hansen [dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] wrote:
| On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote:
| > Below patch addresses a common solution for any place where a process
| > should be checked if it is associated to caller namespace. At present,
| > we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
| > namespace.
| > 
| > To avoid applying this check in every code related to PID namespace,
| > this patch reworks on iterative macros;for_each_process and do_each_thread.
| > 
| > This patch can also reduce latency time on process list lookup inside the
| > container, as we walk along pidmap, instead of every process in system.

The obvious trade-off is with systems that don't use containers which
are porbably the majority at present. For them next_task() now becomes
more expensive (instead of a simply going to next item on list, they have
lookup in the pidmap, a lookup in  pid hash table followed by mapping the
pid back to task). I think there was a discussion once on this and the
conclusion was things like "kill sig -1" are inherently expensive.

Do you need these to be optimized for containers for some other reason ?

Sukadev

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pid: improved namespaced iteration over processes list
       [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2008-12-15 18:32   ` Dave Hansen
@ 2008-12-15 21:47   ` Eric W. Biederman
  1 sibling, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2008-12-15 21:47 UTC (permalink / raw)
  To: Gowrishankar M
  Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir

Gowrishankar M <gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Below patch addresses a common solution for any place where a process
> should be checked if it is associated to caller namespace. At present,
> we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
> namespace.
>
> To avoid applying this check in every code related to PID namespace,
> this patch reworks on iterative macros;for_each_process and do_each_thread.

Which is just wrong.  Most of the time when we call for_each_process
and do_each_thread we are iterating through them for kernel internal purposes
not because of a user space request.

> This patch can also reduce latency time on process list lookup inside the
> container, as we walk along pidmap, instead of every process in system.

I support walking pidmap, in those cases where it makes sense.  kill -1
in particular.

But I don't think there are any significant unconverted instances of
that problem.

So specific helpers to do the job is fine (if the problem is more general
than kill -1) but changing the generic helpers looks like a good way
to introduce lots of subtle bugs into the kernel.  So different names
please.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] pid: improved namespaced iteration over processes list
       [not found]       ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-12-15 21:50         ` Eric W. Biederman
  0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2008-12-15 21:50 UTC (permalink / raw)
  To: Sukadev Bhattiprolu; +Cc: containers, Balbir, Dave Hansen

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> The obvious trade-off is with systems that don't use containers which
> are porbably the majority at present. For them next_task() now becomes
> more expensive (instead of a simply going to next item on list, they have
> lookup in the pidmap, a lookup in  pid hash table followed by mapping the
> pid back to task). I think there was a discussion once on this and the
> conclusion was things like "kill sig -1" are inherently expensive.

Cost wise it would be worth measuring.  I have a report that when that
change was made to /proc readdir in /proc sped up.

The problem that I see is that changing generic methods is not generally
applicable.

> Do you need these to be optimized for containers for some other reason ?

A good question.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-15 21:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 16:49 [PATCH] pid: improved namespaced iteration over processes list Gowrishankar M
     [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-12-15 18:32   ` Dave Hansen
2008-12-15 19:46     ` Sukadev Bhattiprolu
     [not found]       ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-12-15 21:50         ` Eric W. Biederman
2008-12-15 21:47   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox