* [PATCH] pid: improved namespaced iteration over processes list
@ 2008-12-15 16:49 Gowrishankar M
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Gowrishankar M @ 2008-12-15 16:49 UTC (permalink / raw)
To: containers; +Cc: Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir
Below patch addresses a common solution for any place where a process
should be checked if it is associated to caller namespace. At present,
we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
namespace.
To avoid applying this check in every code related to PID namespace,
this patch reworks on iterative macros;for_each_process and do_each_thread.
This patch can also reduce latency time on process list lookup inside the
container, as we walk along pidmap, instead of every process in system.
Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
---
include/linux/sched.h | 8 +++++---
kernel/pid.c | 17 +++++++++++++++++
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2e46189..8d3b520 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
}
#endif
-#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
+#include <linux/nsproxy.h>
+#define next_task(p) pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID)
+#define ns_init_task (current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1))
#define for_each_process(p) \
- for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+ for (p = ns_init_task ; p != NULL ; p = next_task(p))
/*
* Careful: do_each_thread/while_each_thread is a double loop so
* 'break' will not work as expected - use goto instead.
*/
#define do_each_thread(g, t) \
- for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
+ for (g = t = ns_init_task ; g != NULL ; (g = t = next_task(g))) do
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
diff --git a/kernel/pid.c b/kernel/pid.c
index 064e76a..3273a96 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -493,6 +493,23 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return pid;
}
+struct pid *find_ge_tgid(int nr, struct pid_namespace *ns)
+{
+ struct pid* pid;
+ struct task_struct* task;
+
+retry:
+ pid = find_ge_pid(nr, ns);
+ if (pid) {
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task || !has_group_leader_pid(task)) {
+ nr += 1;
+ goto retry;
+ }
+ }
+ return pid;
+}
+
/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2008-12-15 18:32 ` Dave Hansen
2008-12-15 19:46 ` Sukadev Bhattiprolu
2008-12-15 21:47 ` Eric W. Biederman
1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2008-12-15 18:32 UTC (permalink / raw)
To: Gowrishankar M
Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir
On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote:
> Below patch addresses a common solution for any place where a process
> should be checked if it is associated to caller namespace. At present,
> we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
> namespace.
>
> To avoid applying this check in every code related to PID namespace,
> this patch reworks on iterative macros;for_each_process and do_each_thread.
>
> This patch can also reduce latency time on process list lookup inside the
> container, as we walk along pidmap, instead of every process in system.
>
> Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
> ---
> include/linux/sched.h | 8 +++++---
> kernel/pid.c | 17 +++++++++++++++++
> 2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2e46189..8d3b520 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
> }
> #endif
>
> -#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
> +#include <linux/nsproxy.h>
> +#define next_task(p) pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID)
> +#define ns_init_task (current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1))
Can you turn these into static inlines so that they're a bit more
readable?
> #define for_each_process(p) \
> - for (p = &init_task ; (p = next_task(p)) != &init_task ; )
> + for (p = ns_init_task ; p != NULL ; p = next_task(p))
>
> /*
> * Careful: do_each_thread/while_each_thread is a double loop so
> * 'break' will not work as expected - use goto instead.
> */
> #define do_each_thread(g, t) \
> - for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> + for (g = t = ns_init_task ; g != NULL ; (g = t = next_task(g))) do
I have to wonder whether we should be changing this globally or adding a
new do_each_ns_thread() or something. Are you worried this will cause
some collateral damage?
> +struct pid *find_ge_tgid(int nr, struct pid_namespace *ns)
> +{
> + struct pid* pid;
> + struct task_struct* task;
> +
> +retry:
> + pid = find_ge_pid(nr, ns);
> + if (pid) {
> + task = pid_task(pid, PIDTYPE_PID);
> + if (!task || !has_group_leader_pid(task)) {
> + nr += 1;
> + goto retry;
> + }
> + }
> + return pid;
> +}
I might have written that loop a bit differently. Does this work? Is
it any more clear?
while (pid = find_ge_pid(nr, ns) {
task = pid_task(pid, PIDTYPE_PID);
if (task && has_group_leader_pid(task))
break;
nr++;
}
-- Dave
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list
2008-12-15 18:32 ` Dave Hansen
@ 2008-12-15 19:46 ` Sukadev Bhattiprolu
[not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Sukadev Bhattiprolu @ 2008-12-15 19:46 UTC (permalink / raw)
To: Dave Hansen; +Cc: containers, Balbir, ebiederm-aS9lmoZGLiVWk0Htik3J/w
Dave Hansen [dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] wrote:
| On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote:
| > Below patch addresses a common solution for any place where a process
| > should be checked if it is associated to caller namespace. At present,
| > we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
| > namespace.
| >
| > To avoid applying this check in every code related to PID namespace,
| > this patch reworks on iterative macros;for_each_process and do_each_thread.
| >
| > This patch can also reduce latency time on process list lookup inside the
| > container, as we walk along pidmap, instead of every process in system.
The obvious trade-off is with systems that don't use containers which
are porbably the majority at present. For them next_task() now becomes
more expensive (instead of a simply going to next item on list, they have
lookup in the pidmap, a lookup in pid hash table followed by mapping the
pid back to task). I think there was a discussion once on this and the
conclusion was things like "kill sig -1" are inherently expensive.
Do you need these to be optimized for containers for some other reason ?
Sukadev
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-12-15 18:32 ` Dave Hansen
@ 2008-12-15 21:47 ` Eric W. Biederman
1 sibling, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2008-12-15 21:47 UTC (permalink / raw)
To: Gowrishankar M
Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir
Gowrishankar M <gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> Below patch addresses a common solution for any place where a process
> should be checked if it is associated to caller namespace. At present,
> we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
> namespace.
>
> To avoid applying this check in every code related to PID namespace,
> this patch reworks on iterative macros;for_each_process and do_each_thread.
Which is just wrong. Most of the time when we call for_each_process
and do_each_thread we are iterating through them for kernel internal purposes
not because of a user space request.
> This patch can also reduce latency time on process list lookup inside the
> container, as we walk along pidmap, instead of every process in system.
I support walking pidmap, in those cases where it makes sense. kill -1
in particular.
But I don't think there are any significant unconverted instances of
that problem.
So specific helpers to do the job is fine (if the problem is more general
than kill -1) but changing the generic helpers looks like a good way
to introduce lots of subtle bugs into the kernel. So different names
please.
Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list
[not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-12-15 21:50 ` Eric W. Biederman
0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2008-12-15 21:50 UTC (permalink / raw)
To: Sukadev Bhattiprolu; +Cc: containers, Balbir, Dave Hansen
Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
> The obvious trade-off is with systems that don't use containers which
> are porbably the majority at present. For them next_task() now becomes
> more expensive (instead of a simply going to next item on list, they have
> lookup in the pidmap, a lookup in pid hash table followed by mapping the
> pid back to task). I think there was a discussion once on this and the
> conclusion was things like "kill sig -1" are inherently expensive.
Cost wise it would be worth measuring. I have a report that when that
change was made to /proc readdir in /proc sped up.
The problem that I see is that changing generic methods is not generally
applicable.
> Do you need these to be optimized for containers for some other reason ?
A good question.
Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-15 21:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 16:49 [PATCH] pid: improved namespaced iteration over processes list Gowrishankar M
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-12-15 18:32 ` Dave Hansen
2008-12-15 19:46 ` Sukadev Bhattiprolu
[not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-12-15 21:50 ` Eric W. Biederman
2008-12-15 21:47 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox