* [PATCH] pid: improved namespaced iteration over processes list
@ 2008-12-15 16:49 Gowrishankar M
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Gowrishankar M @ 2008-12-15 16:49 UTC (permalink / raw)
To: containers; +Cc: Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir
Below patch addresses a common solution for any place where a process
should be checked if it is associated to caller namespace. At present,
we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current
namespace.
To avoid applying this check in every code related to PID namespace,
this patch reworks on iterative macros;for_each_process and do_each_thread.
This patch can also reduce latency time on process list lookup inside the
container, as we walk along pidmap, instead of every process in system.
Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>
---
include/linux/sched.h | 8 +++++---
kernel/pid.c | 17 +++++++++++++++++
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2e46189..8d3b520 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
}
#endif
-#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)
+#include <linux/nsproxy.h>
+#define next_task(p) pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID)
+#define ns_init_task (current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1))
#define for_each_process(p) \
- for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+ for (p = ns_init_task ; p != NULL ; p = next_task(p))
/*
* Careful: do_each_thread/while_each_thread is a double loop so
* 'break' will not work as expected - use goto instead.
*/
#define do_each_thread(g, t) \
- for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
+ for (g = t = ns_init_task ; g != NULL ; (g = t = next_task(g))) do
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)
diff --git a/kernel/pid.c b/kernel/pid.c
index 064e76a..3273a96 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -493,6 +493,23 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return pid;
}
+struct pid *find_ge_tgid(int nr, struct pid_namespace *ns)
+{
+ struct pid* pid;
+ struct task_struct* task;
+
+retry:
+ pid = find_ge_pid(nr, ns);
+ if (pid) {
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task || !has_group_leader_pid(task)) {
+ nr += 1;
+ goto retry;
+ }
+ }
+ return pid;
+}
+
/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread[parent not found: <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] pid: improved namespaced iteration over processes list [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2008-12-15 18:32 ` Dave Hansen 2008-12-15 19:46 ` Sukadev Bhattiprolu 2008-12-15 21:47 ` Eric W. Biederman 1 sibling, 1 reply; 5+ messages in thread From: Dave Hansen @ 2008-12-15 18:32 UTC (permalink / raw) To: Gowrishankar M Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote: > Below patch addresses a common solution for any place where a process > should be checked if it is associated to caller namespace. At present, > we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current > namespace. > > To avoid applying this check in every code related to PID namespace, > this patch reworks on iterative macros;for_each_process and do_each_thread. > > This patch can also reduce latency time on process list lookup inside the > container, as we walk along pidmap, instead of every process in system. > > Signed-off-by: Gowrishankar M <gowrishankar.m-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org> > --- > include/linux/sched.h | 8 +++++--- > kernel/pid.c | 17 +++++++++++++++++ > 2 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 2e46189..8d3b520 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1917,17 +1917,19 @@ static inline unsigned long wait_task_inactive(struct task_struct *p, > } > #endif > > -#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks) > +#include <linux/nsproxy.h> > +#define next_task(p) pid_task(find_ge_tgid(task_pid_vnr(p) + 1, p->nsproxy->pid_ns), PIDTYPE_PID) > +#define ns_init_task (current->nsproxy->pid_ns == &init_pid_ns ? next_task((&init_task)) : find_task_by_vpid(1)) Can you turn these into static inlines so that they're a bit more readable? > #define for_each_process(p) \ > - for (p = &init_task ; (p = next_task(p)) != &init_task ; ) > + for (p = ns_init_task ; p != NULL ; p = next_task(p)) > > /* > * Careful: do_each_thread/while_each_thread is a double loop so > * 'break' will not work as expected - use goto instead. > */ > #define do_each_thread(g, t) \ > - for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do > + for (g = t = ns_init_task ; g != NULL ; (g = t = next_task(g))) do I have to wonder whether we should be changing this globally or adding a new do_each_ns_thread() or something. Are you worried this will cause some collateral damage? > +struct pid *find_ge_tgid(int nr, struct pid_namespace *ns) > +{ > + struct pid* pid; > + struct task_struct* task; > + > +retry: > + pid = find_ge_pid(nr, ns); > + if (pid) { > + task = pid_task(pid, PIDTYPE_PID); > + if (!task || !has_group_leader_pid(task)) { > + nr += 1; > + goto retry; > + } > + } > + return pid; > +} I might have written that loop a bit differently. Does this work? Is it any more clear? while (pid = find_ge_pid(nr, ns) { task = pid_task(pid, PIDTYPE_PID); if (task && has_group_leader_pid(task)) break; nr++; } -- Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list 2008-12-15 18:32 ` Dave Hansen @ 2008-12-15 19:46 ` Sukadev Bhattiprolu [not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Sukadev Bhattiprolu @ 2008-12-15 19:46 UTC (permalink / raw) To: Dave Hansen; +Cc: containers, Balbir, ebiederm-aS9lmoZGLiVWk0Htik3J/w Dave Hansen [dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] wrote: | On Mon, 2008-12-15 at 22:19 +0530, Gowrishankar M wrote: | > Below patch addresses a common solution for any place where a process | > should be checked if it is associated to caller namespace. At present, | > we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current | > namespace. | > | > To avoid applying this check in every code related to PID namespace, | > this patch reworks on iterative macros;for_each_process and do_each_thread. | > | > This patch can also reduce latency time on process list lookup inside the | > container, as we walk along pidmap, instead of every process in system. The obvious trade-off is with systems that don't use containers which are porbably the majority at present. For them next_task() now becomes more expensive (instead of a simply going to next item on list, they have lookup in the pidmap, a lookup in pid hash table followed by mapping the pid back to task). I think there was a discussion once on this and the conclusion was things like "kill sig -1" are inherently expensive. Do you need these to be optimized for containers for some other reason ? Sukadev ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] pid: improved namespaced iteration over processes list [not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-12-15 21:50 ` Eric W. Biederman 0 siblings, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2008-12-15 21:50 UTC (permalink / raw) To: Sukadev Bhattiprolu; +Cc: containers, Balbir, Dave Hansen Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes: > The obvious trade-off is with systems that don't use containers which > are porbably the majority at present. For them next_task() now becomes > more expensive (instead of a simply going to next item on list, they have > lookup in the pidmap, a lookup in pid hash table followed by mapping the > pid back to task). I think there was a discussion once on this and the > conclusion was things like "kill sig -1" are inherently expensive. Cost wise it would be worth measuring. I have a report that when that change was made to /proc readdir in /proc sped up. The problem that I see is that changing generic methods is not generally applicable. > Do you need these to be optimized for containers for some other reason ? A good question. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pid: improved namespaced iteration over processes list [not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2008-12-15 18:32 ` Dave Hansen @ 2008-12-15 21:47 ` Eric W. Biederman 1 sibling, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2008-12-15 21:47 UTC (permalink / raw) To: Gowrishankar M Cc: containers, Sukadev, ebiederm-aS9lmoZGLiVWk0Htik3J/w, Balbir Gowrishankar M <gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes: > Below patch addresses a common solution for any place where a process > should be checked if it is associated to caller namespace. At present, > we use 'task_pid_vnr(t) > 0' to further proceed with task 't' in current > namespace. > > To avoid applying this check in every code related to PID namespace, > this patch reworks on iterative macros;for_each_process and do_each_thread. Which is just wrong. Most of the time when we call for_each_process and do_each_thread we are iterating through them for kernel internal purposes not because of a user space request. > This patch can also reduce latency time on process list lookup inside the > container, as we walk along pidmap, instead of every process in system. I support walking pidmap, in those cases where it makes sense. kill -1 in particular. But I don't think there are any significant unconverted instances of that problem. So specific helpers to do the job is fine (if the problem is more general than kill -1) but changing the generic helpers looks like a good way to introduce lots of subtle bugs into the kernel. So different names please. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-15 21:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 16:49 [PATCH] pid: improved namespaced iteration over processes list Gowrishankar M
[not found] ` <1229359793-4029-1-git-send-email-gomuthuk-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-12-15 18:32 ` Dave Hansen
2008-12-15 19:46 ` Sukadev Bhattiprolu
[not found] ` <20081215194603.GA11958-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-12-15 21:50 ` Eric W. Biederman
2008-12-15 21:47 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox