* [RFC][PATCH] Implement ns_of_pid()
@ 2008-11-12 6:41 Sukadev Bhattiprolu
[not found] ` <20081112064139.GA27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Sukadev Bhattiprolu @ 2008-11-12 6:41 UTC (permalink / raw)
To: Oleg Nesterov, Eric W. Biederman; +Cc: Containers, Pavel Emelyanov
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Mon, 10 Nov 2008 19:03:56 -0800
Subject: [PATCH 1/3] pid: Implement ns_of_pid
A current problem with the pid namespace is that it is
easy to do pid related work after exit_task_namespaces which
drops the nsproxy pointer.
However if we are doing pid namespace related work we are
always operating on some struct pid which retains the pid_namespace
pointer of the pid namespace it was allocated in.
So provide ns_of_pid which allows us to find the pid
namespace a pid was allocated in.
Using this we have the needed infrastructure to do pid
namespace related work at anytime we have a struct pid,
removing the chance of accidentally having a NULL
pointer dereference when accessing current->nsproxy.
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
include/linux/pid.h | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/include/linux/pid.h b/include/linux/pid.h
index d7e98ff..e9aec85 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -122,6 +122,17 @@ int next_pidmap(struct pid_namespace *pid_ns, int last);
extern struct pid *alloc_pid(struct pid_namespace *ns);
extern void free_pid(struct pid *pid);
+/* ns_of_pid returns the pid namespace in which the specified
+ * pid was allocated.
+ */
+static inline struct pid_namespace *ns_of_pid(struct pid *pid)
+{
+ struct pid_namespace *ns = NULL;
+ if (pid)
+ ns = pid->numbers[pid->level].ns;
+ return ns;
+}
+
/*
* the helpers to get the pid's id seen from different namespaces
*
--
1.5.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC][PATCH 2/3] Generalize task_active_pid_ns()
[not found] ` <20081112064139.GA27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-11-12 6:44 ` Sukadev Bhattiprolu
2008-11-12 6:48 ` [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace Sukadev Bhattiprolu
2008-11-12 6:53 ` [RFC][PATCH] Implement ns_of_pid() Sukadev Bhattiprolu
2 siblings, 0 replies; 8+ messages in thread
From: Sukadev Bhattiprolu @ 2008-11-12 6:44 UTC (permalink / raw)
To: Oleg Nesterov, Eric W. Biederman; +Cc: Containers, Pavel Emelyanov
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Mon, 10 Nov 2008 19:12:02 -0800
Subject: [PATCH 2/3] pid: Generalize task_active_pid_ns
Currently task_active_pid_ns is not safe to call after a
task becomes a zombie and exit_task_namespaces is called,
as nsproxy becomes NULL. By reading the pid namespace from
the pid of the task we can trivially solve this problem at
the cost of one extra memory read in what should be the
same cacheline as we read the namespace from.
When moving things around I have made task_active_pid_ns
out of line because keeping it in pid_namespace.h would
require adding includes of pid.h and sched.h that I
don't think we want.
This change does make task_active_pid_ns unsafe to call during
copy_process until we attach a pid on the task_struct which
seems to be a reasonable trade off.
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
include/linux/pid_namespace.h | 6 +-----
kernel/fork.c | 4 ++--
kernel/pid.c | 6 ++++++
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index d82fe82..38d1032 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -79,11 +79,7 @@ static inline void zap_pid_ns_processes(struct pid_namespace *ns)
}
#endif /* CONFIG_PID_NS */
-static inline struct pid_namespace *task_active_pid_ns(struct task_struct *tsk)
-{
- return tsk->nsproxy->pid_ns;
-}
-
+extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk);
void pidhash_init(void);
void pidmap_init(void);
diff --git a/kernel/fork.c b/kernel/fork.c
index f608356..28be39a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1111,12 +1111,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (pid != &init_struct_pid) {
retval = -ENOMEM;
- pid = alloc_pid(task_active_pid_ns(p));
+ pid = alloc_pid(p->nsproxy->pid_ns);
if (!pid)
goto bad_fork_cleanup_io;
if (clone_flags & CLONE_NEWPID) {
- retval = pid_ns_prepare_proc(task_active_pid_ns(p));
+ retval = pid_ns_prepare_proc(p->nsproxy->pid_ns);
if (retval < 0)
goto bad_fork_free_pid;
}
diff --git a/kernel/pid.c b/kernel/pid.c
index 064e76a..c5513fe 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -474,6 +474,12 @@ pid_t task_session_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
}
EXPORT_SYMBOL(task_session_nr_ns);
+struct pid_namespace *task_active_pid_ns(struct task_struct *tsk)
+{
+ return ns_of_pid(task_pid(tsk));
+}
+EXPORT_SYMBOL_GPL(task_active_pid_ns);
+
/*
* Used by proc to find the first pid that is greater then or equal to nr.
*
--
1.5.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace
[not found] ` <20081112064139.GA27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-12 6:44 ` [RFC][PATCH 2/3] Generalize task_active_pid_ns() Sukadev Bhattiprolu
@ 2008-11-12 6:48 ` Sukadev Bhattiprolu
[not found] ` <20081112064819.GC27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-12 6:53 ` [RFC][PATCH] Implement ns_of_pid() Sukadev Bhattiprolu
2 siblings, 1 reply; 8+ messages in thread
From: Sukadev Bhattiprolu @ 2008-11-12 6:48 UTC (permalink / raw)
To: Oleg Nesterov, Eric W. Biederman; +Cc: Containers, Pavel Emelyanov
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Tue, 11 Nov 2008 17:00:36 -0800
Subject: [PATCH 3/3] sig: Handle pid namespace crossing when sending signals.
Setting si_pid correctly in the context of pid namespaces is tricky.
Currently with the special cases in do_notify_parent and
do_notify_parent_cldstop we handle all of the day to day cases
properly except sending a signal to a task in a child pid namespace.
For that case we need to pretend to be the kernel and set si_pid to 0.
There are also a few theoretical cases where we can trigger sending a
signal from a task in one pid namespace to a task in another pid
namespace. With no necessary correlation between one or the other.
In those cases when the source pid namespace is a child of the
destination pid namespace we actually have a valid pid value
we can and should report to user space.
This patch modifies the code to handle the full general case for
setting si_pid. The code is a little longer but occurs only once and
making it some easier to understand and verify it is correct.
I add a struct pid sender parameter to __group_send_sig_info, as that is
the only function called with si_pid != task_tgid_vnr(current). So we can
correctly handle the sending of a signal to the parent of an arbitrary
task.
Changelog[v2]:
- Port to 2.6.28-rc3
- Add in_interrupt() check in set_sigqueue_pid() (for SIGIO)
Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
drivers/char/tty_io.c | 4 +-
include/linux/signal.h | 3 +-
ipc/mqueue.c | 2 +-
kernel/posix-cpu-timers.c | 12 +++---
kernel/signal.c | 88 +++++++++++++++++++++++++++-----------------
5 files changed, 65 insertions(+), 44 deletions(-)
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 59f4721..51ab382 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -615,8 +615,8 @@ static void do_tty_hangup(struct work_struct *work)
spin_unlock_irq(&p->sighand->siglock);
continue;
}
- __group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
- __group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+ __group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p, NULL);
+ __group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p, NULL);
put_pid(p->signal->tty_old_pgrp); /* A noop */
spin_lock_irqsave(&tty->ctrl_lock, flags);
if (tty->pgrp)
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 84f997f..a576de8 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -234,7 +234,8 @@ static inline int valid_signal(unsigned long sig)
extern int next_signal(struct sigpending *pending, sigset_t *mask);
extern int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p);
-extern int __group_send_sig_info(int, struct siginfo *, struct task_struct *);
+extern int __group_send_sig_info(int, struct siginfo *, struct task_struct *,
+ struct pid *sender);
extern long do_sigpending(void __user *, unsigned long);
extern int sigprocmask(int, sigset_t *, sigset_t *);
extern int show_unhandled_signals;
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 68eb857..8968a4c 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -506,7 +506,7 @@ static void __do_notify(struct mqueue_inode_info *info)
sig_i.si_errno = 0;
sig_i.si_code = SI_MESGQ;
sig_i.si_value = info->notify.sigev_value;
- sig_i.si_pid = task_tgid_vnr(current);
+ sig_i.si_pid = 0; /* Uses default current tgid */
sig_i.si_uid = current->uid;
kill_pid_info(info->notify.sigev_signo,
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 153dcb2..c72efb1 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1014,7 +1014,7 @@ static void check_thread_timers(struct task_struct *tsk,
* At the hard limit, we just die.
* No need to calculate anything else now.
*/
- __group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk, NULL);
return;
}
if (tsk->rt.timeout > DIV_ROUND_UP(*soft, USEC_PER_SEC/HZ)) {
@@ -1029,7 +1029,7 @@ static void check_thread_timers(struct task_struct *tsk,
printk(KERN_INFO
"RT Watchdog Timeout: %s[%d]\n",
tsk->comm, task_pid_nr(tsk));
- __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk, NULL);
}
}
}
@@ -1122,7 +1122,7 @@ static void check_process_timers(struct task_struct *tsk,
sig->it_prof_expires = cputime_add(
sig->it_prof_expires, ptime);
}
- __group_send_sig_info(SIGPROF, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGPROF, SEND_SIG_PRIV, tsk, NULL);
}
if (!cputime_eq(sig->it_prof_expires, cputime_zero) &&
(cputime_eq(prof_expires, cputime_zero) ||
@@ -1138,7 +1138,7 @@ static void check_process_timers(struct task_struct *tsk,
sig->it_virt_expires = cputime_add(
sig->it_virt_expires, utime);
}
- __group_send_sig_info(SIGVTALRM, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGVTALRM, SEND_SIG_PRIV, tsk, NULL);
}
if (!cputime_eq(sig->it_virt_expires, cputime_zero) &&
(cputime_eq(virt_expires, cputime_zero) ||
@@ -1154,14 +1154,14 @@ static void check_process_timers(struct task_struct *tsk,
* At the hard limit, we just die.
* No need to calculate anything else now.
*/
- __group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGKILL, SEND_SIG_PRIV, tsk, NULL);
return;
}
if (psecs >= sig->rlim[RLIMIT_CPU].rlim_cur) {
/*
* At the soft limit, send a SIGXCPU every second.
*/
- __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+ __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk, NULL);
if (sig->rlim[RLIMIT_CPU].rlim_cur
< sig->rlim[RLIMIT_CPU].rlim_max) {
sig->rlim[RLIMIT_CPU].rlim_cur++;
diff --git a/kernel/signal.c b/kernel/signal.c
index 4530fc6..28a48ee 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -28,6 +28,7 @@
#include <linux/pid_namespace.h>
#include <linux/nsproxy.h>
#include <trace/sched.h>
+#include <linux/hardirq.h>
#include <asm/param.h>
#include <asm/uaccess.h>
@@ -798,8 +799,40 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
+static void set_sigqueue_pid(struct sigqueue *q, struct task_struct *t,
+ struct pid *sender)
+{
+ struct pid_namespace *ns;
+
+ /* Set si_pid to the pid number of sender in the pid namespace of
+ * our destination task for all siginfo types that support it.
+ */
+ switch(q->info.si_code & __SI_MASK) {
+ /* siginfo without si_pid */
+ case __SI_TIMER:
+ case __SI_POLL:
+ case __SI_FAULT:
+ break;
+ /* siginfo with si_pid */
+ case __SI_KILL:
+ case __SI_CHLD:
+ case __SI_RT:
+ case __SI_MESGQ:
+ default:
+ /* si_pid for SI_KERNEL is always 0 */
+ if (q->info.si_code == SI_KERNEL || in_interrupt())
+ break;
+ /* Is current not the sending task? */
+ if (!sender)
+ sender = task_tgid(current);
+ ns = task_active_pid_ns(t);
+ q->info.si_pid = pid_nr_ns(sender, ns);
+ break;
+ }
+}
+
static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
- int group)
+ int group, struct pid *sender)
{
struct sigpending *pending;
struct sigqueue *q;
@@ -843,7 +876,7 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
q->info.si_signo = sig;
q->info.si_errno = 0;
q->info.si_code = SI_USER;
- q->info.si_pid = task_pid_vnr(current);
+ q->info.si_pid = 0; /* Uses current tgid */
q->info.si_uid = current->uid;
break;
case (unsigned long) SEND_SIG_PRIV:
@@ -857,6 +890,7 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
copy_siginfo(&q->info, info);
break;
}
+ set_sigqueue_pid(q, t, sender);
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER)
/*
@@ -906,15 +940,16 @@ static int __init setup_print_fatal_signals(char *str)
__setup("print-fatal-signals=", setup_print_fatal_signals);
int
-__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p,
+ struct pid *sender)
{
- return send_signal(sig, info, p, 1);
+ return send_signal(sig, info, p, 1, sender);
}
static int
specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t)
{
- return send_signal(sig, info, t, 0);
+ return send_signal(sig, info, t, 0, NULL);
}
/*
@@ -1018,7 +1053,7 @@ int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
if (!ret && sig) {
ret = -ESRCH;
if (lock_task_sighand(p, &flags)) {
- ret = __group_send_sig_info(sig, info, p);
+ ret = __group_send_sig_info(sig, info, p, NULL);
unlock_task_sighand(p, &flags);
}
}
@@ -1108,7 +1143,7 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
if (sig && p->sighand) {
unsigned long flags;
spin_lock_irqsave(&p->sighand->siglock, flags);
- ret = __group_send_sig_info(sig, info, p);
+ ret = __group_send_sig_info(sig, info, p, NULL);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
}
out_unlock:
@@ -1344,6 +1379,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
struct sighand_struct *psig;
struct task_cputime cputime;
int ret = sig;
+ struct pid *sender;
BUG_ON(sig == -1);
@@ -1353,24 +1389,11 @@ int do_notify_parent(struct task_struct *tsk, int sig)
BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
+ /* We are under tasklist_lock so no need to call get_pid */
+ sender = task_pid(tsk);
info.si_signo = sig;
info.si_errno = 0;
- /*
- * we are under tasklist_lock here so our parent is tied to
- * us and cannot exit and release its namespace.
- *
- * the only it can is to switch its nsproxy with sys_unshare,
- * bu uncharing pid namespaces is not allowed, so we'll always
- * see relevant namespace
- *
- * write_lock() currently calls preempt_disable() which is the
- * same as rcu_read_lock(), but according to Oleg, this is not
- * correct to rely on this
- */
- rcu_read_lock();
- info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
- rcu_read_unlock();
-
+ info.si_pid = 0; /* Filled in later from sender */
info.si_uid = tsk->uid;
thread_group_cputime(tsk, &cputime);
@@ -1412,7 +1435,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
sig = -1;
}
if (valid_signal(sig) && sig > 0)
- __group_send_sig_info(sig, &info, tsk->parent);
+ __group_send_sig_info(sig, &info, tsk->parent, sender);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
@@ -1425,6 +1448,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
unsigned long flags;
struct task_struct *parent;
struct sighand_struct *sighand;
+ struct pid *sender;
if (tsk->ptrace & PT_PTRACED)
parent = tsk->parent;
@@ -1433,15 +1457,11 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
parent = tsk->real_parent;
}
+ /* We are under tasklist_lock so no need to call get_pid */
+ sender = task_pid(tsk);
info.si_signo = SIGCHLD;
info.si_errno = 0;
- /*
- * see comment in do_notify_parent() abot the following 3 lines
- */
- rcu_read_lock();
- info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
- rcu_read_unlock();
-
+ info.si_pid = 0; /* Filled in later from sender */
info.si_uid = tsk->uid;
info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -1466,7 +1486,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
spin_lock_irqsave(&sighand->siglock, flags);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
- __group_send_sig_info(SIGCHLD, &info, parent);
+ __group_send_sig_info(SIGCHLD, &info, parent, sender);
/*
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
@@ -2210,7 +2230,7 @@ sys_kill(pid_t pid, int sig)
info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_USER;
- info.si_pid = task_tgid_vnr(current);
+ info.si_pid = 0; /* Uses default current tgid */
info.si_uid = current->uid;
return kill_something_info(sig, &info, pid);
@@ -2227,7 +2247,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
info.si_signo = sig;
info.si_errno = 0;
info.si_code = SI_TKILL;
- info.si_pid = task_tgid_vnr(current);
+ info.si_pid = 0; /* Uses default current tgid */
info.si_uid = current->uid;
rcu_read_lock();
--
1.5.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH] Implement ns_of_pid()
[not found] ` <20081112064139.GA27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-12 6:44 ` [RFC][PATCH 2/3] Generalize task_active_pid_ns() Sukadev Bhattiprolu
2008-11-12 6:48 ` [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace Sukadev Bhattiprolu
@ 2008-11-12 6:53 ` Sukadev Bhattiprolu
2 siblings, 0 replies; 8+ messages in thread
From: Sukadev Bhattiprolu @ 2008-11-12 6:53 UTC (permalink / raw)
To: Oleg Nesterov, Eric W. Biederman; +Cc: Containers, Pavel Emelyanov
Oops, the subject should say [PATCH 1/3]
I forward ported these patches from Eric's patchset and touch
tested them again. I think we were in agreement on this set.
Should we push these 3 patches upstream (after some more testing).
Sukadev
Sukadev Bhattiprolu [sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] wrote:
|
| From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
| Date: Mon, 10 Nov 2008 19:03:56 -0800
| Subject: [PATCH 1/3] pid: Implement ns_of_pid
|
| A current problem with the pid namespace is that it is
| easy to do pid related work after exit_task_namespaces which
| drops the nsproxy pointer.
|
| However if we are doing pid namespace related work we are
| always operating on some struct pid which retains the pid_namespace
| pointer of the pid namespace it was allocated in.
|
| So provide ns_of_pid which allows us to find the pid
| namespace a pid was allocated in.
|
| Using this we have the needed infrastructure to do pid
| namespace related work at anytime we have a struct pid,
| removing the chance of accidentally having a NULL
| pointer dereference when accessing current->nsproxy.
|
| Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
| ---
| include/linux/pid.h | 11 +++++++++++
| 1 files changed, 11 insertions(+), 0 deletions(-)
|
| diff --git a/include/linux/pid.h b/include/linux/pid.h
| index d7e98ff..e9aec85 100644
| --- a/include/linux/pid.h
| +++ b/include/linux/pid.h
| @@ -122,6 +122,17 @@ int next_pidmap(struct pid_namespace *pid_ns, int last);
| extern struct pid *alloc_pid(struct pid_namespace *ns);
| extern void free_pid(struct pid *pid);
|
| +/* ns_of_pid returns the pid namespace in which the specified
| + * pid was allocated.
| + */
| +static inline struct pid_namespace *ns_of_pid(struct pid *pid)
| +{
| + struct pid_namespace *ns = NULL;
| + if (pid)
| + ns = pid->numbers[pid->level].ns;
| + return ns;
| +}
| +
| /*
| * the helpers to get the pid's id seen from different namespaces
| *
| --
| 1.5.2.5
|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace
[not found] ` <20081112064819.GC27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-11-12 16:33 ` Oleg Nesterov
[not found] ` <20081112163339.GD13269-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-12 16:33 UTC (permalink / raw)
To: Sukadev Bhattiprolu; +Cc: Containers, Eric W. Biederman, Pavel Emelyanov
On 11/11, Sukadev Bhattiprolu wrote:
>
> Subject: [PATCH 3/3] sig: Handle pid namespace crossing when sending signals.
> I add a struct pid sender parameter to __group_send_sig_info, as that is
> the only function called with si_pid != task_tgid_vnr(current). So we can
> correctly handle the sending of a signal to the parent of an arbitrary
> task.
Sukadev, Eric, I am sorry but... and it is very possible I missed something
but... You can't even imagine how I hate these complications ;)
Could you please take another look at the patch I sent
http://marc.info/?l=linux-kernel&m=122634217518183
? It is very simple (but yes, hackish). See also my comment about
in_interrupt() check...
(btw, your another email has a good point, we can't use ->nsproxy
like that patch does).
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -506,7 +506,7 @@ static void __do_notify(struct mqueue_inode_info *info)
> sig_i.si_errno = 0;
> sig_i.si_code = SI_MESGQ;
> sig_i.si_value = info->notify.sigev_value;
> - sig_i.si_pid = task_tgid_vnr(current);
> + sig_i.si_pid = 0; /* Uses default current tgid */
> sig_i.si_uid = current->uid;
Yes __do_notify() (and other pathes I am not aware of) needs attention
too, but I'd suggest a separate patch...
And I personally like the idea to factor out these ".si_pid = current->pid"
but in a separate patch?
> +static void set_sigqueue_pid(struct sigqueue *q, struct task_struct *t,
> + struct pid *sender)
> +{
> + struct pid_namespace *ns;
> +
> + /* Set si_pid to the pid number of sender in the pid namespace of
> + * our destination task for all siginfo types that support it.
> + */
> + switch(q->info.si_code & __SI_MASK) {
> + /* siginfo without si_pid */
> + case __SI_TIMER:
> + case __SI_POLL:
> + case __SI_FAULT:
> + break;
> + /* siginfo with si_pid */
> + case __SI_KILL:
> + case __SI_CHLD:
> + case __SI_RT:
> + case __SI_MESGQ:
> + default:
> + /* si_pid for SI_KERNEL is always 0 */
> + if (q->info.si_code == SI_KERNEL || in_interrupt())
> + break;
> + /* Is current not the sending task? */
> + if (!sender)
> + sender = task_tgid(current);
> + ns = task_active_pid_ns(t);
> + q->info.si_pid = pid_nr_ns(sender, ns);
> + break;
> + }
> +}
Why, why? Just: if from parent ns - clear .si_pid. No?
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace
[not found] ` <20081112163339.GD13269-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-11-13 3:21 ` Eric W. Biederman
2008-11-14 16:58 ` Oleg Nesterov
2008-11-13 3:26 ` Eric W. Biederman
1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-11-13 3:21 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Containers, Sukadev Bhattiprolu, Pavel Emelyanov
Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> On 11/11, Sukadev Bhattiprolu wrote:
>>
>> Subject: [PATCH 3/3] sig: Handle pid namespace crossing when sending signals.
>
>> I add a struct pid sender parameter to __group_send_sig_info, as that is
>> the only function called with si_pid != task_tgid_vnr(current). So we can
>> correctly handle the sending of a signal to the parent of an arbitrary
>> task.
>
> Sukadev, Eric, I am sorry but... and it is very possible I missed something
> but... You can't even imagine how I hate these complications ;)
>
> Could you please take another look at the patch I sent
>
> http://marc.info/?l=linux-kernel&m=122634217518183
>
> ? It is very simple (but yes, hackish). See also my comment about
> in_interrupt() check...
>
> (btw, your another email has a good point, we can't use ->nsproxy
> like that patch does).
>
>> --- a/ipc/mqueue.c
>> +++ b/ipc/mqueue.c
>> @@ -506,7 +506,7 @@ static void __do_notify(struct mqueue_inode_info *info)
>> sig_i.si_errno = 0;
>> sig_i.si_code = SI_MESGQ;
>> sig_i.si_value = info->notify.sigev_value;
>> - sig_i.si_pid = task_tgid_vnr(current);
>> + sig_i.si_pid = 0; /* Uses default current tgid */
>> sig_i.si_uid = current->uid;
>
> Yes __do_notify() (and other pathes I am not aware of) needs attention
> too, but I'd suggest a separate patch...
>
> And I personally like the idea to factor out these ".si_pid = current->pid"
> but in a separate patch?
>
>> +static void set_sigqueue_pid(struct sigqueue *q, struct task_struct *t,
>> + struct pid *sender)
>> +{
>> + struct pid_namespace *ns;
>> +
>> + /* Set si_pid to the pid number of sender in the pid namespace of
>> + * our destination task for all siginfo types that support it.
>> + */
>> + switch(q->info.si_code & __SI_MASK) {
>> + /* siginfo without si_pid */
>> + case __SI_TIMER:
>> + case __SI_POLL:
>> + case __SI_FAULT:
>> + break;
>> + /* siginfo with si_pid */
>> + case __SI_KILL:
>> + case __SI_CHLD:
>> + case __SI_RT:
>> + case __SI_MESGQ:
>> + default:
>> + /* si_pid for SI_KERNEL is always 0 */
>> + if (q->info.si_code == SI_KERNEL || in_interrupt())
>> + break;
>> + /* Is current not the sending task? */
>> + if (!sender)
>> + sender = task_tgid(current);
>> + ns = task_active_pid_ns(t);
>> + q->info.si_pid = pid_nr_ns(sender, ns);
>> + break;
>> + }
>> +}
>
> Why, why? Just: if from parent ns - clear .si_pid. No?
We need the switch to know if we are a member of a union that supports
si_pid.
The if (!sender) portion can be handled by just making all of
the callers pass it. I forget but there are a few weird cases
where the current process is not the sender.
The in_interrupt thing is there simply because current is not
useable from an interrrupt context, and there are some
signals that get sent from an interrupt context. Reliably
passing in sender will remove the need for the in_interrupt
check.
Oh. As for the chunk that is:
ns = task_active_pid_ns(t)
q->info.si_pid = pid_nr_ns(sender, ns);
If we are sending from a child to a parent namespace. The name of the
child changes. There is some place F_SETSIG? sigfd? where we have
something that resembles the full general case of processes being able
to send a signal to any other process.
Hopefully this helps. I'm swamped at the moment and don't have time
to do a full general review.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace
[not found] ` <20081112163339.GD13269-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-11-13 3:21 ` Eric W. Biederman
@ 2008-11-13 3:26 ` Eric W. Biederman
1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2008-11-13 3:26 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Containers, Sukadev Bhattiprolu, Pavel Emelyanov
Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
> On 11/11, Sukadev Bhattiprolu wrote:
>>
>> Subject: [PATCH 3/3] sig: Handle pid namespace crossing when sending signals.
>
>> I add a struct pid sender parameter to __group_send_sig_info, as that is
>> the only function called with si_pid != task_tgid_vnr(current). So we can
>> correctly handle the sending of a signal to the parent of an arbitrary
>> task.
>
> Sukadev, Eric, I am sorry but... and it is very possible I missed something
> but... You can't even imagine how I hate these complications ;)
>
> Could you please take another look at the patch I sent
>
> http://marc.info/?l=linux-kernel&m=122634217518183
>
> ? It is very simple (but yes, hackish). See also my comment about
> in_interrupt() check...
>
> (btw, your another email has a good point, we can't use ->nsproxy
> like that patch does).
>
>> --- a/ipc/mqueue.c
>> +++ b/ipc/mqueue.c
>> @@ -506,7 +506,7 @@ static void __do_notify(struct mqueue_inode_info *info)
>> sig_i.si_errno = 0;
>> sig_i.si_code = SI_MESGQ;
>> sig_i.si_value = info->notify.sigev_value;
>> - sig_i.si_pid = task_tgid_vnr(current);
>> + sig_i.si_pid = 0; /* Uses default current tgid */
>> sig_i.si_uid = current->uid;
>
> Yes __do_notify() (and other pathes I am not aware of) needs attention
> too, but I'd suggest a separate patch...
>
> And I personally like the idea to factor out these ".si_pid = current->pid"
> but in a separate patch?
>
>> +static void set_sigqueue_pid(struct sigqueue *q, struct task_struct *t,
>> + struct pid *sender)
>> +{
>> + struct pid_namespace *ns;
>> +
>> + /* Set si_pid to the pid number of sender in the pid namespace of
>> + * our destination task for all siginfo types that support it.
>> + */
>> + switch(q->info.si_code & __SI_MASK) {
>> + /* siginfo without si_pid */
>> + case __SI_TIMER:
>> + case __SI_POLL:
>> + case __SI_FAULT:
>> + break;
>> + /* siginfo with si_pid */
>> + case __SI_KILL:
>> + case __SI_CHLD:
>> + case __SI_RT:
>> + case __SI_MESGQ:
>> + default:
>> + /* si_pid for SI_KERNEL is always 0 */
>> + if (q->info.si_code == SI_KERNEL || in_interrupt())
>> + break;
>> + /* Is current not the sending task? */
>> + if (!sender)
>> + sender = task_tgid(current);
>> + ns = task_active_pid_ns(t);
>> + q->info.si_pid = pid_nr_ns(sender, ns);
>> + break;
>> + }
>> +}
>
> Why, why? Just: if from parent ns - clear .si_pid. No?
We need the switch to know if we are a member of a union that supports
si_pid.
The if (!sender) portion can be handled by just making all of
the callers pass it. I forget but there are a few weird cases
where the current process is not the sender.
The in_interrupt thing is there simply because current is not
useable from an interrrupt context, and there are some
signals that get sent from an interrupt context. Reliably
passing in sender will remove the need for the in_interrupt
check.
Oh. As for the chunk that is:
ns = task_active_pid_ns(t)
q->info.si_pid = pid_nr_ns(sender, ns);
If we are sending from a child to a parent namespace. The name of the
child changes. There is some place F_SETSIG? sigfd? where we have
something that resembles the full general case of processes being able
to send a signal to any other process.
Hopefully this helps. I'm swamped at the moment and don't have time
to do a full general review.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace
2008-11-13 3:21 ` Eric W. Biederman
@ 2008-11-14 16:58 ` Oleg Nesterov
0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2008-11-14 16:58 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Sukadev Bhattiprolu, Daniel Hokka Zakrisson, Pavel Emelyanov,
Containers, linux-kernel
On 11/12, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 11/11, Sukadev Bhattiprolu wrote:
> >>
> >> +static void set_sigqueue_pid(struct sigqueue *q, struct task_struct *t,
> >> + struct pid *sender)
> >> +{
> >> + struct pid_namespace *ns;
> >> +
> >> + /* Set si_pid to the pid number of sender in the pid namespace of
> >> + * our destination task for all siginfo types that support it.
> >> + */
> >> + switch(q->info.si_code & __SI_MASK) {
> >> + /* siginfo without si_pid */
> >> + case __SI_TIMER:
> >> + case __SI_POLL:
> >> + case __SI_FAULT:
> >> + break;
> >> + /* siginfo with si_pid */
> >> + case __SI_KILL:
> >> + case __SI_CHLD:
> >> + case __SI_RT:
> >> + case __SI_MESGQ:
> >> + default:
> >> + /* si_pid for SI_KERNEL is always 0 */
> >> + if (q->info.si_code == SI_KERNEL || in_interrupt())
> >> + break;
> >> + /* Is current not the sending task? */
> >> + if (!sender)
> >> + sender = task_tgid(current);
> >> + ns = task_active_pid_ns(t);
> >> + q->info.si_pid = pid_nr_ns(sender, ns);
> >> + break;
> >> + }
> >> +}
> >
> > Why, why? Just: if from parent ns - clear .si_pid. No?
>
> We need the switch to know if we are a member of a union that supports
> si_pid.
Please look at http://marc.info/?l=linux-kernel&m=122634217518183
If SIG_FROM_USER is set, we know that .si_pid is "valid".
Yes, yes, yes. sys_rt_sigqueueinfo() is a problem, but in that
case we can't trust .si_code anyway.
> The in_interrupt thing is there simply because current is not
> useable from an interrrupt context, and there are some
> signals that get sent from an interrupt context.
Yes sure. But I don't think this check is enough, see other
emails. And this check is not needed once we have SIG_FROM_USER.
> Oh. As for the chunk that is:
> ns = task_active_pid_ns(t)
> q->info.si_pid = pid_nr_ns(sender, ns);
>
> If we are sending from a child to a parent namespace.
The notify_parent() case is fine, afaics (again I assume the "patch"
above which sets SIG_FROM_USER).
> The name of the
> child changes. There is some place F_SETSIG? sigfd? where we have
> something that resembles the full general case of processes being able
> to send a signal to any other process.
Yes, this needs attention too.
Oleg.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-11-14 16:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 6:41 [RFC][PATCH] Implement ns_of_pid() Sukadev Bhattiprolu
[not found] ` <20081112064139.GA27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-12 6:44 ` [RFC][PATCH 2/3] Generalize task_active_pid_ns() Sukadev Bhattiprolu
2008-11-12 6:48 ` [RFC][PATCH 3/3] Set si_pid to 0 for signals from ancestor namespace Sukadev Bhattiprolu
[not found] ` <20081112064819.GC27806-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-11-12 16:33 ` Oleg Nesterov
[not found] ` <20081112163339.GD13269-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-11-13 3:21 ` Eric W. Biederman
2008-11-14 16:58 ` Oleg Nesterov
2008-11-13 3:26 ` Eric W. Biederman
2008-11-12 6:53 ` [RFC][PATCH] Implement ns_of_pid() Sukadev Bhattiprolu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox