From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org,
Oleg Nesterov <oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>,
Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Subject: Re: [PATCH] Signal semantics for /sbin/init
Date: Tue, 30 Oct 2007 14:37:10 -0700 [thread overview]
Message-ID: <20071030213710.GA22763@us.ibm.com> (raw)
In-Reply-To: <m1bqahmm25.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
| sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:
|
| > (This is Oleg's patch with my tweaks to compile, Oleg pls sign-off).
| > ---
| >
| > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
| > Subject: [PATCH 1/3] Signal semantics for /sbin/init
| >
| > Currently, /sbin/init is protected from unhandled signals by the
| > "current == child_reaper(current)" check in get_signal_to_deliver().
| > This is not enough, we have multiple problems:
| >
| > - this doesn't work for multi-threaded inits, and we can't
| > fix this by simply making this check group-wide.
| >
| > - /sbin/init and kernel threads are not protected from
| > handle_stop_signal(). Minor problem, but not good and
| > allows to "steal" SIGCONT or change ->signal->flags.
| >
| > - /sbin/init is not protected from __group_complete_signal(),
| > sig_fatal() can set SIGNAL_GROUP_EXIT and block exec(), kill
| > sub-threads, set ->group_stop_count, etc.
| >
| > Also, with support for multiple pid namespaces, we need an ability to
| > actually kill the sub-namespace's init from the parent namespace. In
| > this case it is not possible (without painful and intrusive changes)
| > to make the "should we honor this signal" decision on the receiver's
| > side.
| >
| > Hopefully this patch (adds 43 bytes to kernel/signal.o) can solve
| > these problems.
| >
| > Notes:
| >
| > - Blocked signals are never ignored, so init still can receive
| > a pending blocked signal after sigprocmask(SIG_UNBLOCK).
| > Easy to fix, but probably we can ignore this issue.
| >
| > - this patch allows us to simplify de_thread() playing games
| > with pid_ns->child_reaper.
| >
| > (Side note: the current behaviour of things like force_sig_info_fault()
| > is not very good, init should not ignore these signals and go to the
| > endless loop. Exit + panic is imho better, easy to chamge)
| >
| > Oleg.
| >
| > Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
|
| I kept thinking about the problem and the ignored signal issue
| really disturbed me, because we did not have a clean definition
| on how things are supposed to work. I do think Oleg was on the right
| track.
|
| If we make the rule:
| When sending a signal to init. The presence of a signal handler
| that is neither SIG_IGN nor SIG_DFL allows the signal to be sent to
| init. If the signal is not sent it is silently dropped, without
| becoming pending. Further if init specifies it's signal handler as
| SIG_IGN or SIG_DFL all pending signals will be dropped.
|
| The only noticeable user space difference from todays init
| is that it no longer needs to worry about signals becoming
| pending when it has them marked as SIG_DFL or SIG_IGN, and
| then it blocks them.
Yes, thats a small change in semantics to blocked signals and
may not matter much in practice.
|
| This change by making the presence of a signal handler effectively
| a permission check allows us to do all of the work before
| we enqueue the signal. Which means that we can now allow
| force_sig_info to send signals to init and that panic the kernel
| instead of going into an infinite busy loop taking an exception
| sending a signal and then retaking the same exception.
|
| This change also makes it possible to easily implement the
| the desired semantics of /sbin/init for pid namespaces where
| outer processes can kill init but processes inside the pid
| namespace can not.
|
| Please take a look and tell me what you think.
Overall, it looks good, couple of questions below. Will port my
patches and test it out.
|
| diff --git a/kernel/signal.c b/kernel/signal.c
| index 4537bdd..79856eb 100644
| --- a/kernel/signal.c
| +++ b/kernel/signal.c
| @@ -546,6 +546,22 @@ static int check_kill_permission(int sig, struct siginfo *info,
| return security_task_kill(t, info, sig, 0);
| }
|
| +static int sig_available(struct task_struct *tsk, int sig)
Hmm. IMHO, 'available' is not immediately obvious/clear.
| +{
| + void __user * handler;
| + int available = 1;
| +
| + if (likely(!is_global_init(tsk)))
| + goto out;
With multiple pid namespaces, I guess, this would become
if (!is_container_init(tsk))
goto out;
/* from parent namespace, don't ignore */
if (!task_in_descendant_ns(tsk))
goto out;
If this is correct, I have a question below re: do_sigaction.
| +
| + handler = tsk->sighand->action[sig-1].sa.sa_handler;
| + available = (handler != SIG_IGN) &&
| + (handler != SIG_DFL);
Can we use sig_user_defined() for the above checks ? sig_available()
looks like an extension of sig_user_defined() for init. How about
sig_init_user_defined() or reverse the logic and use sig_init_ignore()
like Oleg did ?
| +out:
| + return available;
| +}
| +
| +
| /* forward decl */
| static void do_notify_parent_cldstop(struct task_struct *tsk, int why);
|
| @@ -948,6 +964,9 @@ __group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
| int ret = 0;
|
| assert_spin_locked(&p->sighand->siglock);
| + if (!sig_available(p, sig))
| + return ret;
| +
| handle_stop_signal(sig, p);
|
| /* Short-circuit ignored signals. */
| @@ -1379,6 +1398,11 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
| read_lock(&tasklist_lock);
| /* Since it_lock is held, p->sighand cannot be NULL. */
| spin_lock_irqsave(&p->sighand->siglock, flags);
| + if (!sig_available(p, sig)) {
| + ret = 1;
| + goto out;
| + }
| +
| handle_stop_signal(sig, p);
|
| /* Short-circuit ignored signals. */
Hmm. I see now that Oleg's approach would result in the sig_init_ignore()
check being done twice (once in handle_stop_signal() and once in the following
sig_ignored()).
Is that why you don't fold sig_available() into sig_ignored() ? Or, there
other more important/correctness issues as well ?
| @@ -1860,12 +1884,6 @@ relock:
| if (sig_kernel_ignore(signr)) /* Default is nothing. */
| continue;
|
| - /*
| - * Global init gets no signals it doesn't want.
| - */
| - if (is_global_init(current))
| - continue;
| -
| if (sig_kernel_stop(signr)) {
| /*
| * The default action is to stop all threads in
| @@ -2246,8 +2264,10 @@ static int do_tkill(int tgid, int pid, int sig)
| */
| if (!error && sig && p->sighand) {
| spin_lock_irq(&p->sighand->siglock);
| - handle_stop_signal(sig, p);
| - error = specific_send_sig_info(sig, &info, p);
| + if (sig_available(p, sig)) {
| + handle_stop_signal(sig, p);
| + error = specific_send_sig_info(sig, &info, p);
| + }
| spin_unlock_irq(&p->sighand->siglock);
| }
| }
| @@ -2336,7 +2356,8 @@ int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact)
| * be discarded, whether or not it is blocked"
| */
| if (act->sa.sa_handler == SIG_IGN ||
| - (act->sa.sa_handler == SIG_DFL && sig_kernel_ignore(sig))) {
| + (act->sa.sa_handler == SIG_DFL && sig_kernel_ignore(sig)) ||
| + !sig_available(current, sig)) {
| struct task_struct *t = current;
| sigemptyset(&mask);
| sigaddset(&mask, sig);
Not sure about this. Consider that we extend the sig_available() as
I mentioned above for container_init(). Then suppose following sequence
occurs:
- container-init receives a fatal signal say SIGUSR1, from parent-ns
i.e the signal is pending.
- before processing the pending signal, the container-init sets
the handler to SIG_DFL (which is to terminate).
Will we then discard the pending SIGUSR1 even though it was from
from parent ns ?
next prev parent reply other threads:[~2007-10-30 21:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-27 19:00 [PATCH] Signal semantics for /sbin/init sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20071027190010.GA10397-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-10-29 16:13 ` Dave Hansen
2007-10-29 20:02 ` Eric W. Biederman
2007-10-30 1:24 ` Eric W. Biederman
[not found] ` <m1bqahmm25.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-10-30 17:30 ` Dave Hansen
2007-10-30 18:09 ` Eric W. Biederman
2007-10-30 21:37 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [this message]
[not found] ` <20071030213710.GA22763-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-10-30 22:25 ` Eric W. Biederman
[not found] ` <m1y7dkjl3k.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-11-02 21:00 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20071102210047.GA8714-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-11-04 4:20 ` Eric W. Biederman
[not found] ` <m1wssybpzu.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-11-17 17:38 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20071117173822.GA330-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-12-12 0:49 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071030213710.GA22763@us.ibm.com \
--to=sukadev-r/jw6+rmf7hqt0dzr+alfa@public.gmane.org \
--cc=clg-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org \
--cc=xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox