From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Roland McGrath <roland@redhat.com>
Cc: oleg@redhat.com, ebiederm@xmission.com, daniel@hozac.com,
xemul@openvz.org, containers@lists.osdl.org,
linux-kernel@vger.kernel.org, sukadev@us.ibm.com
Subject: Re: [RFC][PATCH 3/5] Determine if sender is from ancestor ns
Date: Mon, 8 Dec 2008 19:22:07 -0800 [thread overview]
Message-ID: <20081209032207.GA30016@us.ibm.com> (raw)
In-Reply-To: <20081204010636.A8465FC053@magilla.sf.frob.com>
Roland McGrath [roland@redhat.com] wrote:
Thanks for the review and your comments.
>
> I don't quarrel with the intent, but I don't like this approach.
> I think we can do it more cleanly. I don't have it all worked out,
> but a couple of thoughts come to mind.
>
> Firstly, it's no problem to split SIGNAL_UNKILLABLE out into multiple
> flags. It's quite noninvasive, only signal.c looks at those flags. Then
> we can implement the signal magic cleanly keyed on a few separate flags,
> using different flag combinations for global init and container inits.
Ok. Sounds good. Let say for this discussion, container-inits set
SIGNAL_UNKILLABLE_CINIT, global inits continue to set SIGNAL_UNKILLABLE.
>
> I don't mind a feature that lets an unprivileged container init magically
> ignore normal exit signals. That just does something it could do itself
> with sigaction, except its /proc/pid/status "SigIgn:" line will lie.
Yes, but after Bastian mentioned it recently, I was wondering if we can
keep "SigIgn:" honest by tweaking procfs code - i.e have procfs check
the SIGNAL_UNKILLABLE flags and include SIG_DFL exit signals in the
SigIgn: line.
We could include blocked SIG_DFL signals in the SigIgn: line ? Hacky ?
> That's OK. Key that on its own flag and set that in both inits. Just
> check that flag in prepare_signal() and in get_signal_to_deliver().
>
> It's a different story for the signals that cannot be caught, blocked, or
> ignored, SIGKILL and SIGSTOP. At least when sent from outside the
> container, circumventing either of those would be a privilege escalation
> from what's always been understood by admins and so on.
So if SIGKILL or SIGSTOP are sent from inside the container, we drop them
in prepare_signal() if SIGNAL_UNKILLABLE flags are set. If sent from outside
the container, we queue them and get_signal_to_deliver() delivers these
unless SIGNAL_UNKILLABLE is set (i.e the signals are delivered to cinits but
ignored for global init).
>
> It's convenient for the implementation that we can treat these differently
> from other signals, precisely because they can never be caught, blocked, or
> ignored. That is, when we decide in prepare_signal() to queue a SIGKILL or
> SIGSTOP, it can never turn out that we're later going to drop it because it
> went from caught or ignored or blocked to uncaught, unignored, and
> unblocked. (It's only because of that possibility that there is any need
> to check for a suppressed exit signal in get_signal_to_deliver() rather
> than only in prepare_signal().) That means that when the decision hinges
> on the namespace correlation of the sender and receiver, you can check that
> when it's handy (current vs p in prepare_signal) rather than trying to
> reconstruct the answer for a queued signal.
Yes. One thing I am not clear on yet is how we decide in prepare_signal()
if it is safe to check the sender's namespace (since caller of send_signal()
could be an interrupt handler or workqueue).
As I was discussing with Bastian Blank, SI_FROMUSER() is not sufficient
since SI_ASYNCIO appears as an user-space signal.
Now is SI_ASYNCIO really supposed to be a user signal ? Or like SI_MESGQ
and SI_POLL can it be a kernel signal ? If we fix SI_ASYNCIO, can we
safely use SI_FROMUSER() for this or could there be other 'user' signals
from kernel ?
>
> Finally, one more thought. This may be moot for the problem at hand if you
> take the approach I just suggested, but probably should be fixed anyway.
> It seems to me that the si_pid and si_uid fields of siginfo_t ought to be
> translated to the namespaces of the receiver. I think it makes most sense
> to do this on the front end, i.e. in the callers that fill in the siginfo_t
> in the first place (sys_kill et al, or maybe a few layers down?).
> Currently it's inconsistent, but mostly wrong.
Yes, that makes a lot of sense to push that initialization to the callers.
We have been going through and identifying the 'si_pid's that need to be fixed.
Nadia sent out one patch for ipc/mqueue.c
> do_notify_parent() and
> do_notify_parent_cldstop() use the receiver's namespace to compute si_pid,
> but the rest of the signal.c routines do not. A free side effect of doing
> this is that si_pid for a sender whose PID is not visible to the receiver
> (i.e. outside its container) would be distinctively 0 or -1 or something.
> (-1 might be the best choice, since si_[pu]id=0 already arises now in case
> of signal queue exhaustion and the like.) Hence, possibly one could simply
> use si_pid>0 as a "sent from inside the container" check on a queued siginfo_t.
Yes, its probably moot with the other approach, but the only drawback with
relying on the "->si_pid > 0" check in get_signal_to_deliver() was that
allocation of siginfo_t could have failed. In which case it would be
unclear whether signal was from parent or same namespace.
But if we fix si_pid problems and correctly pass in siginfo_t, can we
use "si_pid > 0" check in prepare_signal() to decide whether or not
to queue the signal ?
Sukadev
next prev parent reply other threads:[~2008-12-09 3:22 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-26 3:42 [RFC][PATCH 0/5] Container init signal semantics Sukadev Bhattiprolu
2008-11-26 3:44 ` [RFC][PATCH 1/5] pid: Implement ns_of_pid Sukadev Bhattiprolu
2008-11-27 1:19 ` Bastian Blank
2008-12-01 20:24 ` Sukadev Bhattiprolu
2008-12-02 11:58 ` Bastian Blank
2008-12-02 22:12 ` Sukadev Bhattiprolu
2008-12-03 0:34 ` Valdis.Kletnieks
2008-11-26 3:45 ` [RFC][PATCH 2/5] pid: Generalize task_active_pid_ns Sukadev Bhattiprolu
2008-11-27 1:17 ` Bastian Blank
2008-11-27 21:19 ` Greg Kurz
2008-12-01 21:15 ` Sukadev Bhattiprolu
2008-12-02 11:57 ` Bastian Blank
2008-12-03 7:41 ` Sukadev Bhattiprolu
2008-12-04 12:58 ` Bastian Blank
2008-11-27 13:09 ` Nadia Derbey
2008-12-01 20:38 ` Sukadev Bhattiprolu
2008-11-26 3:46 ` [RFC][PATCH 3/5] Determine if sender is from ancestor ns Sukadev Bhattiprolu
2008-11-27 1:01 ` Bastian Blank
2008-12-01 20:15 ` Sukadev Bhattiprolu
2008-12-02 11:48 ` Bastian Blank
2008-12-02 19:59 ` Sukadev Bhattiprolu
2008-12-04 12:45 ` [RFC][PATCH 3/5] Determine if sender is from ancestor ns+ Bastian Blank
2008-12-02 3:07 ` [RFC][PATCH 3/5] Determine if sender is from ancestor ns Roland McGrath
2008-12-04 1:06 ` Roland McGrath
2008-12-09 3:22 ` Sukadev Bhattiprolu [this message]
2008-11-26 3:46 ` [RFC][PATCH 4/5] Protect cinit from fatal signals Sukadev Bhattiprolu
2008-11-27 1:07 ` Bastian Blank
2008-12-01 20:21 ` Sukadev Bhattiprolu
2008-12-02 12:06 ` Bastian Blank
2008-12-02 20:51 ` Sukadev Bhattiprolu
2008-12-04 12:52 ` Bastian Blank
2008-12-04 18:58 ` Sukadev Bhattiprolu
2008-11-26 3:46 ` [RFC][PATCH 5/5] Clear si_pid for signal from ancestor ns Sukadev Bhattiprolu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081209032207.GA30016@us.ibm.com \
--to=sukadev@linux.vnet.ibm.com \
--cc=containers@lists.osdl.org \
--cc=daniel@hozac.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=roland@redhat.com \
--cc=sukadev@us.ibm.com \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox