* Re: [PATCH] arch_ptrace_stop @ 2007-12-13 17:29 Oleg Nesterov 2007-12-13 17:31 ` Matthew Wilcox 2007-12-13 22:42 ` Roland McGrath 0 siblings, 2 replies; 8+ messages in thread From: Oleg Nesterov @ 2007-12-13 17:29 UTC (permalink / raw) To: Roland McGrath Cc: Petr Tesarik, Tony Luck, Matthew Wilcox, Andrew Morton, linux-kernel Roland McGrath wrote: > > +static int sigkill_pending(struct task_struct *tsk) > +{ > + return ((sigismember(&tsk->pending.signal, SIGKILL) || > + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) && > + !unlikely(sigismember(&tsk->blocked, SIGKILL))); > +} How is it possible that SIGKILL is blocked? > static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info) > { > + int killed = 0; > + > + if (arch_ptrace_stop_needed(exit_code, info)) { > + /* > + * The arch code has something special to do before a > + * ptrace stop. This is allowed to block, e.g. for faults > + * on user stack pages. We can't keep the siglock while > + * calling arch_ptrace_stop, so we must release it now. > + * To preserve proper semantics, we must do this before > + * any signal bookkeeping like checking group_stop_count. > + * Meanwhile, a SIGKILL could come in before we retake the > + * siglock. That must prevent us from sleeping in TASK_TRACED. > + * So after regaining the lock, we must check for SIGKILL. > + */ > + spin_unlock_irq(¤t->sighand->siglock); > + arch_ptrace_stop(exit_code, info); > + spin_lock_irq(¤t->sighand->siglock); > + killed = sigkill_pending(current); > + } > + > /* > * If there is a group stop in progress, > * we must participate in the bookkeeping. > @@ -1604,7 +1635,7 @@ static void ptrace_stop(int exit_code, i > spin_unlock_irq(¤t->sighand->siglock); > try_to_freeze(); > read_lock(&tasklist_lock); > - if (may_ptrace_stop()) { > + if (!unlikely(killed) && may_ptrace_stop()) { Could you please explain this change in more details? Currently ptrace_stop() schedules in TASK_TRACED state even if we have a pending SIGKILL. With this patch this is still possible, but unless arch_ptrace_stop_needed() is true and thus we will check sigkill_pending(). Suppose the task was SIGKILL'ed and does ptrace_notify(PTRACE_EVENT_EXIT), now the resulting action depends on arch_ptrace_stop_needed(). I don't claim this is wrong, just trying to understand. Thanks, Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-13 17:29 [PATCH] arch_ptrace_stop Oleg Nesterov @ 2007-12-13 17:31 ` Matthew Wilcox 2007-12-13 17:39 ` Oleg Nesterov 2007-12-13 22:42 ` Roland McGrath 1 sibling, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2007-12-13 17:31 UTC (permalink / raw) To: Oleg Nesterov Cc: Roland McGrath, Petr Tesarik, Tony Luck, Matthew Wilcox, Andrew Morton, linux-kernel On Thu, Dec 13, 2007 at 08:29:26PM +0300, Oleg Nesterov wrote: > How is it possible that SIGKILL is blocked? I *think* it's possible that kernel threads may block SIGKILL. And I think init (pid 1) gets SIGKILL blocked. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-13 17:31 ` Matthew Wilcox @ 2007-12-13 17:39 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2007-12-13 17:39 UTC (permalink / raw) To: Matthew Wilcox Cc: Roland McGrath, Petr Tesarik, Tony Luck, Matthew Wilcox, Andrew Morton, linux-kernel On 12/13, Matthew Wilcox wrote: > > On Thu, Dec 13, 2007 at 08:29:26PM +0300, Oleg Nesterov wrote: > > How is it possible that SIGKILL is blocked? > > I *think* it's possible that kernel threads may block SIGKILL. > And I think init (pid 1) gets SIGKILL blocked. Yes. But this shouldn't matter because we can't ptrace them. /sbin/init doesn't block SIGKILL, but it can't be ptraced either. Also, SIGKILL is blocked only if kthread was created by kernel_thread() and then it does daemonize(). kthread_create()'ed kthreads don't block SIGKILL but ignore it. The latter behaviour is more correct, but we can't do the same for kernel_thread() threads. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-13 17:29 [PATCH] arch_ptrace_stop Oleg Nesterov 2007-12-13 17:31 ` Matthew Wilcox @ 2007-12-13 22:42 ` Roland McGrath 2007-12-16 16:24 ` Oleg Nesterov 1 sibling, 1 reply; 8+ messages in thread From: Roland McGrath @ 2007-12-13 22:42 UTC (permalink / raw) To: Oleg Nesterov Cc: Petr Tesarik, Tony Luck, Matthew Wilcox, Andrew Morton, linux-kernel > How is it possible that SIGKILL is blocked? It is probably not possible here. It's impossible for normal user threads, and kernel threads or ones doing something weird inside the kernel can probably never get here. But I'm just not trying to rely on any fancy guarantees here. In this change I'm being conservative in just matching the low-level logic that affects get_signal_to_deliver directly. I'd prefer to get this in place now, where it's easy to be sure it's correct vis a vis current behavior. Later on we can convince ourselves about the correctness of simplifications to clean the logic up. > Could you please explain this change in more details? I'm glad to. I appreciate the review. > Currently ptrace_stop() schedules in TASK_TRACED state even if we have a > pending SIGKILL. With this patch this is still possible, but unless > arch_ptrace_stop_needed() is true and thus we will check sigkill_pending(). Currently the siglock is always held throughout. The case this change addresses is when no SIGKILL was already pending before we took the lock. Currently, a new SIGKILL cannot come in until we've released the lock, which is after we've set TASK_TRACED. The signal's sender will hold the lock while checking each thread's state, waking up any in TASK_TRACED. When arch_ptrace_stop_needed() is true, we release the siglock for an unknown period (might block, etc). If a SIGKILL is sent there, it becomes pending while we are in TASK_RUNNING or a normal blocked state. Next we finish arch_ptrace_stop() and reacquire the siglock. Now entering TASK_TRACED would leave us unkillable because SIGKILL is already pending and nothing else (except PTRACE_CONT et al) will try to wake us up. This was tested on ia64 by inserting an explicit sleep in arch_ptrace_stop to simulate a very long block in a page fault, and then sending SIGKILL in this interval. It was verified that this left the thread unkillable, and that the sigkill_pending() check fixed that. > Suppose the task was SIGKILL'ed and does ptrace_notify(PTRACE_EVENT_EXIT), > now the resulting action depends on arch_ptrace_stop_needed(). For a SIGKILL that caused the exit to happen, then that SIGKILL was already dequeued and is no longer pending by the time we get here, so nothing is different. For a new SIGKILL racing with an exit already in progress, then either it comes before ptrace_notify(PTRACE_EVENT_EXIT) has taken the siglock, or after it's released it (in ptrace_stop). If it comes after ptrace_stop releases the siglock, then it wakes the thread up from that ptrace stop. If it comes before ptrace_notify takes the siglock, then the thread stops anyway with a pending SIGKILL. (That may be a problem, but it is not new. We can discuss that case separately.) The new third possibility is that it comes after ptrace_stop releases the siglock to call arch_ptrace_stop. Then the new code will notice the pending SIGKILL and skip the stop. The only way this differs from the SIGKILL having arrived immediately after the stop is that the parent SIGCHLD/wait notification with CLD_TRAPPED didn't happen. It's a race for the parent even to notice that, since exit_code and the queued SIGCHLD's siginfo_t will very quickly be replaced by the fresh notification sent for the child's death. > I don't claim this is wrong, just trying to understand. I don't claim the logic is now perfect, just that this change is better for the case it's intended for and not materially worse for others. We can certainly do more to clean things up. But I don't want that complex subject to delay the fixing and cleaning up of ia64 that this change enables. Thanks, Roland ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-13 22:42 ` Roland McGrath @ 2007-12-16 16:24 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2007-12-16 16:24 UTC (permalink / raw) To: Roland McGrath Cc: Petr Tesarik, Tony Luck, Matthew Wilcox, Andrew Morton, linux-kernel Roland, I am sorry for delay, On 12/13, Roland McGrath wrote: > > > Currently ptrace_stop() schedules in TASK_TRACED state even if we have a > > pending SIGKILL. With this patch this is still possible, but unless > > arch_ptrace_stop_needed() is true and thus we will check sigkill_pending(). > > Currently the siglock is always held throughout. The case this change > addresses is when no SIGKILL was already pending before we took the lock. > Currently, a new SIGKILL cannot come in until we've released the lock, > which is after we've set TASK_TRACED. The signal's sender will hold the > lock while checking each thread's state, waking up any in TASK_TRACED. > > When arch_ptrace_stop_needed() is true, we release the siglock for an > unknown period (might block, etc). If a SIGKILL is sent there, it becomes > pending while we are in TASK_RUNNING or a normal blocked state. Next we > finish arch_ptrace_stop() and reacquire the siglock. Yes, yes, I see. > Now entering > TASK_TRACED would leave us unkillable because SIGKILL is already pending > and nothing else (except PTRACE_CONT et al) will try to wake us up. But this doesn't differ from the case when SIGKILL was already pending when we enter ptrace_stop, and arch_ptrace_stop_needed() == false, that was my point. Yes, arch_ptrace_stop() can take a long time, might block, etc. But what about TIF_SYSCALL_TRACE ? The task can recieve SIGKILL while executing the syscall which can also block and so on, but do_syscall_trace(entryexit == 1) doesn't check the pending signal. I should clarify my question. What I can't understand is the subtle dependency on the result of arch_ptrace_stop_needed(). This means that it is hard to predict the behaviour. IOW, can't we - ignore the pending SIGKILL (current behaviour) -- OR -- - always check it unconditionally, before setting TASK_TRACED ? This looks a bit more consistent to me. Please also note "before setting TASK_TRACED" above. With this patch we set TASK_TRACED under ->siglock, and then change the ->state to TASK_RUNNING if killed == 1. Minor, but this doesn't look correct, we can fool the tracer which does ptrace_check_attach(). Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1196959793.6586.3.camel@elijah.suse.cz>]
* [PATCH] arch_ptrace_stop [not found] <1196959793.6586.3.camel@elijah.suse.cz> @ 2007-12-08 1:11 ` Roland McGrath 2007-12-12 5:51 ` Andrew Morton 0 siblings, 1 reply; 8+ messages in thread From: Roland McGrath @ 2007-12-08 1:11 UTC (permalink / raw) To: Andrew Morton, Linus Torvalds Cc: linux-kernel, linux-ia64, Petr Tesarik, tony.luck This adds support to allow asm/ptrace.h to define two new macros, arch_ptrace_stop_needed and arch_ptrace_stop. These control special machine-specific actions to be done before a ptrace stop. The new code compiles away to nothing when the new macros are not defined. This is the case on all machines to begin with. On ia64, these macros will be defined to solve the long-standing issue of ptrace vs register backing store. Signed-off-by: Roland McGrath <roland@redhat.com> CC: Petr Tesarik <ptesarik@suse.cz> CC: Tony Luck <tony.luck@intel.com> --- include/linux/ptrace.h | 35 +++++++++++++++++++++++++++++++++++ kernel/signal.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 67 insertions(+), 1 deletions(-) diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index ae8146a..7168757 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -128,6 +128,41 @@ int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data); #define force_successful_syscall_return() do { } while (0) #endif +#ifndef arch_ptrace_stop_needed +/** + * arch_ptrace_stop_needed - Decide whether arch_ptrace_stop() should be called + * @code: current->exit_code value ptrace will stop with + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with + * + * This is called with the siglock held, to decide whether or not it's + * necessary to release the siglock and call arch_ptrace_stop() with the + * same @code and @info arguments. It can be defined to a constant if + * arch_ptrace_stop() is never required, or always is. On machines where + * this makes sense, it should be defined to a quick test to optimize out + * calling arch_ptrace_stop() when it would be superfluous. For example, + * if the thread has not been back to user mode since the last stop, the + * thread state might indicate that nothing needs to be done. + */ +#define arch_ptrace_stop_needed(code, info) (0) +#endif + +#ifndef arch_ptrace_stop +/** + * arch_ptrace_stop - Do machine-specific work before stopping for ptrace + * @code: current->exit_code value ptrace will stop with + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with + * + * This is called with no locks held when arch_ptrace_stop_needed() has + * just returned nonzero. It is allowed to block, e.g. for user memory + * access. The arch can have machine-specific work to be done before + * ptrace stops. On ia64, register backing store gets written back to user + * memory here. Since this can be costly (requires dropping the siglock), + * we only do it when the arch requires it for this particular stop, as + * indicated by arch_ptrace_stop_needed(). + */ +#define arch_ptrace_stop(code, info) do { } while (0) +#endif + #endif #endif diff --git a/kernel/signal.c b/kernel/signal.c index afa4f78..2e85fcb 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1594,6 +1594,17 @@ static inline int may_ptrace_stop(void) } /* + * Return nonzero if there is a SIGKILL that should be waking us up. + * Called with the siglock held. + */ +static int sigkill_pending(struct task_struct *tsk) +{ + return ((sigismember(&tsk->pending.signal, SIGKILL) || + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) && + !unlikely(sigismember(&tsk->blocked, SIGKILL))); +} + +/* * This must be called with current->sighand->siglock held. * * This should be the path for all ptrace stops. @@ -1606,6 +1617,26 @@ static inline int may_ptrace_stop(void) */ static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info) { + int killed = 0; + + if (arch_ptrace_stop_needed(exit_code, info)) { + /* + * The arch code has something special to do before a + * ptrace stop. This is allowed to block, e.g. for faults + * on user stack pages. We can't keep the siglock while + * calling arch_ptrace_stop, so we must release it now. + * To preserve proper semantics, we must do this before + * any signal bookkeeping like checking group_stop_count. + * Meanwhile, a SIGKILL could come in before we retake the + * siglock. That must prevent us from sleeping in TASK_TRACED. + * So after regaining the lock, we must check for SIGKILL. + */ + spin_unlock_irq(¤t->sighand->siglock); + arch_ptrace_stop(exit_code, info); + spin_lock_irq(¤t->sighand->siglock); + killed = sigkill_pending(current); + } + /* * If there is a group stop in progress, * we must participate in the bookkeeping. @@ -1621,7 +1652,7 @@ static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info) spin_unlock_irq(¤t->sighand->siglock); try_to_freeze(); read_lock(&tasklist_lock); - if (may_ptrace_stop()) { + if (!unlikely(killed) && may_ptrace_stop()) { do_notify_parent_cldstop(current, CLD_TRAPPED); read_unlock(&tasklist_lock); schedule(); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-08 1:11 ` Roland McGrath @ 2007-12-12 5:51 ` Andrew Morton 2007-12-12 22:36 ` Roland McGrath 0 siblings, 1 reply; 8+ messages in thread From: Andrew Morton @ 2007-12-12 5:51 UTC (permalink / raw) To: Roland McGrath Cc: Linus Torvalds, linux-kernel, linux-ia64, Petr Tesarik, tony.luck On Fri, 7 Dec 2007 17:11:52 -0800 (PST) Roland McGrath <roland@redhat.com> wrote: > > This adds support to allow asm/ptrace.h to define two new macros, > arch_ptrace_stop_needed and arch_ptrace_stop. These control special > machine-specific actions to be done before a ptrace stop. The new > code compiles away to nothing when the new macros are not defined. > This is the case on all machines to begin with. > > On ia64, these macros will be defined to solve the long-standing > issue of ptrace vs register backing store. > > Signed-off-by: Roland McGrath <roland@redhat.com> > CC: Petr Tesarik <ptesarik@suse.cz> > CC: Tony Luck <tony.luck@intel.com> > --- > include/linux/ptrace.h | 35 +++++++++++++++++++++++++++++++++++ > kernel/signal.c | 33 ++++++++++++++++++++++++++++++++- > 2 files changed, 67 insertions(+), 1 deletions(-) > > diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h > index ae8146a..7168757 100644 > --- a/include/linux/ptrace.h > +++ b/include/linux/ptrace.h > @@ -128,6 +128,41 @@ int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data); > #define force_successful_syscall_return() do { } while (0) > #endif > > +#ifndef arch_ptrace_stop_needed > +/** > + * arch_ptrace_stop_needed - Decide whether arch_ptrace_stop() should be called > + * @code: current->exit_code value ptrace will stop with > + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with > + * > + * This is called with the siglock held, to decide whether or not it's > + * necessary to release the siglock and call arch_ptrace_stop() with the > + * same @code and @info arguments. It can be defined to a constant if > + * arch_ptrace_stop() is never required, or always is. On machines where > + * this makes sense, it should be defined to a quick test to optimize out > + * calling arch_ptrace_stop() when it would be superfluous. For example, > + * if the thread has not been back to user mode since the last stop, the > + * thread state might indicate that nothing needs to be done. > + */ > +#define arch_ptrace_stop_needed(code, info) (0) > +#endif > + > +#ifndef arch_ptrace_stop > +/** > + * arch_ptrace_stop - Do machine-specific work before stopping for ptrace > + * @code: current->exit_code value ptrace will stop with > + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with > + * > + * This is called with no locks held when arch_ptrace_stop_needed() has > + * just returned nonzero. It is allowed to block, e.g. for user memory > + * access. The arch can have machine-specific work to be done before > + * ptrace stops. On ia64, register backing store gets written back to user > + * memory here. Since this can be costly (requires dropping the siglock), > + * we only do it when the arch requires it for this particular stop, as > + * indicated by arch_ptrace_stop_needed(). > + */ > +#define arch_ptrace_stop(code, info) do { } while (0) Mutter. These would be better as static inlines. A macro just invites variable-unused warnings on non-ia64 and outright compilation errors on ia64. Speaking from experience... static inline void arch_ptrace_stop(int exit_code, siginfo_t *info) { } #define arch_ptrace_stop arch_ptrace_stop should work? > /* > + * Return nonzero if there is a SIGKILL that should be waking us up. > + * Called with the siglock held. > + */ > +static int sigkill_pending(struct task_struct *tsk) > +{ > + return ((sigismember(&tsk->pending.signal, SIGKILL) || > + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) && > + !unlikely(sigismember(&tsk->blocked, SIGKILL))); > +} Could you please take a peek at the infrastructure added by ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/add-lock_page_killable.patch and see if there is exploitable commonality? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] arch_ptrace_stop 2007-12-12 5:51 ` Andrew Morton @ 2007-12-12 22:36 ` Roland McGrath 0 siblings, 0 replies; 8+ messages in thread From: Roland McGrath @ 2007-12-12 22:36 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, linux-kernel, linux-ia64, Petr Tesarik, tony.luck > Mutter. These would be better as static inlines. A macro just invites > variable-unused warnings on non-ia64 and outright compilation errors on > ia64. Speaking from experience... > > static inline void arch_ptrace_stop(int exit_code, siginfo_t *info) > { > } > #define arch_ptrace_stop arch_ptrace_stop > > should work? That's fine with me. I just followed the example of e.g. arch_ptrace_attach. > > /* > > + * Return nonzero if there is a SIGKILL that should be waking us up. > > + * Called with the siglock held. > > + */ > > +static int sigkill_pending(struct task_struct *tsk) > > +{ > > + return ((sigismember(&tsk->pending.signal, SIGKILL) || > > + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) && > > + !unlikely(sigismember(&tsk->blocked, SIGKILL))); > > +} > > Could you please take a peek at the infrastructure added by > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/add-lock_page_killable.patch > and see if there is exploitable commonality? I haven't reviewed that whole set in detail yet. So I'd rather leave this for a later cleanup. (This ptrace change alone, along with its ia64 pieces, can probably go in much earlier without disrupting anything else.) The fatal_signal_pending function in that patch is not exactly the same as this check, though it might turn out that it's actually equivalent due to higher-level constraints. Thanks, Roland ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-12-16 16:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-13 17:29 [PATCH] arch_ptrace_stop Oleg Nesterov
2007-12-13 17:31 ` Matthew Wilcox
2007-12-13 17:39 ` Oleg Nesterov
2007-12-13 22:42 ` Roland McGrath
2007-12-16 16:24 ` Oleg Nesterov
[not found] <1196959793.6586.3.camel@elijah.suse.cz>
2007-12-08 1:11 ` Roland McGrath
2007-12-12 5:51 ` Andrew Morton
2007-12-12 22:36 ` Roland McGrath
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox