All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: oleg@redhat.com, roland@redhat.com, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	rjw@sisk.pl, jan.kratochvil@redhat.com
Cc: Tejun Heo <tj@kernel.org>
Subject: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED
Date: Mon,  6 Dec 2010 17:56:58 +0100	[thread overview]
Message-ID: <1291654624-6230-11-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1291654624-6230-1-git-send-email-tj@kernel.org>

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean.  The tracer can use the flag to tell the
tracee to retry stop on attach and detach.  On retry, the tracee will
enter the desired state in the correct way.  The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop.  This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop().  This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from userland by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear.  Completing the transition or any event which clears
the group stop states of the task clears the bit and wakes up the
ptracer if waiting.

Oleg:

* Spotted a race condition where a task may retry group stop without
  proper bookkeeping.  Fixed by redoing bookkeeping on retry.

* Pointed out the userland visible intermediate state.  Fixed with
  GROUP_STOP_TRAPPING.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
---
 include/linux/sched.h |    2 +
 kernel/ptrace.c       |   63 ++++++++++++++++++++++++++++++++++++++++++------
 kernel/signal.c       |   62 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c2538dd..7045c34 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,8 +1760,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * task->group_stop flags
  */
+#define GROUP_STOP_SIGMASK	0xffff    /* signr of the last group stop */
 #define GROUP_STOP_PENDING	(1 << 16) /* task should stop for group stop */
 #define GROUP_STOP_CONSUME	(1 << 17) /* consume group stop count */
+#define GROUP_STOP_TRAPPING	(1 << 18) /* switching from STOPPED to TRACED */
 
 extern void task_clear_group_stop(struct task_struct *task);
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99bbaa3..5191301 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -49,14 +49,14 @@ static void ptrace_untrace(struct task_struct *child)
 	spin_lock(&child->sighand->siglock);
 	if (task_is_traced(child)) {
 		/*
-		 * If the group stop is completed or in progress,
-		 * this thread was already counted as stopped.
+		 * If group stop is completed or in progress, it should
+		 * participate in the group stop.  Set GROUP_STOP_PENDING
+		 * before kicking it.
 		 */
 		if (child->signal->flags & SIGNAL_STOP_STOPPED ||
 		    child->signal->group_stop_count)
-			__set_task_state(child, TASK_STOPPED);
-		else
-			signal_wake_up(child, 1);
+			child->group_stop |= GROUP_STOP_PENDING;
+		signal_wake_up(child, 1);
 	}
 	spin_unlock(&child->sighand->siglock);
 }
@@ -79,6 +79,12 @@ void __ptrace_unlink(struct task_struct *child)
 		ptrace_untrace(child);
 }
 
+static int ptrace_wait_trap(void *flags)
+{
+	schedule();
+	return 0;
+}
+
 /*
  * Check that we have indeed attached to the thing..
  */
@@ -93,6 +99,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 	 * we are sure that this is our traced child and that can only
 	 * be changed by us so it's not changing right after this.
 	 */
+relock:
 	read_lock(&tasklist_lock);
 	if ((child->ptrace & PT_PTRACED) && child->parent == current) {
 		ret = 0;
@@ -101,10 +108,30 @@ int ptrace_check_attach(struct task_struct *child, int kill)
 		 * does ptrace_unlink() before __exit_signal().
 		 */
 		spin_lock_irq(&child->sighand->siglock);
-		if (task_is_stopped(child))
-			child->state = TASK_TRACED;
-		else if (!task_is_traced(child) && !kill)
+		if (!task_is_traced(child) && !kill) {
+			/*
+			 * If GROUP_STOP_TRAPPING is set, it is known that
+			 * the tracee will enter either TRACED or the bit
+			 * will be cleared in definite amount of (userland)
+			 * time.  Wait while the bit is set.
+			 *
+			 * This hides PTRACE_ATTACH initiated transition
+			 * from STOPPED to TRACED from userland.
+			 */
+			if (child->group_stop & GROUP_STOP_TRAPPING) {
+				const int bit = ilog2(GROUP_STOP_TRAPPING);
+				DEFINE_WAIT_BIT(wait, &child->group_stop, bit);
+
+				spin_unlock_irq(&child->sighand->siglock);
+				read_unlock(&tasklist_lock);
+
+				wait_on_bit(&child->group_stop, bit,
+					    ptrace_wait_trap,
+					    TASK_UNINTERRUPTIBLE);
+				goto relock;
+			}
 			ret = -ESRCH;
+		}
 		spin_unlock_irq(&child->sighand->siglock);
 	}
 	read_unlock(&tasklist_lock);
@@ -204,6 +231,26 @@ int ptrace_attach(struct task_struct *task)
 	__ptrace_link(task, current);
 	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
 
+	spin_lock(&task->sighand->siglock);
+
+	/*
+	 * If the task is already STOPPED, set GROUP_STOP_PENDING and
+	 * TRAPPING, and kick it so that it transits to TRACED.  TRAPPING
+	 * will be cleared if the child completes the transition or any
+	 * event which clears the group stop states happens.  The bit is
+	 * waited by ptrace_check_attach() to hide the transition from
+	 * userland.
+	 *
+	 * The following is safe as both transitions in and out of STOPPED
+	 * are protected by siglock.
+	 */
+	if (task_is_stopped(task)) {
+		task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+		signal_wake_up(task, 1);
+	}
+
+	spin_unlock(&task->sighand->siglock);
+
 	retval = 0;
 unlock_tasklist:
 	write_unlock_irq(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index a6bc4cf..6d93a3f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -224,10 +224,29 @@ static inline void print_dropped_signal(int sig)
 }
 
 /**
+ * task_clear_group_stop_trapping - clear group stop trapping bit
+ * @task: target task
+ *
+ * If GROUP_STOP_TRAPPING is set, it's cleared and wake_up_bit() is called
+ * on the bit.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop_trapping(struct task_struct *task)
+{
+	if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
+		task->group_stop &= ~GROUP_STOP_TRAPPING;
+		wake_up_bit(&task->group_stop, ilog2(GROUP_STOP_TRAPPING));
+	}
+}
+
+/**
  * task_clear_group_stop - clear pending group stop
  * @task: target task
  *
- * Clear group stop states for @task.
+ * Clear group stop pending state for @task.  All group stop states except
+ * for the recorded last stop signal are cleared.
  *
  * CONTEXT:
  * Must be called with @task->sighand->siglock held.
@@ -235,6 +254,7 @@ static inline void print_dropped_signal(int sig)
 void task_clear_group_stop(struct task_struct *task)
 {
 	task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
+	task_clear_group_stop_trapping(task);
 }
 
 /**
@@ -1696,6 +1716,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
 	}
 
 	/*
+	 * We're committing to trapping.  Clearing GROUP_STOP_TRAPPING and
+	 * transition to TASK_TRACED should be atomic with respect to
+	 * siglock.  Do it after the arch hook as siglock is released and
+	 * regrabbed across it.
+	 */
+	task_clear_group_stop_trapping(current);
+
+	/*
 	 * If @why is CLD_STOPPED, we're trapping to participate in a group
 	 * stop.  Do the bookkeeping.  Note that if SIGCONT was delievered
 	 * while siglock was released for the arch hook, PENDING could be
@@ -1790,6 +1818,9 @@ static int do_signal_stop(int signr)
 		unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
 		struct task_struct *t;
 
+		/* signr will be recorded in task->group_stop for retries */
+		WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
 		    unlikely(signal_group_exit(sig)))
 			return 0;
@@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
-		current->group_stop = gstop;
+		current->group_stop &= ~GROUP_STOP_SIGMASK;
+		current->group_stop |= signr | gstop;
 		sig->group_stop_count = 1;
-		for (t = next_thread(current); t != current; t = next_thread(t))
+		for (t = next_thread(current); t != current;
+		     t = next_thread(t)) {
+			t->group_stop &= ~GROUP_STOP_SIGMASK;
 			/*
 			 * Setting state to TASK_STOPPED for a group
 			 * stop is always done with the siglock held,
 			 * so this check has no races.
 			 */
 			if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
-				t->group_stop = gstop;
+				t->group_stop |= signr | gstop;
 				sig->group_stop_count++;
 				signal_wake_up(t, 0);
-			} else
+			} else {
 				task_clear_group_stop(t);
+				t->group_stop |= signr;
+			}
+		}
 	}
-
+retry:
 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);
 
@@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
 
 		spin_lock_irq(&current->sighand->siglock);
 	} else
-		ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+		ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+			    CLD_STOPPED, 0, NULL);
+
+	/*
+	 * GROUP_STOP_PENDING could be set if another group stop has
+	 * started since being woken up or ptrace wants us to transit
+	 * between TASK_STOPPED and TRACED.  Retry group stop.
+	 */
+	if (current->group_stop & GROUP_STOP_PENDING) {
+		WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+		goto retry;
+	}
 
 	spin_unlock_irq(&current->sighand->siglock);
 
-- 
1.7.1


  parent reply	other threads:[~2010-12-06 16:59 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-06 16:56 [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Tejun Heo
2010-12-06 16:56 ` [PATCH 01/16] signal: fix SIGCONT notification code Tejun Heo
2010-12-06 16:56 ` [PATCH 02/16] signal: fix CLD_CONTINUED notification target Tejun Heo
2010-12-20 14:58   ` Oleg Nesterov
2010-12-21 16:31     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 03/16] signal: remove superflous try_to_freeze() loop in do_signal_stop() Tejun Heo
2010-12-20 14:59   ` Oleg Nesterov
2010-12-06 16:56 ` [PATCH 04/16] ptrace: kill tracehook_notify_jctl() Tejun Heo
2010-12-20 14:59   ` Oleg Nesterov
2010-12-21 17:00     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 05/16] ptrace: add @why to ptrace_stop() Tejun Heo
2010-12-06 16:56 ` [PATCH 06/16] signal: fix premature completion of group stop when interfered by ptrace Tejun Heo
2010-12-20 15:00   ` Oleg Nesterov
2010-12-21 17:04     ` Tejun Heo
2010-12-06 16:56 ` [PATCH 07/16] signal: use GROUP_STOP_PENDING to stop once for a single group stop Tejun Heo
2010-12-06 16:56 ` [PATCH 08/16] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for " Tejun Heo
2010-12-06 16:56 ` [PATCH 09/16] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced Tejun Heo
2010-12-23 12:26   ` Oleg Nesterov
2010-12-23 13:53     ` Tejun Heo
2010-12-23 16:06       ` Oleg Nesterov
2010-12-23 16:33         ` Tejun Heo
2011-01-17 22:09     ` Roland McGrath
2011-01-27 13:56       ` Tejun Heo
2011-01-28 20:30         ` Roland McGrath
2011-01-31 14:39           ` Tejun Heo
2010-12-06 16:56 ` Tejun Heo [this message]
2010-12-20 15:00   ` [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED and TRACED Oleg Nesterov
2010-12-21 17:31     ` Tejun Heo
2010-12-21 17:32       ` Tejun Heo
2010-12-22 10:54       ` Tejun Heo
2010-12-22 11:39       ` Oleg Nesterov
2010-12-22 15:14         ` Tejun Heo
2010-12-22 16:00           ` Oleg Nesterov
2010-12-22 16:21             ` Tejun Heo
2010-12-06 16:56 ` [PATCH 11/16] signal: prepare for CLD_* notification changes Tejun Heo
2010-12-20 16:21   ` Oleg Nesterov
2010-12-20 16:23     ` Oleg Nesterov
2010-12-21 17:35     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 12/16] ptrace: make group stop notification reliable against ptrace Tejun Heo
2010-12-20 17:34   ` Oleg Nesterov
2010-12-21 17:43     ` Tejun Heo
2010-12-22 11:54       ` Oleg Nesterov
2010-12-22 15:26         ` Tejun Heo
2010-12-22 16:02           ` Oleg Nesterov
2010-12-06 16:57 ` [PATCH 13/16] ptrace: reorganize __ptrace_unlink() and ptrace_untrace() Tejun Heo
2010-12-20 18:15   ` Oleg Nesterov
2010-12-21 17:54     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 14/16] ptrace: make SIGCONT notification reliable against ptrace Tejun Heo
2010-12-20 19:43   ` Oleg Nesterov
2010-12-21 17:48     ` Tejun Heo
2010-12-22 12:16       ` Oleg Nesterov
2010-12-21 17:25   ` Oleg Nesterov
2010-12-22 10:35     ` Tejun Heo
2010-12-06 16:57 ` [PATCH 15/16] ptrace: make sure SIGNAL_NOTIFY_CONT is checked after ptrace_signal() Tejun Heo
2010-12-06 16:57 ` [PATCH 16/16] ptrace: remove the extra wake_up_process() from ptrace_detach() Tejun Heo
2010-12-07  0:10   ` Roland McGrath
2010-12-07 13:43     ` Tejun Heo
2010-12-21 17:54   ` Oleg Nesterov
2010-12-22 10:36     ` Tejun Heo
2010-12-14 17:36 ` [PATCHSET] ptrace,signal: sane interaction between ptrace and job control signals, take#2 Oleg Nesterov
2010-12-14 17:46   ` Tejun Heo
2010-12-22 15:20 ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291654624-6230-11-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=jan.kratochvil@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=roland@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.