RFC: PTRACE_SEIZE needs API cleanup?

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: PTRACE_SEIZE needs API cleanup?
@ 2011-09-04 21:11 Denys Vlasenko
  2011-09-05  1:15 ` Indan Zupancic
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-04 21:11 UTC (permalink / raw)
  To: Oleg Nesterov, Tejun Heo; +Cc: linux-kernel, dvlasenk

Hi guys,

I added code to use PTRACE_SEIZE in strace and in the process
had tasted how API looks like from userspace POV.

It is usable, but API feels somewhat quirky.

Consider the following: one of reasons why we added PTRACE_SEIZE
is that existing ptrace API has unnecessary complications
(quirks) such as SIGSTOP on attach, SIGTRAP after execve.

But whoever designed strace did not deliberately designed these quirks in,
he thought it was a good, reasonable design. Only after a second, third,
tenth look it became obvious in retrospect that some things are
not exactly right.

Thankfully, quirks in new PTRACE_SEIZE code mostly have the nature of
"unnecessarily invented entities" as opposed to problems
in trying to use API in real world tasks, but I still think they are
annoying enough to be looked at.

We already have a mechanism how to modify ptrace behavior: ptrace options.
But now we introduce a different mechnism to do the same: by using SEIZE
instead of ATTACH, we magically change some aspects of ptrace.
In effect, SEIZE sets some options. And these "SEIZE options" can't be
set or cleared by SETOPTIONS. This is stupid. Why can't we just add
more options instead of inventing new entities? Why we overloaded
SEIZE with two functions: "attach to, but don't SIGSTOP the tracee"
and "change behaviour of ptrace on this tracee"?

If the argument is that we want to set options immediately at attach,
then I completely agree: yes, we do! Moreover, we want to set some
_ordinary_ options too, such as PTRACE_O_TRACEEXEC, and we can't
do that even now, in improved API! It needs more improving.

So my proposal is:
(a) make SEIZE take a parameter "immediately set these options on attach"
(b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
    nothing more.
(c) make the following new PTRACE_O_foo options:
   (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
   (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
        if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
       [if the second part is too weird to implement, make (2) require (1)]
   (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"
   (4) "make auto-attached children stop a-la INTERRUPT, not with SIGSTOP"
   (5) "enable saner error codes"
(d) Remove magic hidden 'I was SEIZEd, not ATTACHed' bit in kernel code.
    Use bits 1,2,3,4 instead of it.
(e) Make these bits settable with SETOPTIONS too, to keep API complete.

In strace, I already implicitly use 1,2,3,4. With this proposal, I'd use
all five, *explicitly*, and also use at least PTRACE_O_TRACEEXEC,
to avoid the possibility SEIZE races with execve and we get that pesky SIGTRAP.
Something along the lines:

	ptrace(PTRACE_SEIZE, pid, 0,
		PTRACE_O_TRACEEXEC
		| PTRACE_O_TRACESTOP      /* (1) */
		| PTRACE_O_TRACEINTERRUPT /* (2) */
		| PTRACE_O_TRACELISTEN    /* (3) */
		| PTRACE_O_TRACECHILDSTOP /* (4) */
		| PTRACE_O_TRACEERRCODES  /* (5) */
	);

This will remove the need to set options on first ptrace stop.

This will remove the asymmetry we just introduced in current SEIZE code:
PTRACE_EVENT_STOP is the only event code which has no corresponding
PTRACE_O_TRACESTOP option - situation which Occam disapproves.

In fact, I am unsure why the new ops (points 2 and 3 above) need enabling.
Maybe they can be simply be always allowed? Let's see...

Re PTRACE_INTERRUPT: "old-style" check for group-stop,
namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",
works just fine, therefore, from userspace POV there is no apparent reason
why PTRACE_INTERRUPT can't work without "flag stops with PTRACE_EVENT_STOP"
thingy: in this case it can create an
"WIFSTOPPED(status) = true, ptrace(PTRACE_GETSIGINFO) fails"
event which is detectable in userspace. PTRACE_EVENT_STOP is not needed for this,
it's just a "nice to have" thing (it saves one ptrace call).

Re PTRACE_LISTEN: similarly to above, we _do_ know when we are in
group-stop, even without (1) "flag stops with PTRACE_EVENT_STOP" thingy.
Therefore we do know when PTRACE_LISTEN makes sense.
Moreover, we can make it so that kernel errors out if PTRACE_LISTEN
is used in a ptrace stop of a wrong kind.

Just to let you see the relevant userspace code which made me think about
the above issues, here is abridged version of main waitpid loop in strace.
Note that it is coded in a way that it works both on SEIZE-enabled
kernels and on old kernels, which have no SEIZE and friends.
"use_seize" variable is set if we are on a SEIZE-enabled kernel:

	while (nprocs != 0) {
		pid = waitpid(-1, &status, __WALL);
		if (pid < 0) ...

		/* Look up 'pid' in our table of known tracees */
		tcp = pid2tcb(pid);
		if (tcp == NULL) {  /* Not found. It's auto-attached child */
			tcp = alloctcb(pid);
// post_attach_sigstop is == 0 if use_seize, or == TCB_IGNORE_ONE_SIGSTOP otherwise
// Basically, it's "do we expect to filter out one SIGSTOP"?
			tcp->flags |= TCB_ATTACHED | TCB_STARTUP | post_attach_sigstop;
		}

		if (WIFSIGNALED(status)) {
			...
			continue;
		}
		if (WIFEXITED(status)) {
			...
			continue;
		}
		if (!WIFSTOPPED(status)) ... /* error, should not happen */

		/* Is this the very first time we see this tracee stopped? */
		if (tcp->flags & TCB_STARTUP) {
			tcp->flags &= ~TCB_STARTUP;
			if (ptrace(PTRACE_SETOPTIONS, tcp->pid, NULL, ptrace_setoptions) < 0)
				if (errno != ESRCH)
					perror_msg_and_die("PTRACE_SETOPTIONS");
		}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
this if() will not be needed if we set options on attach!

		sig = WSTOPSIG(status);

		if (((unsigned)status >> 16) != 0) { /* Ptrace event */
			unsigned ev = (unsigned)status >> 16;
			if (ev == PTRACE_EVENT_STOP && sig != SIGTRAP) {
					stopped = 1;
					goto show_stopsig;
				}
			}
			goto restart_tracee_with_sig_0;
		}

		/* Is this post-attach SIGSTOP?
		 * Interestingly, the process may stop
		 * with STOPSIG equal to some other signal
		 * than SIGSTOP if we happend to attach
		 * just before the process takes a signal.
		 */
		if (sig == SIGSTOP && (tcp->flags & TCB_IGNORE_ONE_SIGSTOP)) {
			tcp->flags &= ~TCB_IGNORE_ONE_SIGSTOP;
			goto restart_tracee_with_sig_0;
		}

		if (sig != syscall_trap_sig) {
			siginfo_t si;

			/* Nonzero (true) if tracee is stopped by signal
			 * (as opposed to "tracee received signal") */
			stopped = ptrace(PTRACE_GETSIGINFO, pid, 0, &si);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
here is how we determine whether it is group-stop or signal-delivery stop
for the "we didn't use SEIZE" case (if we used SEIZE, we would not
come here on group-stop: see ev == PTRACE_EVENT_STOP check
and "goto show_stopsig" above).

 show_stopsig:
			if (cflag != CFLAG_ONLY_STATS && (qual_flags[sig] & QUAL_SIGNAL)) {
				printleader(tcp);
				if (!stopped) {
					tprints("--- ");
					printsiginfo(&si, verbose(tcp));
				} else
					tprintf("--- stopped by %s ---", signame(sig));
				printtrailer();
			}

...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
_based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
use_seize is true, but we use that flag only as indicator "this is a new kernel".
It's easy to see that there are no intrisic reasons
why it can't also work on newer kernels, but after using old-style ATTACH.
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
			if (!stopped)  /* It's signal-delivery-stop. Inject the signal */
				goto restart_tracee;

			/* It's group-stop */
			if (use_seize) {
				/* This ends ptrace-stop, but does _not_ end group-stop */
				if (ptrace_restart(PTRACE_LISTEN, tcp, 0) < 0) ... /* err, doesn't happen */
				continue;
			}
			/* Old kernel, no support for PTRACE_LISTEN */
			goto restart_tracee;
		}

		/* This is syscall entry or exit */
		if (trace_syscall(tcp) < 0 && !tcp->ptrace_errno) {
			...drop this tracee, it died...
			continue;
		}

 restart_tracee_with_sig_0:
		sig = 0;
 restart_tracee:
		if (ptrace_restart(PTRACE_SYSCALL, tcp, sig) < 0) ... /* err, doesn't happen */
	}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
@ 2011-09-05  1:15 ` Indan Zupancic
  2011-09-05  9:24   ` Denys Vlasenko
  2011-09-05 14:54 ` Denys Vlasenko
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-05  1:15 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel, dvlasenk

Hello,

Sorry to be a bother again, especially because I didn't get to
read all patches yet (360 unread emails, including resends,
not sure where to start catching up), but first couple series
looked good, up to the PTRACE_SEIZE stuff. Anyway, here's my
feedback, feel free to ignore. Keep in mind this is more about
the new ptrace stuff than just the proposed API improvements,
which are IMHO a big step in the right direction, and that I
only have a partial and hazy (user space) view on these things.

On Sun, September 4, 2011 23:11, Denys Vlasenko wrote:
> Hi guys,
>
> I added code to use PTRACE_SEIZE in strace and in the process
> had tasted how API looks like from userspace POV.
>
> It is usable, but API feels somewhat quirky.

Yes, that was my impression too. To me it seems like most complexity is
caused by not hiding ptrace stops, but instead tangling it up with group
stop signals. This shows as a more complex than needed implementation and
a subtle and quirky API. But I haven't looked at the details enough to
more clearly pinpoint the problems, so feel free to totally ignore my
impression. At least till I come up with better arguments.

> Consider the following: one of reasons why we added PTRACE_SEIZE
> is that existing ptrace API has unnecessary complications
> (quirks) such as SIGSTOP on attach, SIGTRAP after execve.

Not sure if it's still the case, but from reading earlier patches
I got the impression it only got worse, instead of seizing the
opportunity to simplify everything.

> But whoever designed strace did not deliberately designed these quirks in,
> he thought it was a good, reasonable design. Only after a second, third,
> tenth look it became obvious in retrospect that some things are
> not exactly right.
>
> Thankfully, quirks in new PTRACE_SEIZE code mostly have the nature of
> "unnecessarily invented entities" as opposed to problems
> in trying to use API in real world tasks, but I still think they are
> annoying enough to be looked at.
>
>
> We already have a mechanism how to modify ptrace behavior: ptrace options.
> But now we introduce a different mechnism to do the same: by using SEIZE
> instead of ATTACH, we magically change some aspects of ptrace.
> In effect, SEIZE sets some options. And these "SEIZE options" can't be
> set or cleared by SETOPTIONS. This is stupid. Why can't we just add
> more options instead of inventing new entities?

Exactly my feeling!

> Why we overloaded
> SEIZE with two functions: "attach to, but don't SIGSTOP the tracee"
> and "change behaviour of ptrace on this tracee"?
>
> If the argument is that we want to set options immediately at attach,
> then I completely agree: yes, we do! Moreover, we want to set some
> _ordinary_ options too, such as PTRACE_O_TRACEEXEC, and we can't
> do that even now, in improved API! It needs more improving.
>
> So my proposal is:
> (a) make SEIZE take a parameter "immediately set these options on attach"

I think that would be a great improvement.

Better than making SEIZE an option itself, which I wanted to propose, but
never got to.

> (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
>     nothing more.
> (c) make the following new PTRACE_O_foo options:
>    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"

Why would anyone use that flag? Most users are unaware of the crazy subtle
SIGSTOP notification madness. It is totally unnecessary, just fix waitid()
to honour the absence of the WUNTRACED flag for traced processes. I can't
think of any useful scenario where calling waitid() with WUNTRACED set on
a traced task makes much sense, as the same information is received via
ptrace.

The other part of a simpler, less crazy ptrace is let PTRACE_CONT not
automatically continue a SIGSTOPped process.

Both changes need a new flag of course, e.g. PTRACE_O_SANE.

>    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
>         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
>        [if the second part is too weird to implement, make (2) require (1)]

I don't understand the need for PTRACE_INTERRUPT: Sending some signal, trapping,
and PTRACE_CONT with zero "data" should have the same effect. Or do you mean that
instead of SIGSTOP, SIGTRAP should be sent? If so, yes, that would be much saner.

>    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"

This is a symptom of the complexity I whined about above.

If ptrace has no influence on group stops then PTRACE_LISTEN is not needed at all.
The problem seems to be that SIGSTOP state is the same as ptrace trapped state, or
something like that. Group stops and ptrace stops should be totally orthogonal.
Then crazy things like PTRACE_LISTEN are not needed.

Why not just untangle the two concepts? (Via PTRACE_O_SANE.)

>    (4) "make auto-attached children stop a-la INTERRUPT, not with SIGSTOP"

That would be sane behaviour. But this is just a part of untangling ptrace
from group stops, so is a separate option really needed instead of one that
just makes ptrace behave more sanely in general?

>    (5) "enable saner error codes"

Why do you need a new option for this? Ptrace error codes are close to useless,
how can returning better error codes ever cause problems?

> (d) Remove magic hidden 'I was SEIZEd, not ATTACHed' bit in kernel code.
>     Use bits 1,2,3,4 instead of it.
> (e) Make these bits settable with SETOPTIONS too, to keep API complete.
>
> In strace, I already implicitly use 1,2,3,4. With this proposal, I'd use
> all five, *explicitly*, and also use at least PTRACE_O_TRACEEXEC,
> to avoid the possibility SEIZE races with execve and we get that pesky SIGTRAP.
> Something along the lines:
>
> 	ptrace(PTRACE_SEIZE, pid, 0,
> 		PTRACE_O_TRACEEXEC
> 		| PTRACE_O_TRACESTOP      /* (1) */
> 		| PTRACE_O_TRACEINTERRUPT /* (2) */
> 		| PTRACE_O_TRACELISTEN    /* (3) */
> 		| PTRACE_O_TRACECHILDSTOP /* (4) */
> 		| PTRACE_O_TRACEERRCODES  /* (5) */
> 	);
>
> This will remove the need to set options on first ptrace stop.
>
> This will remove the asymmetry we just introduced in current SEIZE code:
> PTRACE_EVENT_STOP is the only event code which has no corresponding
> PTRACE_O_TRACESTOP option - situation which Occam disapproves.
>
>
> In fact, I am unsure why the new ops (points 2 and 3 above) need enabling.
> Maybe they can be simply be always allowed? Let's see...

My advise is to untangle all ptrace and group stop state from each other
with one PTRACE_O_SANE option. That would replace options 1 to 4. This
gives perhaps less choice, but considering how subtle it already is, I
don't think that simplifying everything is a bad thing.

> Re PTRACE_INTERRUPT: "old-style" check for group-stop,
> namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",

BTW, no one is going to use and know this, except perhaps ten people in the
whole world. Everyone else just thinks that ptrace doesn't pass on SIGSTOP,
as the manpage says, and that the double SIGSTOP event is just a quirk.

> works just fine, therefore, from userspace POV there is no apparent reason
> why PTRACE_INTERRUPT can't work without "flag stops with PTRACE_EVENT_STOP"
> thingy: in this case it can create an
> "WIFSTOPPED(status) = true, ptrace(PTRACE_GETSIGINFO) fails"
> event which is detectable in userspace. PTRACE_EVENT_STOP is not needed for this,
> it's just a "nice to have" thing (it saves one ptrace call).
>
> Re PTRACE_LISTEN: similarly to above, we _do_ know when we are in
> group-stop, even without (1) "flag stops with PTRACE_EVENT_STOP" thingy.
> Therefore we do know when PTRACE_LISTEN makes sense.

It never makes sense...

> Moreover, we can make it so that kernel errors out if PTRACE_LISTEN
> is used in a ptrace stop of a wrong kind.
>
>
> Just to let you see the relevant userspace code which made me think about
> the above issues, here is abridged version of main waitpid loop in strace.
> Note that it is coded in a way that it works both on SEIZE-enabled
> kernels and on old kernels, which have no SEIZE and friends.
> "use_seize" variable is set if we are on a SEIZE-enabled kernel:
>
> 	while (nprocs != 0) {
> 		pid = waitpid(-1, &status, __WALL);
> 		if (pid < 0) ...
>
> 		/* Look up 'pid' in our table of known tracees */
> 		tcp = pid2tcb(pid);
> 		if (tcp == NULL) {  /* Not found. It's auto-attached child */
> 			tcp = alloctcb(pid);
> // post_attach_sigstop is == 0 if use_seize, or == TCB_IGNORE_ONE_SIGSTOP otherwise
> // Basically, it's "do we expect to filter out one SIGSTOP"?
> 			tcp->flags |= TCB_ATTACHED | TCB_STARTUP | post_attach_sigstop;
> 		}
>
> 		if (WIFSIGNALED(status)) {
> 			...
> 			continue;
> 		}
> 		if (WIFEXITED(status)) {
> 			...
> 			continue;
> 		}
> 		if (!WIFSTOPPED(status)) ... /* error, should not happen */

It may happen if you're tracing a child process and are getting a normal SIGCHLD?

>
> 		/* Is this the very first time we see this tracee stopped? */
> 		if (tcp->flags & TCB_STARTUP) {
> 			tcp->flags &= ~TCB_STARTUP;
> 			if (ptrace(PTRACE_SETOPTIONS, tcp->pid, NULL, ptrace_setoptions) < 0)
> 				if (errno != ESRCH)
> 					perror_msg_and_die("PTRACE_SETOPTIONS");
> 		}
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> this if() will not be needed if we set options on attach!
>
> 		sig = WSTOPSIG(status);
>
> 		if (((unsigned)status >> 16) != 0) { /* Ptrace event */
> 			unsigned ev = (unsigned)status >> 16;
> 			if (ev == PTRACE_EVENT_STOP && sig != SIGTRAP) {
> 					stopped = 1;
> 					goto show_stopsig;
> 				}
> 			}
> 			goto restart_tracee_with_sig_0;
> 		}
>
> 		/* Is this post-attach SIGSTOP?
> 		 * Interestingly, the process may stop
> 		 * with STOPSIG equal to some other signal
> 		 * than SIGSTOP if we happend to attach
> 		 * just before the process takes a signal.
> 		 */
> 		if (sig == SIGSTOP && (tcp->flags & TCB_IGNORE_ONE_SIGSTOP)) {
> 			tcp->flags &= ~TCB_IGNORE_ONE_SIGSTOP;
> 			goto restart_tracee_with_sig_0;
> 		}
>
> 		if (sig != syscall_trap_sig) {
> 			siginfo_t si;
>
> 			/* Nonzero (true) if tracee is stopped by signal
> 			 * (as opposed to "tracee received signal") */
> 			stopped = ptrace(PTRACE_GETSIGINFO, pid, 0, &si);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> here is how we determine whether it is group-stop or signal-delivery stop
> for the "we didn't use SEIZE" case (if we used SEIZE, we would not
> come here on group-stop: see ev == PTRACE_EVENT_STOP check
> and "goto show_stopsig" above).
>
>  show_stopsig:
> 			if (cflag != CFLAG_ONLY_STATS && (qual_flags[sig] & QUAL_SIGNAL)) {
> 				printleader(tcp);
> 				if (!stopped) {
> 					tprints("--- ");
> 					printsiginfo(&si, verbose(tcp));
> 				} else
> 					tprintf("--- stopped by %s ---", signame(sig));
> 				printtrailer();
> 			}
>
> ...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
> _based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
> use_seize is true, but we use that flag only as indicator "this is a new kernel".
> It's easy to see that there are no intrisic reasons
> why it can't also work on newer kernels, but after using old-style ATTACH.

So you're emulating normal group stop behaviour in the tracer, adding band-aids
that make that possible, instead of avoiding these problems by making ptrace
totally transparent for group stop state? Wouldn't it all be much simpler if
ptrace didn't mess with group stop state?

> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> 			if (!stopped)  /* It's signal-delivery-stop. Inject the signal */
> 				goto restart_tracee;
>
> 			/* It's group-stop */
> 			if (use_seize) {
> 				/* This ends ptrace-stop, but does _not_ end group-stop */
> 				if (ptrace_restart(PTRACE_LISTEN, tcp, 0) < 0) ... /* err, doesn't happen */
> 				continue;
> 			}
> 			/* Old kernel, no support for PTRACE_LISTEN */
> 			goto restart_tracee;
> 		}
>
> 		/* This is syscall entry or exit */
> 		if (trace_syscall(tcp) < 0 && !tcp->ptrace_errno) {
> 			...drop this tracee, it died...
> 			continue;
> 		}
>
>  restart_tracee_with_sig_0:
> 		sig = 0;
>  restart_tracee:
> 		if (ptrace_restart(PTRACE_SYSCALL, tcp, sig) < 0) ... /* err, doesn't happen */
> 	}
>

Unrelated to this, could someone take a look at "ptrace() behaviour when doing
PTRACE_KILL": https://lkml.org/lkml/2011/8/28/48

I can reproduce the behaviour with on a 2.6.39 kernel, it seems like a ptrace bug.
Basically a tracee manages to execute a kill(2) after the tracer did a PTRACE_KILL
at system call entry.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05  1:15 ` Indan Zupancic
@ 2011-09-05  9:24   ` Denys Vlasenko
  2011-09-05 13:08     ` Indan Zupancic
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05  9:24 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel, dvlasenk

On Monday 05 September 2011 03:15, Indan Zupancic wrote:
> > We already have a mechanism how to modify ptrace behavior: ptrace options.
> > But now we introduce a different mechnism to do the same: by using SEIZE
> > instead of ATTACH, we magically change some aspects of ptrace.
> > In effect, SEIZE sets some options. And these "SEIZE options" can't be
> > set or cleared by SETOPTIONS. This is stupid. Why can't we just add
> > more options instead of inventing new entities?
> 
> Exactly my feeling!
> 
> > Why we overloaded
> > SEIZE with two functions: "attach to, but don't SIGSTOP the tracee"
> > and "change behaviour of ptrace on this tracee"?
> >
> > If the argument is that we want to set options immediately at attach,
> > then I completely agree: yes, we do! Moreover, we want to set some
> > _ordinary_ options too, such as PTRACE_O_TRACEEXEC, and we can't
> > do that even now, in improved API! It needs more improving.
> >
> > So my proposal is:
> > (a) make SEIZE take a parameter "immediately set these options on attach"
> 
> I think that would be a great improvement.
> 
> Better than making SEIZE an option itself, which I wanted to propose, but
> never got to.

Can't do. In pre-existing ptrace API, options can be set only after attach.
Therefore, it's not possible to set an option which affects ATTACH behavior.


> > (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
> >     nothing more.
> > (c) make the following new PTRACE_O_foo options:
> >    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
> 
> Why would anyone use that flag?

Well, it's nice to not require GETSIGINFO in order to know whether this stop
is group-stop.

> Most users are unaware of the crazy subtle SIGSTOP notification madness.
> It is totally unnecessary, just fix waitid() 
> to honour the absence of the WUNTRACED flag for traced processes.

Can't do. Existing programs depend on current ptrace behavior.

> I can't 
> think of any useful scenario where calling waitid() with WUNTRACED set on
> a traced task makes much sense, as the same information is received via
> ptrace.

Can't do. Existing programs depend on current ptrace behavior.


> The other part of a simpler, less crazy ptrace is let PTRACE_CONT not
> automatically continue a SIGSTOPped process.

Can't do. gdb depends on current ptrace behavior that PTRACE_CONT
restarts tracee after SIGSTOP.


> >    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
> >         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
> >        [if the second part is too weird to implement, make (2) require (1)]
> 
> I don't understand the need for PTRACE_INTERRUPT: Sending some signal, trapping,
> and PTRACE_CONT with zero "data" should have the same effect.

Except that sending a signal will race with someone else sendng the same signal.
It's nicer when this can be avoided.


> Or do you mean that 
> instead of SIGSTOP, SIGTRAP should be sent? If so, yes, that would be much saner.

Yes, that's how it works in 3.1 already.


> >    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"
> 
> This is a symptom of the complexity I whined about above.
> 
> If ptrace has no influence on group stops then PTRACE_LISTEN is not needed at all.
> The problem seems to be that SIGSTOP state is the same as ptrace trapped state, or
> something like that. Group stops and ptrace stops should be totally orthogonal.
> Then crazy things like PTRACE_LISTEN are not needed.
> 
> Why not just untangle the two concepts? (Via PTRACE_O_SANE.)

You are a bit late to the party. Thre was a LONG discussion how to implement
proper group-stop handling. You need to write a much more detailed proposal
than this vague suggestion "let's add one flag which makes everythng better".
How EXACTLY this flag will change ptrace bhavior?


> >    (5) "enable saner error codes"
> 
> Why do you need a new option for this? Ptrace error codes are close to useless,
> how can returning better error codes ever cause problems?

Now ESRCH means "no such process, or process is not attached to us, or process
is not stopped (it is running), or process has exited (it is a zombie now)".
I'd like to make it into four different error codes.


> > In fact, I am unsure why the new ops (points 2 and 3 above) need enabling.
> > Maybe they can be simply be always allowed? Let's see...
> 
> My advise is to untangle all ptrace and group stop state from each other
> with one PTRACE_O_SANE option. That would replace options 1 to 4. This
> gives perhaps less choice, but considering how subtle it already is, I
> don't think that simplifying everything is a bad thing.

Again, please write a document how proposed PTRACE_O_SANE affects ptrace.
*In details*.


> > Re PTRACE_INTERRUPT: "old-style" check for group-stop,
> > namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",
> 
> BTW, no one is going to use and know this, except perhaps ten people in the
> whole world.

It is documented behavior. Latest released strace already uses it.


> Everyone else just thinks that ptrace doesn't pass on SIGSTOP, 
> as the manpage says, and that the double SIGSTOP event is just a quirk.

We have an updated manpage. We are trying to push it into manpages git.


> > 		if (!WIFSTOPPED(status)) ... /* error, should not happen */
> 
> It may happen if you're tracing a child process and are getting a normal SIGCHLD?

What?


> > ...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
> > _based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
> > use_seize is true, but we use that flag only as indicator "this is a new kernel".
> > It's easy to see that there are no intrisic reasons
> > why it can't also work on newer kernels, but after using old-style ATTACH.
> 
> So you're emulating normal group stop behaviour in the tracer, adding band-aids
> that make that possible, instead of avoiding these problems by making ptrace
> totally transparent for group stop state? Wouldn't it all be much simpler if
> ptrace didn't mess with group stop state?

As I said, gdb already depends on current ptrace behavior. We can't break it.

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05  9:24   ` Denys Vlasenko
@ 2011-09-05 13:08     ` Indan Zupancic
  2011-09-05 14:06       ` Denys Vlasenko
  0 siblings, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-05 13:08 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel, dvlasenk

Hi,

On Mon, September 5, 2011 11:24, Denys Vlasenko wrote:
> On Monday 05 September 2011 03:15, Indan Zupancic wrote:
>> > So my proposal is:
>> > (a) make SEIZE take a parameter "immediately set these options on attach"
>>
>> I think that would be a great improvement.
>>
>> Better than making SEIZE an option itself, which I wanted to propose, but
>> never got to.
>
> Can't do. In pre-existing ptrace API, options can be set only after attach.
> Therefore, it's not possible to set an option which affects ATTACH behavior.

Your way is better anyway, so it's not important.

But what I meant is an option that does the actual attach, so it wouldn't
really be an option, more an action. Like I said, your way is better because
it makes much more sense, even though the behaviour is exactly the same.

>
>> > (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
>> >     nothing more.
>> > (c) make the following new PTRACE_O_foo options:
>> >    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
>>
>> Why would anyone use that flag?
>
> Well, it's nice to not require GETSIGINFO in order to know whether this stop
> is group-stop.

Yes, but that's theory. But in practice..? Try to take a step back and
look at the big picture.

Most ptrace users aren't interested in group stops and just treat the two
events the same (because they don't expect the SIGSTOP notification when
WUNTRACED isn't set). And group stop doesn't work on traced tasks and
can't with the old behaviour, so group stop isn't interesting (as it's
broken), so why do a GETSIGINFO either? I totally agree that if you do
a GETSIGINFO now, this new option makes it easier, but I'm trying to
say that almost no user will do the GETSIGINFO either.

The ptrace users who do want group stop to work usually don't want to
interfere with it, they just want to know about it. PTRACE_EVENT_STOP
makes the knowing slightly easier, but it doesn't fix group stop.

>> Most users are unaware of the crazy subtle SIGSTOP notification madness.
>> It is totally unnecessary, just fix waitid()
>> to honour the absence of the WUNTRACED flag for traced processes.
>
> Can't do. Existing programs depend on current ptrace behavior.

But this whole thing is about changing current ptrace behaviour!
That's what the new options and PTRACE_SEIZE etc. are for.

Existing programs that depend on the current behaviour won't use them.
The ones that do use it know about the different behaviour and would
prefer a simpler and less quirky behaviour.

>> I can't
>> think of any useful scenario where calling waitid() with WUNTRACED set on
>> a traced task makes much sense, as the same information is received via
>> ptrace.
>
> Can't do. Existing programs depend on current ptrace behavior.

Existing programs won't use any option to change the current behaviour
if they depend on the current behaviour.

>
>> The other part of a simpler, less crazy ptrace is let PTRACE_CONT not
>> automatically continue a SIGSTOPped process.
>
> Can't do. gdb depends on current ptrace behavior that PTRACE_CONT
> restarts tracee after SIGSTOP.

That is a non-argument, see above. Either gdb uses the old behaviour and
doesn't set any new options, or it does and is aware of any behavioural
changes, including this.

If ptrace didn't interfere with group stop then gdb had no need to restart
the tracee with PTRACE_CONT. It wouldn't have to treat SIGSTOP any differently
than any other signals.

>
>> >    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
>> >         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
>> >        [if the second part is too weird to implement, make (2) require (1)]
>>
>> I don't understand the need for PTRACE_INTERRUPT: Sending some signal, trapping,
>> and PTRACE_CONT with zero "data" should have the same effect.
>
> Except that sending a signal will race with someone else sendng the same signal.
> It's nicer when this can be avoided.

That race is not important, because the signal you send you will block (data=0),
so the tracee will only see one signal. It doesn't matter which signal makes it
through, as they are indentical (I use SIGTERM) and the tracee only sees one.
Just don't use SIGIO or something like that.

>
>> Or do you mean that
>> instead of SIGSTOP, SIGTRAP should be sent? If so, yes, that would be much saner.
>
> Yes, that's how it works in 3.1 already.

But don't existing programs depend on current ptrace behavior?

See what I mean? Improving ptrace can be done.

>
>> >    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"
>>
>> This is a symptom of the complexity I whined about above.
>>
>> If ptrace has no influence on group stops then PTRACE_LISTEN is not needed at all.
>> The problem seems to be that SIGSTOP state is the same as ptrace trapped state, or
>> something like that. Group stops and ptrace stops should be totally orthogonal.
>> Then crazy things like PTRACE_LISTEN are not needed.
>>
>> Why not just untangle the two concepts? (Via PTRACE_O_SANE.)
>
> You are a bit late to the party. Thre was a LONG discussion how to implement
> proper group-stop handling. You need to write a much more detailed proposal
> than this vague suggestion "let's add one flag which makes everythng better".
> How EXACTLY this flag will change ptrace bhavior?

I know! I'm fully aware of this. But I was too late for that long discussion,
and after that I couldn't keep up with all patchsets sent out. It makes little
sense to give detailed feedback when things might have totally changed again.
To be convincing at all I'd probably need to write patches too, so show the
advantages. But I never got to it. And at the time it was less clear to me
what the real behaviour regarding group stop is, as you guys kindly explained
to me.

----------------------------

To summarize, the flag would make ptrace have no effect on group stop state,
making SIGSTOP behaviour like any other signal behaves with ptrace. To get
there, when PTRACE_O_SANE is set, the user space behaviour changes seen are
as follows:

- In cases that ptrace sends a SIGSTOP, SIGTRAP is sent instead.

  (Your 4th proposal). So that ptrace never changes the group stop state of a task.

- waitid() honours the absence of the WUNTRACED flag.

  This prevents duplicate SIGSTOP events and the need to distinguish them
  from each other. No need for C.1 (PTRACE_EVENT_STOP). But if you still
  want it, it should be enabled by PTRACE_O_SANE too.

- PTRACE_CONT does not continue a stopped task.

  Tracer does not need to keep track of group stop state anymore, SIGSTOP and
  SIGCONT just work for traced tasks. No need for PTRACE_LISTEN. Tracer has
  still full control over stopped state of tracee because it can block SIGSTOP
  and SIGCONT signals, and send them itself.

- Unrelated, but to make things easier: Mark when a system call event is pre
  or post syscall.

----------------------------

I might have forgotten something, but I think this is pretty much all that's needed.
It would make everything simpler for everyone, IMHO. Can you think of any use case
that isn't solved by the above changes? All behavioural changes are group stop
related, something that is known to be broken, so not really anyone can much depend
on. And if they do, they don't have to enable any new option.

>
>> >    (5) "enable saner error codes"
>>
>> Why do you need a new option for this? Ptrace error codes are close to useless,
>> how can returning better error codes ever cause problems?
>
> Now ESRCH means "no such process, or process is not attached to us, or process
> is not stopped (it is running), or process has exited (it is a zombie now)".
> I'd like to make it into four different error codes.

That would make sense, yes. But assuming the program is bugfree, the only
case that can happen is "process has exited", and "no such process" for
ATTACH. Which one it is is clear from the context. To me it seems perfectly
save to change the error code for the two other cases, as that are program
bugs. I think you should just do it for those two cases and not bother with
a new option.

>
>> > In fact, I am unsure why the new ops (points 2 and 3 above) need enabling.
>> > Maybe they can be simply be always allowed? Let's see...
>>
>> My advise is to untangle all ptrace and group stop state from each other
>> with one PTRACE_O_SANE option. That would replace options 1 to 4. This
>> gives perhaps less choice, but considering how subtle it already is, I
>> don't think that simplifying everything is a bad thing.
>
> Again, please write a document how proposed PTRACE_O_SANE affects ptrace.
> *In details*.

See above. That's pretty much it, as far as user space behavioural changes.
Implementing it in the kernel is the tricky part, because ptrace is scattered
around between syscall entry/exit, signal handling and who knows where. (I
haven't pieced it all together yet.)

>
>> > Re PTRACE_INTERRUPT: "old-style" check for group-stop,
>> > namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",
>>
>> BTW, no one is going to use and know this, except perhaps ten people in the
>> whole world.
>
> It is documented behavior. Latest released strace already uses it.

Where is it documented? Keep in mind that a wrong description of what's
happening was in the manpage for years and years, when %99 of all current
ptrace using code was written.

And it's in strace because one of you added it, I guess. But strace has
no use for it, it already knows that a task is stopped when the tracee
gets a SIGSTOP. Or in other words, strace doesn't want to use it, but it
has to because it gets multiple events for one SIGSTOP. And strace would
much prefer it if a SIGCONT on the tracee would continue it again. Now it
doesn't. To get there, strace code has to be made even more complicated
by using the PTRACE_LISTEN. That's not improving things.

You can't do anything much with the extra info, the only extra choice the
tracer has with it is to not call PTRACE_CONT on the SIGSTOP notification.
But then the tracee is forever stuck because it can't receive a SIGCONT
and strace doesn't know when it has to be continued. Ergo, why bother
distinguishing the two events?

>
>> Everyone else just thinks that ptrace doesn't pass on SIGSTOP,
>> as the manpage says, and that the double SIGSTOP event is just a quirk.
>
> We have an updated manpage. We are trying to push it into manpages git.
>
>
>> > 		if (!WIFSTOPPED(status)) ... /* error, should not happen */
>>
>> It may happen if you're tracing a child process and are getting a normal SIGCHLD?
>
> What?

Going into that if case. I think I've seen it happen once, but now I'm not sure.
I think I got confused with something else and it just can't happen.

>
>> > ...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
>> > _based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
>> > use_seize is true, but we use that flag only as indicator "this is a new kernel".
>> > It's easy to see that there are no intrisic reasons
>> > why it can't also work on newer kernels, but after using old-style ATTACH.
>>
>> So you're emulating normal group stop behaviour in the tracer, adding band-aids
>> that make that possible, instead of avoiding these problems by making ptrace
>> totally transparent for group stop state? Wouldn't it all be much simpler if
>> ptrace didn't mess with group stop state?
>
> As I said, gdb already depends on current ptrace behavior. We can't break it.

Then it won't use all those new flags either, and adding anything new makes
no sense.

The behaviour you're defending is generally subtle and unexpected behaviour
that most ptrace users don't expect. Gdb can't really work around it to make
SIGSTOP/SIGCONT work either, whatever it does, it's "broken" already. Giving
ptrace users a choice to pick simpler, more predicatble behaviour which solves
all group stop related problems looks like a good idea and I don't see why it
would break gdb.

All the workarounds for the current ptrace can just stay in place, but won't
be triggered or are harmless. To make group stop working gdb just has to do
exactly nothing special, when PTRACE_O_SANE is set. If it wants the double
SIGSTOP notification, it can set WUNTRACED. It already has to handle the new
SIGTRAP instead of SIGSTOP case, and you had no problem with that. Only thing
left is PTRACE_CONT not continuing a group stopped task. I don't know why gdb
would want to continue a task that someone else stopped with SIGSTOP, that's
interfering in the task's normal behaviour, something gdb usually doesn't want
to do. But if it wants, it can still do that by blocking/sending SIGSTOP/SIGCONT.

Best regards,

Indan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05 13:08     ` Indan Zupancic
@ 2011-09-05 14:06       ` Denys Vlasenko
  2011-09-05 17:21         ` Indan Zupancic
  2011-09-05 17:44         ` Indan Zupancic
  0 siblings, 2 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05 14:06 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Mon, 2011-09-05 at 15:08 +0200, Indan Zupancic wrote:
> >> > (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
> >> >     nothing more.
> >> > (c) make the following new PTRACE_O_foo options:
> >> >    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
> >>
> >> Why would anyone use that flag?
> >
> > Well, it's nice to not require GETSIGINFO in order to know whether this stop
> > is group-stop.
> 
> Yes, but that's theory. But in practice..? Try to take a step back and
> look at the big picture.
> 
> Most ptrace users aren't interested in group stops and just treat the two
> events the same (because they don't expect the SIGSTOP notification when
> WUNTRACED isn't set).

Which users? strace? gdb? Something else?

> And group stop doesn't work on traced tasks

Group-stop works, as in "ptraced task does stop on SIGSTOP".
The problem is that we couldn't make it wait for SIGCONT without making
it run.

> group stop isn't interesting (as it's
> broken),

Why do you think so?

>  so why do a GETSIGINFO either? I totally agree that if you do
> a GETSIGINFO now, this new option makes it easier,

It is not a new behavior.

>  but I'm trying to
> say that almost no user will do the GETSIGINFO either.

strace does. I don't know about gdb.

> The ptrace users who do want group stop to work usually don't want to
> interfere with it, they just want to know about it.

The point is, they can't "not interfere" with it. __WALL
implicitly reports group-stops.

>  PTRACE_EVENT_STOP
> makes the knowing slightly easier, but it doesn't fix group stop.

Correct. PTRACE_LISTEN fixes it.

> >> >    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
> >> >         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
> >> >        [if the second part is too weird to implement, make (2) require (1)]
> >>
> >> I don't understand the need for PTRACE_INTERRUPT: Sending some signal, trapping,
> >> and PTRACE_CONT with zero "data" should have the same effect.
> >
> > Except that sending a signal will race with someone else sendng the same signal.
> > It's nicer when this can be avoided.
> 
> That race is not important, because the signal you send you will block (data=0),
> so the tracee will only see one signal.

Signals don't queue. If you send SIGSTOP, and someone else sends SIGSTOP
too at the same time, only one SIGSTOP will be delivered.
If you suppress it, you lose that second SIGSTOP.

> >> Or do you mean that
> >> instead of SIGSTOP, SIGTRAP should be sent? If so, yes, that would be much saner.
> >
> > Yes, that's how it works in 3.1 already.
> 
> But don't existing programs depend on current ptrace behavior?

Existing programs don't use PTRACE_INTERRUPT.

> >> >    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"
> >>
> >> This is a symptom of the complexity I whined about above.
> >>
> >> If ptrace has no influence on group stops then PTRACE_LISTEN is not needed at all.
> >> The problem seems to be that SIGSTOP state is the same as ptrace trapped state, or
> >> something like that. Group stops and ptrace stops should be totally orthogonal.
> >> Then crazy things like PTRACE_LISTEN are not needed.
> >>
> >> Why not just untangle the two concepts? (Via PTRACE_O_SANE.)
> >
> > You are a bit late to the party. Thre was a LONG discussion how to implement
> > proper group-stop handling. You need to write a much more detailed proposal
> > than this vague suggestion "let's add one flag which makes everythng better".
> > How EXACTLY this flag will change ptrace bhavior?
> 
> I know! I'm fully aware of this. But I was too late for that long discussion,
> and after that I couldn't keep up with all patchsets sent out. It makes little
> sense to give detailed feedback when things might have totally changed again.
> To be convincing at all I'd probably need to write patches too, so show the
> advantages. But I never got to it. And at the time it was less clear to me
> what the real behaviour regarding group stop is, as you guys kindly explained
> to me.
> 
> ----------------------------
> 
> To summarize, the flag would make ptrace have no effect on group stop state,

Wait a second. How and when would you set this new flag?

> making SIGSTOP behaviour like any other signal behaves with ptrace. To get
> there, when PTRACE_O_SANE is set, the user space behaviour changes seen are
> as follows:
> 
> - In cases that ptrace sends a SIGSTOP, SIGTRAP is sent instead.
> 
>   (Your 4th proposal). So that ptrace never changes the group stop state of a task.

Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
to be able to see real SIGTRAPs they send to the program,
or ones generated by, say, int3 instruction on x86.

> - waitid() honours the absence of the WUNTRACED flag.

(WUNTRACED is a horrible name. I prefer to use its alias WSTOPPED)

"honours the absence of the WUNTRACED flag" per-tracee?
Or do you want to set ptrace flag *on the tracer*? That would be new.

How will strace or gdb show that process has stopped, if it doesn't know
it? With SIGSTOP it's not really important (user can infer that), but
what about SIGTSTP? If strace says "SIGTSTP was delivered", is process
stopped now, or is it looping in the SIGTSTP handler?

>   This prevents duplicate SIGSTOP events and the need to distinguish them
>   from each other. No need for C.1 (PTRACE_EVENT_STOP). But if you still
>   want it, it should be enabled by PTRACE_O_SANE too.

I'm confused now. Does PTRACE_O_SANE disable or enable group-stop
notifications?

> - PTRACE_CONT does not continue a stopped task.

You just said we will not get group-stops. How we can PTRACE_CONT from
group-stop if we don't get group-stop?

In case you meant that "if we request group-stop notifications by using
__WALL | WSTOPPED, and we get group-stop notification, and we do
PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
or death)", then we have a problem: gdb can't use this interface, it
needs to be able to restart the thread (only one thread, not all of
them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
weird, but it's the real requirement from gdb users.

>   Tracer does not need to keep track of group stop state anymore, SIGSTOP and
>   SIGCONT just work for traced tasks.

But if it *wants to*?

>  No need for PTRACE_LISTEN. Tracer has
>   still full control over stopped state of tracee because it can block SIGSTOP
>   and SIGCONT signals, and send them itself.

SIGCONT's side effect of waking up from group-stop can't be blocked.
SIGCONT always wakes up all threads in thread group.
Using SIGCONT to control tracee will race with SIGCONTs from other
sources.

This makes SIGCONT a too coarse instrument for the job.

> >> > Re PTRACE_INTERRUPT: "old-style" check for group-stop,
> >> > namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",
> >>
> >> BTW, no one is going to use and know this, except perhaps ten people in the
> >> whole world.
> >
> > It is documented behavior. Latest released strace already uses it.
> 
> Where is it documented? Keep in mind that a wrong description of what's
> happening was in the manpage for years and years, when %99 of all current
> ptrace using code was written.
>
> And it's in strace because one of you added it, I guess. But strace has
> no use for it, it already knows that a task is stopped when the tracee
> gets a SIGSTOP.

No, it does not know it for the other three stopping signals,
since they can have non-default handlers.

> Or in other words, strace doesn't want to use it,

I think it should show group-stops. It's useful information.

>  but it
> has to because it gets multiple events for one SIGSTOP. And strace would
> much prefer it if a SIGCONT on the tracee would continue it again.

Yes. But gdb prefers otherwise. That's why we need PTRACE_LISTEN for
strace, and PTRACE_CONT for gdb.

> You can't do anything much with the extra info,

strace can print it, a-la:

--- stopped by SIGTTIN ---

> >> > ...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
> >> > _based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
> >> > use_seize is true, but we use that flag only as indicator "this is a new kernel".
> >> > It's easy to see that there are no intrisic reasons
> >> > why it can't also work on newer kernels, but after using old-style ATTACH.
> >>
> >> So you're emulating normal group stop behaviour in the tracer, adding band-aids
> >> that make that possible, instead of avoiding these problems by making ptrace
> >> totally transparent for group stop state? Wouldn't it all be much simpler if
> >> ptrace didn't mess with group stop state?
> >
> > As I said, gdb already depends on current ptrace behavior. We can't break it.
> 
> Then it won't use all those new flags either, and adding anything new makes
> no sense.

Yes... until gdb will want to give user a choice after SIGSTOP: continue
to sit in group-stop until SIGCONT (wasn't possible until
PTRACE_LISTEN), or continue executing (gdb's current behavior if user
uses "continue" command). Therefore, gdb needs a way to do both.

> The behaviour you're defending is generally subtle and unexpected behaviour
> that most ptrace users don't expect.

What behavior do I defend? That we get both signal-stops and
group-stops? I don't "defend" it, it's just what we _already have_. We
need to do minimal amount of changes.

In short, you propose to make it possible to switch off group-stop
notification. This makes it necessary to decide whether you want to
continue or to stop on SIGSTOP *beforehand*, when you enter waitpid().

My points are:
(1) in strace, this would deprive us of a useful bit of information
about the process.
(2) in gdb case, this may be too constraining for us: we want to be able
to decide what to do on group-stop *after* group-stop has happened!

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05 14:06       ` Denys Vlasenko
@ 2011-09-05 17:21         ` Indan Zupancic
  2011-09-06  0:59           ` Denys Vlasenko
  2011-09-05 17:44         ` Indan Zupancic
  1 sibling, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-05 17:21 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

Hello,

On Mon, September 5, 2011 16:06, Denys Vlasenko wrote:
> On Mon, 2011-09-05 at 15:08 +0200, Indan Zupancic wrote:
>> >> > (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
>> >> >     nothing more.
>> >> > (c) make the following new PTRACE_O_foo options:
>> >> >    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
>> >>
>> >> Why would anyone use that flag?
>> >
>> > Well, it's nice to not require GETSIGINFO in order to know whether this stop
>> > is group-stop.
>>
>> Yes, but that's theory. But in practice..? Try to take a step back and
>> look at the big picture.
>>
>> Most ptrace users aren't interested in group stops and just treat the two
>> events the same (because they don't expect the SIGSTOP notification when
>> WUNTRACED isn't set).
>
> Which users? strace? gdb? Something else?

Anyone using ptrace for whatever reason, including strace and gdb, until
they got "fixed" to know about this detail.

>> And group stop doesn't work on traced tasks
>
> Group-stop works, as in "ptraced task does stop on SIGSTOP".
> The problem is that we couldn't make it wait for SIGCONT without making
> it run.

Okay, group start and stop doesn't work then. Same thing, stopping something
while not being able to know when to continue it is not useful.

>
>> group stop isn't interesting (as it's
>> broken),
>
> Why do you think so?

For the reason you give above. It doesn't currently work, having it
half-working is not useful, so it's not that useful to know if a SIGSTOP
is a notification or a ptrace SIGSTOP event. In either case the task has
to be continued, or else it would be stuck forever. At best you can use
it to inform users in more detail what happens, in the case of strace
and gdb. It becomes slightly more interesting when group start and stop
both work, I agree.

>
>>  so why do a GETSIGINFO either? I totally agree that if you do
>> a GETSIGINFO now, this new option makes it easier,
>
> It is not a new behavior.
>
>
>>  but I'm trying to
>> say that almost no user will do the GETSIGINFO either.
>
> strace does. I don't know about gdb.

But how long does it do it, and how long did it work fine not doing it?
Strace 4.5.16 only uses it to print out signal address and pc if possible,
it doesn't use it for SIGSTOP. I haven't checked newer strace source,
that's the newest version on my hd.

>
>> The ptrace users who do want group stop to work usually don't want to
>> interfere with it, they just want to know about it.
>
> The point is, they can't "not interfere" with it. __WALL
> implicitly reports group-stops.

That can be changed. That is not documented behaviour of __WALL,
__WALL is supposed to tell for which tasks to give notifications,
not what kind. At least that's the impression after reading the
manpage.

If people want group-stop reports, they set WUNTRACED/WSTOPPED.
If they don't want it, they don't set it: But this choice is taken
away when ptracing.

>
>>  PTRACE_EVENT_STOP
>> makes the knowing slightly easier, but it doesn't fix group stop.
>
> Correct. PTRACE_LISTEN fixes it.

In a very convoluted way. It fixes the symptomps, but it doesn't fix
the problem, it works around it.

>
>> >> >    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with
>> sig=SIGTRAP
>> >> >         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
>> >> >        [if the second part is too weird to implement, make (2) require (1)]
>> >>
>> >> I don't understand the need for PTRACE_INTERRUPT: Sending some signal, trapping,
>> >> and PTRACE_CONT with zero "data" should have the same effect.
>> >
>> > Except that sending a signal will race with someone else sendng the same signal.
>> > It's nicer when this can be avoided.
>>
>> That race is not important, because the signal you send you will block (data=0),
>> so the tracee will only see one signal.
>
> Signals don't queue. If you send SIGSTOP, and someone else sends SIGSTOP
> too at the same time, only one SIGSTOP will be delivered.
> If you suppress it, you lose that second SIGSTOP.

Ah, yes, didn't think about that. Then use a queuing rt signal instead.

>
>> >> Or do you mean that
>> >> instead of SIGSTOP, SIGTRAP should be sent? If so, yes, that would be much saner.
>> >
>> > Yes, that's how it works in 3.1 already.
>>
>> But don't existing programs depend on current ptrace behavior?
>
> Existing programs don't use PTRACE_INTERRUPT.

So it's only for PTRACE_INTERRUPT, not also for new childs?
Did PTRACE_INTERRUPT make it into the standard kernel already?
Damn, I thought the API changes would wait.

>> >> >    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other
>> options"
>> >>
>> >> This is a symptom of the complexity I whined about above.
>> >>
>> >> If ptrace has no influence on group stops then PTRACE_LISTEN is not needed at all.
>> >> The problem seems to be that SIGSTOP state is the same as ptrace trapped state, or
>> >> something like that. Group stops and ptrace stops should be totally orthogonal.
>> >> Then crazy things like PTRACE_LISTEN are not needed.
>> >>
>> >> Why not just untangle the two concepts? (Via PTRACE_O_SANE.)
>> >
>> > You are a bit late to the party. Thre was a LONG discussion how to implement
>> > proper group-stop handling. You need to write a much more detailed proposal
>> > than this vague suggestion "let's add one flag which makes everythng better".
>> > How EXACTLY this flag will change ptrace bhavior?
>>
>> I know! I'm fully aware of this. But I was too late for that long discussion,
>> and after that I couldn't keep up with all patchsets sent out. It makes little
>> sense to give detailed feedback when things might have totally changed again.
>> To be convincing at all I'd probably need to write patches too, so show the
>> advantages. But I never got to it. And at the time it was less clear to me
>> what the real behaviour regarding group stop is, as you guys kindly explained
>> to me.
>>
>> ----------------------------
>>
>> To summarize, the flag would make ptrace have no effect on group stop state,
>
> Wait a second. How and when would you set this new flag?

Well, preferably with the new nifty PTRACE_SEIZE, or via PTRACE_SETOPTIONS.

But now you mention it, it would be very nice if PTRACE_TRACEME would also
accept options, just like PTRACE_SEIZE. (Or PTRACE_SEIZE with 0 pid would
work as PTRACE_TRACEME with options, but that's probably too confusing.)

>
>> making SIGSTOP behaviour like any other signal behaves with ptrace. To get
>> there, when PTRACE_O_SANE is set, the user space behaviour changes seen are
>> as follows:
>>
>> - In cases that ptrace sends a SIGSTOP, SIGTRAP is sent instead.
>>
>>   (Your 4th proposal). So that ptrace never changes the group stop state of a task.
>
> Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
> to be able to see real SIGTRAPs they send to the program,
> or ones generated by, say, int3 instruction on x86.

But SIGTRAP is ours, ptrace already sends SIGTRAPS at execve. Only change is
that it also sends one for new childs instead of SIGSTOP.

In this case we don't interfere because the tracer can know if it is a ptrace
trap by checking the extra bits. Except for the signal race and SIGTRAP not
queueing...

In any case, this is not a new problem. Changing SIGSTOP removes the group
stop interference, and increases the SIGTRAP interference in case of races.
But in the normal case non-ptrace traps aren't seen by tracees, if the tracer
takes some care (if it doesn't, then current program don't get their SIGTRAP
either).

>
>
>> - waitid() honours the absence of the WUNTRACED flag.
>
> (WUNTRACED is a horrible name. I prefer to use its alias WSTOPPED)
>
> "honours the absence of the WUNTRACED flag" per-tracee?
> Or do you want to set ptrace flag *on the tracer*? That would be new.

Err, maybe, that seems the most obvious way of doing it. But you can also
do it per tracer, it doesn't really matter. There must be ptrace specific
logic in waitid() ignoring WSTOPPED for tracees. Instead of blindly checking
that, it could also check this new flag.

In practice the effect is the same because tracers set the same options
on all tasks they trace, and tasks can be only traced by one tracer. If
all flags are moved to the tracer, it would be easy to implement multiple
tracers on the same task.

> How will strace or gdb show that process has stopped, if it doesn't know
> it? With SIGSTOP it's not really important (user can infer that), but
> what about SIGTSTP? If strace says "SIGTSTP was delivered", is process
> stopped now, or is it looping in the SIGTSTP handler?

Never heard of SIGTSTP, I don't know TTY black magic.

Tracers that want to get group stop notifications will set WSTOPPED and get
that information that way. But as ptrace won't generate any SIGSTOPs, they
don't have to use GETSIGINFO to know if it came from ptrace or not: It never
does.

>
>
>>   This prevents duplicate SIGSTOP events and the need to distinguish them
>>   from each other. No need for C.1 (PTRACE_EVENT_STOP). But if you still
>>   want it, it should be enabled by PTRACE_O_SANE too.
>
> I'm confused now. Does PTRACE_O_SANE disable or enable group-stop
> notifications?

Neither, it makes waitid() honour the presence and absence of the WSTOPPED
flag. It gives that choice back to the tracer.

>
>> - PTRACE_CONT does not continue a stopped task.
>
> You just said we will not get group-stops. How we can PTRACE_CONT from
> group-stop if we don't get group-stop?

Why would you want to do that? I think I don't understand you.

> In case you meant that "if we request group-stop notifications by using
> __WALL | WSTOPPED, and we get group-stop notification, and we do
> PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
> or death)",

Exactly.

A group-stop notification is not really a ptrace event (maybe it is now),
so PTRACE_CONT wouldn't be needed. It's just a group stop notification,
not a freeze, report and wait event, like signals. But this is in my
mind, reality might be different.

> then we have a problem: gdb can't use this interface, it
> needs to be able to restart the thread (only one thread, not all of
> them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> weird, but it's the real requirement from gdb users.

Is that true? Isn't a SIGCONT sent to the TID only for that thread instead
of the whole group? That's slightly inconvenient indeed. Perhaps this
limitation can be fixed? Might be troublesome for the main thread.

But this is for a group stop initiated by gdb, I suppose? In that case gdb
can just let the threads hang in the signal delivery path, and continue them
one by one, like it does now. Any group signal would work for this, doesn't
have to be SIGSTOP. E.g. a queueing rt signal that is blocked and never seen
by the tracee.

Would that work?

Because IMHO ptrace isn't the proper way of handling normal group stops,
signals are.

>
>>   Tracer does not need to keep track of group stop state anymore, SIGSTOP and
>>   SIGCONT just work for traced tasks.
>
> But if it *wants to*?

Then it still can, by setting WSTOPPED.

>
>>  No need for PTRACE_LISTEN. Tracer has
>>   still full control over stopped state of tracee because it can block SIGSTOP
>>   and SIGCONT signals, and send them itself.
>
> SIGCONT's side effect of waking up from group-stop can't be blocked.
> SIGCONT always wakes up all threads in thread group.
> Using SIGCONT to control tracee will race with SIGCONTs from other
> sources.
>
> This makes SIGCONT a too coarse instrument for the job.

No kidding. Perhaps the solution is to not use group stops for this in
the first place.

But PTRACE_CONT results in one traced task running while the rest is still
stopped, that could be called an often unwanted side effect too.

>
>> >> > Re PTRACE_INTERRUPT: "old-style" check for group-stop,
>> >> > namely, "if ptrace(PTRACE_GETSIGINFO) fails, then it's group-stop",
>> >>
>> >> BTW, no one is going to use and know this, except perhaps ten people in the
>> >> whole world.
>> >
>> > It is documented behavior. Latest released strace already uses it.
>>
>> Where is it documented? Keep in mind that a wrong description of what's
>> happening was in the manpage for years and years, when %99 of all current
>> ptrace using code was written.
>>
>> And it's in strace because one of you added it, I guess. But strace has
>> no use for it, it already knows that a task is stopped when the tracee
>> gets a SIGSTOP.
>
> No, it does not know it for the other three stopping signals,
> since they can have non-default handlers.
>
>> Or in other words, strace doesn't want to use it,
>
> I think it should show group-stops. It's useful information.

Then let it do so, there is no reason it can't with my proposal.

>>  but it
>> has to because it gets multiple events for one SIGSTOP. And strace would
>> much prefer it if a SIGCONT on the tracee would continue it again.
>
> Yes. But gdb prefers otherwise. That's why we need PTRACE_LISTEN for
> strace, and PTRACE_CONT for gdb.

Or gdb can stop using group stops to achieve what it wants. Gdb is the
strange use case, strace-like ptrace usage is the more common one. So
make the more common case simpler instead of complicating everything
for a corner gdb use case (however important it is).

Gdb can't want to have both transparent group stop behaviour, and also
hijack group stops for thread running control. Letting it hang in ptrace
is fine for gdb, that's what it does now. Let it keep doing it, it can
even let it hang in SIGSTOP delivery path, it doesn't really matter.
Only change is that tasks are in trapped state instead of actually
stopped. That is an improvement, because then unrelated SIGCONT's can't
interfere with gdb's debugging. But with PTRACE_CONT not continuing group
stopped tasks it gives gdb the ability to transparently debug externally
stopped and started tasks. It still can stop all tasks and continue them
one by one as it does now.

>> You can't do anything much with the extra info,
>
> strace can print it, a-la:
>
> --- stopped by SIGTTIN ---

True. That stays true though.

>
>> >> > ...and here we decide to use either PTRACE_SYSCALL or PTRACE_LISTEN,
>> >> > _based on the value of "stopped" variable_. Here we PTRACE_LISTEN only if
>> >> > use_seize is true, but we use that flag only as indicator "this is a new kernel".
>> >> > It's easy to see that there are no intrisic reasons
>> >> > why it can't also work on newer kernels, but after using old-style ATTACH.
>> >>
>> >> So you're emulating normal group stop behaviour in the tracer, adding band-aids
>> >> that make that possible, instead of avoiding these problems by making ptrace
>> >> totally transparent for group stop state? Wouldn't it all be much simpler if
>> >> ptrace didn't mess with group stop state?
>> >
>> > As I said, gdb already depends on current ptrace behavior. We can't break it.
>>
>> Then it won't use all those new flags either, and adding anything new makes
>> no sense.
>
> Yes... until gdb will want to give user a choice after SIGSTOP: continue
> to sit in group-stop until SIGCONT (wasn't possible until
> PTRACE_LISTEN), or continue executing (gdb's current behavior if user
> uses "continue" command). Therefore, gdb needs a way to do both.

But are users expected to send SIGSTOPs manually when debugging, instead
of hitting ctlr+z or whatever?

Anyway, gdb can give this choice when it receives the SIGSTOP:
If users want it to go into group stop, gdb does PTRACE_CONT.
If users want to selectively continue running, gdb continues
select threads while blocking the SIGSTOP delivery.

Would this work for gdb, or is there some problem with this solution?

>
>> The behaviour you're defending is generally subtle and unexpected behaviour
>> that most ptrace users don't expect.
>
> What behavior do I defend? That we get both signal-stops and
> group-stops? I don't "defend" it, it's just what we _already have_. We
> need to do minimal amount of changes.

The group stop interference behaviour. I don't think that adding
PTRACE_INTERRUPT, PTRACE_LISTEN and all those new options is the
minimal amount of change, especially not when including the user
space changes needed to make use of this.

> In short, you propose to make it possible to switch off group-stop
> notification.

That is just part of the proposal, the main this is to not let PTRACE_CONT
continue a stopped task. Switching off group-stop notifications makes it
only easier to use ptrace when not specifically interested in group stops.

> This makes it necessary to decide whether you want to
> continue or to stop on SIGSTOP *beforehand*, when you enter waitpid().

No, only if you want to get the group notification. That would be true
if PTRACE_CONT continues a task, but if that's not true, then it's pure
a choice whether to get those notifications or not.

At waitpid() time you can do whatever you want, including the old behaviour.

> My points are:
> (1) in strace, this would deprive us of a useful bit of information
> about the process.

It's a choice, if you want group notifications, then ask for them.
Current ptrace takes that choice away, that's all.

> (2) in gdb case, this may be too constraining for us: we want to be able
> to decide what to do on group-stop *after* group-stop has happened!

See above, that should be still possible. Or am I missing something?

Greetings,

Indan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05 17:21         ` Indan Zupancic
@ 2011-09-06  0:59           ` Denys Vlasenko
  2011-09-06 17:08             ` Indan Zupancic
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-06  0:59 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Monday 05 September 2011 19:21, Indan Zupancic wrote:
> >> The ptrace users who do want group stop to work usually don't want to
> >> interfere with it, they just want to know about it.
> >
> > The point is, they can't "not interfere" with it. __WALL
> > implicitly reports group-stops.
> 
> That can be changed. That is not documented behaviour of __WALL,
> __WALL is supposed to tell for which tasks to give notifications,
> not what kind. At least that's the impression after reading the
> manpage.
> 
> If people want group-stop reports, they set WUNTRACED/WSTOPPED.

Well, that is your interpretation. The fact is, existing programs
either deliberately use __WALL in order to see (among other things)
group-stops of tracees; or use __WALL and don't really care about
group-stops they are getting - but they were debugged and tested
on the kernels where group-stops were delivered on __WALL,
thus they might break if that will stop happening.



> >>  PTRACE_EVENT_STOP
> >> makes the knowing slightly easier, but it doesn't fix group stop.
> >
> > Correct. PTRACE_LISTEN fixes it.
> 
> In a very convoluted way. It fixes the symptomps, but it doesn't fix
> the problem, it works around it.

Well, as far as I am concerned, PTRACE_LISTEN allows me to achieve what I need.

If you don't like it just on the conceptual grounds, it may be
a matter of taste.

Do you see an actual problem with it, as in "something doesn't work right"?


> >> making SIGSTOP behaviour like any other signal behaves with ptrace. To get
> >> there, when PTRACE_O_SANE is set, the user space behaviour changes seen are
> >> as follows:
> >>
> >> - In cases that ptrace sends a SIGSTOP, SIGTRAP is sent instead.
> >>
> >>   (Your 4th proposal). So that ptrace never changes the group stop state of a task.
> >
> > Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
> > to be able to see real SIGTRAPs they send to the program,
> > or ones generated by, say, int3 instruction on x86.
> 
> But SIGTRAP is ours, ptrace already sends SIGTRAPS at execve.

...whcih causes problems, and therefore we have PTRACE_O_TRACEEXEC to suppress
this idiotic post-execve SIGTRAP.

> Only change is 
> that it also sends one for new childs instead of SIGSTOP.

Racing with user's SIGTRAP does not fix the problem, it merely moves it
to a different signal. PTRACE_EVENT_STOP thingy fixes the problem once
and for all.


> In any case, this is not a new problem.

That the problem is old doesn't mean we can ignore it.

> But in the normal case non-ptrace traps aren't seen by tracees, if the tracer
> takes some care (if it doesn't, then current program don't get their SIGTRAP
> either).

I don't understand this part.


> > How will strace or gdb show that process has stopped, if it doesn't know
> > it? With SIGSTOP it's not really important (user can infer that), but
> > what about SIGTSTP? If strace says "SIGTSTP was delivered", is process
> > stopped now, or is it looping in the SIGTSTP handler?
> 
> Never heard of SIGTSTP, I don't know TTY black magic.
> 
> Tracers that want to get group stop notifications will set WSTOPPED and get
> that information that way. But as ptrace won't generate any SIGSTOPs, they
> don't have to use GETSIGINFO to know if it came from ptrace or not: It never
> does.

Drats. In your proposal, if I'd set WSTOPPED, I will get group-stops, right?

How in your proposed solution can I "restart and cancel group-stop" after
group-stop? And how can I "restart and wait for a SIGCONT"?


> >>   This prevents duplicate SIGSTOP events and the need to distinguish them
> >>   from each other. No need for C.1 (PTRACE_EVENT_STOP). But if you still
> >>   want it, it should be enabled by PTRACE_O_SANE too.
> >
> > I'm confused now. Does PTRACE_O_SANE disable or enable group-stop
> > notifications?
> 
> Neither, it makes waitid() honour the presence and absence of the WSTOPPED
> flag. It gives that choice back to the tracer.
> 
> >
> >> - PTRACE_CONT does not continue a stopped task.
> >
> > You just said we will not get group-stops. How we can PTRACE_CONT from
> > group-stop if we don't get group-stop?
> 
> Why would you want to do that? I think I don't understand you.

Today both strace and gdb want to know about group-stops.
If you think they should not, well, tough luck: their maintainers
think otherwise.


> > In case you meant that "if we request group-stop notifications by using
> > __WALL | WSTOPPED, and we get group-stop notification, and we do
> > PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
> > or death)",
> 
> Exactly.
> 
> A group-stop notification is not really a ptrace event (maybe it is now),
> so PTRACE_CONT wouldn't be needed. It's just a group stop notification,
> not a freeze, report and wait event, like signals.

Yes, this would be a reasonable API model. PTRACE_CONT cancels group-stop,
and *doing nothing* leaves task in the group-stop. One problem with this:
many ptrace ops (such as GETREGS) are only allowed in ptrace stops.
If you don't consider group-stop to be a ptrace-stop, then no such
ops are allowed. In fact, even restart ops need stopped tracee, so PTRACE_CONT
is illegal too. IOW: in this model, kernel doesn't know when we decided
to go down *do nothing* route. From kernel POW, it is confused. Maybe we do
want to PTRACE_CONT, but just didn't get around to it?


> > then we have a problem: gdb can't use this interface, it
> > needs to be able to restart the thread (only one thread, not all of
> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> > weird, but it's the real requirement from gdb users.
> 
> Is that true? Isn't a SIGCONT sent to the TID only for that thread instead
> of the whole group? That's slightly inconvenient indeed. Perhaps this
> limitation can be fixed? Might be troublesome for the main thread.

This is how stopping/starting works. It's per-process, not per-thread.
Regardless of the thread it is sent to, stop signals stop all threads,
and SIGCONT restarts all of them.


> But this is for a group stop initiated by gdb, I suppose?

It is true for any king of group-stop.

> In that case gdb 
> can just let the threads hang in the signal delivery path, and continue them
> one by one, like it does now.

It can't, since it doesn't know about signal handlers.

> Any group signal would work for this, doesn't 
> have to be SIGSTOP. E.g. a queueing rt signal that is blocked and never seen
> by the tracee.

I repeat: it is not about signals sent by gdb. It is about siganls sent by anyone.


> >>  No need for PTRACE_LISTEN. Tracer has
> >>   still full control over stopped state of tracee because it can block SIGSTOP
> >>   and SIGCONT signals, and send them itself.
> >
> > SIGCONT's side effect of waking up from group-stop can't be blocked.
> > SIGCONT always wakes up all threads in thread group.
> > Using SIGCONT to control tracee will race with SIGCONTs from other
> > sources.
> >
> > This makes SIGCONT a too coarse instrument for the job.
> 
> No kidding. Perhaps the solution is to not use group stops for this in
> the first place.
> 
> But PTRACE_CONT results in one traced task running while the rest is still
> stopped, that could be called an often unwanted side effect too.

Yes. But gdb people want that. I and Oleg tried to persuade them to stop
wanting that. The end result is that they persuaded us that it's needed.


> >>  but it
> >> has to because it gets multiple events for one SIGSTOP. And strace would
> >> much prefer it if a SIGCONT on the tracee would continue it again.
> >
> > Yes. But gdb prefers otherwise. That's why we need PTRACE_LISTEN for
> > strace, and PTRACE_CONT for gdb.
> 
> Or gdb can stop using group stops to achieve what it wants. Gdb is the 
> strange use case, strace-like ptrace usage is the more common one. So
> make the more common case simpler instead of complicating everything
> for a corner gdb use case (however important it is).
> 
> Gdb can't want to have both transparent group stop behaviour, and also
> hijack group stops for thread running control. Letting it hang in ptrace
> is fine for gdb, that's what it does now. Let it keep doing it, it can
> even let it hang in SIGSTOP delivery path, it doesn't really matter.
> Only change is that tasks are in trapped state instead of actually
> stopped. That is an improvement, because then unrelated SIGCONT's can't
> interfere with gdb's debugging. But with PTRACE_CONT not continuing group
> stopped tasks it gives gdb the ability to transparently debug externally
> stopped and started tasks. It still can stop all tasks and continue them
> one by one as it does now.

In short: "screw gdb needs, the elegance of my idea is more important
that their real world needs". Sorry, but we tried that, and it leads nowhere.


> >> > As I said, gdb already depends on current ptrace behavior. We can't break it.
> >>
> >> Then it won't use all those new flags either, and adding anything new makes
> >> no sense.
> >
> > Yes... until gdb will want to give user a choice after SIGSTOP: continue
> > to sit in group-stop until SIGCONT (wasn't possible until
> > PTRACE_LISTEN), or continue executing (gdb's current behavior if user
> > uses "continue" command). Therefore, gdb needs a way to do both.
> 
> But are users expected to send SIGSTOPs manually when debugging, instead
> of hitting ctlr+z or whatever?

We don't know. We should handle the widest possible set of scenarios:
^Z, manual signals, signals sent by kernel on bg I/O (such as SIGTTIN)...


> Anyway, gdb can give this choice when it receives the SIGSTOP:
> If users want it to go into group stop, gdb does PTRACE_CONT.
> If users want to selectively continue running, gdb continues
> select threads while blocking the SIGSTOP delivery.

Here we go again. If gdb would block signal delivery, HOW DOES
IT KNOW THAT THIS SIGNAL IS GOING TO CAUSE GROUP-STOP?
Think about receiving SIGTSTP. Will it cause group-stop?


> >> The behaviour you're defending is generally subtle and unexpected behaviour
> >> that most ptrace users don't expect.
> >
> > What behavior do I defend? That we get both signal-stops and
> > group-stops? I don't "defend" it, it's just what we _already have_. We
> > need to do minimal amount of changes.
> 
> The group stop interference behaviour. I don't think that adding
> PTRACE_INTERRUPT, PTRACE_LISTEN and all those new options is the
> minimal amount of change, especially not when including the user
> space changes needed to make use of this.

Current group-stop fix is EXACTLY one new ptrace command: PTRACE_LISTEN.
How you can make it more minimal, I have no idea.


> > In short, you propose to make it possible to switch off group-stop
> > notification.
> 
> That is just part of the proposal, the main this is to not let PTRACE_CONT
> continue a stopped task.

This will break gdb. I told this several times already. Not acceptable.


> Switching off group-stop notifications makes it 
> only easier to use ptrace when not specifically interested in group stops.

Solution in the search of a problem.


> > (2) in gdb case, this may be too constraining for us: we want to be able
> > to decide what to do on group-stop *after* group-stop has happened!
> 
> See above, that should be still possible. Or am I missing something?

Yes, you miss the fact that inferring group-stop on signal delivery
is not reliable. The correct way to do it is to not *infer* it, but
*observe* it when it really happens.

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-06  0:59           ` Denys Vlasenko
@ 2011-09-06 17:08             ` Indan Zupancic
  2011-09-07  2:34               ` Denys Vlasenko
  0 siblings, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-06 17:08 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

Hello,

On Tue, September 6, 2011 02:59, Denys Vlasenko wrote:
> On Monday 05 September 2011 19:21, Indan Zupancic wrote:
>> >> The ptrace users who do want group stop to work usually don't want to
>> >> interfere with it, they just want to know about it.
>> >
>> > The point is, they can't "not interfere" with it. __WALL
>> > implicitly reports group-stops.
>>
>> That can be changed. That is not documented behaviour of __WALL,
>> __WALL is supposed to tell for which tasks to give notifications,
>> not what kind. At least that's the impression after reading the
>> manpage.
>>
>> If people want group-stop reports, they set WUNTRACED/WSTOPPED.
>
> Well, that is your interpretation. The fact is, existing programs
> either deliberately use __WALL in order to see (among other things)
> group-stops of tracees; or use __WALL and don't really care about
> group-stops they are getting - but they were debugged and tested
> on the kernels where group-stops were delivered on __WALL,
> thus they might break if that will stop happening.

It's very unlike that something that doesn't want group stops breaks
when it stops getting them. Especially not when asking for new behaviour
with some new option. Keep in mind that this group stop thing is not
documented in any manpage!

>> >>  PTRACE_EVENT_STOP
>> >> makes the knowing slightly easier, but it doesn't fix group stop.
>> >
>> > Correct. PTRACE_LISTEN fixes it.
>>
>> In a very convoluted way. It fixes the symptomps, but it doesn't fix
>> the problem, it works around it.
>
> Well, as far as I am concerned, PTRACE_LISTEN allows me to achieve what I need.

I hope to convince you that there is a better way.

> If you don't like it just on the conceptual grounds, it may be
> a matter of taste.

Of course it is a matter of taste, we're talking about ABI here!

My goal is to simplify ptrace usage for most use cases. And that
includes the ptrace users' code, not only the kernel/API changes.

> Do you see an actual problem with it, as in "something doesn't work right"?

I'll have to take a better look at PTRACE_LISTEN to be sure.

I just think it's the wrong way around of doing things.

>> > Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
>> > to be able to see real SIGTRAPs they send to the program,
>> > or ones generated by, say, int3 instruction on x86.
>>
>> But SIGTRAP is ours, ptrace already sends SIGTRAPS at execve.
>
> ...whcih causes problems, and therefore we have PTRACE_O_TRACEEXEC to suppress
> this idiotic post-execve SIGTRAP.

The SIGTRAP is still there, there is just an extra bit to distinguish
it from normal SIGTRAPs. Sending another SIGTRAP with another bit set,
just as with PTRACE_O_TRACEEXEC, solves this issue.

>> Only change is
>> that it also sends one for new childs instead of SIGSTOP.
>
> Racing with user's SIGTRAP does not fix the problem, it merely moves it
> to a different signal. PTRACE_EVENT_STOP thingy fixes the problem once
> and for all.

It doesn't really matter if it's SIGSTOP with an extra flag, or SIGTRAP
with an extra flag set. But SIGTRAP is already used this way for fork
and exec, so using it always instead of sometimes SIGSTOP would be more
consistent. It's also clearer what to do with the SIGTRAP as tracer: Block
it so the tracee doesn't see it. With SIGSTOP it's less clear what to do.
In current trace it doesn't matter what you do, in practice, and that's also
what the manpage says. So using SIGTRAP with an extra bit set is cleaner.

>
>> In any case, this is not a new problem.
>
> That the problem is old doesn't mean we can ignore it.
>
>> But in the normal case non-ptrace traps aren't seen by tracees, if the tracer
>> takes some care (if it doesn't, then current program don't get their SIGTRAP
>> either).
>
> I don't understand this part.

"Tracer taking some care" means checking those ptrace bits when handling
a SIGTRAP: PTRACE_EVENT_EXEC, etc.

If the tracer doesn't use PTRACE_O_TRACEEXEC etc. then it will just block
all SIGTRAPS blindly, and real SIGTRAP don't make it to the tracee.

>
>> > How will strace or gdb show that process has stopped, if it doesn't know
>> > it? With SIGSTOP it's not really important (user can infer that), but
>> > what about SIGTSTP? If strace says "SIGTSTP was delivered", is process
>> > stopped now, or is it looping in the SIGTSTP handler?
>>
>> Never heard of SIGTSTP, I don't know TTY black magic.
>>
>> Tracers that want to get group stop notifications will set WSTOPPED and get
>> that information that way. But as ptrace won't generate any SIGSTOPs, they
>> don't have to use GETSIGINFO to know if it came from ptrace or not: It never
>> does.
>
> Drats. In your proposal, if I'd set WSTOPPED, I will get group-stops, right?

Group stop notifications, yes, that was the idea.

> How in your proposed solution can I "restart and cancel group-stop" after
> group-stop? And how can I "restart and wait for a SIGCONT"?

Not sure what you mean with restart, just send another SIGSTOP?

Blocking a group stop happens by blocking the SIGSTOP delivery to the tracee.
Each tracee gets a SIGSTOP ptrace event, it's not like only one gets it and
that controls the state of the whole group. Same for SIGCONT. I'll have to
double check this, but I think this is how it works.

To restart, send a SIGCONT.

> Today both strace and gdb want to know about group-stops.
> If you think they should not, well, tough luck: their maintainers
> think otherwise.

I'm not saying that. I just want an option to not get them, so that strace
behaves less unexpected.

>> A group-stop notification is not really a ptrace event (maybe it is now),
>> so PTRACE_CONT wouldn't be needed. It's just a group stop notification,
>> not a freeze, report and wait event, like signals.
>
> Yes, this would be a reasonable API model. PTRACE_CONT cancels group-stop,
> and *doing nothing* leaves task in the group-stop. One problem with this:
> many ptrace ops (such as GETREGS) are only allowed in ptrace stops.
> If you don't consider group-stop to be a ptrace-stop, then no such
> ops are allowed. In fact, even restart ops need stopped tracee, so PTRACE_CONT
> is illegal too. IOW: in this model, kernel doesn't know when we decided
> to go down *do nothing* route. From kernel POW, it is confused. Maybe we do
> want to PTRACE_CONT, but just didn't get around to it?

It's probably best to not change the current model for group stop notifications
(when requested via WSTOPPED), as that is confusing, even if the new behaviour
is better (which it probably isn't).

You're right that if a tracee requests group stop notifications, it probably
wants to be able to poke around too sometimes, and then a ptrace slightly
entwined with group stops is preferred. But it's not elegant. :-/

>> > then we have a problem: gdb can't use this interface, it
>> > needs to be able to restart the thread (only one thread, not all of
>> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
>> > weird, but it's the real requirement from gdb users.
>>
>> Is that true? Isn't a SIGCONT sent to the TID only for that thread instead
>> of the whole group? That's slightly inconvenient indeed. Perhaps this
>> limitation can be fixed? Might be troublesome for the main thread.
>
> This is how stopping/starting works. It's per-process, not per-thread.
> Regardless of the thread it is sent to, stop signals stop all threads,
> and SIGCONT restarts all of them.

Yes, but every thread gets a SIGCONT and the tracee will get a ptrace
event for each thread. It can let the threads it doesn't want to continue
hang in the ptrace handling, and continue only one with PTRACE_CONT.

Any reason why this can't work?

This way there is a clear distinction between trapped and group stops.
When traced tasks are stopped, they're in a group stop. When gdb is poking
around, they're in trapped state.

>
>> But this is for a group stop initiated by gdb, I suppose?
>
> It is true for any king of group-stop.
>
>> In that case gdb
>> can just let the threads hang in the signal delivery path, and continue them
>> one by one, like it does now.
>
> It can't, since it doesn't know about signal handlers.

The SIGTSTP thing, right? That is indeed a problem of using SIGSTOP.

Is the same true for SIGCONT? Is there any other way to continue a stopped task?
If not, then do what I proposed after a SIGCONT sent by gdb instead.

That has the great advantage that normal group stops just work and gdb doesn't
have to ask at group stop time what the user wants. Instead, it just waits till
the user tells gdb whether to continue all threads or just one, and then do the
appropriate action at SIGCONT time.

The reason we all forget SIGCONT is because in current ptrace you never see it.

> I repeat: it is not about signals sent by gdb. It is about siganls sent by anyone.

Okay.

>
>> >>  No need for PTRACE_LISTEN. Tracer has
>> >>   still full control over stopped state of tracee because it can block SIGSTOP
>> >>   and SIGCONT signals, and send them itself.
>> >
>> > SIGCONT's side effect of waking up from group-stop can't be blocked.
>> > SIGCONT always wakes up all threads in thread group.
>> > Using SIGCONT to control tracee will race with SIGCONTs from other
>> > sources.
>> >
>> > This makes SIGCONT a too coarse instrument for the job.
>>
>> No kidding. Perhaps the solution is to not use group stops for this in
>> the first place.
>>
>> But PTRACE_CONT results in one traced task running while the rest is still
>> stopped, that could be called an often unwanted side effect too.
>
> Yes. But gdb people want that. I and Oleg tried to persuade them to stop
> wanting that. The end result is that they persuaded us that it's needed.

I agree that the gdb people should be happy with any changes.

> In short: "screw gdb needs, the elegance of my idea is more important
> that their real world needs". Sorry, but we tried that, and it leads nowhere.

No! If gdb can't do what it needs then it's no good.

> We don't know. We should handle the widest possible set of scenarios:
> ^Z, manual signals, signals sent by kernel on bg I/O (such as SIGTTIN)...

Okay.

>
>> Anyway, gdb can give this choice when it receives the SIGSTOP:
>> If users want it to go into group stop, gdb does PTRACE_CONT.
>> If users want to selectively continue running, gdb continues
>> select threads while blocking the SIGSTOP delivery.
>
> Here we go again. If gdb would block signal delivery, HOW DOES
> IT KNOW THAT THIS SIGNAL IS GOING TO CAUSE GROUP-STOP?
> Think about receiving SIGTSTP. Will it cause group-stop?

Sorry. It's painfully clear now.

>
>> >> The behaviour you're defending is generally subtle and unexpected behaviour
>> >> that most ptrace users don't expect.
>> >
>> > What behavior do I defend? That we get both signal-stops and
>> > group-stops? I don't "defend" it, it's just what we _already have_. We
>> > need to do minimal amount of changes.
>>
>> The group stop interference behaviour. I don't think that adding
>> PTRACE_INTERRUPT, PTRACE_LISTEN and all those new options is the
>> minimal amount of change, especially not when including the user
>> space changes needed to make use of this.
>
> Current group-stop fix is EXACTLY one new ptrace command: PTRACE_LISTEN.
> How you can make it more minimal, I have no idea.

But look at the example code you sent that is needed to make normal
group stop/continuation work. The minimal change is to make that
work by default when setting a new option, without taking away the
means to selectively continue single threads.

>
>
>> > In short, you propose to make it possible to switch off group-stop
>> > notification.
>>
>> That is just part of the proposal, the main this is to not let PTRACE_CONT
>> continue a stopped task.
>
> This will break gdb. I told this several times already. Not acceptable.

It probably can use SIGCONT to achieve what it wants. Letting PTRACE_CONT
not resume stopped tasks would fix normal group stops without any changes.
Only programs that want to resume single threads while holding the rest
need to change their code slightly to let the tasks hang in SIGCONT instead
of group stop notification.

>
>> Switching off group-stop notifications makes it
>> only easier to use ptrace when not specifically interested in group stops.
>
> Solution in the search of a problem.

You're probably right that it's not worth the trouble. I'm fine if tracers
always get a group stop notification, but if so, there should be an option
that makes PTRACE_CONT not continue group stopped tasks.

What I want to avoid is adding extra specific code in the tracer when all
it wants is to be transparent to group stops. (Except the setting of an
extra option, of course.)

One way is to not get the group stop notifications, so the tracer doesn't
accidentally continues the stopped tasks. To me this seems the simplest
solution, but it has the problem that it only works for tracers not
interested in group stop. To also make it work with group stops, my idea
was to let PTRACE_CONT not continue stopped tasks. Instead of trapping
tasks at the beginning of group stop, they can trap the tasks at group
stop exit time, which is much more natural anyway (because it is group
continuation that gdb wants to prevent, not group stop).

With PTRACE_LISTEN the tracee has to handle group stops specially, then,
instead of doing nothing for group stop notifications, it has to use
PTRACE_LISTEN to emulate group stop and continuation for the tracee. No
one is going to bother to add code for this when they're not interested
in group stops, especially not if the code is portable at all. It's just
not worth the trouble. The behaviour is too strange and different with
PTRACE_LISTEN.

So yes, you fix the problem with PTRACE_LISTEN, but in practice the problem
won't be fixed and group stop will be still broken. Read the decription of
PTRACE_LISTEN! How big and complicated do you want to make the ptrace manpage?

Compare that to:

 "PTRACE_O_FOO makes PTRACE_CONT/PTRACE_SYSCALL/etc. not automatically
  continue a group stopped process, so that SIGSTOP and SIGCONT work for
  ptraced processes. Tracers that want to control thread continuation
  can do that at SIGCONT time."

>
>> > (2) in gdb case, this may be too constraining for us: we want to be able
>> > to decide what to do on group-stop *after* group-stop has happened!
>>
>> See above, that should be still possible. Or am I missing something?
>
> Yes, you miss the fact that inferring group-stop on signal delivery
> is not reliable. The correct way to do it is to not *infer* it, but
> *observe* it when it really happens.

You're right about that. You've convinced me that group stop notification
is often needed.

As gdb actually wants to control thread continuation, not group stopping
(as you said, that can happen in many ways), it can use SIGCONT to control
which thread to run, at thread continuation time when it knows what the user
wants. That way normal, external group stops and continuations keep working,
but gdb still has the means to take total control and only run one thread,
without external SIGCONT messing it up. Changes to gdb are minimal, it just
has to do what it does for group stops now at SIGCONT time instead. And all
others don't have to emulate group stops in user space with PTRACE_LISTEN.

I'm probably still missing something, but not as much as before.

Greetings,

Indan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-06 17:08             ` Indan Zupancic
@ 2011-09-07  2:34               ` Denys Vlasenko
  2011-09-07 17:15                 ` Indan Zupancic
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-07  2:34 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Tuesday 06 September 2011 19:08, Indan Zupancic wrote:
> >> > Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
> >> > to be able to see real SIGTRAPs they send to the program,
> >> > or ones generated by, say, int3 instruction on x86.
> >>
> >> But SIGTRAP is ours, ptrace already sends SIGTRAPS at execve.
> >
> > ...whcih causes problems, and therefore we have PTRACE_O_TRACEEXEC to suppress
> > this idiotic post-execve SIGTRAP.
> 
> The SIGTRAP is still there, there is just an extra bit to distinguish
> it from normal SIGTRAPs.

No. SIGTRAP is no longer 'there'. The PTRACE_EVENT_EXEC happens in a different
place: after syscall-entry, but before syscall-exit. Also, it can't be blocked
by signal mask. Old-style post-execve SIGTRAP happens after syscall exit,
and can be blocked by signal mask.

IOW: PTRACE_EVENT_EXEC is not a signal at all. It just happens to have 
WSTOPSIG(status) == SIGTRAP.

> Sending another SIGTRAP with another bit set, 
> just as with PTRACE_O_TRACEEXEC, solves this issue.


> >> Only change is
> >> that it also sends one for new childs instead of SIGSTOP.
> >
> > Racing with user's SIGTRAP does not fix the problem, it merely moves it
> > to a different signal. PTRACE_EVENT_STOP thingy fixes the problem once
> > and for all.
> 
> It doesn't really matter if it's SIGSTOP with an extra flag, or SIGTRAP
> with an extra flag set. But SIGTRAP is already used this way for fork
> and exec,

SIGTRAP is used for fork? I don't think so.

> so using it always instead of sometimes SIGSTOP would be more 
> consistent. It's also clearer what to do with the SIGTRAP as tracer: Block
> it so the tracee doesn't see it. With SIGSTOP it's less clear what to do.
> In current trace it doesn't matter what you do, in practice, and that's also
> what the manpage says. So using SIGTRAP with an extra bit set is cleaner.



> >> Tracers that want to get group stop notifications will set WSTOPPED and get
> >> that information that way. But as ptrace won't generate any SIGSTOPs, they
> >> don't have to use GETSIGINFO to know if it came from ptrace or not: It never
> >> does.
> >
> > Drats. In your proposal, if I'd set WSTOPPED, I will get group-stops, right?
> 
> Group stop notifications, yes, that was the idea.
> 
> > How in your proposed solution can I "restart and cancel group-stop" after
> > group-stop? And how can I "restart and wait for a SIGCONT"?
> 
> Not sure what you mean with restart, just send another SIGSTOP?
> 
> Blocking a group stop happens by blocking the SIGSTOP delivery to the tracee.

Here we go again.

I will write it in bold.

YOU MUST NOT BLOCK DELIVERY OF STOPPING SIGNALS! IT'S ***WRONG***!

I already told you numerous times that SIGTSTP, SISTTIN and SIGTTOU
can have non-default handlers. Do you understand what this means?
I guess you don't, since you never replied with the explanation how you
propose to with this problem.

I also told you that SIGSTOP/SIGCONT are *process-wide*. Meaning
that if you stop on SIGSTOP delivery to one thread of a multithreaded process
and will not restart it via PTRACE_CONT, then THIS PARTICULAL THREAD
will stop running, yes, but all other threads WILL NOT STOP.

> Each tracee gets a SIGSTOP ptrace event, it's not like only one gets it and
> that controls the state of the whole group.

It's exactly the opposite. SIGSTOP signal-delivery-stop happens on *one* thread,
and if tracer does not suppress it, then all threads go into group-stop.

> Same for SIGCONT. I'll have to 
> double check this, but I think this is how it works.

Please do double-check.


> >> A group-stop notification is not really a ptrace event (maybe it is now),
> >> so PTRACE_CONT wouldn't be needed. It's just a group stop notification,
> >> not a freeze, report and wait event, like signals.
> >
> > Yes, this would be a reasonable API model. PTRACE_CONT cancels group-stop,
> > and *doing nothing* leaves task in the group-stop. One problem with this:
> > many ptrace ops (such as GETREGS) are only allowed in ptrace stops.
> > If you don't consider group-stop to be a ptrace-stop, then no such
> > ops are allowed. In fact, even restart ops need stopped tracee, so PTRACE_CONT
> > is illegal too. IOW: in this model, kernel doesn't know when we decided
> > to go down *do nothing* route. From kernel POW, it is confused. Maybe we do
> > want to PTRACE_CONT, but just didn't get around to it?
> 
> It's probably best to not change the current model for group stop notifications
> (when requested via WSTOPPED), as that is confusing, even if the new behaviour
> is better (which it probably isn't).
> 
> You're right that if a tracee requests group stop notifications, it probably
> wants to be able to poke around too sometimes, and then a ptrace slightly
> entwined with group stops is preferred. But it's not elegant. :-/

How kernel knows that tracer is done with poking around, and now it's ok to
wake up on SIGCONT?


> >> > then we have a problem: gdb can't use this interface, it
> >> > needs to be able to restart the thread (only one thread, not all of
> >> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> >> > weird, but it's the real requirement from gdb users.
> >>
> >> Is that true? Isn't a SIGCONT sent to the TID only for that thread instead
> >> of the whole group? That's slightly inconvenient indeed. Perhaps this
> >> limitation can be fixed? Might be troublesome for the main thread.
> >
> > This is how stopping/starting works. It's per-process, not per-thread.
> > Regardless of the thread it is sent to, stop signals stop all threads,
> > and SIGCONT restarts all of them.
> 
> Yes, but every thread gets a SIGCONT

No, not every thread. Only one thread gets it.

> and the tracee will get a ptrace 
> event for each thread. It can let the threads it doesn't want to continue
> hang in the ptrace handling, and continue only one with PTRACE_CONT.
> 
> Any reason why this can't work?

Gosh......... I already told you why it won't work.


> >> > In short, you propose to make it possible to switch off group-stop
> >> > notification.
> >>
> >> That is just part of the proposal, the main this is to not let PTRACE_CONT
> >> continue a stopped task.
> >
> > This will break gdb. I told this several times already. Not acceptable.
> 
> It probably can use SIGCONT to achieve what it wants.

SIGCONT is magic. It wakes all group-stopped threads, and this effect
can't be blocked *even via ptrace*. This means that restarting only one
thread via SIGCONT is impossible.


> Letting PTRACE_CONT 
> not resume stopped tasks would fix normal group stops without any changes.

Exactly our line of thought before gdb people shot down our idea.

> Only programs that want to resume single threads while holding the rest
> need to change their code slightly to let the tasks hang in SIGCONT instead
> of group stop notification.

> >> Switching off group-stop notifications makes it
> >> only easier to use ptrace when not specifically interested in group stops.
> >
> > Solution in the search of a problem.
> 
> You're probably right that it's not worth the trouble. I'm fine if tracers
> always get a group stop notification, but if so, there should be an option
> that makes PTRACE_CONT not continue group stopped tasks.

Yes. Exactly. That "option" is called PTRACE_LISTEN.


> What I want to avoid is adding extra specific code in the tracer when all
> it wants is to be transparent to group stops.

You mean, you want to avoid this atrocious bloat? -

-	ptrace(PTRACE_CONT, pid, 0, sig);
+	int is_stopped = .... determine whether it's a group_stop...
+	if (is_stopped) ptrace(PTRACE_LISTEN, pid, 0, 0);
+	else            ptrace(PTRACE_CONT, pid, 0, sig);

Somehow this doesn't look like big problem to me.


> One way is to not get the group stop notifications, so the tracer doesn't
> accidentally continues the stopped tasks. To me this seems the simplest
> solution, but it has the problem that it only works for tracers not
> interested in group stop. To also make it work with group stops, my idea
> was to let PTRACE_CONT not continue stopped tasks. Instead of trapping
> tasks at the beginning of group stop, they can trap the tasks at group
> stop exit time, which is much more natural anyway (because it is group
> continuation that gdb wants to prevent, not group stop).
> 
> With PTRACE_LISTEN the tracee has to handle group stops specially, then,
> instead of doing nothing for group stop notifications, it has to use
> PTRACE_LISTEN to emulate group stop and continuation for the tracee. No
> one is going to bother to add code for this when they're not interested
> in group stops, especially not if the code is portable at all.

Portable code using ptrace? I pity those who need to maintain that...


> It's just not worth the trouble. The behaviour is too strange and different with
> PTRACE_LISTEN.

I disagree. See code fragment above. It's simple.


> Compare that to:
> 
>  "PTRACE_O_FOO makes PTRACE_CONT/PTRACE_SYSCALL/etc. not automatically
>   continue a group stopped process, so that SIGSTOP and SIGCONT work for
>   ptraced processes. Tracers that want to control thread continuation
>   can do that at SIGCONT time."

This doesn't allow to make a decision whether to honor or ignore group-stop
*after* it happened. 


-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-07  2:34               ` Denys Vlasenko
@ 2011-09-07 17:15                 ` Indan Zupancic
  0 siblings, 0 replies; 28+ messages in thread
From: Indan Zupancic @ 2011-09-07 17:15 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

Hello,

The end of this thread is in sight.

On Wed, September 7, 2011 04:34, Denys Vlasenko wrote:
> On Tuesday 06 September 2011 19:08, Indan Zupancic wrote:
>> >> > Problem. Now we interfere with SIGSTRAPs. Yes, there are users who want
>> >> > to be able to see real SIGTRAPs they send to the program,
>> >> > or ones generated by, say, int3 instruction on x86.
>> >>
>> >> But SIGTRAP is ours, ptrace already sends SIGTRAPS at execve.
>> >
>> > ...whcih causes problems, and therefore we have PTRACE_O_TRACEEXEC to suppress
>> > this idiotic post-execve SIGTRAP.
>>
>> The SIGTRAP is still there, there is just an extra bit to distinguish
>> it from normal SIGTRAPs.
>
> No. SIGTRAP is no longer 'there'. The PTRACE_EVENT_EXEC happens in a different
> place: after syscall-entry, but before syscall-exit. Also, it can't be blocked
> by signal mask. Old-style post-execve SIGTRAP happens after syscall exit,
> and can be blocked by signal mask.

That would be inconvenient, to say the least.

> IOW: PTRACE_EVENT_EXEC is not a signal at all. It just happens to have
> WSTOPSIG(status) == SIGTRAP.

Is the new fake SIGTRAP ever passed on to the tracee? Or is "data" ignored
for PTRACE_EVENT_EXEC? Or is SIGTRAP passed on if set in data? Just curious,
it doesn't really matter.

More generic question is if setting data to anything can generate a new
signal, or if it is only used to block, pass or change existing signals?

>> It doesn't really matter if it's SIGSTOP with an extra flag, or SIGTRAP
>> with an extra flag set. But SIGTRAP is already used this way for fork
>> and exec,
>
> SIGTRAP is used for fork? I don't think so.

Well, the thing that looks like a SIGTRAP but isn't, which you get when
setting PTRACE_O_TRACEFORK. The manpage is confusing because it says the
child gets the SIGTRAP | PTRACE_EVENT_FORK << 8, but it means the parent
pracee that does the fork. The child tracee gets a SIGSTOP. It would be
more consistent if the child would get a SIGTRAP | PTRACE_EVENT_FORK << 8
or variant too. Otherwise the tracer can't know for sure whether to pass
on the SIGSTOP or not. (Now it doesn't matter, but it does if group stops
get fixed.)

>> so using it always instead of sometimes SIGSTOP would be more
>> consistent. It's also clearer what to do with the SIGTRAP as tracer: Block
>> it so the tracee doesn't see it. With SIGSTOP it's less clear what to do.
>> In current trace it doesn't matter what you do, in practice, and that's also
>> what the manpage says. So using SIGTRAP with an extra bit set is cleaner.
>
>
>
>> >> Tracers that want to get group stop notifications will set WSTOPPED and get
>> >> that information that way. But as ptrace won't generate any SIGSTOPs, they
>> >> don't have to use GETSIGINFO to know if it came from ptrace or not: It never
>> >> does.
>> >
>> > Drats. In your proposal, if I'd set WSTOPPED, I will get group-stops, right?
>>
>> Group stop notifications, yes, that was the idea.
>>
>> > How in your proposed solution can I "restart and cancel group-stop" after
>> > group-stop? And how can I "restart and wait for a SIGCONT"?
>>
>> Not sure what you mean with restart, just send another SIGSTOP?
>>
>> Blocking a group stop happens by blocking the SIGSTOP delivery to the tracee.
>
> Here we go again.
>
> I will write it in bold.
>
> YOU MUST NOT BLOCK DELIVERY OF STOPPING SIGNALS! IT'S ***WRONG***!

Sorry, relax! I meant the SIGSTOP that isn't a SIGSTOP, but looks like a
SIGSTOP the tracer gets as group notification. Not the SIGSTOP causing
the group stop. I hope. Either that, or I should have read your whole
email before replying, or going back to this bit to fix it, when I finally
understood the problem of trying to manage group stops at SIGSTOP time.
I can't remember, it's getting a bit fuzzy.

> I already told you numerous times that SIGTSTP, SISTTIN and SIGTTOU
> can have non-default handlers. Do you understand what this means?
> I guess you don't, since you never replied with the explanation how you
> propose to with this problem.

I thought I admitted that using SIGSTOP wasn't sufficient and that group
stop notifications are indeed often necessary, exactly because of these
issues. But perhaps I should have said explicitly, that yes, I get it:
Using anything else than group stop notifications is unreliable and
doesn't always work.

Keep in mind that I never proposed to get rid of group stop, only that
programs should have an option to not get them, so group stops and
continuations just work for them.

> I also told you that SIGSTOP/SIGCONT are *process-wide*. Meaning
> that if you stop on SIGSTOP delivery to one thread of a multithreaded process
> and will not restart it via PTRACE_CONT, then THIS PARTICULAL THREAD
> will stop running, yes, but all other threads WILL NOT STOP.

Okay, now it's my time to use caps:

THAT DOES NOT MATTER _IF_ ALL THREADS GET A SIGSTOP/SIGCONT.

If that would be the case, then it doesn't matter if it's process-wide or
not, because the tracer would get a chance to grab all threads. But sadly,
this isn't the case, as I mistakenly thought with my sloppy testing.

>> Each tracee gets a SIGSTOP ptrace event, it's not like only one gets it and
>> that controls the state of the whole group.
>
> It's exactly the opposite. SIGSTOP signal-delivery-stop happens on *one* thread,
> and if tracer does not suppress it, then all threads go into group-stop.
>
>> Same for SIGCONT. I'll have to
>> double check this, but I think this is how it works.
>
> Please do double-check.

I did. You are right, I confused the SIGSTOP after fork/clone my program gets
because it sets PTRACE_O_TRACEFORK/CLONE for a normal SIGSTOP notification,
so I always saw one SIGSTOP for every thread and forked task (I tested both).

And in that case, yes of course it's idiotic to try to control group stop
my misguided way.

>> It's probably best to not change the current model for group stop notifications
>> (when requested via WSTOPPED), as that is confusing, even if the new behaviour
>> is better (which it probably isn't).
>>
>> You're right that if a tracee requests group stop notifications, it probably
>> wants to be able to poke around too sometimes, and then a ptrace slightly
>> entwined with group stops is preferred. But it's not elegant. :-/
>
> How kernel knows that tracer is done with poking around, and now it's ok to
> wake up on SIGCONT?

Well, if you don't change the model, the same as now, of course.

But if I'd ignore backward compatibility and would create a new API,
I'd probably make ptrace trap any task when doing a ptrace operation
on it, so you could issue a ptrace at any time instead of only on
specific moments. Once grabbed, you have to let it go with PTRACE_CONT.
So to answer this hypotetical question, in that case, if no ptrace
operation was done, no action would be needed. For the kernel there
would be a clear distinction between group-stopped and ptrace-trapped.
It's too late for any such changes now though.

>> Any reason why this can't work?
>
> Gosh......... I already told you why it won't work.

Keep in mind I saw SIGSTOPs for all threads in my tests, so thinking
that all threads get a SIGCONT too (when stopped) isn't that stupid.

> SIGCONT is magic. It wakes all group-stopped threads, and this effect
> can't be blocked *even via ptrace*. This means that restarting only one
> thread via SIGCONT is impossible.

Magic indeed. I guess I got sucked too much into "a thread is just a process"
thing. I assumed the group effect was implemented by automatically propagating
group-wide signals when needed in a semi-hidden way.

>> Letting PTRACE_CONT
>> not resume stopped tasks would fix normal group stops without any changes.
>
> Exactly our line of thought before gdb people shot down our idea.

Sad, and I'm running out of alternatives. I've only two left.

>> Only programs that want to resume single threads while holding the rest
>> need to change their code slightly to let the tasks hang in SIGCONT instead
>> of group stop notification.
>
>> >> Switching off group-stop notifications makes it
>> >> only easier to use ptrace when not specifically interested in group stops.
>> >
>> > Solution in the search of a problem.
>>
>> You're probably right that it's not worth the trouble. I'm fine if tracers
>> always get a group stop notification, but if so, there should be an option
>> that makes PTRACE_CONT not continue group stopped tasks.
>
> Yes. Exactly. That "option" is called PTRACE_LISTEN.

Hm, but PTRACE_LISTEN isn't a fire and forget thing, it seems too complicated
and stateful to use easily. I'll take another look at it.

I want to put the burden on gdb and other users who want to mess with group
stop state.

>
>> What I want to avoid is adding extra specific code in the tracer when all
>> it wants is to be transparent to group stops.
>
> You mean, you want to avoid this atrocious bloat? -
>
> -	ptrace(PTRACE_CONT, pid, 0, sig);
> +	int is_stopped = .... determine whether it's a group_stop...
> +	if (is_stopped) ptrace(PTRACE_LISTEN, pid, 0, 0);
> +	else            ptrace(PTRACE_CONT, pid, 0, sig);
>
> Somehow this doesn't look like big problem to me.

If that was all, then yes, it would be fine.

Okay, silly question:

If SIGCONT on one thread continues all, then you have no guarantee that
the thread you did PTRACE_LISTEN on will get the SIGCONT. Hence you have
no chance to do a PTRACE_SYSCALL or other variant of PTRACE_CONT before
the thread runs.

But reading the patch description it seems PTRACE_LISTEN is more subtle,
it doesn't really run a task, it puts it in quasi running stat, and the
tracer is supposed to keep track of all tasks in the group and continue
the PTRACE_LISTENed task later on or something?!

If the thread runs after a SIGCONT on another task, then PTRACE_LISTEN
is broken. If the task doesn't run ("quasi running" state), then what
is needed to make it run and how the hell does the tracer know when to
press the magic button? (Keep in mind that the task getting the SIGCONT
may not be traced.) As far as I know there are no group continuation
notification events. If there are, then those could be used instead of
SIGCONT in my previous silly proposal, and unlike those, it would work.

So all in all it seems that instead of PTRACE_LISTEN, an extra argument
for PTRACE_CONT/PTRACE_SYSCALL/PTRACE_EMU/etc is much better. This flag
would always work, so no need for if checks and so on. The flag would
say to not continue a group stopped task, e.g. PTRACE_MAYSTOP. This would
probably have to bet set in "addr".

Then it would become:

-	ptrace(PTRACE_CONT, pid, 0, sig);
+	ptrace(PTRACE_CONT, pid, PTRACE_MAYSTOP, sig);

Advantage is that this would be truly the only code change needed, no
messing around with state, except if you want to take full control
like gdb needs sometimes. (But not more than currently.)

>
>> One way is to not get the group stop notifications, so the tracer doesn't
>> accidentally continues the stopped tasks. To me this seems the simplest
>> solution, but it has the problem that it only works for tracers not
>> interested in group stop. To also make it work with group stops, my idea
>> was to let PTRACE_CONT not continue stopped tasks. Instead of trapping
>> tasks at the beginning of group stop, they can trap the tasks at group
>> stop exit time, which is much more natural anyway (because it is group
>> continuation that gdb wants to prevent, not group stop).
>>
>> With PTRACE_LISTEN the tracee has to handle group stops specially, then,
>> instead of doing nothing for group stop notifications, it has to use
>> PTRACE_LISTEN to emulate group stop and continuation for the tracee. No
>> one is going to bother to add code for this when they're not interested
>> in group stops, especially not if the code is portable at all.
>
> Portable code using ptrace? I pity those who need to maintain that...

Tell me about it, I saw strace's code. That's why I limit my ptrace code to
only Linux, and that's hard enough already. (Idea being that a 2.6 compiled
binary works on a 2.4 kernel.)

>
>> It's just not worth the trouble. The behaviour is too strange and different with
>> PTRACE_LISTEN.
>
> I disagree. See code fragment above. It's simple.

It seems more complicated than that, but I hope I'm missing something again.

Actually, what I'm missing is WCONTINUED. Why the hell is gdb not using that
to control thread running? If it did, it won't have to ask the user what
to do at group stop time, it could automatically do the right thing at group
continuation time. Or is the tracee not in trapped state when WCONTINUED
happens, but it is when WSTOPPED happens? If gdb got WCONTINUED, I don't see
a reason why gdb would ever need to control group stop state via PTRACE_CONT.

What was the gdb's people argument to shoot down this idea?

>
>> Compare that to:
>>
>>  "PTRACE_O_FOO makes PTRACE_CONT/PTRACE_SYSCALL/etc. not automatically
>>   continue a group stopped process, so that SIGSTOP and SIGCONT work for
>>   ptraced processes. Tracers that want to control thread continuation
>>   can do that at SIGCONT time."
>
> This doesn't allow to make a decision whether to honor or ignore group-stop
> *after* it happened.

Gdb also wants to control group stop beginning? Does it ever wants to block
group stop from happening, and should it have the right if it doesn't trace
all tasks in the group? SIGSTOP was supposed to be not blockable.

Currently the tracer gets a group stop notification, which means, I assume,
that a group stop happened. How can gdb prevent a group stop from happening?

I think all it can do is either end the group stop or let the tasks hang in
trapped state. With the new options it would also be able to let them go into
a real group stop.

So all in all I propose the following:

- Add the PTRACE_MAYSTOP argument to 'addr' for continuating operations like
  PTRACE_CONT/PTRACE_SYSCALL/etc. When set, ptrace won't end a group stop
  and the task can go from trapped into stopped state.

This has the same function as PTRACE_LISTEN, except it's much simpler to use
and explain. If 'addr' is currently ignored, then programs can set this argument
without worrying about ptrace returning an error and it's backward compatible.

An alternative is to add an option to make PTRACE_MAYSTOP the default. And add
a PTRACE_DO_CONT instead, if WCONTINUED isn't good enough for gdb. Or say that
a SIGCONT passed at group notification time restores the old behaviour.

What do you think?

Greetings,

Indan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05 14:06       ` Denys Vlasenko
  2011-09-05 17:21         ` Indan Zupancic
@ 2011-09-05 17:44         ` Indan Zupancic
  2011-09-06  1:05           ` Denys Vlasenko
  1 sibling, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-05 17:44 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Mon, September 5, 2011 16:06, Denys Vlasenko wrote:
> In case you meant that "if we request group-stop notifications by using
> __WALL | WSTOPPED, and we get group-stop notification, and we do
> PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
> or death)", then we have a problem: gdb can't use this interface, it
> needs to be able to restart the thread (only one thread, not all of
> them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> weird, but it's the real requirement from gdb users.
[...]
> SIGCONT's side effect of waking up from group-stop can't be blocked.
> SIGCONT always wakes up all threads in thread group.
> Using SIGCONT to control tracee will race with SIGCONTs from other
> sources.
>
> This makes SIGCONT a too coarse instrument for the job.
[...]
> Yes... until gdb will want to give user a choice after SIGSTOP: continue
> to sit in group-stop until SIGCONT (wasn't possible until
> PTRACE_LISTEN), or continue executing (gdb's current behavior if user
> uses "continue" command). Therefore, gdb needs a way to do both.

Having thought a bit more about this, I think this is less of a problem
than it seems, because for a group stop we get a ptrace event for each
task, and this should be true for SIGCONT as well. So gdb could also
always let the group stop happen, and only when prompted to do so by
a user, continue one thread by sending SIGCONT and letting all the other
threads hang in trapped state. So the above choice doesn't have to be
given, it can be made explicitly by the user at SIGCONT time (continue
all threads, or just one).

Regards,

Indan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-05 17:44         ` Indan Zupancic
@ 2011-09-06  1:05           ` Denys Vlasenko
  2011-09-06 17:19             ` Indan Zupancic
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-06  1:05 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Monday 05 September 2011 19:44, Indan Zupancic wrote:
> On Mon, September 5, 2011 16:06, Denys Vlasenko wrote:
> > In case you meant that "if we request group-stop notifications by using
> > __WALL | WSTOPPED, and we get group-stop notification, and we do
> > PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
> > or death)", then we have a problem: gdb can't use this interface, it
> > needs to be able to restart the thread (only one thread, not all of
> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> > weird, but it's the real requirement from gdb users.
> [...]
> > SIGCONT's side effect of waking up from group-stop can't be blocked.
> > SIGCONT always wakes up all threads in thread group.
> > Using SIGCONT to control tracee will race with SIGCONTs from other
> > sources.
> >
> > This makes SIGCONT a too coarse instrument for the job.
> [...]
> > Yes... until gdb will want to give user a choice after SIGSTOP: continue
> > to sit in group-stop until SIGCONT (wasn't possible until
> > PTRACE_LISTEN), or continue executing (gdb's current behavior if user
> > uses "continue" command). Therefore, gdb needs a way to do both.
> 
> Having thought a bit more about this, I think this is less of a problem
> than it seems, because for a group stop we get a ptrace event for each
> task, and this should be true for SIGCONT as well. So gdb could also
> always let the group stop happen, and only when prompted to do so by
> a user, continue one thread by sending SIGCONT and letting all the other
> threads hang in trapped state.

Won't work. SIGCONT unpauses all threads in the thread group,
and _then_ it is delivered to one of the threads. You can block
or ignore it, yes, but it is too late: the unpausing already happened,
and blocking/ignoring will only affect SIGCONT handler execution,
if the program has one.

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-06  1:05           ` Denys Vlasenko
@ 2011-09-06 17:19             ` Indan Zupancic
  2011-09-07  2:47               ` Denys Vlasenko
  0 siblings, 1 reply; 28+ messages in thread
From: Indan Zupancic @ 2011-09-06 17:19 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

Hi,

On Tue, September 6, 2011 03:05, Denys Vlasenko wrote:
> On Monday 05 September 2011 19:44, Indan Zupancic wrote:
>> On Mon, September 5, 2011 16:06, Denys Vlasenko wrote:
>> > In case you meant that "if we request group-stop notifications by using
>> > __WALL | WSTOPPED, and we get group-stop notification, and we do
>> > PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
>> > or death)", then we have a problem: gdb can't use this interface, it
>> > needs to be able to restart the thread (only one thread, not all of
>> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
>> > weird, but it's the real requirement from gdb users.
>> [...]
>> > SIGCONT's side effect of waking up from group-stop can't be blocked.
>> > SIGCONT always wakes up all threads in thread group.
>> > Using SIGCONT to control tracee will race with SIGCONTs from other
>> > sources.
>> >
>> > This makes SIGCONT a too coarse instrument for the job.
>> [...]
>> > Yes... until gdb will want to give user a choice after SIGSTOP: continue
>> > to sit in group-stop until SIGCONT (wasn't possible until
>> > PTRACE_LISTEN), or continue executing (gdb's current behavior if user
>> > uses "continue" command). Therefore, gdb needs a way to do both.
>>
>> Having thought a bit more about this, I think this is less of a problem
>> than it seems, because for a group stop we get a ptrace event for each
>> task, and this should be true for SIGCONT as well. So gdb could also
>> always let the group stop happen, and only when prompted to do so by
>> a user, continue one thread by sending SIGCONT and letting all the other
>> threads hang in trapped state.
>
> Won't work. SIGCONT unpauses all threads in the thread group,
> and _then_ it is delivered to one of the threads.

No, it is delivered to _all_ threads. With current ptrace you never see a
SIGCONT, but it should behave the same as SIGSTOP, which results in one
ptrace event for each thread.

> You can block
> or ignore it, yes, but it is too late: the unpausing already happened,
> and blocking/ignoring will only affect SIGCONT handler execution,
> if the program has one.

Not doing PTRACE_CONT will keep the thread hanging in trapped state.
All threads get a SIGCONT, not only one, so you can pause all threads
this way.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-06 17:19             ` Indan Zupancic
@ 2011-09-07  2:47               ` Denys Vlasenko
  2011-09-07 14:24                 ` Indan Zupancic
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-07  2:47 UTC (permalink / raw)
  To: Indan Zupancic; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Tuesday 06 September 2011 19:19, Indan Zupancic wrote:
> >> > In case you meant that "if we request group-stop notifications by using
> >> > __WALL | WSTOPPED, and we get group-stop notification, and we do
> >> > PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
> >> > or death)", then we have a problem: gdb can't use this interface, it
> >> > needs to be able to restart the thread (only one thread, not all of
> >> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
> >> > weird, but it's the real requirement from gdb users.
> >> [...]
> >> > SIGCONT's side effect of waking up from group-stop can't be blocked.
> >> > SIGCONT always wakes up all threads in thread group.
> >> > Using SIGCONT to control tracee will race with SIGCONTs from other
> >> > sources.
> >> >
> >> > This makes SIGCONT a too coarse instrument for the job.
> >> [...]
> >> > Yes... until gdb will want to give user a choice after SIGSTOP: continue
> >> > to sit in group-stop until SIGCONT (wasn't possible until
> >> > PTRACE_LISTEN), or continue executing (gdb's current behavior if user
> >> > uses "continue" command). Therefore, gdb needs a way to do both.
> >>
> >> Having thought a bit more about this, I think this is less of a problem
> >> than it seems, because for a group stop we get a ptrace event for each
> >> task, and this should be true for SIGCONT as well. So gdb could also
> >> always let the group stop happen, and only when prompted to do so by
> >> a user, continue one thread by sending SIGCONT and letting all the other
> >> threads hang in trapped state.
> >
> > Won't work. SIGCONT unpauses all threads in the thread group,
> > and _then_ it is delivered to one of the threads.
> 
> No, it is delivered to _all_ threads.

Wrong.

> With current ptrace you never see a SIGCONT

Wrong. Even rather old strace 4.5.9 does show it.



#include <stdlib.h>
#include <pthread.h>
static void *threadfunc(void *p)
{
        sleep(10);
        exit(0);
}
int main()
{
        printf("%d\n", getpid());
        pthread_t thread;
        pthread_create(&thread, NULL, threadfunc, NULL);
        sleep(10);
        exit(0);
}


$ gcc -Os -lpthread t.c -ot
$ strace -V
strace -- version 4.5.9
$ strace -oLOG -s999 -tt -f ./t
umovestr: Input/output error
9590
umovestr: Input/output error
ptrace: umoven: Input/output error

In other terminal: "kill -CONT 9590"

LOG:

9590  04:41:13.984640 clone(...) = 9591
9590  04:41:13.984712 rt_sigprocmask(SIG_BLOCK, [CHLD],  <unfinished ...>
...
9591  04:41:13.984972 <... rt_sigaction resumed> {SIG_DFL}, 8) = 0
9590  04:41:13.984993 nanosleep({10, 0},  <unfinished ...>
9591  04:41:13.985015 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
9591  04:41:13.985056 nanosleep({10, 0},  <unfinished ...>
9590  04:41:19.969687 <... nanosleep resumed> 0xff9e6fa4) = ? ERESTART_RESTARTBLOCK (To be restarted)
9590  04:41:19.969762 --- SIGCONT (Continued) @ 0 (0) ---
9590  04:41:19.969791 setup()           = 0
9591  04:41:23.985155 <... nanosleep resumed> {10, 0}) = 0
9590  04:41:23.985201 exit_group(0)     = ?
9591  04:41:23.985231 exit_group(0)     = ?


Take a good look. There was no SIGCONT delivery to thread 9591.


> > You can block
> > or ignore it, yes, but it is too late: the unpausing already happened,
> > and blocking/ignoring will only affect SIGCONT handler execution,
> > if the program has one.
> 
> Not doing PTRACE_CONT will keep the thread hanging in trapped state.
> All threads get a SIGCONT, not only one, so you can pause all threads
> this way.

As I said, you are wrong about SIGCONT.

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-07  2:47               ` Denys Vlasenko
@ 2011-09-07 14:24                 ` Indan Zupancic
  0 siblings, 0 replies; 28+ messages in thread
From: Indan Zupancic @ 2011-09-07 14:24 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Oleg Nesterov, Tejun Heo, linux-kernel

On Wed, September 7, 2011 04:47, Denys Vlasenko wrote:
> On Tuesday 06 September 2011 19:19, Indan Zupancic wrote:
>> >> > In case you meant that "if we request group-stop notifications by using
>> >> > __WALL | WSTOPPED, and we get group-stop notification, and we do
>> >> > PTRACE_CONT, then task does not run (it sits in group-stop until SIGCONT
>> >> > or death)", then we have a problem: gdb can't use this interface, it
>> >> > needs to be able to restart the thread (only one thread, not all of
>> >> > them, so sending SIGCONT is not ok!) from the group-stop. Yes, it's
>> >> > weird, but it's the real requirement from gdb users.
>> >> [...]
>> >> > SIGCONT's side effect of waking up from group-stop can't be blocked.
>> >> > SIGCONT always wakes up all threads in thread group.
>> >> > Using SIGCONT to control tracee will race with SIGCONTs from other
>> >> > sources.
>> >> >
>> >> > This makes SIGCONT a too coarse instrument for the job.
>> >> [...]
>> >> > Yes... until gdb will want to give user a choice after SIGSTOP: continue
>> >> > to sit in group-stop until SIGCONT (wasn't possible until
>> >> > PTRACE_LISTEN), or continue executing (gdb's current behavior if user
>> >> > uses "continue" command). Therefore, gdb needs a way to do both.
>> >>
>> >> Having thought a bit more about this, I think this is less of a problem
>> >> than it seems, because for a group stop we get a ptrace event for each
>> >> task, and this should be true for SIGCONT as well. So gdb could also
>> >> always let the group stop happen, and only when prompted to do so by
>> >> a user, continue one thread by sending SIGCONT and letting all the other
>> >> threads hang in trapped state.
>> >
>> > Won't work. SIGCONT unpauses all threads in the thread group,
>> > and _then_ it is delivered to one of the threads.
>>
>> No, it is delivered to _all_ threads.
>
> Wrong.

Argh, indeed! I always confused the SIGSTOP after fork/clone with the SIGSTOP
I sent, thinking the SIGSTOP was sent to all tasks.

Well, so much for that idea then.

>> With current ptrace you never see a SIGCONT
>
> Wrong. Even rather old strace 4.5.9 does show it.

I meant a resuming SIGCONT, not just any SIGCONT. But that a normal SIGCONT isn't
seen by all threads should have been a hint for me that perhaps the same isn't true
for SIGSTOPs, and that perhaps I should take a better look instead of repeating the
same test and mistake.

>> Not doing PTRACE_CONT will keep the thread hanging in trapped state.
>> All threads get a SIGCONT, not only one, so you can pause all threads
>> this way.
>
> As I said, you are wrong about SIGCONT.

Indeed, I was wrong.

Greetings,

Indan



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: PTRACE_SEIZE needs API cleanup?
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
  2011-09-05  1:15 ` Indan Zupancic
@ 2011-09-05 14:54 ` Denys Vlasenko
  2011-09-05 16:51 ` [PATCH 1/2] Fix pollution of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05 14:54 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel

On Sun, 2011-09-04 at 23:11 +0200, Denys Vlasenko wrote:
> Hi guys,
> 
> I added code to use PTRACE_SEIZE in strace and in the process
> had tasted how API looks like from userspace POV.
> 
> It is usable, but API feels somewhat quirky.
> 
> Consider the following: one of reasons why we added PTRACE_SEIZE
> is that existing ptrace API has unnecessary complications
> (quirks) such as SIGSTOP on attach, SIGTRAP after execve.
> 
> But whoever designed strace did not deliberately designed these quirks in,
> he thought it was a good, reasonable design. Only after a second, third,
> tenth look it became obvious in retrospect that some things are
> not exactly right.
> 
> Thankfully, quirks in new PTRACE_SEIZE code mostly have the nature of
> "unnecessarily invented entities" as opposed to problems
> in trying to use API in real world tasks, but I still think they are
> annoying enough to be looked at.
> 
> 
> We already have a mechanism how to modify ptrace behavior: ptrace options.
> But now we introduce a different mechnism to do the same: by using SEIZE
> instead of ATTACH, we magically change some aspects of ptrace.
> In effect, SEIZE sets some options. And these "SEIZE options" can't be
> set or cleared by SETOPTIONS. This is stupid. Why can't we just add
> more options instead of inventing new entities? Why we overloaded
> SEIZE with two functions: "attach to, but don't SIGSTOP the tracee"
> and "change behaviour of ptrace on this tracee"?
> 
> If the argument is that we want to set options immediately at attach,
> then I completely agree: yes, we do! Moreover, we want to set some
> _ordinary_ options too, such as PTRACE_O_TRACEEXEC, and we can't
> do that even now, in improved API! It needs more improving.
> 
> So my proposal is:
> (a) make SEIZE take a parameter "immediately set these options on attach"
> (b) without any options, make SEIZE just do "ATTACH sans SIGSTOP" thing,
>     nothing more.
> (c) make the following new PTRACE_O_foo options:
>    (1) "flag stops with PTRACE_EVENT_STOP event value in waitpid status"
>    (2) "enable PTRACE_INTERRUPT. It either causes PTRACE_EVENT_STOP with sig=SIGTRAP
>         if (1) is enabled, or creates a group-stop with sig=SIGTRAP otherwise"
>        [if the second part is too weird to implement, make (2) require (1)]
>    (3) "enable PTRACE_LISTEN. Works on group-stops even without any other options"
>    (4) "make auto-attached children stop a-la INTERRUPT, not with SIGSTOP"
>    (5) "enable saner error codes"


A crude patch is below. It rolls points 1-4 from above into a single
new option, PTRACE_O_TRACESTOP.

PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but it allows all
PTRACE_O_opts to be set at attach time (they are passed in data param) -
including PTRACE_O_TRACESTOP, of course.

All formerly PTRACE_SEIZE-enabled behavior is now enabled by
PTRACE_O_TRACESTOP instead. PT_SEIZED bit is removed.

While at it, I cleaned up the following:

Exchanged PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which made
PT_option bits contiguous and therefore made ptrace_setoptions much
simpler.

If ptrace_setoptions fails, options are not affected.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is better.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Not compile tested.

-- 
vda



diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..e2ba2dd 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -62,8 +62,9 @@
 #define PTRACE_O_TRACEEXEC	0x00000010
 #define PTRACE_O_TRACEVFORKDONE	0x00000020
 #define PTRACE_O_TRACEEXIT	0x00000040
+#define PTRACE_O_TRACESTOP	0x00000080
 
-#define PTRACE_O_MASK		0x0000007f
+#define PTRACE_O_MASK		0x000000ff
 
 /* Wait extended result codes for the above trace options.  */
 #define PTRACE_EVENT_FORK	1
@@ -85,24 +86,21 @@
  * flags.  When the a task is stopped the ptracer owns task->ptrace.
  */
 
-#define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
-#define PT_TRACESYSGOOD	0x00000004
-#define PT_PTRACE_CAP	0x00000008	/* ptracer can follow suid-exec */
+#define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
 
+#define PT_OPT_FLAG_SHIFT	3
+#define PT_TRACESYSGOOD	0x00000008	/* must be directly before PT_TRACE_event bits! */
 /* PT_TRACE_* event enable flags */
-#define PT_EVENT_FLAG_SHIFT	4
-#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
-
+#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
 #define PT_TRACE_FORK		PT_EVENT_FLAG(PTRACE_EVENT_FORK)
 #define PT_TRACE_VFORK		PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
 #define PT_TRACE_CLONE		PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
 #define PT_TRACE_EXEC		PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
-
-#define PT_TRACE_MASK	0x000003f4
+#define PT_TRACE_STOP		PT_EVENT_FLAG(PTRACE_EVENT_STOP)
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
@@ -228,7 +226,7 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
 		child->ptrace = current->ptrace;
 		__ptrace_link(child, current->parent);
 
-		if (child->ptrace & PT_SEIZED)
+		if (child->ptrace & PTRACE_EVENT_STOP)
 			task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);
 		else
 			sigaddset(&child->pending.signal, SIGSTOP);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..0841969 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -219,19 +219,23 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * SEIZE will enable new ptrace behaviors which will be implemented
-	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * gradually.  SEIZE_DEVEL bit is used to prevent applications
 	 * expecting full SEIZE behaviors trapping on kernel commits which
 	 * are still in the process of implementing them.
 	 *
 	 * Only test programs for new ptrace behaviors being implemented
 	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
 	 *
-	 * Once SEIZE behaviors are completely implemented, this flag and
-	 * the following test will be removed.
+	 * Once SEIZE behaviors are completely implemented, this flag
+	 * will be removed.
 	 */
 	retval = -EIO;
-	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
-		goto out;
+	if (seize) {
+		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
+			goto out;
+		flags &= ~PTRACE_SEIZE_DEVEL;
+	} else
+		flags = 0;
 
 	audit_ptrace(task);
 
@@ -263,9 +267,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
-	task->ptrace = PT_PTRACED;
-	if (seize)
-		task->ptrace |= PT_SEIZED;
+	task->ptrace = PT_PTRACED | (flags << PT_OPT_FLAG_SHIFT);
 	if (task_ns_capable(task, CAP_SYS_PTRACE))
 		task->ptrace |= PT_PTRACE_CAP;
 
@@ -509,30 +511,13 @@ int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long ds
 
 static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 {
-	child->ptrace &= ~PT_TRACE_MASK;
-
-	if (data & PTRACE_O_TRACESYSGOOD)
-		child->ptrace |= PT_TRACESYSGOOD;
-
-	if (data & PTRACE_O_TRACEFORK)
-		child->ptrace |= PT_TRACE_FORK;
-
-	if (data & PTRACE_O_TRACEVFORK)
-		child->ptrace |= PT_TRACE_VFORK;
-
-	if (data & PTRACE_O_TRACECLONE)
-		child->ptrace |= PT_TRACE_CLONE;
-
-	if (data & PTRACE_O_TRACEEXEC)
-		child->ptrace |= PT_TRACE_EXEC;
-
-	if (data & PTRACE_O_TRACEVFORKDONE)
-		child->ptrace |= PT_TRACE_VFORK_DONE;
+	if (data & ~(long)PTRACE_O_MASK)
+		return -EINVAL;
 
-	if (data & PTRACE_O_TRACEEXIT)
-		child->ptrace |= PT_TRACE_EXIT;
+	child->ptrace &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
+	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
 
-	return (data & ~PTRACE_O_MASK) ? -EINVAL : 0;
+	return 0;
 }
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
@@ -666,7 +651,7 @@ static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
 int ptrace_request(struct task_struct *child, long request,
 		   unsigned long addr, unsigned long data)
 {
-	bool seized = child->ptrace & PT_SEIZED;
+	bool stop_events_enabled = child->ptrace & PT_TRACE_STOP;
 	int ret = -EIO;
 	siginfo_t siginfo, *si;
 	void __user *datavp = (void __user *) data;
@@ -715,7 +700,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * The actual trap might not be PTRACE_EVENT_STOP trap but
 		 * the pending condition is cleared regardless.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		/*
@@ -740,7 +725,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * again.  Alternatively, ptracer can issue INTERRUPT to
 		 * finish listening and re-trap tracee into STOP.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		si = child->last_siginfo;
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..11bae20 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -823,8 +823,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * @t: tracee wanting to notify tracer
  *
  * This function schedules sticky ptrace trap which is cleared on the next
- * TRAP_STOP to notify ptracer of an event.  @t must have been seized by
- * ptracer.
+ * TRAP_STOP to notify ptracer of an event.  @t must have PTRACE_O_TRACESTOP
+ * option active.
  *
  * If @t is running, STOP trap will be taken.  If trapped for STOP and
  * ptracer is listening for events, tracee is woken up so that it can
@@ -837,7 +837,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  */
 static void ptrace_trap_notify(struct task_struct *t)
 {
-	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+	WARN_ON_ONCE(!(t->ptrace & PT_TRACE_STOP));
 	assert_spin_locked(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
@@ -882,7 +882,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_TRACE_STOP)))
 				wake_up_state(t, __TASK_STOPPED);
 			else
 				ptrace_trap_notify(t);
@@ -2004,7 +2004,7 @@ static bool do_signal_stop(int signr)
 			if (!task_is_stopped(t) &&
 			    task_set_jobctl_pending(t, signr | gstop)) {
 				sig->group_stop_count++;
-				if (likely(!(t->ptrace & PT_SEIZED)))
+				if (likely(!(t->ptrace & PT_TRACE_STOP)))
 					signal_wake_up(t, 0);
 				else
 					ptrace_trap_notify(t);
@@ -2057,13 +2057,13 @@ static bool do_signal_stop(int signr)
 /**
  * do_jobctl_trap - take care of ptrace jobctl traps
  *
- * When PT_SEIZED, it's used for both group stop and explicit
+ * When PT_TRACE_STOP is on, it's used for both group stop and explicit
  * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap with
  * accompanying siginfo.  If stopped, lower eight bits of exit_code contain
  * the stop signal; otherwise, %SIGTRAP.
  *
- * When !PT_SEIZED, it's used only for group stop trap with stop signal
- * number as exit_code and no siginfo.
+ * When PT_TRACE_STOP is off, it's used only for group stop trap
+ * with stop signal number as exit_code and no siginfo.
  *
  * CONTEXT:
  * Must be called with @current->sighand->siglock held, which may be
@@ -2074,7 +2074,7 @@ static void do_jobctl_trap(void)
 	struct signal_struct *signal = current->signal;
 	int signr = current->jobctl & JOBCTL_STOP_SIGMASK;
 
-	if (current->ptrace & PT_SEIZED) {
+	if (current->ptrace & PT_TRACE_STOP) {
 		if (!signal->group_stop_count &&
 		    !(signal->flags & SIGNAL_STOP_STOPPED))
 			signr = SIGTRAP;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/2] Fix pollution of task->ptrace if PTRACE_SETOPTIONS fails.
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
  2011-09-05  1:15 ` Indan Zupancic
  2011-09-05 14:54 ` Denys Vlasenko
@ 2011-09-05 16:51 ` Denys Vlasenko
  2011-09-05 17:01 ` [PATCH 2/2] Denys Vlasenko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05 16:51 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel

> A crude patch is below. It rolls points 1-4 from above into a single
> new option, PTRACE_O_TRACESTOP.


For easier review, split it into two parts. Part 1 follows.
(It is still not even compile-tested).
-- 
vda


Fix pollution of task->ptrace if PTRACE_SETOPTIONS fails.


If PTRACE_SETOPTIONS fails, options should not be affected.

While at it, simplified ptrace_setoptions():

Exchanged PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which made
PT_option bits contiguous and therefore made ptrace_setoptions much
simpler.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>


diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..9aa669c 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -88,13 +88,12 @@
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
-#define PT_TRACESYSGOOD	0x00000004
-#define PT_PTRACE_CAP	0x00000008	/* ptracer can follow suid-exec */
+#define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
 
+#define PT_OPT_FLAG_SHIFT	3
+#define PT_TRACESYSGOOD	0x00000008	/* must be directly before PT_TRACE_event bits! */
 /* PT_TRACE_* event enable flags */
-#define PT_EVENT_FLAG_SHIFT	4
-#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
-
+#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
 #define PT_TRACE_FORK		PT_EVENT_FLAG(PTRACE_EVENT_FORK)
 #define PT_TRACE_VFORK		PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
 #define PT_TRACE_CLONE		PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
@@ -102,8 +101,6 @@
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
 
-#define PT_TRACE_MASK	0x000003f4
-
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
 #define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
@@ -228,7 +225,7 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
 		child->ptrace = current->ptrace;
 		__ptrace_link(child, current->parent);
 
-		if (child->ptrace & PT_SEIZED)
+		if (child->ptrace & PTRACE_EVENT_STOP)
 			task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);
 		else
 			sigaddset(&child->pending.signal, SIGSTOP);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..9c3536d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -219,19 +219,23 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * SEIZE will enable new ptrace behaviors which will be implemented
-	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * gradually.  SEIZE_DEVEL bit is used to prevent applications
 	 * expecting full SEIZE behaviors trapping on kernel commits which
 	 * are still in the process of implementing them.
 	 *
 	 * Only test programs for new ptrace behaviors being implemented
 	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
 	 *
-	 * Once SEIZE behaviors are completely implemented, this flag and
-	 * the following test will be removed.
+	 * Once SEIZE behaviors are completely implemented, this flag
+	 * will be removed.
 	 */
 	retval = -EIO;
-	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
-		goto out;
+	if (seize) {
+		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
+			goto out;
+		flags &= ~PTRACE_SEIZE_DEVEL;
+	} else
+		flags = 0;
 
 	audit_ptrace(task);
 
@@ -243,7 +247,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * Protect exec's credential calculations against our interference;
-	 * interference; SUID, SGID and LSM creds get determined differently
+	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
 	retval = -ERESTARTNOINTR;
@@ -509,30 +513,13 @@ int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long ds
 
 static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 {
-	child->ptrace &= ~PT_TRACE_MASK;
-
-	if (data & PTRACE_O_TRACESYSGOOD)
-		child->ptrace |= PT_TRACESYSGOOD;
-
-	if (data & PTRACE_O_TRACEFORK)
-		child->ptrace |= PT_TRACE_FORK;
-
-	if (data & PTRACE_O_TRACEVFORK)
-		child->ptrace |= PT_TRACE_VFORK;
-
-	if (data & PTRACE_O_TRACECLONE)
-		child->ptrace |= PT_TRACE_CLONE;
-
-	if (data & PTRACE_O_TRACEEXEC)
-		child->ptrace |= PT_TRACE_EXEC;
-
-	if (data & PTRACE_O_TRACEVFORKDONE)
-		child->ptrace |= PT_TRACE_VFORK_DONE;
+	if (data & ~(long)PTRACE_O_MASK)
+		return -EINVAL;
 
-	if (data & PTRACE_O_TRACEEXIT)
-		child->ptrace |= PT_TRACE_EXIT;
+	child->ptrace &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
+	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
 
-	return (data & ~PTRACE_O_MASK) ? -EINVAL : 0;
+	return 0;
 }
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)




^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2]
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
                   ` (2 preceding siblings ...)
  2011-09-05 16:51 ` [PATCH 1/2] Fix pollution of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
@ 2011-09-05 17:01 ` Denys Vlasenko
  2011-09-05 17:06 ` [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior Denys Vlasenko
  2011-09-06 16:52 ` [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
  5 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05 17:01 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel

On Mon, 2011-09-05 at 18:51 +0200, Denys Vlasenko wrote:
> > A crude patch is below. It rolls points 1-4 from above into a single
> > new option, PTRACE_O_TRACESTOP.
> 
> For easier review, split it into two parts. Part 1 follows.
> (It is still not even compile-tested).

Part 2.
-- 
vda


Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.

Introduce new ptrace option, PTRACE_O_TRACESTOP. This makes API
more symmetric: every PTRACE_EVENT_event has corresponding PTRACE_O_TRACEevent now,
as it used to have before PTRACE_SEIZE was introduced.

PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but with this patch
it allows any PTRACE_O_opts to be set at attach time (they are passed in data param)
- including PTRACE_O_TRACESTOP, of course. Without any options,
PTRACE_SEIZE is equivalent to PTRACE_ATTACH. With only PTRACE_O_TRACESTOP option,
PTRACE_SEIZE behavior will be the same as PTRACE_SEIZE behavior before this patch.

This opens up two new possibilities: ptrace options can be set on attach
(can be used to close a few corner cases in strace);
PTRACE_LISTEN, PTRACE_INTERRUPT commands and PTRACE_EVENT_STOP event
can be enabled with PTRACE_SETOPTIONS with PTRACE_O_TRACESTOP
(not a big deal, but IMO this makes API more symmetric).

All formerly PTRACE_SEIZE-enabled behavior is now enabled by
PTRACE_O_TRACESTOP instead (by PT_TRACE_STOP bit). PT_SEIZED bit is removed.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>



diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..e2ba2dd 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -62,8 +62,9 @@
 #define PTRACE_O_TRACEEXEC	0x00000010
 #define PTRACE_O_TRACEVFORKDONE	0x00000020
 #define PTRACE_O_TRACEEXIT	0x00000040
+#define PTRACE_O_TRACESTOP	0x00000080
 
-#define PTRACE_O_MASK		0x0000007f
+#define PTRACE_O_MASK		0x000000ff
 
 /* Wait extended result codes for the above trace options.  */
 #define PTRACE_EVENT_FORK	1
@@ -85,24 +86,21 @@
  * flags.  When the a task is stopped the ptracer owns task->ptrace.
  */
 
-#define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
-#define PT_TRACESYSGOOD	0x00000004
-#define PT_PTRACE_CAP	0x00000008	/* ptracer can follow suid-exec */
+#define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
 
+#define PT_OPT_FLAG_SHIFT	3
+#define PT_TRACESYSGOOD	0x00000008	/* must be directly before PT_TRACE_event bits! */
 /* PT_TRACE_* event enable flags */
-#define PT_EVENT_FLAG_SHIFT	4
-#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
-
+#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
 #define PT_TRACE_FORK		PT_EVENT_FLAG(PTRACE_EVENT_FORK)
 #define PT_TRACE_VFORK		PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
 #define PT_TRACE_CLONE		PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
 #define PT_TRACE_EXEC		PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
-
-#define PT_TRACE_MASK	0x000003f4
+#define PT_TRACE_STOP		PT_EVENT_FLAG(PTRACE_EVENT_STOP)
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
@@ -228,7 +226,7 @@ static inline void ptrace_init_task(struct task_struct *child, bool ptrace)
 		child->ptrace = current->ptrace;
 		__ptrace_link(child, current->parent);
 
-		if (child->ptrace & PT_SEIZED)
+		if (child->ptrace & PTRACE_EVENT_STOP)
 			task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);
 		else
 			sigaddset(&child->pending.signal, SIGSTOP);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..0bf3d74 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -219,19 +219,23 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * SEIZE will enable new ptrace behaviors which will be implemented
-	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * gradually.  SEIZE_DEVEL bit is used to prevent applications
 	 * expecting full SEIZE behaviors trapping on kernel commits which
 	 * are still in the process of implementing them.
 	 *
 	 * Only test programs for new ptrace behaviors being implemented
 	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
 	 *
-	 * Once SEIZE behaviors are completely implemented, this flag and
-	 * the following test will be removed.
+	 * Once SEIZE behaviors are completely implemented, this flag
+	 * will be removed.
 	 */
 	retval = -EIO;
-	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
-		goto out;
+	if (seize) {
+		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
+			goto out;
+		flags &= ~PTRACE_SEIZE_DEVEL;
+	} else
+		flags = 0;
 
 	audit_ptrace(task);
 
@@ -243,7 +247,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * Protect exec's credential calculations against our interference;
-	 * interference; SUID, SGID and LSM creds get determined differently
+	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
 	retval = -ERESTARTNOINTR;
@@ -263,9 +267,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
-	task->ptrace = PT_PTRACED;
-	if (seize)
-		task->ptrace |= PT_SEIZED;
+	task->ptrace = PT_PTRACED | (flags << PT_OPT_FLAG_SHIFT);
 	if (task_ns_capable(task, CAP_SYS_PTRACE))
 		task->ptrace |= PT_PTRACE_CAP;
 
@@ -509,30 +511,13 @@ int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long ds
 
 static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 {
-	child->ptrace &= ~PT_TRACE_MASK;
-
-	if (data & PTRACE_O_TRACESYSGOOD)
-		child->ptrace |= PT_TRACESYSGOOD;
-
-	if (data & PTRACE_O_TRACEFORK)
-		child->ptrace |= PT_TRACE_FORK;
-
-	if (data & PTRACE_O_TRACEVFORK)
-		child->ptrace |= PT_TRACE_VFORK;
-
-	if (data & PTRACE_O_TRACECLONE)
-		child->ptrace |= PT_TRACE_CLONE;
-
-	if (data & PTRACE_O_TRACEEXEC)
-		child->ptrace |= PT_TRACE_EXEC;
-
-	if (data & PTRACE_O_TRACEVFORKDONE)
-		child->ptrace |= PT_TRACE_VFORK_DONE;
+	if (data & ~(long)PTRACE_O_MASK)
+		return -EINVAL;
 
-	if (data & PTRACE_O_TRACEEXIT)
-		child->ptrace |= PT_TRACE_EXIT;
+	child->ptrace &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
+	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
 
-	return (data & ~PTRACE_O_MASK) ? -EINVAL : 0;
+	return 0;
 }
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
@@ -666,7 +651,7 @@ static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
 int ptrace_request(struct task_struct *child, long request,
 		   unsigned long addr, unsigned long data)
 {
-	bool seized = child->ptrace & PT_SEIZED;
+	bool stop_events_enabled = child->ptrace & PT_TRACE_STOP;
 	int ret = -EIO;
 	siginfo_t siginfo, *si;
 	void __user *datavp = (void __user *) data;
@@ -715,7 +700,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * The actual trap might not be PTRACE_EVENT_STOP trap but
 		 * the pending condition is cleared regardless.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		/*
@@ -740,7 +725,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * again.  Alternatively, ptracer can issue INTERRUPT to
 		 * finish listening and re-trap tracee into STOP.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		si = child->last_siginfo;
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..9248600 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -823,8 +823,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * @t: tracee wanting to notify tracer
  *
  * This function schedules sticky ptrace trap which is cleared on the next
- * TRAP_STOP to notify ptracer of an event.  @t must have been seized by
- * ptracer.
+ * TRAP_STOP to notify ptracer of an event.  @t must have PTRACE_O_TRACESTOP
+ * option active.
  *
  * If @t is running, STOP trap will be taken.  If trapped for STOP and
  * ptracer is listening for events, tracee is woken up so that it can
@@ -837,7 +837,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  */
 static void ptrace_trap_notify(struct task_struct *t)
 {
-	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+	WARN_ON_ONCE(!(t->ptrace & PT_TRACE_STOP));
 	assert_spin_locked(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
@@ -882,7 +882,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_TRACE_STOP)))
 				wake_up_state(t, __TASK_STOPPED);
 			else
 				ptrace_trap_notify(t);
@@ -2004,7 +2004,7 @@ static bool do_signal_stop(int signr)
 			if (!task_is_stopped(t) &&
 			    task_set_jobctl_pending(t, signr | gstop)) {
 				sig->group_stop_count++;
-				if (likely(!(t->ptrace & PT_SEIZED)))
+				if (likely(!(t->ptrace & PT_TRACE_STOP)))
 					signal_wake_up(t, 0);
 				else
 					ptrace_trap_notify(t);
@@ -2057,13 +2057,13 @@ static bool do_signal_stop(int signr)
 /**
  * do_jobctl_trap - take care of ptrace jobctl traps
  *
- * When PT_SEIZED, it's used for both group stop and explicit
- * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap with
+ * When PT_TRACE_STOP is on, it's used for both group stop and explicit
+ * INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap with
  * accompanying siginfo.  If stopped, lower eight bits of exit_code contain
  * the stop signal; otherwise, %SIGTRAP.
  *
- * When !PT_SEIZED, it's used only for group stop trap with stop signal
- * number as exit_code and no siginfo.
+ * When PT_TRACE_STOP is off, it's used only for group stop trap
+ * with stop signal number as exit_code and no siginfo.
  *
  * CONTEXT:
  * Must be called with @current->sighand->siglock held, which may be
@@ -2074,7 +2074,7 @@ static void do_jobctl_trap(void)
 	struct signal_struct *signal = current->signal;
 	int signr = current->jobctl & JOBCTL_STOP_SIGMASK;
 
-	if (current->ptrace & PT_SEIZED) {
+	if (current->ptrace & PT_TRACE_STOP) {
 		if (!signal->group_stop_count &&
 		    !(signal->flags & SIGNAL_STOP_STOPPED))
 			signr = SIGTRAP;



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
                   ` (3 preceding siblings ...)
  2011-09-05 17:01 ` [PATCH 2/2] Denys Vlasenko
@ 2011-09-05 17:06 ` Denys Vlasenko
  2011-09-06 20:08   ` Oleg Nesterov
  2011-09-06 16:52 ` [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
  5 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-05 17:06 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Oleg Nesterov, Tejun Heo, linux-kernel

On Mon, 2011-09-05 at 18:51 +0200, Denys Vlasenko wrote:
> > A crude patch is below. It rolls points 1-4 from above into a single
> > new option, PTRACE_O_TRACESTOP.
> 
> For easier review, split it into two parts. Part 1 follows.
> (It is still not even compile-tested).

Part 2. (ignore untitled one sent a few mins ago)
-- 
vda


Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.

Introduce new ptrace option, PTRACE_O_TRACESTOP. This makes API
more symmetric: every PTRACE_EVENT_event has corresponding PTRACE_O_TRACEevent now,
as it used to have before PTRACE_SEIZE was introduced.

PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but with this patch
it allows any PTRACE_O_opts to be set at attach time (they are passed in data param)
- including PTRACE_O_TRACESTOP, of course. Without any options,
PTRACE_SEIZE is equivalent to PTRACE_ATTACH. With only PTRACE_O_TRACESTOP option,
PTRACE_SEIZE behavior will be the same as PTRACE_SEIZE behavior before this patch.

This opens up two new possibilities:
(1) Ptrace options can be set on attach. Can be used to close a few corner cases
in strace where we get unwanted behavior before we have a chance to set options,
and removes the need to track "did we set opts for this task" state in strace
internals.
(2) PTRACE_LISTEN, PTRACE_INTERRUPT commands and PTRACE_EVENT_STOP event
can be enabled with PTRACE_SETOPTIONS with PTRACE_O_TRACESTOP.
Not a big deal, but IMO this makes API more symmetric.

All formerly PTRACE_SEIZE-enabled behavior is now enabled by
PTRACE_O_TRACESTOP instead (by PT_TRACE_STOP bit). PT_SEIZED bit is removed.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>


diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 9aa669c..e2ba2dd 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -62,8 +62,9 @@
 #define PTRACE_O_TRACEEXEC	0x00000010
 #define PTRACE_O_TRACEVFORKDONE	0x00000020
 #define PTRACE_O_TRACEEXIT	0x00000040
+#define PTRACE_O_TRACESTOP	0x00000080
 
-#define PTRACE_O_MASK		0x0000007f
+#define PTRACE_O_MASK		0x000000ff
 
 /* Wait extended result codes for the above trace options.  */
 #define PTRACE_EVENT_FORK	1
@@ -85,7 +86,6 @@
  * flags.  When the a task is stopped the ptracer owns task->ptrace.
  */
 
-#define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
 #define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
@@ -100,6 +100,7 @@
 #define PT_TRACE_EXEC		PT_EVENT_FLAG(PTRACE_EVENT_EXEC)
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
+#define PT_TRACE_STOP		PT_EVENT_FLAG(PTRACE_EVENT_STOP)
 
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9c3536d..0bf3d74 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -267,9 +267,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 	if (task->ptrace)
 		goto unlock_tasklist;
 
-	task->ptrace = PT_PTRACED;
-	if (seize)
-		task->ptrace |= PT_SEIZED;
+	task->ptrace = PT_PTRACED | (flags << PT_OPT_FLAG_SHIFT);
 	if (task_ns_capable(task, CAP_SYS_PTRACE))
 		task->ptrace |= PT_PTRACE_CAP;
 
@@ -653,7 +651,7 @@ static int ptrace_regset(struct task_struct *task, int req, unsigned int type,
 int ptrace_request(struct task_struct *child, long request,
 		   unsigned long addr, unsigned long data)
 {
-	bool seized = child->ptrace & PT_SEIZED;
+	bool stop_events_enabled = child->ptrace & PT_TRACE_STOP;
 	int ret = -EIO;
 	siginfo_t siginfo, *si;
 	void __user *datavp = (void __user *) data;
@@ -702,7 +700,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * The actual trap might not be PTRACE_EVENT_STOP trap but
 		 * the pending condition is cleared regardless.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		/*
@@ -727,7 +725,7 @@ int ptrace_request(struct task_struct *child, long request,
 		 * again.  Alternatively, ptracer can issue INTERRUPT to
 		 * finish listening and re-trap tracee into STOP.
 		 */
-		if (unlikely(!seized || !lock_task_sighand(child, &flags)))
+		if (unlikely(!stop_events_enabled || !lock_task_sighand(child, &flags)))
 			break;
 
 		si = child->last_siginfo;
diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..9248600 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -823,8 +823,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
  * @t: tracee wanting to notify tracer
  *
  * This function schedules sticky ptrace trap which is cleared on the next
- * TRAP_STOP to notify ptracer of an event.  @t must have been seized by
- * ptracer.
+ * TRAP_STOP to notify ptracer of an event.  @t must have PTRACE_O_TRACESTOP
+ * option active.
  *
  * If @t is running, STOP trap will be taken.  If trapped for STOP and
  * ptracer is listening for events, tracee is woken up so that it can
@@ -837,7 +837,7 @@ static int check_kill_permission(int sig, struct siginfo *info,
  */
 static void ptrace_trap_notify(struct task_struct *t)
 {
-	WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+	WARN_ON_ONCE(!(t->ptrace & PT_TRACE_STOP));
 	assert_spin_locked(&t->sighand->siglock);
 
 	task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
@@ -882,7 +882,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
 		do {
 			task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
 			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-			if (likely(!(t->ptrace & PT_SEIZED)))
+			if (likely(!(t->ptrace & PT_TRACE_STOP)))
 				wake_up_state(t, __TASK_STOPPED);
 			else
 				ptrace_trap_notify(t);
@@ -2004,7 +2004,7 @@ static bool do_signal_stop(int signr)
 			if (!task_is_stopped(t) &&
 			    task_set_jobctl_pending(t, signr | gstop)) {
 				sig->group_stop_count++;
-				if (likely(!(t->ptrace & PT_SEIZED)))
+				if (likely(!(t->ptrace & PT_TRACE_STOP)))
 					signal_wake_up(t, 0);
 				else
 					ptrace_trap_notify(t);
@@ -2057,13 +2057,13 @@ static bool do_signal_stop(int signr)
 /**
  * do_jobctl_trap - take care of ptrace jobctl traps
  *
- * When PT_SEIZED, it's used for both group stop and explicit
- * SEIZE/INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap with
+ * When PT_TRACE_STOP is on, it's used for both group stop and explicit
+ * INTERRUPT traps.  Both generate PTRACE_EVENT_STOP trap with
  * accompanying siginfo.  If stopped, lower eight bits of exit_code contain
  * the stop signal; otherwise, %SIGTRAP.
  *
- * When !PT_SEIZED, it's used only for group stop trap with stop signal
- * number as exit_code and no siginfo.
+ * When PT_TRACE_STOP is off, it's used only for group stop trap
+ * with stop signal number as exit_code and no siginfo.
  *
  * CONTEXT:
  * Must be called with @current->sighand->siglock held, which may be
@@ -2074,7 +2074,7 @@ static void do_jobctl_trap(void)
 	struct signal_struct *signal = current->signal;
 	int signr = current->jobctl & JOBCTL_STOP_SIGMASK;
 
-	if (current->ptrace & PT_SEIZED) {
+	if (current->ptrace & PT_TRACE_STOP) {
 		if (!signal->group_stop_count &&
 		    !(signal->flags & SIGNAL_STOP_STOPPED))
 			signr = SIGTRAP;




^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
  2011-09-05 17:06 ` [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior Denys Vlasenko
@ 2011-09-06 20:08   ` Oleg Nesterov
  2011-09-06 23:06     ` Tejun Heo
  2011-09-07  4:55     ` Denys Vlasenko
  0 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2011-09-06 20:08 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

On 09/05, Denys Vlasenko wrote:
>
> Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
>
> Introduce new ptrace option, PTRACE_O_TRACESTOP. This makes API
> more symmetric: every PTRACE_EVENT_event has corresponding PTRACE_O_TRACEevent now,
> as it used to have before PTRACE_SEIZE was introduced.
>
> PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but with this patch
> it allows any PTRACE_O_opts to be set at attach time

Well. This assumes that the only difference with PTRACE_SEIZE is the
new stop/interrupt behaviour. I am not sure this is "safe" to assume.

Tejun, what do you think?

>From the correctness pov, the patch is mostly correct. but you forgot
to update ptrace_init_task(). I bet you didn't try to test the patch ;)

>  int ptrace_request(struct task_struct *child, long request,
>  		   unsigned long addr, unsigned long data)
>  {
> -	bool seized = child->ptrace & PT_SEIZED;
> +	bool stop_events_enabled = child->ptrace & PT_TRACE_STOP;

May be ptrace_event_enabled(child, PTRACE_EVENT_STOP) looks better...
The same about other PT_TRACE_STOP checks, although this is cosmetic.


And. Given that you can set/clear PT_TRACE_STOP in ptrace_setoptions(),
you need the locking.

Just for example. do_signal_stop() calls ptrace_trap_notify() and hits
WARN_ON_ONCE(!PT_TRACE_STOP) because it was cleared in between.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
  2011-09-06 20:08   ` Oleg Nesterov
@ 2011-09-06 23:06     ` Tejun Heo
  2011-09-07  4:55     ` Denys Vlasenko
  1 sibling, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2011-09-06 23:06 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, Denys Vlasenko, linux-kernel

Hello,

On Tue, Sep 06, 2011 at 10:08:18PM +0200, Oleg Nesterov wrote:
> On 09/05, Denys Vlasenko wrote:
> > PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but with this patch
> > it allows any PTRACE_O_opts to be set at attach time
> 
> Well. This assumes that the only difference with PTRACE_SEIZE is the
> new stop/interrupt behaviour. I am not sure this is "safe" to assume.
> 
> Tejun, what do you think?

The biggest problem would be locking against the tracee.  Currently,
it's assumed that the behavior is determined at seize/attach time and
stay that way.  Tracee behaves differently to maintain compatibility
or show the new behavior.  If the state changes on the fly, we need to
change how tracee behaves.  Maybe we'll need some re-trap trickery or
can just get away with making tracee always behave in the new way and
somehow present it differently if !TRACESTOP.

That said, I can't see much point in this excercise.  Why does this
even matter?  This is almost purely cosmetic and any effect on the
usability of the API is so too.  The gained easiness of the API change
is in the realm of a few if statements.  There's nothing to be gained
by allowing flipping TRACESTOP.  Why would any program which is aware
of the new behavior turn it off and if not why would we want to
complicate things by supporting a feature which wouldn't be useful
risking higher chance of breakage (both from increased complexity and
lack of usage)?

ptrace is an ugly interface.  Some of that is inherent but probably
most of it is from how it was designed and has evolved over time.  As
far as I'm concerned, the goals are making it feature-complete and not
deviating too much from what's already there.

ptrace is painful to use with or without PTRACE_O_TRACESTOP.  IMHO, if
we want to address this issue, a better approach would be implementing
a wrapper library which hides the silliness of the kernel interface
and provides useable set of interface and information.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
  2011-09-06 20:08   ` Oleg Nesterov
  2011-09-06 23:06     ` Tejun Heo
@ 2011-09-07  4:55     ` Denys Vlasenko
  2011-09-07 16:37       ` Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-07  4:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

On Tuesday 06 September 2011 22:08, Oleg Nesterov wrote:
> On 09/05, Denys Vlasenko wrote:
> >
> > Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
> >
> > Introduce new ptrace option, PTRACE_O_TRACESTOP. This makes API
> > more symmetric: every PTRACE_EVENT_event has corresponding PTRACE_O_TRACEevent now,
> > as it used to have before PTRACE_SEIZE was introduced.
> >
> > PTRACE_SEIZE does not assume PTRACE_O_TRACESTOP, but with this patch
> > it allows any PTRACE_O_opts to be set at attach time
> 
> Well. This assumes that the only difference with PTRACE_SEIZE is the
> new stop/interrupt behaviour. I am not sure this is "safe" to assume.

I'd say that with this change, PTRACE_SEIZE is just PTRACE_ATTACH
with the possibility to set options on attach. Nothing more.

> Tejun, what do you think?


> >  int ptrace_request(struct task_struct *child, long request,
> >  		   unsigned long addr, unsigned long data)
> >  {
> > -	bool seized = child->ptrace & PT_SEIZED;
> > +	bool stop_events_enabled = child->ptrace & PT_TRACE_STOP;
> 
> May be ptrace_event_enabled(child, PTRACE_EVENT_STOP) looks better...
> The same about other PT_TRACE_STOP checks, although this is cosmetic.

Good idea. I will send a new patch a bit later.

> And. Given that you can set/clear PT_TRACE_STOP in ptrace_setoptions(),
> you need the locking.
>
> Just for example. do_signal_stop() calls ptrace_trap_notify() and hits
> WARN_ON_ONCE(!PT_TRACE_STOP) because it was cleared in between.

PTRACE_SETOPTIONS can be used only on stopped tracees. Can do_signal_stop()
run on a tracee while it is stopped?

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior.
  2011-09-07  4:55     ` Denys Vlasenko
@ 2011-09-07 16:37       ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2011-09-07 16:37 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

On 09/07, Denys Vlasenko wrote:
>
> On Tuesday 06 September 2011 22:08, Oleg Nesterov wrote:
>
> > And. Given that you can set/clear PT_TRACE_STOP in ptrace_setoptions(),
> > you need the locking.
> >
> > Just for example. do_signal_stop() calls ptrace_trap_notify() and hits
> > WARN_ON_ONCE(!PT_TRACE_STOP) because it was cleared in between.
>
> PTRACE_SETOPTIONS can be used only on stopped tracees. Can do_signal_stop()
> run on a tracee while it is stopped?

Sure. A tracee's sub-thread can do this. Or SIGCONT can trigger trap_notify.

And this is the "theoretical" problem sith PT_TRACE_STOP, in some sense.
Unlike other bits, it used by the "asynchronous" code. And it has the
meaning outside of the resume-stop path.

For example. You can't assume that PTRACE_LISTEN will work after you
set PT_TRACE_STOP, and this is not the implementation bug (although
of course I am not saying this is not possible to fix). But this needs
more changes, and _personally_ I see no point.

Oleg.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails
  2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
                   ` (4 preceding siblings ...)
  2011-09-05 17:06 ` [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior Denys Vlasenko
@ 2011-09-06 16:52 ` Denys Vlasenko
  2011-09-06 18:43   ` Oleg Nesterov
  5 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-06 16:52 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Tejun Heo, linux-kernel, Denys Vlasenko

Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails.

If PTRACE_SETOPTIONS fails, options should not be affected.
This patch makes it so.

While at it, simplify a few things:

Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in ptrace_setoptions()
much simpler.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
---
 include/linux/ptrace.h |   12 +++++-------
 kernel/ptrace.c        |   43 +++++++++++++++----------------------------
 2 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..3b05d96 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -88,13 +88,13 @@
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
-#define PT_TRACESYSGOOD	0x00000004
-#define PT_PTRACE_CAP	0x00000008	/* ptracer can follow suid-exec */
+#define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
 
+#define PT_OPT_FLAG_SHIFT	3
+/* must be directly before PT_TRACE_event bits */
+#define PT_TRACESYSGOOD	0x00000008
 /* PT_TRACE_* event enable flags */
-#define PT_EVENT_FLAG_SHIFT	4
-#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
-
+#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
 #define PT_TRACE_FORK		PT_EVENT_FLAG(PTRACE_EVENT_FORK)
 #define PT_TRACE_VFORK		PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
 #define PT_TRACE_CLONE		PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
@@ -102,8 +102,6 @@
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
 
-#define PT_TRACE_MASK	0x000003f4
-
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
 #define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..9c3536d 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -219,19 +219,23 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * SEIZE will enable new ptrace behaviors which will be implemented
-	 * gradually.  SEIZE_DEVEL is used to prevent applications
+	 * gradually.  SEIZE_DEVEL bit is used to prevent applications
 	 * expecting full SEIZE behaviors trapping on kernel commits which
 	 * are still in the process of implementing them.
 	 *
 	 * Only test programs for new ptrace behaviors being implemented
 	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
 	 *
-	 * Once SEIZE behaviors are completely implemented, this flag and
-	 * the following test will be removed.
+	 * Once SEIZE behaviors are completely implemented, this flag
+	 * will be removed.
 	 */
 	retval = -EIO;
-	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
-		goto out;
+	if (seize) {
+		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
+			goto out;
+		flags &= ~PTRACE_SEIZE_DEVEL;
+	} else
+		flags = 0;
 
 	audit_ptrace(task);
 
@@ -243,7 +247,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * Protect exec's credential calculations against our interference;
-	 * interference; SUID, SGID and LSM creds get determined differently
+	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
 	retval = -ERESTARTNOINTR;
@@ -509,30 +513,13 @@ int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long ds
 
 static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 {
-	child->ptrace &= ~PT_TRACE_MASK;
-
-	if (data & PTRACE_O_TRACESYSGOOD)
-		child->ptrace |= PT_TRACESYSGOOD;
-
-	if (data & PTRACE_O_TRACEFORK)
-		child->ptrace |= PT_TRACE_FORK;
-
-	if (data & PTRACE_O_TRACEVFORK)
-		child->ptrace |= PT_TRACE_VFORK;
-
-	if (data & PTRACE_O_TRACECLONE)
-		child->ptrace |= PT_TRACE_CLONE;
-
-	if (data & PTRACE_O_TRACEEXEC)
-		child->ptrace |= PT_TRACE_EXEC;
-
-	if (data & PTRACE_O_TRACEVFORKDONE)
-		child->ptrace |= PT_TRACE_VFORK_DONE;
+	if (data & ~(long)PTRACE_O_MASK)
+		return -EINVAL;
 
-	if (data & PTRACE_O_TRACEEXIT)
-		child->ptrace |= PT_TRACE_EXIT;
+	child->ptrace &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
+	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
 
-	return (data & ~PTRACE_O_MASK) ? -EINVAL : 0;
+	return 0;
 }
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
-- 
1.7.6



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails
  2011-09-06 16:52 ` [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
@ 2011-09-06 18:43   ` Oleg Nesterov
  2011-09-07  4:44     ` Denys Vlasenko
  2011-09-07  4:45     ` [PATCH v3] " Denys Vlasenko
  0 siblings, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2011-09-06 18:43 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Tejun Heo, linux-kernel, Denys Vlasenko

On 09/06, Denys Vlasenko wrote:
>
> Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
> PT_option bits contiguous and therefore makes code in ptrace_setoptions()
> much simpler.

Agreed, this probably makes sense. Although I don't think "contiguous"
really helps, even if I agree it looks better. What does mater is that
PTRACE_O_TRACESYSGOOD << PT_OPT_FLAG_SHIFT becomes PT_TRACESYSGOOD.

But,

> +#define PT_OPT_FLAG_SHIFT	3
> +/* must be directly before PT_TRACE_event bits */
> +#define PT_TRACESYSGOOD	0x00000008

This probably means PT_TRACESYSGOOD should be also defined as PT_EVENT_FLAG(0),

>  /* PT_TRACE_* event enable flags */
> -#define PT_EVENT_FLAG_SHIFT	4
> -#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
> -
> +#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))

And ptrace_setoptions() does

	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);

Now we should verify that

	PTRACE_O_XXX == 1 << PTRACE_EVENT_XXX;

for every XXX... Looks correct. But perhaps it makes sense to do this
explicitely and redefine PTRACE_O_* via PTRACE_EVENT_*.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -219,19 +219,23 @@ static int ptrace_attach(struct task_struct *task, long request,
>  
>  	/*
>  	 * SEIZE will enable new ptrace behaviors which will be implemented
> -	 * gradually.  SEIZE_DEVEL is used to prevent applications
> +	 * gradually.  SEIZE_DEVEL bit is used to prevent applications
>  	 * expecting full SEIZE behaviors trapping on kernel commits which
>  	 * are still in the process of implementing them.
>  	 *
>  	 * Only test programs for new ptrace behaviors being implemented
>  	 * should set SEIZE_DEVEL.  If unset, SEIZE will fail with -EIO.
>  	 *
> -	 * Once SEIZE behaviors are completely implemented, this flag and
> -	 * the following test will be removed.
> +	 * Once SEIZE behaviors are completely implemented, this flag
> +	 * will be removed.
>  	 */
>  	retval = -EIO;
> -	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
> -		goto out;
> +	if (seize) {
> +		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
> +			goto out;
> +		flags &= ~PTRACE_SEIZE_DEVEL;
> +	} else
> +		flags = 0;
>  
>  	audit_ptrace(task);

This chunk looks completely off-topic, why it is needed in this patch?

>  static int ptrace_setoptions(struct task_struct *child, unsigned long data)
>  {
> -	child->ptrace &= ~PT_TRACE_MASK;
> -
> -	if (data & PTRACE_O_TRACESYSGOOD)
> -		child->ptrace |= PT_TRACESYSGOOD;
> -
> -	if (data & PTRACE_O_TRACEFORK)
> -		child->ptrace |= PT_TRACE_FORK;
> -
> -	if (data & PTRACE_O_TRACEVFORK)
> -		child->ptrace |= PT_TRACE_VFORK;
> -
> -	if (data & PTRACE_O_TRACECLONE)
> -		child->ptrace |= PT_TRACE_CLONE;
> -
> -	if (data & PTRACE_O_TRACEEXEC)
> -		child->ptrace |= PT_TRACE_EXEC;
> -
> -	if (data & PTRACE_O_TRACEVFORKDONE)
> -		child->ptrace |= PT_TRACE_VFORK_DONE;
> +	if (data & ~(long)PTRACE_O_MASK)
> +		return -EINVAL;

Oh, yes, I always hated this logic. We change ->ptrace first, then
return -EINVAL if data is wrong.

But. Denys, I think this needs a separate patch. And of course, of
course this can break things. Say, a poor application passes the
unsupported bit along with the valid bits, and doesn't check the result.
This works before this patch.

Honestly, I'd prefer to keep this oddity. But if you send a separate
patch... I do not know ;)

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails
  2011-09-06 18:43   ` Oleg Nesterov
@ 2011-09-07  4:44     ` Denys Vlasenko
  2011-09-07  4:45     ` [PATCH v3] " Denys Vlasenko
  1 sibling, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-07  4:44 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

On Tuesday 06 September 2011 20:43, Oleg Nesterov wrote:
> > +#define PT_OPT_FLAG_SHIFT	3
> > +/* must be directly before PT_TRACE_event bits */
> > +#define PT_TRACESYSGOOD	0x00000008
> 
> This probably means PT_TRACESYSGOOD should be also defined as PT_EVENT_FLAG(0)

Good idea.

> >  /* PT_TRACE_* event enable flags */
> > -#define PT_EVENT_FLAG_SHIFT	4
> > -#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
> > -
> > +#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
> 
> And ptrace_setoptions() does
> 
> 	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
> 
> Now we should verify that
> 
> 	PTRACE_O_XXX == 1 << PTRACE_EVENT_XXX;
> 
> for every XXX... Looks correct. But perhaps it makes sense to do this
> explicitely and redefine PTRACE_O_* via PTRACE_EVENT_*.

Also good idea.
 
> > -	if (seize && !(flags & PTRACE_SEIZE_DEVEL))
> > -		goto out;
> > +	if (seize) {
> > +		if ((flags & ~(long)PTRACE_O_MASK) != PTRACE_SEIZE_DEVEL)
> > +			goto out;
> > +		flags &= ~PTRACE_SEIZE_DEVEL;
> > +	} else
> > +		flags = 0;
> >  
> >  	audit_ptrace(task);
> 
> This chunk looks completely off-topic, why it is needed in this patch?

It isn't, it wasn't supposed to be there :(


> >  static int ptrace_setoptions(struct task_struct *child, unsigned long data)
> >  {
> > -	child->ptrace &= ~PT_TRACE_MASK;
> > -
> > -	if (data & PTRACE_O_TRACESYSGOOD)
> > -		child->ptrace |= PT_TRACESYSGOOD;
> > -
> > -	if (data & PTRACE_O_TRACEFORK)
> > -		child->ptrace |= PT_TRACE_FORK;
> > -
> > -	if (data & PTRACE_O_TRACEVFORK)
> > -		child->ptrace |= PT_TRACE_VFORK;
> > -
> > -	if (data & PTRACE_O_TRACECLONE)
> > -		child->ptrace |= PT_TRACE_CLONE;
> > -
> > -	if (data & PTRACE_O_TRACEEXEC)
> > -		child->ptrace |= PT_TRACE_EXEC;
> > -
> > -	if (data & PTRACE_O_TRACEVFORKDONE)
> > -		child->ptrace |= PT_TRACE_VFORK_DONE;
> > +	if (data & ~(long)PTRACE_O_MASK)
> > +		return -EINVAL;
> 
> Oh, yes, I always hated this logic. We change ->ptrace first, then
> return -EINVAL if data is wrong.
> 
> But. Denys, I think this needs a separate patch. And of course, of
> course this can break things. Say, a poor application passes the
> unsupported bit along with the valid bits, and doesn't check the result.
> This works before this patch.

This is really a gross bug, I think we should just bite the bullet
and fix it.

I have hard time imagining how application managed to *inadvertently*
invent a non-existing PTRACE_O_BOGUSFLAG and pass it
to PTRACE_SETOPTIONS call. In what header did they fing PTRACE_O_BOGUSFLAG?

I think this can only happen if they do this on purpose,
but *what* purpose? To get options cleared? Can't imagine anyone doing that,
option clearing can be done without resort to undocumented kernel bugs -
ptrace(PTRACE_SETOPTIONS, pid, 0, 0) does it, rigth?

Sending patch v3 in separate mail.

-- 
vda

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails
  2011-09-06 18:43   ` Oleg Nesterov
  2011-09-07  4:44     ` Denys Vlasenko
@ 2011-09-07  4:45     ` Denys Vlasenko
  2011-09-07 20:35       ` Oleg Nesterov
  1 sibling, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2011-09-07  4:45 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails.
    
If PTRACE_SETOPTIONS fails, options should not be affected.
This patch makes it so.
    
Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't
mess up relationship between bit positions and event ids.
    
While at it, simplify a few things:
    
Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in ptrace_setoptions()
much simpler.
    
PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.
    
PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).
    
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 800f113..0911100 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -54,17 +54,6 @@
 /* flags in @data for PTRACE_SEIZE */
 #define PTRACE_SEIZE_DEVEL	0x80000000 /* temp flag for development */
 
-/* options set using PTRACE_SETOPTIONS */
-#define PTRACE_O_TRACESYSGOOD	0x00000001
-#define PTRACE_O_TRACEFORK	0x00000002
-#define PTRACE_O_TRACEVFORK	0x00000004
-#define PTRACE_O_TRACECLONE	0x00000008
-#define PTRACE_O_TRACEEXEC	0x00000010
-#define PTRACE_O_TRACEVFORKDONE	0x00000020
-#define PTRACE_O_TRACEEXIT	0x00000040
-
-#define PTRACE_O_MASK		0x0000007f
-
 /* Wait extended result codes for the above trace options.  */
 #define PTRACE_EVENT_FORK	1
 #define PTRACE_EVENT_VFORK	2
@@ -74,6 +63,17 @@
 #define PTRACE_EVENT_EXIT	6
 #define PTRACE_EVENT_STOP	7
 
+/* options set using PTRACE_SETOPTIONS */
+#define PTRACE_O_TRACESYSGOOD	1
+#define PTRACE_O_TRACEFORK	(1 << PTRACE_EVENT_FORK)
+#define PTRACE_O_TRACEVFORK	(1 << PTRACE_EVENT_VFORK)
+#define PTRACE_O_TRACECLONE	(1 << PTRACE_EVENT_CLONE)
+#define PTRACE_O_TRACEEXEC	(1 << PTRACE_EVENT_EXEC)
+#define PTRACE_O_TRACEVFORKDONE	(1 << PTRACE_EVENT_VFORK_DONE)
+#define PTRACE_O_TRACEEXIT	(1 << PTRACE_EVENT_EXIT)
+
+#define PTRACE_O_MASK		0x0000007f
+
 #include <asm/ptrace.h>
 
 #ifdef __KERNEL__
@@ -88,13 +88,12 @@
 #define PT_SEIZED	0x00010000	/* SEIZE used, enable new behavior */
 #define PT_PTRACED	0x00000001
 #define PT_DTRACE	0x00000002	/* delayed trace (used on m68k, i386) */
-#define PT_TRACESYSGOOD	0x00000004
-#define PT_PTRACE_CAP	0x00000008	/* ptracer can follow suid-exec */
+#define PT_PTRACE_CAP	0x00000004	/* ptracer can follow suid-exec */
 
+#define PT_OPT_FLAG_SHIFT	3
 /* PT_TRACE_* event enable flags */
-#define PT_EVENT_FLAG_SHIFT	4
-#define PT_EVENT_FLAG(event)	(1 << (PT_EVENT_FLAG_SHIFT + (event) - 1))
-
+#define PT_EVENT_FLAG(event)	(1 << (PT_OPT_FLAG_SHIFT + (event)))
+#define PT_TRACESYSGOOD		PT_EVENT_FLAG(0)
 #define PT_TRACE_FORK		PT_EVENT_FLAG(PTRACE_EVENT_FORK)
 #define PT_TRACE_VFORK		PT_EVENT_FLAG(PTRACE_EVENT_VFORK)
 #define PT_TRACE_CLONE		PT_EVENT_FLAG(PTRACE_EVENT_CLONE)
@@ -102,8 +101,6 @@
 #define PT_TRACE_VFORK_DONE	PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE)
 #define PT_TRACE_EXIT		PT_EVENT_FLAG(PTRACE_EVENT_EXIT)
 
-#define PT_TRACE_MASK	0x000003f4
-
 /* single stepping state bits (used on ARM and PA-RISC) */
 #define PT_SINGLESTEP_BIT	31
 #define PT_SINGLESTEP		(1<<PT_SINGLESTEP_BIT)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 9de3ecf..665ee93 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -243,7 +243,7 @@ static int ptrace_attach(struct task_struct *task, long request,
 
 	/*
 	 * Protect exec's credential calculations against our interference;
-	 * interference; SUID, SGID and LSM creds get determined differently
+	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
 	retval = -ERESTARTNOINTR;
@@ -509,30 +509,13 @@ int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long ds
 
 static int ptrace_setoptions(struct task_struct *child, unsigned long data)
 {
-	child->ptrace &= ~PT_TRACE_MASK;
-
-	if (data & PTRACE_O_TRACESYSGOOD)
-		child->ptrace |= PT_TRACESYSGOOD;
-
-	if (data & PTRACE_O_TRACEFORK)
-		child->ptrace |= PT_TRACE_FORK;
-
-	if (data & PTRACE_O_TRACEVFORK)
-		child->ptrace |= PT_TRACE_VFORK;
-
-	if (data & PTRACE_O_TRACECLONE)
-		child->ptrace |= PT_TRACE_CLONE;
-
-	if (data & PTRACE_O_TRACEEXEC)
-		child->ptrace |= PT_TRACE_EXEC;
-
-	if (data & PTRACE_O_TRACEVFORKDONE)
-		child->ptrace |= PT_TRACE_VFORK_DONE;
+	if (data & ~(long)PTRACE_O_MASK)
+		return -EINVAL;
 
-	if (data & PTRACE_O_TRACEEXIT)
-		child->ptrace |= PT_TRACE_EXIT;
+	child->ptrace &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
+	child->ptrace |= (data << PT_OPT_FLAG_SHIFT);
 
-	return (data & ~PTRACE_O_MASK) ? -EINVAL : 0;
+	return 0;
 }
 
 static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails
  2011-09-07  4:45     ` [PATCH v3] " Denys Vlasenko
@ 2011-09-07 20:35       ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2011-09-07 20:35 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Denys Vlasenko, Tejun Heo, linux-kernel

On 09/07, Denys Vlasenko wrote:
>
> Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails.
>
> If PTRACE_SETOPTIONS fails, options should not be affected.
> This patch makes it so.

Heh, OK. You won ;)

> While at it, simplify a few things:

No, no, Denys. Please make a separate patch which changes the
current behaviour, please. Then cleanups.

Otherwise looks correct.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-09-07 20:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-04 21:11 RFC: PTRACE_SEIZE needs API cleanup? Denys Vlasenko
2011-09-05  1:15 ` Indan Zupancic
2011-09-05  9:24   ` Denys Vlasenko
2011-09-05 13:08     ` Indan Zupancic
2011-09-05 14:06       ` Denys Vlasenko
2011-09-05 17:21         ` Indan Zupancic
2011-09-06  0:59           ` Denys Vlasenko
2011-09-06 17:08             ` Indan Zupancic
2011-09-07  2:34               ` Denys Vlasenko
2011-09-07 17:15                 ` Indan Zupancic
2011-09-05 17:44         ` Indan Zupancic
2011-09-06  1:05           ` Denys Vlasenko
2011-09-06 17:19             ` Indan Zupancic
2011-09-07  2:47               ` Denys Vlasenko
2011-09-07 14:24                 ` Indan Zupancic
2011-09-05 14:54 ` Denys Vlasenko
2011-09-05 16:51 ` [PATCH 1/2] Fix pollution of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
2011-09-05 17:01 ` [PATCH 2/2] Denys Vlasenko
2011-09-05 17:06 ` [PATCH 2/2] Add new PTRACE_O_TRACESTOP option, make it control new ptrace behavior Denys Vlasenko
2011-09-06 20:08   ` Oleg Nesterov
2011-09-06 23:06     ` Tejun Heo
2011-09-07  4:55     ` Denys Vlasenko
2011-09-07 16:37       ` Oleg Nesterov
2011-09-06 16:52 ` [PATCH v2] Fix clearing of task->ptrace if PTRACE_SETOPTIONS fails Denys Vlasenko
2011-09-06 18:43   ` Oleg Nesterov
2011-09-07  4:44     ` Denys Vlasenko
2011-09-07  4:45     ` [PATCH v3] " Denys Vlasenko
2011-09-07 20:35       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).