* ptrace problem with 2.6.25 on Itanium
@ 2008-04-24 10:39 stephane eranian
2008-04-24 12:04 ` Petr Tesarik
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: stephane eranian @ 2008-04-24 10:39 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 1788 bytes --]
Hello everyone,
I am running into a new problem with perfmon on Itanium and 2.6.25.
The pfmon tool is able to monitor across fork(). For that it relies on
ptrace() to receive notifications on fork. This works fine on X86 and 2.6.25
however it is currently broken on IA-64.
Normally, on fork(), the ptracing parent (here pfmon) receives 2 notifications:
1. SIGTRAP with event PTRACE_EVENT_FORK to indicate a new process
is being created. New pid is extracted via PTRACE_GETEVENTMSG
2. SIGSTOP with for new pid indicating that child is ready to
execute its first
instruction
The first message allow the tool to create the data structure to for
new process,
the second marks the point where a perfmon context can actually be attached.
With 2.6.25 on Itanium, the notifications are received out of order,
i.e., the SIGTOP
first and the FORK notification next. Of course, the tool is confused
because until
it sees the FORK event, it does not know the new process.
This situation never happens on X86 with the same kernel.
To demonstrate the problem, I have attached a simple test program. You need
to pass the name of a command that creates child processes. Look at the order
between the FORK and SIGSTOP notifications. There is a forktest program in
pfmon/tests.
I don't have time to track this down. However, I am highly suspicious of this
new TIF_RESTORE_RSE and the arch_ptrace_stop_needed() code. The do_fork()
routine does indeed set SIGSTOP, before it call ptrace_notify(). But this does
not impact X86, which, by the way, does not define arch_ptrace_stop_needed().
I don't have an older kernel handy to run the test. Hopefully someone
on this list
will try this on 2.6.24 or older.
I am not on this mailing list anymore, so please CC me on your reply.
[-- Attachment #2: task_ptrace.c --]
[-- Type: application/octet-stream, Size: 3911 bytes --]
#include <sys/types.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <stdarg.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
/*
* This belongs to some LIBC header files for 2.6
*/
#ifndef PTRACE_SETOPTIONS
/* 0x4200-0x4300 are reserved for architecture-independent additions. */
#define PTRACE_SETOPTIONS 0x4200
#define PTRACE_GETEVENTMSG 0x4201
#define PTRACE_GETSIGINFO 0x4202
#define PTRACE_SETSIGINFO 0x4203
/* options set using PTRACE_SETOPTIONS */
#define PTRACE_O_TRACESYSGOOD 0x00000001
#define PTRACE_O_TRACEFORK 0x00000002
#define PTRACE_O_TRACEVFORK 0x00000004
#define PTRACE_O_TRACECLONE 0x00000008
#define PTRACE_O_TRACEEXEC 0x00000010
#define PTRACE_O_TRACEVFORKDONE 0x00000020
#define PTRACE_O_TRACEEXIT 0x00000040
/* Wait extended result codes for the above trace pt_options. */
#define PTRACE_EVENT_FORK 1
#define PTRACE_EVENT_VFORK 2
#define PTRACE_EVENT_CLONE 3
#define PTRACE_EVENT_EXEC 4
#define PTRACE_EVENT_VFORK_DONE 5
#define PTRACE_EVENT_EXIT 6
#endif /* PTRACE_OPTIONS */
static void fatal_error(char *fmt,...) __attribute__((noreturn));
static void
fatal_error(char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
exit(1);
}
int
child(char **arg)
{
/*
* will cause the program to stop before executing the first
* user level instruction. We can only attach (load) a context
* if the task is in the STOPPED state.
*/
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
/*
* execute the requested command
*/
execvp(arg[0], arg);
fatal_error("cannot exec: %s\n", arg[0]);
/* not reached */
}
int
parent(char **arg)
{
unsigned long ptrace_flags = 0, sig;
int event, status, ret, wait_type;
pid_t pid, new_pid;
ptrace_flags |= PTRACE_O_TRACEFORK;
/*
* Create the child task
*/
pid = fork();
switch(pid) {
case -1:
fatal_error("Cannot fork process\n");
case 0:
exit(child(arg));
}
/*
* wait for the child to exec
*/
waitpid(pid, &status, WUNTRACED);
/*
* check if process exited early
*/
if (WIFEXITED(status))
fatal_error("command %s exited too early with status %d\n", arg[0], WEXITSTATUS(status));
ptrace_flags |= PTRACE_O_TRACEEXEC;
ptrace_flags |= PTRACE_O_TRACEFORK;
ret = ptrace(PTRACE_SETOPTIONS, pid, NULL, (void *)ptrace_flags);
if (ret)
fatal_error("ptrace setopions=%d\n", errno);
ret = ptrace(PTRACE_CONT, pid, NULL, NULL);
if (ret)
fatal_error("ptrace cont=%d\n", errno);
wait_type = WUNTRACED|WNOHANG|__WALL;
for (;;) {
pid = wait4(-1, &status, wait_type, NULL);
if (pid == 0)
continue;
if (pid < 1)
break;
printf("pid=%d errno=%d exited=%d stopped=%d signaled=%d stopsig=%-2d\n",
pid, errno,
WIFEXITED(status),
WIFSTOPPED(status),
WIFSIGNALED(status),
WSTOPSIG(status));
if (WIFEXITED(status) || WIFSIGNALED(status)) {
printf("EXITED [%d]\n", pid);
continue;
}
sig = WSTOPSIG(status);
if (sig == SIGTRAP) {
sig = 0;
event = status >> 16;
switch(event) {
case PTRACE_EVENT_FORK:
ret = ptrace(PTRACE_GETEVENTMSG, pid, NULL, (void *)&new_pid);
if (ret)
fatal_error("ptrace getmsg=%d\n", errno);
printf("FORK new_pid [%ld]\n", new_pid);
ret = ptrace(PTRACE_SETOPTIONS, pid, NULL, (void *)ptrace_flags);
if (ret)
fatal_error("ptrace options newpid=%d\n", errno);
break;
default:
printf("unexpected event %d\n", event);
}
} else if (sig == SIGSTOP) {
printf("SIGSTOP from [%d]\n", pid);
sig = 0;
}
ret = ptrace(PTRACE_CONT, pid, NULL, (void *)sig);
if (ret)
fatal_error("ptrace cont=%d\n", errno);
}
/*
* simply wait for completion
*/
waitpid(pid, &status, 0);
return 0;
}
int
main(int argc, char **argv)
{
if (argc < 2) {
fatal_error("You must specify a command to execute\n");
}
return parent(argv+1);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
@ 2008-04-24 12:04 ` Petr Tesarik
2008-04-24 12:14 ` stephane eranian
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Petr Tesarik @ 2008-04-24 12:04 UTC (permalink / raw)
To: linux-ia64
On Thu, 2008-04-24 at 12:39 +0200, stephane eranian wrote:
> Hello everyone,
>
> I am running into a new problem with perfmon on Itanium and 2.6.25.
>
> The pfmon tool is able to monitor across fork(). For that it relies on
> ptrace() to receive notifications on fork. This works fine on X86 and 2.6.25
> however it is currently broken on IA-64.
>
> Normally, on fork(), the ptracing parent (here pfmon) receives 2 notifications:
>
> 1. SIGTRAP with event PTRACE_EVENT_FORK to indicate a new process
> is being created. New pid is extracted via PTRACE_GETEVENTMSG
>
> 2. SIGSTOP with for new pid indicating that child is ready to
> execute its first
> instruction
>
>
> The first message allow the tool to create the data structure to for
> new process,
> the second marks the point where a perfmon context can actually be attached.
>
> With 2.6.25 on Itanium, the notifications are received out of order,
> i.e., the SIGTOP
> first and the FORK notification next. Of course, the tool is confused
> because until
> it sees the FORK event, it does not know the new process.
>
> This situation never happens on X86 with the same kernel.
>
> To demonstrate the problem, I have attached a simple test program. You need
> to pass the name of a command that creates child processes. Look at the order
> between the FORK and SIGSTOP notifications. There is a forktest program in
> pfmon/tests.
>
> I don't have time to track this down. However, I am highly suspicious of this
> new TIF_RESTORE_RSE and the arch_ptrace_stop_needed() code. The do_fork()
> routine does indeed set SIGSTOP, before it call ptrace_notify(). But this does
> not impact X86, which, by the way, does not define arch_ptrace_stop_needed().
> I don't have an older kernel handy to run the test. Hopefully someone
> on this list
> will try this on 2.6.24 or older.
I tried it on SLES10, which is basically a 2.6.16 with a simplified
version of the patch (one which only uses arch_ptrace_stop, but not
TIF_RESTORE_RSE) and it works as expected:
glass:~/ptrace-wrong-notify # ./task_ptrace_attach ./forktest 10 10
creating 10 additional process(es)
10 iterations
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6199]
pida99 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6199]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6200]
pidb00 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6200]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6201]
pida99 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6199]
pidb00 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6200]
pidb01 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6201]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pidb01 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6201]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6202]
pidb02 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6202]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6203]
pidb02 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6202]
pidb03 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6203]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pidb03 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6203]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6204]
pidb04 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6204]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6205]
pidb04 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6204]
pidb05 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6205]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6206]
pidb05 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6205]
pidb06 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6206]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6207]
pidb06 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6206]
pidb07 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6207]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [6208]
pidb07 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6207]
pidb08 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [6208]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pidb08 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6208]
pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pida98 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [6198]
So, if something is broken, it must be the TIF_RESTORE_RSE part of the
patch, or an unexpected side effect of switching to the generic
sys_ptrace. I plan to have a look at mainline later today...
Kind regards,
Petr Tesarik
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
2008-04-24 12:04 ` Petr Tesarik
@ 2008-04-24 12:14 ` stephane eranian
2008-04-24 12:27 ` Petr Tesarik
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: stephane eranian @ 2008-04-24 12:14 UTC (permalink / raw)
To: linux-ia64
Petr,
Thanks for checking, I am pretty sure this is a problem introduced recently.
The only thing related to this that I can think of is the TIF_RESTORE_RSE
and the associated TIF_NOTIFY_RESUME.
When I try the same test on 2.6.25:
$ ./task_ptrace ~/perfmon/pfmon/tests/forktest 10 10
creating 10 additional process(es)
10 iterations
pidu40 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [7541]
pidu41 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [7541]
pidu41 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [7541]
pidu42 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
>>SIGSTOP from [7542]
pidu42 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [7542]
pidu40 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
>>FORK new_pid [7542]
pidu40 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
pidu40 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [7543]
pidu43 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [7543]
pidu43 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [7543]
pidu44 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
>>SIGSTOP from [7544]
pidu40 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
>>FORK new_pid [7544]
On Thu, Apr 24, 2008 at 2:04 PM, Petr Tesarik <ptesarik@suse.cz> wrote:
>
> On Thu, 2008-04-24 at 12:39 +0200, stephane eranian wrote:
> > Hello everyone,
> >
> > I am running into a new problem with perfmon on Itanium and 2.6.25.
> >
> > The pfmon tool is able to monitor across fork(). For that it relies on
> > ptrace() to receive notifications on fork. This works fine on X86 and 2.6.25
> > however it is currently broken on IA-64.
> >
> > Normally, on fork(), the ptracing parent (here pfmon) receives 2 notifications:
> >
> > 1. SIGTRAP with event PTRACE_EVENT_FORK to indicate a new process
> > is being created. New pid is extracted via PTRACE_GETEVENTMSG
> >
> > 2. SIGSTOP with for new pid indicating that child is ready to
> > execute its first
> > instruction
> >
> >
> > The first message allow the tool to create the data structure to for
> > new process,
> > the second marks the point where a perfmon context can actually be attached.
> >
> > With 2.6.25 on Itanium, the notifications are received out of order,
> > i.e., the SIGTOP
> > first and the FORK notification next. Of course, the tool is confused
> > because until
> > it sees the FORK event, it does not know the new process.
> >
> > This situation never happens on X86 with the same kernel.
> >
> > To demonstrate the problem, I have attached a simple test program. You need
> > to pass the name of a command that creates child processes. Look at the order
> > between the FORK and SIGSTOP notifications. There is a forktest program in
> > pfmon/tests.
> >
> > I don't have time to track this down. However, I am highly suspicious of this
> > new TIF_RESTORE_RSE and the arch_ptrace_stop_needed() code. The do_fork()
> > routine does indeed set SIGSTOP, before it call ptrace_notify(). But this does
> > not impact X86, which, by the way, does not define arch_ptrace_stop_needed().
> > I don't have an older kernel handy to run the test. Hopefully someone
> > on this list
> > will try this on 2.6.24 or older.
>
> I tried it on SLES10, which is basically a 2.6.16 with a simplified
> version of the patch (one which only uses arch_ptrace_stop, but not
> TIF_RESTORE_RSE) and it works as expected:
>
> glass:~/ptrace-wrong-notify # ./task_ptrace_attach ./forktest 10 10
> creating 10 additional process(es)
> 10 iterations
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6199]
> pida99 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6199]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6200]
> pidb00 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6200]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6201]
> pida99 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6199]
> pidb00 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6200]
> pidb01 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6201]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pidb01 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6201]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6202]
> pidb02 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6202]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6203]
> pidb02 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6202]
> pidb03 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6203]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pidb03 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6203]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6204]
> pidb04 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6204]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6205]
> pidb04 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6204]
> pidb05 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6205]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6206]
> pidb05 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6205]
> pidb06 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6206]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6207]
> pidb06 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6206]
> pidb07 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6207]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
> FORK new_pid [6208]
> pidb07 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6207]
> pidb08 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
> SIGSTOP from [6208]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pidb08 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6208]
> pida98 errno=0 exited=0 stopped=1 signaled=0 stopsig\x17
> pida98 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
> EXITED [6198]
>
> So, if something is broken, it must be the TIF_RESTORE_RSE part of the
> patch, or an unexpected side effect of switching to the generic
> sys_ptrace. I plan to have a look at mainline later today...
>
> Kind regards,
> Petr Tesarik
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
2008-04-24 12:04 ` Petr Tesarik
2008-04-24 12:14 ` stephane eranian
@ 2008-04-24 12:27 ` Petr Tesarik
2008-04-28 2:30 ` Roland McGrath
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Petr Tesarik @ 2008-04-24 12:27 UTC (permalink / raw)
To: linux-ia64
On Thu, 2008-04-24 at 14:14 +0200, stephane eranian wrote:
> Petr,
>
> Thanks for checking, I am pretty sure this is a problem introduced recently.
> The only thing related to this that I can think of is the TIF_RESTORE_RSE
> and the associated TIF_NOTIFY_RESUME.
>
> When I try the same test on 2.6.25:
Yes, this is consistent with what I can see on 2.6.25-rc3 (which I still
had lying around on the test host):
glass:~/ptrace-wrong-notify # ./task_ptrace_attach ./forktest 10 10
creating 10 additional process(es)
10 iterations
pid006 errno=0 exited=0 stopped=1 signaled=0 stopsig\x19
SIGSTOP from [3006]
pid006 errno=0 exited=1 stopped=0 signaled=0 stopsig=0
EXITED [3006]
pid005 errno=0 exited=0 stopped=1 signaled=0 stopsig=5
FORK new_pid [3006]
and so on... Ok, I'm going to play with git now. ;]
Petr Tesarik
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
` (2 preceding siblings ...)
2008-04-24 12:27 ` Petr Tesarik
@ 2008-04-28 2:30 ` Roland McGrath
2008-04-28 10:01 ` Petr Tesarik
2008-04-30 19:32 ` stephane eranian
5 siblings, 0 replies; 7+ messages in thread
From: Roland McGrath @ 2008-04-28 2:30 UTC (permalink / raw)
To: linux-ia64
Sorry to complicate your life, but this one is officially Your Problem.
There is no kernel bug here. The semantics have not changed, only the
timing. (You are not the first to assume some ordering constraint was
provided in the ptrace interface that in fact has never been guaranteed
at all.)
It's not surprising that the TIF_RESTORE_RSE/arch_ptrace_stop() changes
precipitated your first experience seeing this. It may very well be that
this order of the reports never ever happened before even once in real
life. But, it really truly has never been guaranteed (on any arch).
There is not going to be any new guarantee. You'll just have to adapt to
what the actual rules have always been. Sorry.
The new child is started running (so as to immediately deliver its
SIGSTOP) before the parent's ptrace_notify. This has always been so.
It's probably true that for the child to get far enough to stop before
the parent did, in the past, could only have happened through an
extraordinary preemption situation. Now that both parent and child do
the arch_ptrace_stop() logic before they complete their stops, there
are many more factors of nondeterminism involved in the common case.
On every arch, in every older kernel, if you have enough SMP, enough
preemption load (and preemption enabled), HZ high enough to drive up
frequency of preemption, relative to how long the particular CPU takes
to complete the ptrace_notify work, you will eventually manage to see
intermittent nondeterminism in the order of these two ptrace reports.
A robust userland application just has to cope with it.
This is not so hard to deal with. If you get a report for a new pid you
have never heard of, then you know it must be a new child whose parent's
fork/clone event you have yet to see. (Note it won't always be a SIGSTOP
that you see. It could be a death by SIGKILL, or it could be a stop for
a different signal that was dequeued before SIGSTOP, having just been
posted in a quick race right after the birth of the child.) In that
event, you can be sure that the parent will be very quickly reporting
too. So you can do synchronous waits until you see the parent clone
report whose eventmsg matches the spontaneous child pid. (Or you can
just keep track of the partial child in your data structures and go back
to your normal wait loop, which is probably a better way to write your
application.)
Thanks,
Roland
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
` (3 preceding siblings ...)
2008-04-28 2:30 ` Roland McGrath
@ 2008-04-28 10:01 ` Petr Tesarik
2008-04-30 19:32 ` stephane eranian
5 siblings, 0 replies; 7+ messages in thread
From: Petr Tesarik @ 2008-04-28 10:01 UTC (permalink / raw)
To: linux-ia64
On Sun, 2008-04-27 at 19:30 -0700, Roland McGrath wrote:
> Sorry to complicate your life, but this one is officially Your Problem.
> There is no kernel bug here. The semantics have not changed, only the
> timing. (You are not the first to assume some ordering constraint was
> provided in the ptrace interface that in fact has never been guaranteed
> at all.)
I was just wondering why the timing was changed in such a way that the
signals *consistently* arrive in wrong order, and why only on IA64. I
think I know the answer:
do_fork() calls ptrace_notify() from the parent before the parent
process is rescheduled. This usually happens before the child process is
scheduled - BTW it happens always if both processes run on the same CPU
and the kernel is not preemptive.
Now, what happens after applying the RSE patch is that sending the
notification from the parent involves synchronizing the parent's RSE to
user space. Because this takes some time (and may sleep), the child
process is selected by the scheduler and runs first (note that the
notification is sent _after_ waking the child).
On other architectures, the overhead with enqueueing the notification
signal is much smaller, so the parent usually finishes sending the
signal before the child even gets to sending its own notification. But
this is racy, as pointed out by Roland, and any program which relies on
it will fail one day. In a way, we should be glad that ia64 now tends to
send the signals in a non-deterministic order, as more people will hit
the race and (hopefully) fix their programs. ;)
Kind regards,
Petr Tesarik
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ptrace problem with 2.6.25 on Itanium
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
` (4 preceding siblings ...)
2008-04-28 10:01 ` Petr Tesarik
@ 2008-04-30 19:32 ` stephane eranian
5 siblings, 0 replies; 7+ messages in thread
From: stephane eranian @ 2008-04-30 19:32 UTC (permalink / raw)
To: linux-ia64
Roland,
On Mon, Apr 28, 2008 at 4:30 AM, Roland McGrath <roland@redhat.com> wrote:
> Sorry to complicate your life, but this one is officially Your Problem.
> There is no kernel bug here. The semantics have not changed, only the
> timing. (You are not the first to assume some ordering constraint was
> provided in the ptrace interface that in fact has never been guaranteed
> at all.)
>
I suspected you were going to say that. I have now fixed pfmon. I have
released the new version yesterday. The trick I use is that if I get SIGSTOP
notification first, I keep the child stopped until I get the FORK notification
from the parent. This way, the child cannot exit before pfmon gets the
FORK notification (I have seen this happen).
Thanks for checking on this.
So far, never seen the inversion on X86.
> It's not surprising that the TIF_RESTORE_RSE/arch_ptrace_stop() changes
> precipitated your first experience seeing this. It may very well be that
> this order of the reports never ever happened before even once in real
> life. But, it really truly has never been guaranteed (on any arch).
> There is not going to be any new guarantee. You'll just have to adapt to
> what the actual rules have always been. Sorry.
>
> The new child is started running (so as to immediately deliver its
> SIGSTOP) before the parent's ptrace_notify. This has always been so.
> It's probably true that for the child to get far enough to stop before
> the parent did, in the past, could only have happened through an
> extraordinary preemption situation. Now that both parent and child do
> the arch_ptrace_stop() logic before they complete their stops, there
> are many more factors of nondeterminism involved in the common case.
>
> On every arch, in every older kernel, if you have enough SMP, enough
> preemption load (and preemption enabled), HZ high enough to drive up
> frequency of preemption, relative to how long the particular CPU takes
> to complete the ptrace_notify work, you will eventually manage to see
> intermittent nondeterminism in the order of these two ptrace reports.
> A robust userland application just has to cope with it.
>
> This is not so hard to deal with. If you get a report for a new pid you
> have never heard of, then you know it must be a new child whose parent's
> fork/clone event you have yet to see. (Note it won't always be a SIGSTOP
> that you see. It could be a death by SIGKILL, or it could be a stop for
> a different signal that was dequeued before SIGSTOP, having just been
> posted in a quick race right after the birth of the child.) In that
> event, you can be sure that the parent will be very quickly reporting
> too. So you can do synchronous waits until you see the parent clone
> report whose eventmsg matches the spontaneous child pid. (Or you can
> just keep track of the partial child in your data structures and go back
> to your normal wait loop, which is probably a better way to write your
> application.)
>
>
> Thanks,
> Roland
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-30 19:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-24 10:39 ptrace problem with 2.6.25 on Itanium stephane eranian
2008-04-24 12:04 ` Petr Tesarik
2008-04-24 12:14 ` stephane eranian
2008-04-24 12:27 ` Petr Tesarik
2008-04-28 2:30 ` Roland McGrath
2008-04-28 10:01 ` Petr Tesarik
2008-04-30 19:32 ` stephane eranian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox