* Question about execve.
@ 2010-03-28 3:26 Carlos O'Donell
2010-03-28 15:42 ` John David Anglin
0 siblings, 1 reply; 23+ messages in thread
From: Carlos O'Donell @ 2010-03-28 3:26 UTC (permalink / raw)
To: Helge Deller, Kyle McMartin, linux-parisc
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
Helge,
On PARISC I'm seeing the following reproducible behvaiour:
* Parent calls vfork()
* Child of vfork() calls execve()
* Child returns from execve() and starts corrupting parent state
eventually leading to a segmentation fault.
* New process (as a result of execve) runs to completion.
What code in the Linux kernel prevents the child, which calls
execve(), from returning?
Test case attached.
Cheers,
Carlos.
[-- Attachment #2: build.sh --]
[-- Type: application/x-sh, Size: 40 bytes --]
[-- Attachment #3: pt-vfork.S --]
[-- Type: application/octet-stream, Size: 1675 bytes --]
.text !
.align 4 !
.export __vfork !
.type __vfork,@function !
__vfork: !
.PROC !
.CALLINFO FRAME=64,CALLS,SAVE_RP,ENTRY_GR=3 !
.ENTRY ! !
/* Save return pointer. */
stw %rp, -20(%sr0,%sp) !
/* Get 64 bytes on the stack, save sp, and PIC register. */
stwm %r3, 64(%sp)
stw %sp, -4(%sp)
stw %r19, -32(%sp)
/* Save/restore PIC register around syscall. */
copy %r19, %r25
/* Load thread register. */
mfctl %cr27, %r26 !
/* Load cached parent PID. */
ldw -1044(%r26),%r1 !
/* Negate it, such that the child runs with
a negative PID and no functions work until
the execve. */
sub %r0,%r1,%r1 !
/* Store it back. */
stw %r1,-1044(%r26) !
/* Call vfork. */
ble 0x100(%sr2,%r0)
ldi (0 + 113),%r20
/* If this is the child jump to thread_start */
cmpb,=,n %r0,%ret0,.Lthread_start !
/* This is the parent. */
/* Load thread register. */
mfctl %cr27, %r26 !
/* Load cached parent PID */
ldw -1044(%r26),%r1 !
/* Negate it (restoring it) */
sub %r0,%r1,%r1 !
/* Save it back. */
stw %r1,-1044(%r26) !
ldi -4096,%r1
/* Unsigned compre = Was ret0 between -1 and -4096. */
comclr,>>= %r1,%ret0,%r0
b,n .Lerror
.Lthread_start: !
/* Load return pointer and restore stack. */
ldw -84(%sp), %rp
bv %r0(%rp)
ldwm -64(%sp), %r3
.Lerror:
/* Negate error code. */
sub %r0,%ret0,%r3
.import __errno_location,code ! !
/* Get address of errno. */
bl __errno_location,%rp !
copy %r25, %r19
/* Save errno. */
stw %r3, 0(%ret0)
/* vfork returns -1 on error. */
ldi -1, %ret0
/* Return. */
ldw -84(%sp), %rp
bv %r0(%rp)
ldwm -64(%sp), %r3
.EXIT ! .PROCEND ! .size __vfork, .-__vfork !
.weak vfork ! vfork = __vfork
[-- Attachment #4: vfork.c --]
[-- Type: text/x-csrc, Size: 589 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#define CALL_EXIT 0
int main (void)
{
pid_t child;
char *cmd[] = { "bash", "-c", "echo In child $$;", (char *)0 };
char *env[] = { "HOME=/tmp", (char *)0 };
int ret;
child = vfork();
if (child == 0)
{
ret = execve("/bin/bash", cmd, env);
printf ("ret = %d\n", ret);
#if CALL_EXIT == 1
_exit(1);
#endif
}
else
{
printf("child != 0\n");
}
printf("parent is %d\n", (unsigned int)getpid());
printf("child is %d\n", (unsigned int)child);
return 0;
}
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Question about execve. 2010-03-28 3:26 Question about execve Carlos O'Donell @ 2010-03-28 15:42 ` John David Anglin 2010-03-28 16:03 ` John David Anglin ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: John David Anglin @ 2010-03-28 15:42 UTC (permalink / raw) To: Carlos O'Donell; +Cc: Helge Deller, Kyle McMartin, linux-parisc On Sat, 27 Mar 2010, Carlos O'Donell wrote: > Helge, > > On PARISC I'm seeing the following reproducible behvaiour: > > * Parent calls vfork() > * Child of vfork() calls execve() > * Child returns from execve() and starts corrupting parent state > eventually leading to a segmentation fault. > * New process (as a result of execve) runs to completion. > > What code in the Linux kernel prevents the child, which calls > execve(), from returning? All the execute a file commands can return if an error occurs. Thus, I question the fact that the execve wrapper doesn't save registers. The vfork child runs using the parent's stack. Thus, even the call to execve potentially corrupts the parent's stack, and depending on scheduling, this might cause a segmentation fault in either parent or child. Here's a couple of excerpts from the vfork man page for HP-UX: Series 800 Process times for the parent and child processes within the [vfork,exec] window may be inaccurate. * Parent and child processes share the same stack space within the [vfork,exec] window. If the size of the stack has been changed within this window by the child process (return from or call to a function, for example), it is likely that the parent and child processes will be killed with signal SIGSEGV or SIGBUS. The [vfork,exec] window begins at the vfork() call and ends when the child completes its exec() call. The Open Group man page says: The use of vfork() for any purpose except as a prelude to an immediate call to a function from the exec family, or to _exit(), is not advised. On HP-UX, it's not supported. I see that this is not a new issue for linux: http://lkml.indiana.edu/hypermail/linux/kernel/9902.0/0256.html It is ok to treat vfork() identically to fork(), and the HP-UX manpage indicates that this may be the case for certain HP-UX implementations (maybe Series 700). > Test case attached. I was not able to save your pt-fork.S file. mutt didn't understand the encoding. I tried the vfork.c test case on my c3750 with 32-bit kernel. It didn't segv in a limited number of runs. However, I did notice that getpid() is broken after vfork(). The printf call in the child path probably isn't legit. It's also not legit to simply return if execve fails. _exit must be called. My guess would be that the parent is not blocked leading to corruption. > #include <stdio.h> > #include <stdlib.h> > #include <errno.h> > #include <unistd.h> > > #define CALL_EXIT 0 > > int main (void) > { > pid_t child; > char *cmd[] = { "bash", "-c", "echo In child $$;", (char *)0 }; > char *env[] = { "HOME=/tmp", (char *)0 }; > int ret; > > child = vfork(); > > if (child == 0) > { > ret = execve("/bin/bash", cmd, env); > printf ("ret = %d\n", ret); > #if CALL_EXIT == 1 > _exit(1); > #endif > } > else > { > printf("child != 0\n"); > } > > printf("parent is %d\n", (unsigned int)getpid()); > printf("child is %d\n", (unsigned int)child); > > return 0; > } Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 15:42 ` John David Anglin @ 2010-03-28 16:03 ` John David Anglin 2010-03-28 17:26 ` Carlos O'Donell 2010-03-28 17:32 ` John David Anglin 2 siblings, 0 replies; 23+ messages in thread From: John David Anglin @ 2010-03-28 16:03 UTC (permalink / raw) To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc > My guess would be that the parent is not blocked leading to corruption. I did a quick test and the child always seems to run first. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 15:42 ` John David Anglin 2010-03-28 16:03 ` John David Anglin @ 2010-03-28 17:26 ` Carlos O'Donell 2010-03-28 18:00 ` John David Anglin 2010-03-28 17:32 ` John David Anglin 2 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 17:26 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc [-- Attachment #1: Type: text/plain, Size: 2619 bytes --] On Sun, Mar 28, 2010 at 11:42 AM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> Test case attached. > > I was not able to save your pt-fork.S file. mutt didn't understand > the encoding. Attached as vforktest.tgz. > I tried the vfork.c test case on my c3750 with 32-bit kernel. It > didn't segv in a limited number of runs. However, I did notice that > getpid() is broken after vfork(). Broken how? A small note about getpid(): Since the parent and child share the same memory space, the parent temporarily negates the cached PID value just before the vfork. Only the parent, after the vfork, restores the cached PID. This is on purpose to prevent any functions like PID-related functions from working in the child() during the window bewteen vfork and execve. > The printf call in the child path probably isn't legit. It's also not > legit to simply return if execve fails. _exit must be called. The printf call in the child path is ABSOLUTELY wrong, but I wanted to see what the return was. You could equally well insert: #define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)") And cause the child to abort with an invalid instruction trap, and inspect r28 at the time of the fault to see what execve returned. In fact I use this in my testing now to avoid printf. > My guess would be that the parent is not blocked leading to corruption. What I see is that the child is returning from the kernel execve, and continuing to execute. If I use ABORT_INSTRUCTION in the child right after the execve I see this: carlos@firin:~/fsrc/vforkfailure/test$ ./vfork Illegal instruction carlos@firin:~/fsrc/vforkfailure/test$ In child 28734 This makes *no* sense. What I don't understand is this: * vfork'd child shares memory space with parent, therefore space id has to be the same? * Does execve give the process a new space id or does it simply overwrite the entire process image? So I had the test case try to exec a binary that just calls ABORT_INSTRUCTION. What I noticed is the following: carlos@firin:~/fsrc/vforkfailure/test$ ./vfork parent sr3 is 0x3428000 Illegal instruction There are 2 entries in dmesg: vfork (pid 28964): Illegal instruction (code 8) at 00000000000106a7 main (pid 28965): Illegal instruction (code 8) at 00000000000104af * The child returning from execve has the same space id as the parent e.g. 0x3418000 * The process run via the execve has a different space id e.g. 0x3418800 What is supposed to stop the child, which called execve, from returning to the parent? Cheers, Carlos. [-- Attachment #2: vforktest.tgz --] [-- Type: application/x-gzip, Size: 18051 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 17:26 ` Carlos O'Donell @ 2010-03-28 18:00 ` John David Anglin 2010-03-28 18:47 ` Carlos O'Donell 2010-03-28 19:07 ` John David Anglin 0 siblings, 2 replies; 23+ messages in thread From: John David Anglin @ 2010-03-28 18:00 UTC (permalink / raw) To: Carlos O'Donell Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc On Sun, 28 Mar 2010, Carlos O'Donell wrote: > Broken how? See other mail. > A small note about getpid(): > > Since the parent and child share the same memory space, the parent > temporarily negates the cached PID value just before the vfork. Only > the parent, after the vfork, restores the cached PID. This is on > purpose to prevent any functions like PID-related functions from > working in the child() during the window bewteen vfork and execve. Ok, it might be glibc that's broken (i.e., it doesn't restore correct dave). > The printf call in the child path is ABSOLUTELY wrong, but I wanted to > see what the return was. You could equally well insert: > > #define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)") Tried that and it didn't trigger. > > My guess would be that the parent is not blocked leading to corruption. > > What I see is that the child is returning from the kernel execve, and > continuing to execute. Hmmm, I'm not seeing that, but I do have the patch installed that reloads %r16 after calling schedule. %r16 isn't clobbered by the call but I believe %cr30 can change. As a result, we may run a task with the wrong registers with the current code. See <http://article.gmane.org/gmane.linux.ports.parisc/2687>. > If I use ABORT_INSTRUCTION in the child right after the execve I see this: Already tried this. > carlos@firin:~/fsrc/vforkfailure/test$ ./vfork > Illegal instruction > carlos@firin:~/fsrc/vforkfailure/test$ In child 28734 > > This makes *no* sense. > > What I don't understand is this: > > * vfork'd child shares memory space with parent, therefore space id > has to be the same? Sounds right. > * Does execve give the process a new space id or does it simply > overwrite the entire process image? > > So I had the test case try to exec a binary that just calls ABORT_INSTRUCTION. > > What I noticed is the following: > > carlos@firin:~/fsrc/vforkfailure/test$ ./vfork > parent sr3 is 0x3428000 > Illegal instruction > > There are 2 entries in dmesg: > vfork (pid 28964): Illegal instruction (code 8) at 00000000000106a7 > main (pid 28965): Illegal instruction (code 8) at 00000000000104af > > * The child returning from execve has the same space id as the parent > e.g. 0x3418000 > * The process run via the execve has a different space id e.g. 0x3418800 That's what I would have expected for execve. > What is supposed to stop the child, which called execve, from > returning to the parent? It's only supposed to happen if there is an error in the execve call. I think our task scheduling is broken ;( Could you try my change and see if it helps? Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 18:00 ` John David Anglin @ 2010-03-28 18:47 ` Carlos O'Donell 2010-03-28 19:07 ` John David Anglin 1 sibling, 0 replies; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 18:47 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc [-- Attachment #1: Type: text/plain, Size: 1891 bytes --] On Sun, Mar 28, 2010 at 2:00 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: > Ok, it might be glibc that's broken (i.e., it doesn't restore correct > dave). I don't think this is broken. >> The printf call in the child path is ABSOLUTELY wrong, but I wanted to >> see what the return was. You could equally well insert: >> >> #define ABORT_INSTRUCTION asm ("iitlbp %r0,(%sr0, %r0)") > > Tried that and it didn't trigger. OK, your kernel is perhaps fixed? >> > My guess would be that the parent is not blocked leading to corruption. >> >> What I see is that the child is returning from the kernel execve, and >> continuing to execute. > > Hmmm, I'm not seeing that, but I do have the patch installed that > reloads %r16 after calling schedule. %r16 isn't clobbered by the > call but I believe %cr30 can change. As a result, we may run a task > with the wrong registers with the current code. When schedule returns you are either still in the current process, or have switched to another process via arch/parisc/kernel/entry.S (_switch_to), so yes %cr30 will have been changed and must be reloaded (kernel threads IIUC don't have their own register state, you are simply just switching %cr30 around and reloading the appropriate pt_regs). Where are the new space ids loaded into the space registers for the new process? > See <http://article.gmane.org/gmane.linux.ports.parisc/2687>. Applied. > It's only supposed to happen if there is an error in the execve call. > I think our task scheduling is broken ;( > > Could you try my change and see if it helps? It doesn't help. Just so we are on the same page, please find attached vforktest.tgz. Unpack it. Run ./build.sh Run ./vfork What are the results on your machine? I'm running a 64-bit kernel 2.6.32 (kyle's tree) SMP two PA8700's at 650MHz. Cheers, Carlos. [-- Attachment #2: vforktest.tgz --] [-- Type: application/x-gzip, Size: 21529 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 18:00 ` John David Anglin 2010-03-28 18:47 ` Carlos O'Donell @ 2010-03-28 19:07 ` John David Anglin 2010-03-28 19:31 ` John David Anglin 1 sibling, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-28 19:07 UTC (permalink / raw) To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc > > A small note about getpid(): > > > > Since the parent and child share the same memory space, the parent > > temporarily negates the cached PID value just before the vfork. Only > > the parent, after the vfork, restores the cached PID. This is on > > purpose to prevent any functions like PID-related functions from > > working in the child() during the window bewteen vfork and execve. > > Ok, it might be glibc that's broken (i.e., it doesn't restore correct > dave). The bad value is stored here: 0x0003f578 <fork+700>: ldil L%7b800,r26 0x0003f57c <fork+704>: b,l 0x35d74 <_IO_list_unlock>,rp 0x0003f580 <fork+708>: stw r19,-414(r6) (gdb) p/x $r19 $2 = 0x10101364 Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 19:07 ` John David Anglin @ 2010-03-28 19:31 ` John David Anglin 2010-03-28 19:40 ` Carlos O'Donell 0 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-28 19:31 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, carlos, deller, kyle, linux-parisc > The bad value is stored here: > > 0x0003f578 <fork+700>: ldil L%7b800,r26 > 0x0003f57c <fork+704>: b,l 0x35d74 <_IO_list_unlock>,rp > 0x0003f580 <fork+708>: stw r19,-414(r6) > (gdb) p/x $r19 > $2 = 0x10101364 %r19 is clobbered by the clone syscall: (gdb) disass 0x0003f44c 0x0003f458 Dump of assembler code from 0x3f44c to 0x3f458: 0x0003f44c <fork+400>: be,l 100(sr2,r0),sr0,r31 0x0003f450 <fork+404>: ldi 78,r20 0x0003f454 <fork+408>: nop Breakpoint 5, 0x0003f44c in fork () (gdb) p/x $r19 $6 = 0x0 Breakpoint 4, 0x0003f454 in fork () (gdb) p/x $r19 $7 = 0x10101364 Same value for %r19 is returned for both parent and child. This is using application linked with -static (i.e., libc.a). Probably, shared lib is different. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 19:31 ` John David Anglin @ 2010-03-28 19:40 ` Carlos O'Donell [not found] ` <20100328202239.162785145@hiauly1.hia.nrc.ca> 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 19:40 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc On Sun, Mar 28, 2010 at 3:31 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> The bad value is stored here: >> >> 0x0003f578 <fork+700>: =A0 =A0 =A0 =A0ldil L%7b800,r26 >> 0x0003f57c <fork+704>: =A0 =A0 =A0 =A0b,l 0x35d74 <_IO_list_unlock>,= rp >> 0x0003f580 <fork+708>: =A0 =A0 =A0 =A0stw r19,-414(r6) >> (gdb) p/x $r19 >> $2 =3D 0x10101364 > > %r19 is clobbered by the clone syscall: This is a known issue with the kernel assembly, fork, vfork, and clone all clobber r19, which is why glibc saves/restore r19 across syscalls... but only in the PIC case. ARCH_FORK -> INLINE_SYSCALL (clone, 5, = \ CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, = \ NULL, NULL, NULL, &THREAD_SELF->tid) INLINE_SYSCALL does: asm volatile( = \ SAVE_ASM_PIC = \ " ble 0x100(%%sr2, %%r0)\n" = \ " ldi %1, %%r20\n" = \ LOAD_ASM_PIC = \ : "=3Dr" (__res) = \ : "i" (SYS_ify(name)) PIC_REG_USE ASM_ARGS_##nr= \ : "memory", CALL_CLOB_REGS CLOB_ARGS_##nr = \ ); = \ So r19 is saved/restored around the clone syscall, but *ONLY* if the code is PIC. Therefore in your non-PIC case it will corrupt r19. Actually, looking at the fork/vfork/clone wrappers in entry.S they clobber two registers: /* These are call-clobbered registers and therefore also syscall-clobbered (we hope). */ STREG %r2,PT_GR19(%r1) /* save for child */ STREG %r30,PT_GR21(%r1) It would appear that r21 is also clobbered. Does this mean that we need to either (a) fix the kernel to stop clobbering r19 and r21 or fix userspace? I'm inclined to think the kernel is wrong, and lazy here, and pt_regs should have another PT_GR2_CHILD and PT_GR30_CHILD entries for the case above. Comments? > (gdb) disass 0x0003f44c 0x0003f458 > Dump of assembler code from 0x3f44c to 0x3f458: > 0x0003f44c <fork+400>: =A0be,l 100(sr2,r0),sr0,r31 > 0x0003f450 <fork+404>: =A0ldi 78,r20 > 0x0003f454 <fork+408>: =A0nop > > Breakpoint 5, 0x0003f44c in fork () > (gdb) p/x $r19 > $6 =3D 0x0 > > Breakpoint 4, 0x0003f454 in fork () > (gdb) p/x $r19 > $7 =3D 0x10101364 > > Same value for %r19 is returned for both parent and child. > > This is using application linked with -static (i.e., libc.a). =A0Prob= ably, > shared lib is different. I hope my explanation above shows why r19 is clobbered. I hadn't thought about the static case and a non-PIC register use of r1= 9. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20100328202239.162785145@hiauly1.hia.nrc.ca>]
* Re: Question about execve. [not found] ` <20100328202239.162785145@hiauly1.hia.nrc.ca> @ 2010-03-28 20:39 ` Carlos O'Donell [not found] ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca> 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 20:39 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc On Sun, Mar 28, 2010 at 4:22 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> Does this mean that we need to either (a) fix the kernel to stop >> clobbering r19 and r21 or fix userspace? > > r21 is an syscall argument register and it looks like CLOB_ARGS_* > handles it. =A0So, I doubt we need to worry about about r21. Right. >> I'm inclined to think the kernel is wrong, and lazy here, and pt_reg= s >> should have another PT_GR2_CHILD and PT_GR30_CHILD entries for the >> case above. > > Fixing the kernel will fix broken userspace code. =A0Fixing glibc to > save and restore r19 in non-PIC code will allow new code to work > with broken kernels. =A0Thus, I tend to think both should be fixed. Agreed, I'll get a fix for this into glibc. > Instead of using PT_GR19 and PT_GR21, can't we use PT_SR4 and PT_SR5 > to store r2 and r30 for the child. =A0We don't seem to ever actually > save sr4 or sr5 in the syscall path and we restore fixed values. > This is a hack, but avoids increasing the size of the task structure. I don't like hacks like this. They are confusing and hard to maintain. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20100328210134.3D6005145@hiauly1.hia.nrc.ca>]
* Re: Question about execve. [not found] ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca> @ 2010-03-28 21:04 ` Carlos O'Donell [not found] ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca> 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 21:04 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, deller, kyle, linux-parisc On Sun, Mar 28, 2010 at 5:01 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> > Instead of using PT_GR19 and PT_GR21, can't we use PT_SR4 and PT_S= R5 >> > to store r2 and r30 for the child. =A0We don't seem to ever actual= ly >> > save sr4 or sr5 in the syscall path and we restore fixed values. >> > This is a hack, but avoids increasing the size of the task structu= re. >> >> I don't like hacks like this. They are confusing and hard to maintai= n. > > However, it says in ptrace.h: > > =A0* N.B. gdb/strace care about the size and offsets within this > =A0* structure. If you change things, you may break object compatibil= ity > =A0* for those applications. > > There is one free slot in pt_regs: > > =A0 =A0 =A0 =A0unsigned long pad0; =A0 =A0 /* available for other use= s */ > > Given that this struct leaks to userspace, I don't think we should > mess with it. Maybe. I would certainly test this out before adding new entries. If I provide you with a new glibc with the r19 fix would you be able to test it out? Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20100328212110.03F224E77@hiauly1.hia.nrc.ca>]
* Re: Question about execve. [not found] ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca> @ 2010-03-28 23:12 ` Carlos O'Donell 2010-03-28 23:59 ` John David Anglin 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 23:12 UTC (permalink / raw) To: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc On Sun, Mar 28, 2010 at 5:21 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> If I provide you with a new glibc with the r19 fix would you be able >> to test it out? > > Yes. I know it's easier to debug a static application for these cases, so I think a fixed glibc will help. I have an unstable libc6 currently building with the changes to save/restore r19 in all cases. We still aren't any closer to explaining why my shared case doesn't work. I still don't have a good answer to this question: * What prevents the execve from returning to userspace? Cheers, Carlos. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 23:12 ` Carlos O'Donell @ 2010-03-28 23:59 ` John David Anglin 2010-03-29 0:24 ` Carlos O'Donell 0 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-28 23:59 UTC (permalink / raw) To: Carlos O'Donell; +Cc: Helge Deller, Kyle McMartin, linux-parisc On Sun, 28 Mar 2010, Carlos O'Donell wrote: > On Sun, Mar 28, 2010 at 5:21 PM, John David Anglin > <dave@hiauly1.hia.nrc.ca> wrote: > >> If I provide you with a new glibc with the r19 fix would you be able > >> to test it out? > > > > Yes. > > I know it's easier to debug a static application for these cases, so I > think a fixed glibc will help. Yes. It's proving more difficult than I thought to fix the kernel and I'm not sure I understand how this works: LDREG TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE(%r30), %r1 LDREG TASK_PT_GR19(%r1),%r2 b wrapper_exit copy %r0,%r28 It appears the child has to access the parent's pt_regs struct to load the return for the child. > I have an unstable libc6 currently building with the changes to > save/restore r19 in all cases. Great. > We still aren't any closer to explaining why my shared case doesn't work. > > I still don't have a good answer to this question: > > * What prevents the execve from returning to userspace? I presume you mean the caller's userspace. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 23:59 ` John David Anglin @ 2010-03-29 0:24 ` Carlos O'Donell 2010-03-29 2:38 ` John David Anglin 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-29 0:24 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> I know it's easier to debug a static application for these cases, so= I >> think a fixed glibc will help. > > Yes. > > It's proving more difficult than I thought to fix the kernel and > I'm not sure I understand how this works: > > =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE= (%r30), %r1 > =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2 > =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit > =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28 > > It appears the child has to access the parent's pt_regs struct > to load the return for the child. I see that copy_process in do_fork will give the child a copy of the entire pt_reg structure so it has access to these values anyway? >> * What prevents the execve from returning to userspace? > > I presume you mean the caller's userspace. Yes, the process that calls execve. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-29 0:24 ` Carlos O'Donell @ 2010-03-29 2:38 ` John David Anglin 2010-03-29 12:11 ` Carlos O'Donell 0 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-29 2:38 UTC (permalink / raw) To: Carlos O'Donell Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc On Sun, 28 Mar 2010, Carlos O'Donell wrote: > On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin > <dave@hiauly1.hia.nrc.ca> wrote: > >> I know it's easier to debug a static application for these cases, = so I > >> think a fixed glibc will help. > > > > Yes. > > > > It's proving more difficult than I thought to fix the kernel and > > I'm not sure I understand how this works: > > > > =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SI= ZE(%r30), %r1 > > =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2 > > =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit > > =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28 > > > > It appears the child has to access the parent's pt_regs struct > > to load the return for the child. >=20 > I see that copy_process in do_fork will give the child a copy of the > entire pt_reg structure so it has access to these values anyway? I got a kernel to boot which shouldn't clobber r19. Turned out to be a typo. This uses PT_SR4 for the kernel return for the child. Will test with gcc build. Will try again using the pad0 field later. Dave --=20 J. David Anglin dave.anglin@nrc-cnrc.g= c.ca National Research Council of Canada (613) 990-0752 (FAX: 9= 52-6602) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-29 2:38 ` John David Anglin @ 2010-03-29 12:11 ` Carlos O'Donell 2010-03-29 14:02 ` John David Anglin 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-29 12:11 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc On Sun, Mar 28, 2010 at 10:38 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: > On Sun, 28 Mar 2010, Carlos O'Donell wrote: > >> On Sun, Mar 28, 2010 at 7:59 PM, John David Anglin >> <dave@hiauly1.hia.nrc.ca> wrote: >> >> I know it's easier to debug a static application for these cases,= so I >> >> think a fixed glibc will help. >> > >> > Yes. >> > >> > It's proving more difficult than I thought to fix the kernel and >> > I'm not sure I understand how this works: >> > >> > =A0 =A0 =A0 =A0LDREG =A0 TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_S= IZE(%r30), %r1 >> > =A0 =A0 =A0 =A0LDREG =A0 TASK_PT_GR19(%r1),%r2 >> > =A0 =A0 =A0 =A0b =A0 =A0 =A0 wrapper_exit >> > =A0 =A0 =A0 =A0copy =A0 =A0%r0,%r28 >> > >> > It appears the child has to access the parent's pt_regs struct >> > to load the return for the child. >> >> I see that copy_process in do_fork will give the child a copy of the >> entire pt_reg structure so it has access to these values anyway? > > I got a kernel to boot which shouldn't clobber r19. =A0Turned out to > be a typo. =A0This uses PT_SR4 for the kernel return for the child. > Will test with gcc build. > > Will try again using the pad0 field later. Sorry, what turned out to be a typo? I guess any static binary that calls fork/vfork/clone is susceptible to corruption of r19 in this way. I don't know how much that effects userspace, since most is shared, *however* the dynamic linker is technically a static binary :-) Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-29 12:11 ` Carlos O'Donell @ 2010-03-29 14:02 ` John David Anglin 2010-03-31 13:55 ` Carlos O'Donell 0 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-29 14:02 UTC (permalink / raw) To: Carlos O'Donell Cc: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc On Mon, 29 Mar 2010, Carlos O'Donell wrote: > > I got a kernel to boot which shouldn't clobber r19. =A0Turned out t= o > > be a typo. =A0This uses PT_SR4 for the kernel return for the child. > > Will test with gcc build. > > > > Will try again using the pad0 field later. >=20 > Sorry, what turned out to be a typo? I used PT_SR4 instead of TASK_PT_SR4 in child_return ;( > I guess any static binary that calls fork/vfork/clone is susceptible > to corruption of r19 in this way. >=20 > I don't know how much that effects userspace, since most is shared, > *however* the dynamic linker is technically a static binary :-) The change may not fix much, but there's no technical reason that the kernel needs to clobber r19. Dave --=20 J. David Anglin dave.anglin@nrc-cnrc.g= c.ca National Research Council of Canada (613) 990-0752 (FAX: 9= 52-6602) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-29 14:02 ` John David Anglin @ 2010-03-31 13:55 ` Carlos O'Donell 2010-04-01 10:54 ` Helge Deller 0 siblings, 1 reply; 23+ messages in thread From: Carlos O'Donell @ 2010-03-31 13:55 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Kyle McMartin, linux-parisc On Mon, Mar 29, 2010 at 10:02 AM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: > The change may not fix much, but there's no technical reason that > the kernel needs to clobber r19. Rebuilt 2.10.2-6 here with r19 save for all cases. http://www.parisc-linux.org/~carlos/saver19/ Two regressions during build, but they appear unrelated. Could you test this in a chroot and tell me if it works for you? Cheers, Carlos. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-31 13:55 ` Carlos O'Donell @ 2010-04-01 10:54 ` Helge Deller 2010-04-01 13:33 ` John David Anglin 0 siblings, 1 reply; 23+ messages in thread From: Helge Deller @ 2010-04-01 10:54 UTC (permalink / raw) To: Carlos O'Donell, dave.anglin; +Cc: linux-parisc, kyle > Rebuilt 2.10.2-6 here with r19 save for all cases. >=20 > http://www.parisc-linux.org/~carlos/saver19/ Hi Carlos, I installed this libc on one of my servers. I didn't ran any bigger tests yet, but at least it seems to have fixed = the issue that an ssh into the machine often didn't let me in the first= time. Now it just works... Thanks! Helge --=20 NEU: GMX DSL f=FCr 19,99 EUR/mtl. und ohne Mindest-Laufzeit! http://portal.gmx.net/de/go/dsl02 -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-04-01 10:54 ` Helge Deller @ 2010-04-01 13:33 ` John David Anglin 2010-04-01 18:55 ` Helge Deller 0 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-04-01 13:33 UTC (permalink / raw) To: Helge Deller; +Cc: linux-parisc, kyle, carlos, dave.anglin Hi Helge, > I installed this libc on one of my servers. > I didn't ran any bigger tests yet, but at least it seems to have fixed the issue that an ssh into the machine often didn't let me in the first time. Now it just works... I think this might be another issue. Strangely, I used to see this one my c3750 and it' now gone away. I originally noticed a problem with Carlos's vfork testcase when compiled with -static. The parent pid printed by the parent was corrupted (always same value) by the vfork call (r19 was not saved across syscall). If you check that it now works, the problem is probably fixed in glibc. I have attached below a kernel fix for this bug. This allows old user code to work correctly. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) Signed-off-by: John Daqvid Anglin <dave.anglin@nrc-cnrc.gc.ca> diff --git a/arch/parisc/kernel/asm-offsets.c b/arch/parisc/kernel/asm-offsets.c index ec787b4..b2f35b2 100644 --- a/arch/parisc/kernel/asm-offsets.c +++ b/arch/parisc/kernel/asm-offsets.c @@ -137,6 +137,7 @@ int main(void) DEFINE(TASK_PT_IAOQ0, offsetof(struct task_struct, thread.regs.iaoq[0])); DEFINE(TASK_PT_IAOQ1, offsetof(struct task_struct, thread.regs.iaoq[1])); DEFINE(TASK_PT_CR27, offsetof(struct task_struct, thread.regs.cr27)); + DEFINE(TASK_PT_SYSCALL_RP, offsetof(struct task_struct, thread.regs.pad0)); DEFINE(TASK_PT_ORIG_R28, offsetof(struct task_struct, thread.regs.orig_r28)); DEFINE(TASK_PT_KSP, offsetof(struct task_struct, thread.regs.ksp)); DEFINE(TASK_PT_KPC, offsetof(struct task_struct, thread.regs.kpc)); @@ -225,6 +226,7 @@ int main(void) DEFINE(PT_IAOQ0, offsetof(struct pt_regs, iaoq[0])); DEFINE(PT_IAOQ1, offsetof(struct pt_regs, iaoq[1])); DEFINE(PT_CR27, offsetof(struct pt_regs, cr27)); + DEFINE(PT_SYSCALL_RP, offsetof(struct pt_regs, pad0)); DEFINE(PT_ORIG_R28, offsetof(struct pt_regs, orig_r28)); DEFINE(PT_KSP, offsetof(struct pt_regs, ksp)); DEFINE(PT_KPC, offsetof(struct pt_regs, kpc)); diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index 3a44f7f..5772cba 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -1772,9 +1744,9 @@ ENTRY(sys_fork_wrapper) ldo -16(%r30),%r29 /* Reference param save area */ #endif - /* These are call-clobbered registers and therefore - also syscall-clobbered (we hope). */ - STREG %r2,PT_GR19(%r1) /* save for child */ + STREG %r2,PT_SYSCALL_RP(%r1) + + /* WARNING - Clobbers r21, userspace must save! */ STREG %r30,PT_GR21(%r1) LDREG PT_GR30(%r1),%r25 @@ -1804,7 +1776,7 @@ ENTRY(child_return) nop LDREG TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE(%r30), %r1 - LDREG TASK_PT_GR19(%r1),%r2 + LDREG TASK_PT_SYSCALL_RP(%r1),%r2 b wrapper_exit copy %r0,%r28 ENDPROC(child_return) @@ -1823,8 +1795,9 @@ ENTRY(sys_clone_wrapper) ldo -16(%r30),%r29 /* Reference param save area */ #endif - /* WARNING - Clobbers r19 and r21, userspace must save these! */ - STREG %r2,PT_GR19(%r1) /* save for child */ + STREG %r2,PT_SYSCALL_RP(%r1) + + /* WARNING - Clobbers r21, userspace must save! */ STREG %r30,PT_GR21(%r1) BL sys_clone,%r2 copy %r1,%r24 @@ -1847,7 +1820,9 @@ ENTRY(sys_vfork_wrapper) ldo -16(%r30),%r29 /* Reference param save area */ #endif - STREG %r2,PT_GR19(%r1) /* save for child */ + STREG %r2,PT_SYSCALL_RP(%r1) + + /* WARNING - Clobbers r21, userspace must save! */ STREG %r30,PT_GR21(%r1) BL sys_vfork,%r2 diff --git a/arch/parisc/hpux/wrappers.S b/arch/parisc/hpux/wrappers.S index 58c53c8..bdcea33 100644 --- a/arch/parisc/hpux/wrappers.S +++ b/arch/parisc/hpux/wrappers.S @@ -88,7 +88,7 @@ ENTRY(hpux_fork_wrapper) STREG %r2,-20(%r30) ldo 64(%r30),%r30 - STREG %r2,PT_GR19(%r1) ;! save for child + STREG %r2,PT_SYSCALL_RP(%r1) ;! save for child STREG %r30,PT_GR21(%r1) ;! save for child LDREG PT_GR30(%r1),%r25 @@ -132,7 +132,7 @@ ENTRY(hpux_child_return) bl,n schedule_tail, %r2 #endif - LDREG TASK_PT_GR19-TASK_SZ_ALGN-128(%r30),%r2 + LDREG TASK_PT_SYSCALL_RP-TASK_SZ_ALGN-128(%r30),%r2 b fork_return copy %r0,%r28 ENDPROC(hpux_child_return) ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-04-01 13:33 ` John David Anglin @ 2010-04-01 18:55 ` Helge Deller 0 siblings, 0 replies; 23+ messages in thread From: Helge Deller @ 2010-04-01 18:55 UTC (permalink / raw) To: John David Anglin; +Cc: linux-parisc, kyle, carlos, dave.anglin On 04/01/2010 03:33 PM, John David Anglin wrote: > Hi Helge, > >> I installed this libc on one of my servers. >> I didn't ran any bigger tests yet, but at least it seems to have fixed the issue that an ssh into the machine often didn't let me in the first time. Now it just works... > > I think this might be another issue. Strangely, I used to see this > one my c3750 and it' now gone away. Sadly I was wrong and answered too fast. The ssh-issue came back at some point, so the new glibc didn't fixed it.... :-( > I originally noticed a problem with Carlos's vfork testcase when compiled > with -static. The parent pid printed by the parent was corrupted (always > same value) by the vfork call (r19 was not saved across syscall). If you > check that it now works, the problem is probably fixed in glibc. > > I have attached below a kernel fix for this bug. This allows old user > code to work correctly. Thanks for the work you and Carlos do! Helge ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 15:42 ` John David Anglin 2010-03-28 16:03 ` John David Anglin 2010-03-28 17:26 ` Carlos O'Donell @ 2010-03-28 17:32 ` John David Anglin 2010-03-28 18:20 ` Carlos O'Donell 2 siblings, 1 reply; 23+ messages in thread From: John David Anglin @ 2010-03-28 17:32 UTC (permalink / raw) To: dave.anglin; +Cc: carlos, deller, kyle, linux-parisc > I tried the vfork.c test case on my c3750 with 32-bit kernel. It > didn't segv in a limited number of runs. However, I did notice that > getpid() is broken after vfork(). The vfork (clone) syscall corrupts (i.e., inserts wrong value) the parent tid. In the following, I disabled the printf's and execve call in Carlos's testcase. The child just does _exit(). The fast path through getpid is: Dump of assembler code for function getpid: 0x0001ad2c <getpid+0>: stw rp,-14(sp) 0x0001ad30 <getpid+4>: mfctl tr3,r20 0x0001ad34 <getpid+8>: ldw -414(r20),r19 0x0001ad38 <getpid+12>: cmpib,>= 0,r19,0x1ad48 <getpid+28> 0x0001ad3c <getpid+16>: copy r19,ret0 0x0001ad40 <getpid+20>: ldw -14(sp),rp 0x0001ad44 <getpid+24>: bv,n r0(rp) Breakpoint 3, 0x0001ad34 in getpid () (gdb) del 2 (gdb) p/x $r20 $6 = 0x9a480 Breakpoint 4, main () at vfork.c:17 17 child = vfork(); (gdb) x/x 0x9a480 - 0x414 0x9a06c: 0x00000000 (gdb) c Continuing. Breakpoint 3, 0x0001ad34 in getpid () (gdb) x/x 0x9a480 - 0x414 0x9a06c: 0x10101364 Breakpoint 4, main () at vfork.c:17 17 child = vfork(); (gdb) x/64x 0x9a040 0x9a040: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a050: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a060: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a070: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a080: 0xc0521150 0x00000000 0x00000000 0x00000000 Breakpoint 3, 0x0001ad34 in getpid () (gdb) x/64x 0x9a040 0x9a040: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a050: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a060: 0x00000000 0x00000000 0x00000000 0x10101364 0x9a070: 0x00000000 0x00000000 0x00000000 0x00000000 0x9a080: 0xc03ae150 0x00000000 0x00000000 0x00000000 So, the only location changed by vfork is the parent tid. dave@hiauly6:~$ strace ./vfork execve("./vfork", ["./vfork"], [/* 16 vars */]) = 0 newuname({sys="Linux", node="hiauly6", ...}) = 0 brk(0) = 0x9a000 brk(0x9acb4) = 0x9acb4 brk(0xbbcb4) = 0xbbcb4 brk(0xbc000) = 0xbc000 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x9a068) = 6212 fstat64(0x1, 0xc0258a08) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000 write(1, "parent is 269488996\n", 20parent is 269488996 ) = 20 exit_group(0) = ? (gdb) p/x 269488996 $1 = 0x10101364 I believe this is the "getpid" implementation: static inline __attribute__((always_inline)) pid_t really_getpid (pid_t oldval) { if (__builtin_expect (oldval == 0, 1)) { pid_t selftid = THREAD_GETMEM (THREAD_SELF, tid); if (__builtin_expect (selftid != 0, 1)) return selftid; } INTERNAL_SYSCALL_DECL (err); pid_t result = INTERNAL_SYSCALL (getpid, err, 0); /* We do not set the PID field in the TID here since we might be called from a signal handler while the thread executes fork. */ if (oldval == 0) THREAD_SETMEM (THREAD_SELF, tid, result); return result; } As a side issue, gdb can't single step over mfctl instruction ;( Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Question about execve. 2010-03-28 17:32 ` John David Anglin @ 2010-03-28 18:20 ` Carlos O'Donell 0 siblings, 0 replies; 23+ messages in thread From: Carlos O'Donell @ 2010-03-28 18:20 UTC (permalink / raw) To: John David Anglin, Helge Deller, Kyle McMartin, linux-parisc On Sun, Mar 28, 2010 at 1:32 PM, John David Anglin <dave@hiauly1.hia.nrc.ca> wrote: >> I tried the vfork.c test case on my c3750 with 32-bit kernel. =A0It >> didn't segv in a limited number of runs. =A0However, I did notice th= at >> getpid() is broken after vfork(). > > The vfork (clone) syscall corrupts (i.e., inserts wrong value) > the parent tid. =A0In the following, I disabled the printf's and > execve call in Carlos's testcase. =A0The child just does _exit(). The vfork syscall goes through process.c (sys_vfork) which doesn't tell the kernel about the parent_tidptr, or child_tidptr (passes NULL for both). The kernel shouldn't be touching the parent tid pointer at all. The vfork wrapper in glibc *does* negate the cached PID value, such that the child doesn't see a valid PID value until after execve completes. > The fast path through getpid is: > > Dump of assembler code for function getpid: > 0x0001ad2c <getpid+0>: =A0stw rp,-14(sp) > 0x0001ad30 <getpid+4>: =A0mfctl tr3,r20 > 0x0001ad34 <getpid+8>: =A0ldw -414(r20),r19 > 0x0001ad38 <getpid+12>: cmpib,>=3D 0,r19,0x1ad48 <getpid+28> > 0x0001ad3c <getpid+16>: copy r19,ret0 > 0x0001ad40 <getpid+20>: ldw -14(sp),rp > 0x0001ad44 <getpid+24>: bv,n r0(rp) > > Breakpoint 3, 0x0001ad34 in getpid () > (gdb) del 2 > (gdb) p/x $r20 > $6 =3D 0x9a480 > > Breakpoint 4, main () at vfork.c:17 > 17 =A0 =A0 =A0 =A0child =3D vfork(); > (gdb) x/x 0x9a480 - 0x414 > 0x9a06c: =A0 =A0 =A0 =A00x00000000 > (gdb) c > Continuing. This is the PID of the parent, not the TID. They are actually two different fields. nptl/descr.h ~~~ /* Thread ID - which is also a 'is this thread descriptor (and therefore stack) used' flag. */ pid_t tid; /* Process ID - thread group ID in kernel speak. */ pid_t pid; ~~~ The PID of all the threads in a process group is the same. Each thread has a unique TID, which is During the vfork the parent does this: ~~~ /* Load thread register. */ mfctl %cr27, %r26 ! /* Load cached parent PID. */ ldw -1044(%r26),%r1 ! /* Negate it, such that the child runs with a negative PID and no functions work until the execve. */ sub %r0,%r1,%r1 ! /* Store it back. */ stw %r1,-1044(%r26) ! ~~~ > Breakpoint 3, 0x0001ad34 in getpid () > (gdb) x/x 0x9a480 - 0x414 > 0x9a06c: =A0 =A0 =A0 =A00x10101364 I don't see how this is the negative of 0x0, it should just be 0x0. I wonder what changed it. > Breakpoint 4, main () at vfork.c:17 > 17 =A0 =A0 =A0 =A0child =3D vfork(); > (gdb) x/64x 0x9a040 > 0x9a040: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a050: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a060: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a070: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a080: =A0 =A0 =A0 =A00xc0521150 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > > Breakpoint 3, 0x0001ad34 in getpid () > (gdb) x/64x 0x9a040 > 0x9a040: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a050: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a060: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x10101364 > 0x9a070: =A0 =A0 =A0 =A00x00000000 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > 0x9a080: =A0 =A0 =A0 =A00xc03ae150 =A0 =A0 =A00x00000000 =A0 =A0 =A00= x00000000 =A0 =A0 =A00x00000000 > > So, the only location changed by vfork is the parent tid. s/tid/pid/g. > dave@hiauly6:~$ strace ./vfork > execve("./vfork", ["./vfork"], [/* 16 vars */]) =3D 0 > newuname({sys=3D"Linux", node=3D"hiauly6", ...}) =3D 0 > brk(0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0=3D 0x9a000 > brk(0x9acb4) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D= 0x9acb4 > brk(0xbbcb4) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D= 0xbbcb4 > brk(0xbc000) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D= 0xbc000 > clone(child_stack=3D0, flags=3DCLONE_CHILD_CLEARTID|CLONE_CHILD_SETTI= D|SIGCHLD, child_tidptr=3D0x9a068) =3D 6212 > fstat64(0x1, 0xc0258a08) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 0 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,= 0) =3D 0x40000000 > write(1, "parent is 269488996\n", 20parent is 269488996 > ) =A0 =3D 20 > exit_group(0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D= ? > > (gdb) p/x 269488996 > $1 =3D 0x10101364 > > I believe this is the "getpid" implementation: > > static inline __attribute__((always_inline)) pid_t > really_getpid (pid_t oldval) > { > =A0if (__builtin_expect (oldval =3D=3D 0, 1)) > =A0 =A0{ > =A0 =A0 =A0pid_t selftid =3D THREAD_GETMEM (THREAD_SELF, tid); > =A0 =A0 =A0if (__builtin_expect (selftid !=3D 0, 1)) > =A0 =A0 =A0 =A0return selftid; > =A0 =A0 } > > =A0INTERNAL_SYSCALL_DECL (err); > =A0pid_t result =3D INTERNAL_SYSCALL (getpid, err, 0); > > =A0/* We do not set the PID field in the TID here since we might be > =A0 =A0 called from a signal handler while the thread executes fork. = =A0*/ > =A0if (oldval =3D=3D 0) > =A0 =A0THREAD_SETMEM (THREAD_SELF, tid, result); > =A0return result; > } > > As a side issue, gdb can't single step over mfctl instruction ;( We'll fix gdb next :-) Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc"= in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2010-04-01 18:55 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-28 3:26 Question about execve Carlos O'Donell
2010-03-28 15:42 ` John David Anglin
2010-03-28 16:03 ` John David Anglin
2010-03-28 17:26 ` Carlos O'Donell
2010-03-28 18:00 ` John David Anglin
2010-03-28 18:47 ` Carlos O'Donell
2010-03-28 19:07 ` John David Anglin
2010-03-28 19:31 ` John David Anglin
2010-03-28 19:40 ` Carlos O'Donell
[not found] ` <20100328202239.162785145@hiauly1.hia.nrc.ca>
2010-03-28 20:39 ` Carlos O'Donell
[not found] ` <20100328210134.3D6005145@hiauly1.hia.nrc.ca>
2010-03-28 21:04 ` Carlos O'Donell
[not found] ` <20100328212110.03F224E77@hiauly1.hia.nrc.ca>
2010-03-28 23:12 ` Carlos O'Donell
2010-03-28 23:59 ` John David Anglin
2010-03-29 0:24 ` Carlos O'Donell
2010-03-29 2:38 ` John David Anglin
2010-03-29 12:11 ` Carlos O'Donell
2010-03-29 14:02 ` John David Anglin
2010-03-31 13:55 ` Carlos O'Donell
2010-04-01 10:54 ` Helge Deller
2010-04-01 13:33 ` John David Anglin
2010-04-01 18:55 ` Helge Deller
2010-03-28 17:32 ` John David Anglin
2010-03-28 18:20 ` Carlos O'Donell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.