* Bug: retry of clone() on Alpha can result in zeroed process thread pointer @ 2014-07-23 8:52 Michael Cree 2014-07-24 18:19 ` Richard Henderson 2014-07-30 19:30 ` Richard Henderson 0 siblings, 2 replies; 4+ messages in thread From: Michael Cree @ 2014-07-23 8:52 UTC (permalink / raw) To: linux-alpha; +Cc: linux-kernel, Richard Henderson I am seeing a bug in clone() on the Alpha architecture. Reported to Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755397 The test suite of glibc sometimes fails in the nptl/tst-eintr3 test with a segmentation fault. I have tracked it down to the thread pointer returned by the rduniq PALcall is occasionally zero when it should point to the TLS. I have only ever seen this occur when running a SMP kernel. Running strace on nptl/tst-eintr3 reveals that the clone() syscall is retried by the kernel if an ERESTARTNOINTR error occurs. At $syscall_error in arch/alpha/kernel/entry.S the kernel handles the error and in doing that it writes to 72(sp) which is where the value of the a3 CPU register on entry to the kernel is stored. Then the kernel retries the clone() function. But the alpha specific code for copy_thread() in arch/alpha/kernel/process.c does not use the passed a3 cpu register (the argument tls), instead it goes to the saved stack to get the value of the a3 register, which on the second call to clone() has been modified to no longer be the value of the a3 cpu register on entry to the kernel. And a latent bomb is laid for userspace in the form of an incorrect process unique value (which is the thread pointer) in the PCB. Am I correct in my analysis and, if so, can we get a fix for this please. Cheers Michael. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: retry of clone() on Alpha can result in zeroed process thread pointer 2014-07-23 8:52 Bug: retry of clone() on Alpha can result in zeroed process thread pointer Michael Cree @ 2014-07-24 18:19 ` Richard Henderson 2014-07-24 19:30 ` Michael Cree 2014-07-30 19:30 ` Richard Henderson 1 sibling, 1 reply; 4+ messages in thread From: Richard Henderson @ 2014-07-24 18:19 UTC (permalink / raw) To: Michael Cree, linux-alpha, linux-kernel On 07/22/2014 10:52 PM, Michael Cree wrote: > Running strace on nptl/tst-eintr3 reveals that the clone() syscall > is retried by the kernel if an ERESTARTNOINTR error occurs. At > $syscall_error in arch/alpha/kernel/entry.S the kernel handles the > error and in doing that it writes to 72(sp) which is where the value > of the a3 CPU register on entry to the kernel is stored. Then the > kernel retries the clone() function. But the alpha specific code > for copy_thread() in arch/alpha/kernel/process.c does not use the > passed a3 cpu register (the argument tls), instead it goes to the > saved stack to get the value of the a3 register, which on the > second call to clone() has been modified to no longer be the value > of the a3 cpu register on entry to the kernel. And a latent bomb > is laid for userspace in the form of an incorrect process unique > value (which is the thread pointer) in the PCB. > > Am I correct in my analysis and, if so, can we get a fix for this > please. Well... let me start with the assumption that we can't possibly restart unless the syscall fails with -ERESTART*. Before we clobber 72($sp), $syscall_error saves the old value in $19. This is the r19 parameter to do_work_pending, and is passed all the way down to syscall_restart where we do restore the original value of a3 for ERESTARTNOINTR. So if there's a path that leads to restart, but doesn't save a3 before clobbering, I don't see it. Do you have an strace dump that shows this? r~ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: retry of clone() on Alpha can result in zeroed process thread pointer 2014-07-24 18:19 ` Richard Henderson @ 2014-07-24 19:30 ` Michael Cree 0 siblings, 0 replies; 4+ messages in thread From: Michael Cree @ 2014-07-24 19:30 UTC (permalink / raw) To: Richard Henderson; +Cc: linux-alpha, linux-kernel On Thu, Jul 24, 2014 at 08:19:52AM -1000, Richard Henderson wrote: > On 07/22/2014 10:52 PM, Michael Cree wrote: > > Running strace on nptl/tst-eintr3 reveals that the clone() syscall > > is retried by the kernel if an ERESTARTNOINTR error occurs. At > > $syscall_error in arch/alpha/kernel/entry.S the kernel handles the > > error and in doing that it writes to 72(sp) which is where the value > > of the a3 CPU register on entry to the kernel is stored. Then the > > kernel retries the clone() function. But the alpha specific code > > for copy_thread() in arch/alpha/kernel/process.c does not use the > > passed a3 cpu register (the argument tls), instead it goes to the > > saved stack to get the value of the a3 register, which on the > > second call to clone() has been modified to no longer be the value > > of the a3 cpu register on entry to the kernel. And a latent bomb > > is laid for userspace in the form of an incorrect process unique > > value (which is the thread pointer) in the PCB. > > > > Am I correct in my analysis and, if so, can we get a fix for this > > please. > > Well... let me start with the assumption that we can't possibly restart unless > the syscall fails with -ERESTART*. > > Before we clobber 72($sp), $syscall_error saves the old value in $19. This is > the r19 parameter to do_work_pending, and is passed all the way down to > syscall_restart where we do restore the original value of a3 for ERESTARTNOINTR. > > So if there's a path that leads to restart, but doesn't save a3 before > clobbering, I don't see it. Do you have an strace dump that shows this? Yes. This is an example of a run of nptl/tst-eintr3 that fails after cutting off quite a bit of stuff at the start to get to the relevant section: clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0x2000121f8e0, child_tidptr=0x2000121f2c0) = ? ERESTARTNOINTR (To be restarted) --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=20086, si_uid=1000} --- write(1, ".", 1.) = 1 sigreturn() (mask []) = -1 ERRNO_312 (Unknown error 312) clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0, child_tidptr=0x2000121f2c0) = 20089 +++ killed by SIGSEGV +++ Note that the retry of clone() has zero for the tls argument. Examining the resultant core dump reveals that tst-eintr3 segfaulted when trying to access a thread local variable and that register v0, used in calculating the TLS location and set up by the rduniq PALcall, is zero. Cheers Michael. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug: retry of clone() on Alpha can result in zeroed process thread pointer 2014-07-23 8:52 Bug: retry of clone() on Alpha can result in zeroed process thread pointer Michael Cree 2014-07-24 18:19 ` Richard Henderson @ 2014-07-30 19:30 ` Richard Henderson 1 sibling, 0 replies; 4+ messages in thread From: Richard Henderson @ 2014-07-30 19:30 UTC (permalink / raw) To: Michael Cree, linux-alpha, linux-kernel Ok, I think I've actually found it this time. It's here: > 281 childregs->r20 = 1; /* OSF/1 has some strange fork() semantics. */ > 282 regs->r20 = 0; We need to delay this r20 silliness until after restarts or something. Or just kill it -- it's not like glibc uses this return value. r~ ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-07-30 19:30 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-23 8:52 Bug: retry of clone() on Alpha can result in zeroed process thread pointer Michael Cree 2014-07-24 18:19 ` Richard Henderson 2014-07-24 19:30 ` Michael Cree 2014-07-30 19:30 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox