* [PATCH] fix for futex_wait signal stack corruption
@ 2007-12-04 20:57 Steven Rostedt
2007-12-04 21:09 ` Linus Torvalds
0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2007-12-04 20:57 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner
Cc: LKML, David Holmes - Sun Microsystems, Andrew Morton,
Linus Torvalds
David Holmes found a bug in the RT patch with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too. The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.
Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.
Here's the relevant code from kernel/futex.c: (not in order in the file)
[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
struct timespec __user *utime, u32 __user *uaddr2,
u32 val3)
{
struct timespec ts;
ktime_t t, *tp = NULL;
u32 val2 = 0;
int cmd = op & FUTEX_CMD_MASK;
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
return -EINVAL;
t = timespec_to_ktime(ts);
if (cmd == FUTEX_WAIT)
t = ktime_add(ktime_get(), t);
tp = &t;
}
[...]
return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}
[...]
long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3)
{
int ret;
int cmd = op & FUTEX_CMD_MASK;
struct rw_semaphore *fshared = NULL;
if (!(op & FUTEX_PRIVATE_FLAG))
fshared = ¤t->mm->mmap_sem;
switch (cmd) {
case FUTEX_WAIT:
ret = futex_wait(uaddr, fshared, val, timeout);
[...]
static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
u32 val, ktime_t *abs_time)
{
[...]
struct restart_block *restart;
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
restart->arg0 = (unsigned long)uaddr;
restart->arg1 = (unsigned long)val;
restart->arg2 = (unsigned long)abs_time;
restart->arg3 = 0;
if (fshared)
restart->arg3 |= ARG3_SHARED;
return -ERESTART_RESTARTBLOCK;
[...]
static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->arg0;
u32 val = (u32)restart->arg1;
ktime_t *abs_time = (ktime_t *)restart->arg2;
struct rw_semaphore *fshared = NULL;
restart->fn = do_no_restart_syscall;
if (restart->arg3 & ARG3_SHARED)
fshared = ¤t->mm->mmap_sem;
return (long)futex_wait(uaddr, fshared, val, abs_time);
}
So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK. The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.
This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.
I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.
The solution I did to solve this is to allocate a temporary buffer when
setting up the block and free it in futex_wait_restart. This patch
allows David's test program to actually pass.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
diff --git a/kernel/futex.c b/kernel/futex.c
index 9dc591a..74be1cb 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1288,11 +1288,27 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
return -ERESTARTSYS;
else {
struct restart_block *restart;
+ ktime_t *tmp_time = NULL;
+
+ /*
+ * abs_time is on the stack and is not a parameter.
+ * If we save it, it will be overridden on return and
+ * what is sent to futex_wait_restart will be corrupted.
+ */
+ if (abs_time) {
+ tmp_time = kmalloc(sizeof(*tmp_time), GFP_KERNEL);
+ if (unlikely(!tmp_time))
+ /* Ahh, what else can we do?? */
+ printk(KERN_WARNING
+ "Can't allocate temp timer storage\n");
+ else
+ *tmp_time = *abs_time;
+ }
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
restart->arg0 = (unsigned long)uaddr;
restart->arg1 = (unsigned long)val;
- restart->arg2 = (unsigned long)abs_time;
+ restart->arg2 = (unsigned long)tmp_time;
restart->arg3 = 0;
if (fshared)
restart->arg3 |= ARG3_SHARED;
@@ -1312,13 +1328,19 @@ static long futex_wait_restart(struct restart_block *restart)
{
u32 __user *uaddr = (u32 __user *)restart->arg0;
u32 val = (u32)restart->arg1;
- ktime_t *abs_time = (ktime_t *)restart->arg2;
+ ktime_t *tmp_time = (ktime_t *)restart->arg2;
+ ktime_t t;
struct rw_semaphore *fshared = NULL;
+ if (tmp_time) {
+ t = *tmp_time;
+ kfree(tmp_time);
+ tmp_time = &t;
+ }
restart->fn = do_no_restart_syscall;
if (restart->arg3 & ARG3_SHARED)
fshared = ¤t->mm->mmap_sem;
- return (long)futex_wait(uaddr, fshared, val, abs_time);
+ return (long)futex_wait(uaddr, fshared, val, tmp_time);
}
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] fix for futex_wait signal stack corruption 2007-12-04 20:57 [PATCH] fix for futex_wait signal stack corruption Steven Rostedt @ 2007-12-04 21:09 ` Linus Torvalds 2007-12-04 21:39 ` Steven Rostedt 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2007-12-04 21:09 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007, Steven Rostedt wrote: > > The solution I did to solve this is to allocate a temporary buffer when > setting up the block and free it in futex_wait_restart. This patch > allows David's test program to actually pass. No. Unacceptable. This is a memory leak in case nobody retries it. It's basically not how you can do this thing. The *only* thing you can pass for a system call restart is the argument block register state. If that is not enough, then you cannot restart it. It's that simple. Andrew, please do *not* put this in any queues. It's fundamentally broken, and cannot be fixed as is. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fix for futex_wait signal stack corruption 2007-12-04 21:09 ` Linus Torvalds @ 2007-12-04 21:39 ` Steven Rostedt 2007-12-04 22:43 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Steven Rostedt @ 2007-12-04 21:39 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, Dec 04, 2007 at 01:09:36PM -0800, Linus Torvalds wrote: > > > On Tue, 4 Dec 2007, Steven Rostedt wrote: > > > > The solution I did to solve this is to allocate a temporary buffer when > > setting up the block and free it in futex_wait_restart. This patch > > allows David's test program to actually pass. > > No. Unacceptable. This is a memory leak in case nobody retries it. It's > basically not how you can do this thing. Fair enough. I was misguided, that the return func had to be called. > > The *only* thing you can pass for a system call restart is the argument > block register state. If that is not enough, then you cannot restart it. > It's that simple. > > Andrew, please do *not* put this in any queues. It's fundamentally broken, > and cannot be fixed as is. Yep, trash it. Seems that arg3 is not used here and since the timer is 64 bits, we can store the bottom 32 bits in arg2 and the top in arg3 (this will work for both 32 and 64 bit archs). This will eliminate the need for kmalloc (I didn't like that solution anyway). New patch on its way (after I get some food to eat). Thanks, -- Steve ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fix for futex_wait signal stack corruption 2007-12-04 21:39 ` Steven Rostedt @ 2007-12-04 22:43 ` Linus Torvalds 2007-12-05 1:23 ` Steven Rostedt 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2007-12-04 22:43 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007, Steven Rostedt wrote: > > Seems that arg3 is not used here and since the timer is 64 bits, we can > store the bottom 32 bits in arg2 and the top in arg3 (this will work for > both 32 and 64 bit archs). Yes. That should work fine. The restart logic sometimes results in odd calling conventions, and quite frankly, we could just change how "restart_block" looks too. There is nothing that says that it has to be unsigned long arg0, arg1, arg2, arg3 and that particular layout was just picked on a whim. The only issue is: - we don't want the restart block to be *too* large, since it's part of the thread info. - but we need to have enough room for all the system calls that want to use the restart block, and preferably in a reasonable format. So far, using "unsigned long" has been good enough, in that it's big enough for a pointer and all normal arguments, but if something really deeply wants another format or a guaranteed 64-bit word regardless of architecture, we could make one or more of the arguments be "u64" instead. But in this case, since there is already unused argument space, I think that doing the "32 high bits + 32 low bits" is probably the best option. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fix for futex_wait signal stack corruption 2007-12-04 22:43 ` Linus Torvalds @ 2007-12-05 1:23 ` Steven Rostedt 2007-12-05 1:46 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Steven Rostedt @ 2007-12-05 1:23 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007, Linus Torvalds wrote: > On Tue, 4 Dec 2007, Steven Rostedt wrote: > > > > Seems that arg3 is not used here and since the timer is 64 bits, we can > > store the bottom 32 bits in arg2 and the top in arg3 (this will work for > > both 32 and 64 bit archs). > > Yes. That should work fine. Unfortunately, looking closer at the code, arg3 _is_ used. arg3 is used to set a flag whether or not the mmap_sem is shared. (I blame lack of food for not noticing this in the first place). [...] > So far, using "unsigned long" has been good enough, in that it's big > enough for a pointer and all normal arguments, but if something really > deeply wants another format or a guaranteed 64-bit word regardless of > architecture, we could make one or more of the arguments be "u64" instead. > > But in this case, since there is already unused argument space, I think > that doing the "32 high bits + 32 low bits" is probably the best option. Since arg3 is out, which do you prefer? Creating an arg4 (and perhaps more) in the block or having a u64 arg? Changing all the args to u64 may be the best. This way we don't need to worry about passing 64bit arguments, and we don't waste more space on 64 bit archs by adding more args of unsigned long. Thoughts? -- Steve ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] fix for futex_wait signal stack corruption 2007-12-05 1:23 ` Steven Rostedt @ 2007-12-05 1:46 ` Linus Torvalds 2007-12-05 3:17 ` [PATCH -v2] " Steven Rostedt 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2007-12-05 1:46 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007, Steven Rostedt wrote: > > Since arg3 is out, which do you prefer? Creating an arg4 (and perhaps > more) in the block or having a u64 arg? Changing all the args to u64 may > be the best. I suspect that the best option is probably to make that thing a unnamed union of the actual different types the different restart cases needs, which also allows you to name things appropriately and not have any wasted space. Leave the arg0-3 ones around as one of the unions, both to avoid having to change other things and to have a "generic" one for stuff that simply doesn't much care (in order to not have tons and tons of substructures to the union when most users really don't need any fancy types). I think we already depend on recent-enough gcc's that unnamed unions are ok and we don't need to play games with naming. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 1:46 ` Linus Torvalds @ 2007-12-05 3:17 ` Steven Rostedt 2007-12-05 3:41 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Steven Rostedt @ 2007-12-05 3:17 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton David Holmes found a bug in the RT patch with respect to pthread_cond_timedwait. After trying his test program on the latest git from mainline, I found the bug was there too. The bug he was seeing that his test program showed, was that if one were to do a "Ctrl-Z" on a process that was in the pthread_cond_timedwait, and then did a "bg" on that process, it would return with a "-ETIMEDOUT" but early. That is, the timer would go off early. Looking into this, I found the source of the problem. And it is a rather nasty bug at that. Here's the relevant code from kernel/futex.c: (not in order in the file) [...] smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val, struct timespec __user *utime, u32 __user *uaddr2, u32 val3) { struct timespec ts; ktime_t t, *tp = NULL; u32 val2 = 0; int cmd = op & FUTEX_CMD_MASK; if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) { if (copy_from_user(&ts, utime, sizeof(ts)) != 0) return -EFAULT; if (!timespec_valid(&ts)) return -EINVAL; t = timespec_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add(ktime_get(), t); tp = &t; } [...] return do_futex(uaddr, op, val, tp, uaddr2, val2, val3); } [...] long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { int ret; int cmd = op & FUTEX_CMD_MASK; struct rw_semaphore *fshared = NULL; if (!(op & FUTEX_PRIVATE_FLAG)) fshared = ¤t->mm->mmap_sem; switch (cmd) { case FUTEX_WAIT: ret = futex_wait(uaddr, fshared, val, timeout); [...] static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, u32 val, ktime_t *abs_time) { [...] struct restart_block *restart; restart = ¤t_thread_info()->restart_block; restart->fn = futex_wait_restart; restart->arg0 = (unsigned long)uaddr; restart->arg1 = (unsigned long)val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = 0; if (fshared) restart->arg3 |= ARG3_SHARED; return -ERESTART_RESTARTBLOCK; [...] static long futex_wait_restart(struct restart_block *restart) { u32 __user *uaddr = (u32 __user *)restart->arg0; u32 val = (u32)restart->arg1; ktime_t *abs_time = (ktime_t *)restart->arg2; struct rw_semaphore *fshared = NULL; restart->fn = do_no_restart_syscall; if (restart->arg3 & ARG3_SHARED) fshared = ¤t->mm->mmap_sem; return (long)futex_wait(uaddr, fshared, val, abs_time); } So when the futex_wait is interrupt by a signal we break out of the hrtimer code and set up or return from signal. This code does not return back to userspace, so we set up a RESTARTBLOCK. The bug here is that we save the "abs_time" which is a pointer to the stack variable "ktime_t t" from sys_futex. This returns and unwinds the stack before we get to call our signal. On return from the signal we go to futex_wait_restart, where we update all the parameters for futex_wait and call it. But here we have a problem where abs_time is no longer valid. I verified this with print statements, and sure enough, what abs_time was set to ends up being garbage when we get to futex_wait_restart. The solution I did to solve this (with input from Linus Torvalds) was to add unions to the restart_block to allow system calls to use the restart with specific parameters. This way the futex code now saves the time in a 64bit value in the restart block instead of storing it on the stack. Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 in thread_info.h, when there's a #ifdef __KERNEL__ just below that. Not sure what that is there for. If this turns out to be a problem, I've tested this with using "unsigned int" for u32 and "unsigned long long" for u64 and it worked just the same. I'm using u32 and u64 just to be consistent with what the futex code uses. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> include/linux/thread_info.h | 15 ++++++++++++++- kernel/futex.c | 25 +++++++++++++------------ 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 1c4eb41..d97c874 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -7,12 +7,25 @@ #ifndef _LINUX_THREAD_INFO_H #define _LINUX_THREAD_INFO_H +#include <linux/types.h> + /* * System call restart block. */ struct restart_block { long (*fn)(struct restart_block *); - unsigned long arg0, arg1, arg2, arg3; + union { + struct { + unsigned long arg0, arg1, arg2, arg3; + }; + /* For futex_wait */ + struct { + u32 *uaddr; + u32 val; + u32 flags; + u64 time; + } fu; + }; }; extern long do_no_restart_syscall(struct restart_block *parm); diff --git a/kernel/futex.c b/kernel/futex.c index 9dc591a..ad3b6e3 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1149,9 +1149,9 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, /* * In case we must use restart_block to restart a futex_wait, - * we encode in the 'arg3' shared capability + * we encode in the 'flags' shared capability */ -#define ARG3_SHARED 1 +#define FLAGS_SHARED 1 static long futex_wait_restart(struct restart_block *restart); @@ -1290,12 +1290,13 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, struct restart_block *restart; restart = ¤t_thread_info()->restart_block; restart->fn = futex_wait_restart; - restart->arg0 = (unsigned long)uaddr; - restart->arg1 = (unsigned long)val; - restart->arg2 = (unsigned long)abs_time; - restart->arg3 = 0; + restart->fu.uaddr = (u32*)uaddr; + restart->fu.val = val; + restart->fu.time = abs_time->tv64; + restart->fu.flags = 0; + if (fshared) - restart->arg3 |= ARG3_SHARED; + restart->fu.flags |= FLAGS_SHARED; return -ERESTART_RESTARTBLOCK; } @@ -1310,15 +1311,15 @@ static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, static long futex_wait_restart(struct restart_block *restart) { - u32 __user *uaddr = (u32 __user *)restart->arg0; - u32 val = (u32)restart->arg1; - ktime_t *abs_time = (ktime_t *)restart->arg2; + u32 __user *uaddr = (u32 __user *)restart->fu.uaddr; struct rw_semaphore *fshared = NULL; + ktime_t t; + t.tv64 = restart->fu.time; restart->fn = do_no_restart_syscall; - if (restart->arg3 & ARG3_SHARED) + if (restart->fu.flags & FLAGS_SHARED) fshared = ¤t->mm->mmap_sem; - return (long)futex_wait(uaddr, fshared, val, abs_time); + return (long)futex_wait(uaddr, fshared, restart->fu.val, &t); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 3:17 ` [PATCH -v2] " Steven Rostedt @ 2007-12-05 3:41 ` Linus Torvalds 2007-12-05 3:47 ` Randy Dunlap ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Linus Torvalds @ 2007-12-05 3:41 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton Patch looks fine to me. On Tue, 4 Dec 2007, Steven Rostedt wrote: > > Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 > in thread_info.h, when there's a #ifdef __KERNEL__ just below that. > Not sure what that is there for. Hmm. I'd not expect user-mode headers to ever include <linux/thread-info.h>, and if they do, they'd already get get totally invalid namespace pollution ("struct restart_block" at a minimum) along with stuff that simply isn't sensible in user-space at all, so I think this part is fine. And I guess somebody will scream if it bites them ;) Anyway, my gut feel is that this is potentially a real problem, and we should fix it asap (ie it should go into 2.6.24 even at this late stage in the game), but it would be nice to know if the problem actually hit any actual real program, and not just a test-setup. So here's a question for David Holmes: What caused you to actually notice this behaviour? Can this actually be seen in real life usage? Anyway, at a minimum, here's an Acked-by: Linus Torvalds <torvalds@linux-foundation.org> and I suspect I should just apply it directly. Any comments from anybody else? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 3:41 ` Linus Torvalds @ 2007-12-05 3:47 ` Randy Dunlap 2007-12-05 3:53 ` Steven Rostedt 2007-12-05 5:33 ` David Holmes - Sun Microsystems 2007-12-05 5:54 ` Thomas Gleixner 2 siblings, 1 reply; 15+ messages in thread From: Randy Dunlap @ 2007-12-05 3:47 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007 19:41:32 -0800 (PST) Linus Torvalds wrote: > > Patch looks fine to me. > > On Tue, 4 Dec 2007, Steven Rostedt wrote: > > > > Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 > > in thread_info.h, when there's a #ifdef __KERNEL__ just below that. > > Not sure what that is there for. > > Hmm. I'd not expect user-mode headers to ever include > <linux/thread-info.h>, and if they do, they'd already get get totally > invalid namespace pollution ("struct restart_block" at a minimum) along > with stuff that simply isn't sensible in user-space at all, so I think > this part is fine. > > And I guess somebody will scream if it bites them ;) > > Anyway, my gut feel is that this is potentially a real problem, and we > should fix it asap (ie it should go into 2.6.24 even at this late stage in > the game), but it would be nice to know if the problem actually hit any > actual real program, and not just a test-setup. Steve/David, where can I find the test case, please? > So here's a question for David Holmes: What caused you to actually notice > this behaviour? Can this actually be seen in real life usage? > > Anyway, at a minimum, here's an > > Acked-by: Linus Torvalds <torvalds@linux-foundation.org> > > and I suspect I should just apply it directly. Any comments from anybody > else? --- ~Randy Features and documentation: http://lwn.net/Articles/260136/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 3:47 ` Randy Dunlap @ 2007-12-05 3:53 ` Steven Rostedt 0 siblings, 0 replies; 15+ messages in thread From: Steven Rostedt @ 2007-12-05 3:53 UTC (permalink / raw) To: Randy Dunlap Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, LKML, David Holmes - Sun Microsystems, Andrew Morton On Tue, 4 Dec 2007, Randy Dunlap wrote: > > Steve/David, > > where can I find the test case, please? David attached it to the RH Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=408321 -- Steve ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 3:41 ` Linus Torvalds 2007-12-05 3:47 ` Randy Dunlap @ 2007-12-05 5:33 ` David Holmes - Sun Microsystems 2007-12-05 6:06 ` Linus Torvalds 2007-12-05 5:54 ` Thomas Gleixner 2 siblings, 1 reply; 15+ messages in thread From: David Holmes - Sun Microsystems @ 2007-12-05 5:33 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton Linus Torvalds said the following on 5/12/07 01:41 PM: > So here's a question for David Holmes: What caused you to actually notice > this behaviour? Can this actually be seen in real life usage? We observed an application "hang" that turned out to be caused by a clock mismatch between that used with the pthread_cond_t and that used to convert a relative wait time to an absolute one. When the program ran in the foreground and hung I used ctrl-Z to suspend it then "bg" to background it. As soon as I did that the application became unstuck. While this was observed with process control signals, my concern was that other signals might cause pthread_cond_timedwait to return immediately in the same way. The test program allows for SIGUSR1 and SIGRTMIN testing as well, but these other signals did not cause the immediate return. But it would seem from Steven's analysis that this is just a fortuitous result. If I understand things correctly, any interruption of pthread_cond_timedwait by a signal, could result in waiting until an arbitrary time - depending on how the stack value was corrupted. Is that correct? Thanks, David Holmes Senior Java Technologist Java SE VM Real-time and Embedded Group --------------------------------------- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 5:33 ` David Holmes - Sun Microsystems @ 2007-12-05 6:06 ` Linus Torvalds 2007-12-05 6:14 ` David Holmes - Sun Microsystems 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2007-12-05 6:06 UTC (permalink / raw) To: David Holmes - Sun Microsystems Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton On Wed, 5 Dec 2007, David Holmes - Sun Microsystems wrote: > > While this was observed with process control signals, my concern was that > other signals might cause pthread_cond_timedwait to return immediately in the > same way. The test program allows for SIGUSR1 and SIGRTMIN testing as well, > but these other signals did not cause the immediate return. But it would seem > from Steven's analysis that this is just a fortuitous result. If I understand > things correctly, any interruption of pthread_cond_timedwait by a signal, > could result in waiting until an arbitrary time - depending on how the stack > value was corrupted. Is that correct? No, very few things can actually cause the restart_block path to be taken. An actual signal execution would turn that into an EINTR, the only case that should ever trigger this is a signal that causes some kernel action (ie the system call *is* interrupted), but does not actually result in any user-visible state changes. The classic case is ^Z + bg, but iirc you can trigger it with ptrace too. And I think two threads racing to pick up the same signal can cause it too, for that matter (ie one thread takes the signal, the other one got interrupted but there's nothing there, so it just causes a system call restart). There's basically two different system call restart mechanisms in the kernel: - returning -ERESTARTNOHAND will cause the system call to be restarted with the *original* arguments if no signal handler was actually invoked. This has been around for a long time, and is used by a lot of system calls. It's fine for things that are idempotent, ie the argument meaning doesn't change over time (things like a "read()" system call, for example) - the "restart_block" model that returns -ERESTARTBLOCK, which will cause the system call to be restarted with the arguments specified in the system call restart block. This is for system calls that are *not* idempotent, ie the argument might be a relative timeout or something like that, where we need to actually behave *differently* when restarting. The latter case is "new" (it's been around for a while, but relative to the ERESTARTNOHAND one), and it relies on the system call itself setting up its restart point and the argument save area. And each such system call can obviously screw it up by saving/restoring the arguments with the incorrect semantics. So this bug was really (a) specific to that particular futex restart mechanism, and (b) only triggers for the (rather unusual) case where the system call gets interrupted by a signal, but no signal handler actually happens. In practice, ^Z is the most common case by far (other signals are either ignored and don't even cause an interrupt event in the first place, or they are "real" signals, and cause a signal handler to be invoked). Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 6:06 ` Linus Torvalds @ 2007-12-05 6:14 ` David Holmes - Sun Microsystems 0 siblings, 0 replies; 15+ messages in thread From: David Holmes - Sun Microsystems @ 2007-12-05 6:14 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, LKML, Andrew Morton Thanks for clarifying that Linus. Regards, David Holmes Linus Torvalds said the following on 5/12/07 04:06 PM: > > On Wed, 5 Dec 2007, David Holmes - Sun Microsystems wrote: >> While this was observed with process control signals, my concern was that >> other signals might cause pthread_cond_timedwait to return immediately in the >> same way. The test program allows for SIGUSR1 and SIGRTMIN testing as well, >> but these other signals did not cause the immediate return. But it would seem >> from Steven's analysis that this is just a fortuitous result. If I understand >> things correctly, any interruption of pthread_cond_timedwait by a signal, >> could result in waiting until an arbitrary time - depending on how the stack >> value was corrupted. Is that correct? > > No, very few things can actually cause the restart_block path to be taken. > An actual signal execution would turn that into an EINTR, the only case > that should ever trigger this is a signal that causes some kernel action > (ie the system call *is* interrupted), but does not actually result in any > user-visible state changes. > > The classic case is ^Z + bg, but iirc you can trigger it with ptrace too. > And I think two threads racing to pick up the same signal can cause it > too, for that matter (ie one thread takes the signal, the other one got > interrupted but there's nothing there, so it just causes a system call > restart). > > There's basically two different system call restart mechanisms in the > kernel: > > - returning -ERESTARTNOHAND will cause the system call to be restarted > with the *original* arguments if no signal handler was actually > invoked. This has been around for a long time, and is used by a lot of > system calls. It's fine for things that are idempotent, ie the argument > meaning doesn't change over time (things like a "read()" system call, > for example) > > - the "restart_block" model that returns -ERESTARTBLOCK, which will cause > the system call to be restarted with the arguments specified in the > system call restart block. This is for system calls that are *not* > idempotent, ie the argument might be a relative timeout or something > like that, where we need to actually behave *differently* when > restarting. > > The latter case is "new" (it's been around for a while, but relative to > the ERESTARTNOHAND one), and it relies on the system call itself setting > up its restart point and the argument save area. And each such system call > can obviously screw it up by saving/restoring the arguments with the > incorrect semantics. > > So this bug was really (a) specific to that particular futex restart > mechanism, and (b) only triggers for the (rather unusual) case where the > system call gets interrupted by a signal, but no signal handler actually > happens. In practice, ^Z is the most common case by far (other signals are > either ignored and don't even cause an interrupt event in the first place, > or they are "real" signals, and cause a signal handler to be invoked). > > Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 3:41 ` Linus Torvalds 2007-12-05 3:47 ` Randy Dunlap 2007-12-05 5:33 ` David Holmes - Sun Microsystems @ 2007-12-05 5:54 ` Thomas Gleixner 2007-12-05 10:31 ` Ingo Molnar 2 siblings, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2007-12-05 5:54 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, Ingo Molnar, LKML, David Holmes - Sun Microsystems, Andrew Morton, Stable Team On Tue, 4 Dec 2007, Linus Torvalds wrote: > > Patch looks fine to me. > > On Tue, 4 Dec 2007, Steven Rostedt wrote: > > > > Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 > > in thread_info.h, when there's a #ifdef __KERNEL__ just below that. > > Not sure what that is there for. > > Hmm. I'd not expect user-mode headers to ever include > <linux/thread-info.h>, and if they do, they'd already get get totally > invalid namespace pollution ("struct restart_block" at a minimum) along > with stuff that simply isn't sensible in user-space at all, so I think > this part is fine. > > And I guess somebody will scream if it bites them ;) > > Anyway, my gut feel is that this is potentially a real problem, and we > should fix it asap (ie it should go into 2.6.24 even at this late stage in > the game), but it would be nice to know if the problem actually hit any > actual real program, and not just a test-setup. > > So here's a question for David Holmes: What caused you to actually notice > this behaviour? Can this actually be seen in real life usage? > > Anyway, at a minimum, here's an > > Acked-by: Linus Torvalds <torvalds@linux-foundation.org> > > and I suspect I should just apply it directly. Any comments from anybody > else? Doh, yes. I completely missed that stack dependency of the pointer when I looked at the patch back then. The solution looks solid and probably we should get rid of the unnamed union member and fixup the other places which use restart_block in a similar way. Just a minor nit. Can we please use "futex" instead of "fu" ? I'm just envisioning the next union member named "ba". Acked-by: Thomas Gleixner <tglx@linutronix.de> Please apply with the s/fu/futex/ change. This needs to go into stable .22/.23 as well. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2] fix for futex_wait signal stack corruption 2007-12-05 5:54 ` Thomas Gleixner @ 2007-12-05 10:31 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2007-12-05 10:31 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Steven Rostedt, LKML, David Holmes - Sun Microsystems, Andrew Morton, Stable Team * Thomas Gleixner <tglx@linutronix.de> wrote: > Just a minor nit. Can we please use "futex" instead of "fu" ? I'm just > envisioning the next union member named "ba". > > Acked-by: Thomas Gleixner <tglx@linutronix.de> > > Please apply with the s/fu/futex/ change. This needs to go into stable > .22/.23 as well. ok, i've tied it all up, collected the Acks and will push Steve's fix to Linus via the scheduler git tree - is that fine with everyone? I think there's enough time to get this tested - Linus's latest tree shows up in Fedora rawhide in 1-2 days and tons of apps use futexes. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-12-05 10:31 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-04 20:57 [PATCH] fix for futex_wait signal stack corruption Steven Rostedt 2007-12-04 21:09 ` Linus Torvalds 2007-12-04 21:39 ` Steven Rostedt 2007-12-04 22:43 ` Linus Torvalds 2007-12-05 1:23 ` Steven Rostedt 2007-12-05 1:46 ` Linus Torvalds 2007-12-05 3:17 ` [PATCH -v2] " Steven Rostedt 2007-12-05 3:41 ` Linus Torvalds 2007-12-05 3:47 ` Randy Dunlap 2007-12-05 3:53 ` Steven Rostedt 2007-12-05 5:33 ` David Holmes - Sun Microsystems 2007-12-05 6:06 ` Linus Torvalds 2007-12-05 6:14 ` David Holmes - Sun Microsystems 2007-12-05 5:54 ` Thomas Gleixner 2007-12-05 10:31 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox