public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: + execve-must-clear-current-clear_child_tid.patch added to -mm tree
       [not found] <200907312142.n6VLgKfx021454@imap1.linux-foundation.org>
@ 2009-07-31 22:29 ` Oleg Nesterov
  2009-08-01  0:38   ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2009-07-31 22:29 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, eric.dumazet, drepper, jens, mingo, peterz,
	sonnyrao, stable, tglx, torvalds

> From: Eric Dumazet <eric.dumazet@gmail.com>
>
> diff -puN fs/compat.c~execve-must-clear-current-clear_child_tid fs/compat.c
> --- a/fs/compat.c~execve-must-clear-current-clear_child_tid
> +++ a/fs/compat.c
> @@ -1550,6 +1550,7 @@ int compat_do_execve(char * filename,
>  	mutex_unlock(&current->cred_guard_mutex);
>  	acct_update_integrals(current);
>  	free_bprm(bprm);
> +	current->clear_child_tid = NULL;
>  	if (displaced)
>  		put_files_struct(displaced);
>  	return retval;
> diff -puN fs/exec.c~execve-must-clear-current-clear_child_tid fs/exec.c
> --- a/fs/exec.c~execve-must-clear-current-clear_child_tid
> +++ a/fs/exec.c
> @@ -1343,6 +1343,7 @@ int do_execve(char * filename,
>  	mutex_unlock(&current->cred_guard_mutex);
>  	acct_update_integrals(current);
>  	free_bprm(bprm);
> +	current->clear_child_tid = NULL;
>  	if (displaced)
>  		put_files_struct(displaced);
>  	return retval;

Perhaps it is better to change mm_release() ? It has to play with
->clear_child_tid anyway.

Oleg.

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -568,18 +568,18 @@ void mm_release(struct task_struct *tsk,
 	 * the value intact in a core dump, and to save the unnecessary
 	 * trouble otherwise.  Userland only wants this done for a sys_exit.
 	 */
-	if (tsk->clear_child_tid
-	    && !(tsk->flags & PF_SIGNALED)
-	    && atomic_read(&mm->mm_users) > 1) {
-		u32 __user * tidptr = tsk->clear_child_tid;
+	if (tsk->clear_child_tid) {
+		if (!(tsk->flags & PF_SIGNALED) &&
+		    atomic_read(&mm->mm_users) > 1) {
+			/*
+			 * We don't check the error code - if userspace has
+			 * not set up a proper pointer then tough luck.
+			 */
+			put_user(0, tsk->clear_child_tid);
+			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
+					1, NULL, NULL, 0);
+		}
 		tsk->clear_child_tid = NULL;
-
-		/*
-		 * We don't check the error code - if userspace has
-		 * not set up a proper pointer then tough luck.
-		 */
-		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
 	}
 }
 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + execve-must-clear-current-clear_child_tid.patch added to -mm tree
  2009-07-31 22:29 ` + execve-must-clear-current-clear_child_tid.patch added to -mm tree Oleg Nesterov
@ 2009-08-01  0:38   ` Linus Torvalds
  2009-08-01  0:51     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2009-08-01  0:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: akpm, linux-kernel, eric.dumazet, drepper, jens, mingo, peterz,
	sonnyrao, stable, tglx



On Sat, 1 Aug 2009, Oleg Nesterov wrote:
> 
> Perhaps it is better to change mm_release() ? It has to play with
> ->clear_child_tid anyway.

Ahh. I take back my previous Ack. Your patch is better. I'll ack that 
instead.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: + execve-must-clear-current-clear_child_tid.patch added to -mm tree
  2009-08-01  0:38   ` Linus Torvalds
@ 2009-08-01  0:51     ` Andrew Morton
  2009-08-01  1:54       ` [PATCH v2] execve: must clear current->clear_child_tid Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-08-01  0:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, linux-kernel, eric.dumazet, drepper, jens, mingo,
	peterz, sonnyrao, stable, tglx

On Fri, 31 Jul 2009 17:38:14 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Sat, 1 Aug 2009, Oleg Nesterov wrote:
> > 
> > Perhaps it is better to change mm_release() ? It has to play with
> > ->clear_child_tid anyway.
> 
> Ahh. I take back my previous Ack. Your patch is better. I'll ack that 
> instead.
> 

'k, thanks.  I shall compulsively watch my inbox awaiting the signed-off
and tested version ;)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] execve: must clear current->clear_child_tid
  2009-08-01  0:51     ` Andrew Morton
@ 2009-08-01  1:54       ` Oleg Nesterov
  2009-08-01  6:12         ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2009-08-01  1:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, linux-kernel, eric.dumazet, drepper, jens, mingo,
	peterz, sonnyrao, stable, tglx

On 07/31, Andrew Morton wrote:
>
> On Fri, 31 Jul 2009 17:38:14 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> >
> >
> > On Sat, 1 Aug 2009, Oleg Nesterov wrote:
> > >
> > > Perhaps it is better to change mm_release() ? It has to play with
> > > ->clear_child_tid anyway.
> >
> > Ahh. I take back my previous Ack. Your patch is better. I'll ack that
> > instead.
> >
>
> 'k, thanks.  I shall compulsively watch my inbox awaiting the signed-off
> and tested version ;)

I did some testing, but didn't try to check if this patches fixes the
origianal problem. It obviously should... Still I removed Tested-by tag.
But added Linus's ack, the patch is the same.

------------------------------------------------------------------------------
[PATCH v2] execve: must clear current->clear_child_tid

From: Eric Dumazet <eric.dumazet@gmail.com>

While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.  This
support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current->clear_child_tid.

At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call that
   glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
   ...) syscall

2) When new process starts, its current->clear_child_tid is set to a
   location that has a meaning only in bash (or initial program) context
   (&THREAD_SELF->tid)

3) This new process does the execve() syscall to start a new program. 
   current->clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
   kernel will attempt to clear the integer pointed by
   current->clear_child_tid from mm_release() :

        if (tsk->clear_child_tid
            && !(tsk->flags & PF_SIGNALED)
            && atomic_read(&mm->mm_users) > 1) {
                u32 __user * tidptr = tsk->clear_child_tid;
                tsk->clear_child_tid = NULL;

                /*
                 * We don't check the error code - if userspace has
                 * not set up a proper pointer then tough luck.
                 */
<< here >>      put_user(0, tidptr);
                sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
        }

5) OR : if new program is not multi-threaded, but spied by /proc/pid
   users (ps command for example), mm_users > 1, and the exiting program
   could corrupt 4 bytes in a persistent memory area (shm or memory mapped
   file)

If current->clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom <jens@mcbone.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 kernel/fork.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

--- WAIT/kernel/fork.c~CLEARTID	2009-07-02 19:27:36.000000000 +0200
+++ WAIT/kernel/fork.c	2009-08-01 03:36:59.000000000 +0200
@@ -568,18 +568,18 @@ void mm_release(struct task_struct *tsk,
 	 * the value intact in a core dump, and to save the unnecessary
 	 * trouble otherwise.  Userland only wants this done for a sys_exit.
 	 */
-	if (tsk->clear_child_tid
-	    && !(tsk->flags & PF_SIGNALED)
-	    && atomic_read(&mm->mm_users) > 1) {
-		u32 __user * tidptr = tsk->clear_child_tid;
+	if (tsk->clear_child_tid) {
+		if (!(tsk->flags & PF_SIGNALED) &&
+		    atomic_read(&mm->mm_users) > 1) {
+			/*
+			 * We don't check the error code - if userspace has
+			 * not set up a proper pointer then tough luck.
+			 */
+			put_user(0, tsk->clear_child_tid);
+			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
+					1, NULL, NULL, 0);
+		}
 		tsk->clear_child_tid = NULL;
-
-		/*
-		 * We don't check the error code - if userspace has
-		 * not set up a proper pointer then tough luck.
-		 */
-		put_user(0, tidptr);
-		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
 	}
 }
 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] execve: must clear current->clear_child_tid
  2009-08-01  1:54       ` [PATCH v2] execve: must clear current->clear_child_tid Oleg Nesterov
@ 2009-08-01  6:12         ` Eric Dumazet
  2009-08-01  6:44           ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-08-01  6:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, drepper, jens, mingo,
	peterz, sonnyrao, stable, tglx

Oleg Nesterov a écrit :
> On 07/31, Andrew Morton wrote:
>> On Fri, 31 Jul 2009 17:38:14 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>>
>>> On Sat, 1 Aug 2009, Oleg Nesterov wrote:
>>>> Perhaps it is better to change mm_release() ? It has to play with
>>>> ->clear_child_tid anyway.
>>> Ahh. I take back my previous Ack. Your patch is better. I'll ack that
>>> instead.
>>>
>> 'k, thanks.  I shall compulsively watch my inbox awaiting the signed-off
>> and tested version ;)
> 
> I did some testing, but didn't try to check if this patches fixes the
> origianal problem. It obviously should... Still I removed Tested-by tag.
> But added Linus's ack, the patch is the same.
> 
> ------------------------------------------------------------------------------
> [PATCH v2] execve: must clear current->clear_child_tid
> 
> From: Eric Dumazet <eric.dumazet@gmail.com>
> 
> While looking at Jens Rosenboom bug report
> (http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
> a dying "ps" program, we found following problem.
> 
> clone() syscall has special support for TID of created threads.  This
> support includes two features.
> 
> One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
> TID value.
> 
> One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
> thread dies.
> 
> The integer location is a user provided pointer, provided at clone()
> time.
> 
> kernel keeps this pointer value into current->clear_child_tid.
> 
> At execve() time, we should make sure kernel doesnt keep this user
> provided pointer, as full user memory is replaced by a new one.
> 
> As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
> CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
> memory in forked processes.
> 
> Following sequence could happen:
> 
> 1) bash (or any program) starts a new process, by a fork() call that
>    glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
>    ...) syscall
> 
> 2) When new process starts, its current->clear_child_tid is set to a
>    location that has a meaning only in bash (or initial program) context
>    (&THREAD_SELF->tid)
> 
> 3) This new process does the execve() syscall to start a new program. 
>    current->clear_child_tid is left unchanged (a non NULL value)
> 
> 4) If this new program creates some threads, and initial thread exits,
>    kernel will attempt to clear the integer pointed by
>    current->clear_child_tid from mm_release() :
> 
>         if (tsk->clear_child_tid
>             && !(tsk->flags & PF_SIGNALED)
>             && atomic_read(&mm->mm_users) > 1) {
>                 u32 __user * tidptr = tsk->clear_child_tid;
>                 tsk->clear_child_tid = NULL;
> 
>                 /*
>                  * We don't check the error code - if userspace has
>                  * not set up a proper pointer then tough luck.
>                  */
> << here >>      put_user(0, tidptr);
>                 sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
>         }
> 
> 5) OR : if new program is not multi-threaded, but spied by /proc/pid
>    users (ps command for example), mm_users > 1, and the exiting program
>    could corrupt 4 bytes in a persistent memory area (shm or memory mapped
>    file)
> 
> If current->clear_child_tid points to a writeable portion of memory of the
> new program, kernel happily and silently corrupts 4 bytes of memory, with
> unexpected effects.
> 
> Fix is straightforward and should not break any sane program.
> 
> Reported-by: Jens Rosenboom <jens@mcbone.net>
> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> 
>  kernel/fork.c |   22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> --- WAIT/kernel/fork.c~CLEARTID	2009-07-02 19:27:36.000000000 +0200
> +++ WAIT/kernel/fork.c	2009-08-01 03:36:59.000000000 +0200
> @@ -568,18 +568,18 @@ void mm_release(struct task_struct *tsk,
>  	 * the value intact in a core dump, and to save the unnecessary
>  	 * trouble otherwise.  Userland only wants this done for a sys_exit.
>  	 */
> -	if (tsk->clear_child_tid
> -	    && !(tsk->flags & PF_SIGNALED)
> -	    && atomic_read(&mm->mm_users) > 1) {
> -		u32 __user * tidptr = tsk->clear_child_tid;
> +	if (tsk->clear_child_tid) {
> +		if (!(tsk->flags & PF_SIGNALED) &&
> +		    atomic_read(&mm->mm_users) > 1) {
> +			/*
> +			 * We don't check the error code - if userspace has
> +			 * not set up a proper pointer then tough luck.
> +			 */
> +			put_user(0, tsk->clear_child_tid);
> +			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
> +					1, NULL, NULL, 0);
> +		}
>  		tsk->clear_child_tid = NULL;
> -
> -		/*
> -		 * We don't check the error code - if userspace has
> -		 * not set up a proper pointer then tough luck.
> -		 */
> -		put_user(0, tidptr);
> -		sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
>  	}
>  }
>  
> 

Thanks Oleg, you are right this seems cleaner.

I only wonder about core dumping, since mm_release() is also used by exiting tasks.

Isnt clear_child_tid used by gdb or other debugger ?
(The tid value is carefuly untouched in case of a core dump, but maybe
gdb needs the current->clear_child_tid for whatever reason, for example
to get TID address in user memory ?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] execve: must clear current->clear_child_tid
  2009-08-01  6:12         ` Eric Dumazet
@ 2009-08-01  6:44           ` Oleg Nesterov
  2009-08-01  7:52             ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2009-08-01  6:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, drepper, jens, mingo,
	peterz, sonnyrao, stable, tglx

On 08/01, Eric Dumazet wrote:
>
> I only wonder about core dumping, since mm_release() is also used by exiting tasks.
>
> Isnt clear_child_tid used by gdb or other debugger ?

Afaics it is not...

At least, I can't see how gdb (or any other user-space app) can figure
out the value of ->clear_child_tid.



Not that this really matters, but please note also that it is possible
that the coredumping task has ->clear_child_tid == NULL anyway, even
without this change. The PF_SIGNALED check in mm_release() is not 100&
reliable.

Suppose a thread T sleeps in do_exit()->ptrace_event(PT_TRACE_EXIT) path.
Another thread starts a coredump and kills T via zap_process(). This wakes
up T, it calls exit_mm()->mm_release() without PF_SIGNALED.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] execve: must clear current->clear_child_tid
  2009-08-01  6:44           ` Oleg Nesterov
@ 2009-08-01  7:52             ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2009-08-01  7:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Linus Torvalds, linux-kernel, drepper, jens, mingo,
	peterz, sonnyrao, stable, tglx

Oleg Nesterov a écrit :
> On 08/01, Eric Dumazet wrote:
>> I only wonder about core dumping, since mm_release() is also used by exiting tasks.
>>
>> Isnt clear_child_tid used by gdb or other debugger ?
> 
> Afaics it is not...
> 
> At least, I can't see how gdb (or any other user-space app) can figure
> out the value of ->clear_child_tid.
> 

Yep, this was from old Unixes time, when core file included the 'u' structure ...

Dont worry ;)


> 
> 
> Not that this really matters, but please note also that it is possible
> that the coredumping task has ->clear_child_tid == NULL anyway, even
> without this change. The PF_SIGNALED check in mm_release() is not 100&
> reliable.
> 
> Suppose a thread T sleeps in do_exit()->ptrace_event(PT_TRACE_EXIT) path.
> Another thread starts a coredump and kills T via zap_process(). This wakes
> up T, it calls exit_mm()->mm_release() without PF_SIGNALED.
> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-08-01  7:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200907312142.n6VLgKfx021454@imap1.linux-foundation.org>
2009-07-31 22:29 ` + execve-must-clear-current-clear_child_tid.patch added to -mm tree Oleg Nesterov
2009-08-01  0:38   ` Linus Torvalds
2009-08-01  0:51     ` Andrew Morton
2009-08-01  1:54       ` [PATCH v2] execve: must clear current->clear_child_tid Oleg Nesterov
2009-08-01  6:12         ` Eric Dumazet
2009-08-01  6:44           ` Oleg Nesterov
2009-08-01  7:52             ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox