* Re: SMP races in proc with thread_struct
2001-05-04 12:34 ` Todd Inglett
@ 2001-05-04 12:46 ` Keith Owens
2001-05-04 13:11 ` Andreas Schwab
2001-05-04 14:21 ` Andreas Ferber
2001-05-04 16:04 ` Alexander Viro
2001-05-04 17:52 ` [PATCH][RFC] " Alexander Viro
2 siblings, 2 replies; 12+ messages in thread
From: Keith Owens @ 2001-05-04 12:46 UTC (permalink / raw)
To: Todd Inglett; +Cc: Alexander Viro, linux-kernel
On Fri, 04 May 2001 07:34:20 -0500,
Todd Inglett <tinglett@vnet.ibm.com> wrote:
>But this is where hell breaks loose. Every process has a valid parent
>-- unless it is dead and nobody cares. Process N has already exited and
>released from the tasklist while its parent was still alive. There was
>no reason to reparent it. It just got released. So N's task_struct has
>a dangling ptr to its parent. Nobody is holding the parent task_struct,
>either. When the parent died memory for its task_struct was released.
>This is ungood.
Wrap the reference to the parent task structure with exception table
recovery code, like copy_from_user(). If the dangling reference points
to valid memory the worst that will happen is that you read and report
gibberish for one output. If the reference causes an exception then
recover and treat it as NULL. For a read only case, the only important
thing is not to die, one occurrence of bad data is tolerable.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 12:46 ` Keith Owens
@ 2001-05-04 13:11 ` Andreas Schwab
2001-05-04 13:38 ` Brian Gerst
2001-05-04 23:27 ` Keith Owens
2001-05-04 14:21 ` Andreas Ferber
1 sibling, 2 replies; 12+ messages in thread
From: Andreas Schwab @ 2001-05-04 13:11 UTC (permalink / raw)
To: Keith Owens; +Cc: Todd Inglett, Alexander Viro, linux-kernel
Keith Owens <kaos@ocs.com.au> writes:
|> On Fri, 04 May 2001 07:34:20 -0500,
|> Todd Inglett <tinglett@vnet.ibm.com> wrote:
|> >But this is where hell breaks loose. Every process has a valid parent
|> >-- unless it is dead and nobody cares. Process N has already exited and
|> >released from the tasklist while its parent was still alive. There was
|> >no reason to reparent it. It just got released. So N's task_struct has
|> >a dangling ptr to its parent. Nobody is holding the parent task_struct,
|> >either. When the parent died memory for its task_struct was released.
|> >This is ungood.
|>
|> Wrap the reference to the parent task structure with exception table
|> recovery code, like copy_from_user().
Exception tables only protect accesses to user virtual memory. Kernel
memory references must always be valid in the first place.
Andreas.
--
Andreas Schwab "And now for something
SuSE Labs completely different."
Andreas.Schwab@suse.de
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 13:11 ` Andreas Schwab
@ 2001-05-04 13:38 ` Brian Gerst
2001-05-04 23:27 ` Keith Owens
1 sibling, 0 replies; 12+ messages in thread
From: Brian Gerst @ 2001-05-04 13:38 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Keith Owens, Todd Inglett, Alexander Viro, linux-kernel
Andreas Schwab wrote:
>
> Keith Owens <kaos@ocs.com.au> writes:
>
> |> On Fri, 04 May 2001 07:34:20 -0500,
> |> Todd Inglett <tinglett@vnet.ibm.com> wrote:
> |> >But this is where hell breaks loose. Every process has a valid parent
> |> >-- unless it is dead and nobody cares. Process N has already exited and
> |> >released from the tasklist while its parent was still alive. There was
> |> >no reason to reparent it. It just got released. So N's task_struct has
> |> >a dangling ptr to its parent. Nobody is holding the parent task_struct,
> |> >either. When the parent died memory for its task_struct was released.
> |> >This is ungood.
> |>
> |> Wrap the reference to the parent task structure with exception table
> |> recovery code, like copy_from_user().
>
> Exception tables only protect accesses to user virtual memory. Kernel
> memory references must always be valid in the first place.
>
> Andreas.
The virtual address being accessed is irrelevant. It's the address of
the faulting instruction that determines what the kernel will do if it
can't deal with a page fault. If the access was made from kernel mode
the exception handler (if there is one) always gets invoked, otherwise
it oopses.
--
Brian Gerst
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 13:11 ` Andreas Schwab
2001-05-04 13:38 ` Brian Gerst
@ 2001-05-04 23:27 ` Keith Owens
1 sibling, 0 replies; 12+ messages in thread
From: Keith Owens @ 2001-05-04 23:27 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Todd Inglett, Alexander Viro, linux-kernel
On 04 May 2001 15:11:37 +0200,
Andreas Schwab <schwab@suse.de> wrote:
>Keith Owens <kaos@ocs.com.au> writes:
>|> Wrap the reference to the parent task structure with exception table
>|> recovery code, like copy_from_user().
>
>Exception tables only protect accesses to user virtual memory. Kernel
>memory references must always be valid in the first place.
Wrong. Exception tables say that if the kernel gets an exception
between labels A and B then branch to fixup label C. See show_regs()
in arch/i386/kernel/process.c and wrmsr_eio() in arch/i386/kernel/msr.c
for examples which do not depend on user virtual memory.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 12:46 ` Keith Owens
2001-05-04 13:11 ` Andreas Schwab
@ 2001-05-04 14:21 ` Andreas Ferber
2001-05-04 15:18 ` Todd Inglett
1 sibling, 1 reply; 12+ messages in thread
From: Andreas Ferber @ 2001-05-04 14:21 UTC (permalink / raw)
To: Keith Owens; +Cc: Todd Inglett, Alexander Viro, linux-kernel
Hi,
On Fri, May 04, 2001 at 10:46:43PM +1000, Keith Owens wrote:
> For a read only case, the only important
> thing is not to die, one occurrence of bad data is tolerable.
Strong NACK. The pages where the bad data comes from may in some cases
already be reclaimed for other data, probably something security
relevant, which should never ever be given even read access by an
unauthorized user. Even if this event may be a very rare case, one
single occurrence of this is one to much.
Andreas
--
>Ich nehm nicht mal Zucker in den den Kaffee.
Dafür nehm ich nichtmal Kaffee.
(Peter Backof + Bernhard Schultz in detebe)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 14:21 ` Andreas Ferber
@ 2001-05-04 15:18 ` Todd Inglett
0 siblings, 0 replies; 12+ messages in thread
From: Todd Inglett @ 2001-05-04 15:18 UTC (permalink / raw)
To: Andreas Ferber; +Cc: Keith Owens, Alexander Viro, linux-kernel
Andreas Ferber wrote:
>
> On Fri, May 04, 2001 at 10:46:43PM +1000, Keith Owens wrote:
>
> > For a read only case, the only important
> > thing is not to die, one occurrence of bad data is tolerable.
>
> Strong NACK. The pages where the bad data comes from may in some cases
> already be reclaimed for other data, probably something security
> relevant, which should never ever be given even read access by an
> unauthorized user. Even if this event may be a very rare case, one
> single occurrence of this is one to much.
Agreed. Worse, it is not readonly. The /proc code task_lock's the task
struct, thus writing to it.
I'll post a patch shortly once I've tested it. Worse case only if the
task is exiting I sweep the tasklist looking for the parent to see if
the parent is still valid. I am not verifying if it is the actual
parent (it might be a new task allocated at the same spot). I could
just report 0 (or 1) for the parent for any process that is exiting, but
then you won't be able to see the ppid for zombies. Or is there another
state I can look for? What I really need is PF_EXITED :).
I am a little concerned also about mm, file, tty and sig fields. These
appear to be NULLed in do_exit(), but I haven't tracked down tty and sig
yet.
--
-todd
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: SMP races in proc with thread_struct
2001-05-04 12:34 ` Todd Inglett
2001-05-04 12:46 ` Keith Owens
@ 2001-05-04 16:04 ` Alexander Viro
2001-05-04 17:52 ` [PATCH][RFC] " Alexander Viro
2 siblings, 0 replies; 12+ messages in thread
From: Alexander Viro @ 2001-05-04 16:04 UTC (permalink / raw)
To: Todd Inglett; +Cc: linux-kernel
On Fri, 4 May 2001, Todd Inglett wrote:
> Ok, I've got this isolated. Here's the sequence of events:
>
> 1. Some process T (probably "top") opens /proc/N/stat.
> 2. While holding tasklist_lock the proc code does a get_task_struct()
> to add a ref count to the page.
> 3. Process N exits.
> 4. The parent of process N exits.
> 5. Process T reads from the open file. This calls proc_pid_stat()
> which dereferences N's task_struct. This is ok as Alexander points out
> because a reference is held.
> 6. Using N's task_struct process T attempt to dereference the *parent*
> task struct. It assumes this is ok because:
Where?
> A) it is holding tasklist_lock so N cannot be reparented in a race.
> B) every process *always* has a valid parent.
>
> But this is where hell breaks loose. Every process has a valid parent
> -- unless it is dead and nobody cares. Process N has already exited and
> released from the tasklist while its parent was still alive. There was
> no reason to reparent it. It just got released. So N's task_struct has
> a dangling ptr to its parent. Nobody is holding the parent task_struct,
> either. When the parent died memory for its task_struct was released.
> This is ungood.
If N is dead all accesses should return -ENOENT. No matter what
happens with its parent.
> My opinion here is that this is proc's problem. When we free a
> task_struct it could be "cleaned up" of dangling ptrs, but this is a
> hack to cover a bug in proc.
>
> This is not isolated to the parent task_struct, either. The task_struct
> mm is also dereferenced. It is pretty easy to validate a parent
> task_struct ptr (just hold tasklist_lock and run the list to check if it
> is still valid -- might not be the *right* task, but it will still be
> valid). However, how do you validate the mm is ok?
exit_mm() cleans task->mm. _Before_ the process dies. And it should do
that, for a lot of reasons. General principle: if you are doing
garbage-collection upon removal of the last reference - _remove_
that reference. Before you call destructor. Anything else is simply
asking for races.
Besides, all the exit_foo() can be done by a very alive kernel threads.
Suppose that exit_mm() didn't clean ->mm. Well, here comes losetup(8).
It binds loop device to a file and starts a thread for handling requests.
Said thread is created by kernel_thread(9). Which is a wrapper for clone(2).
So far, so good, but that thread gets a VM of parent. I.e. losteup.
That is _not_ good. For one thing, it means full-blown MMU switch whenever
we switch to loop_thread. And that will cost you. Since loop_thread has
no business using the userland part of MMU state it calls exit_mm(9) (it
calls daemonize(9). which, in turn, calls exit_mm(9)).
That picture is typical for kernel threads. knfsd, drivers' helper threads,
you name it. And unlike the relatively tame case of loop_thread (there
we have serialization between loop_thread and parent that will not
let parent to exit before loop_thread will do up(&lo->lo_sem) which is
after the daemonize() call) in general we can very well have parent
dead and gone by the time when child calls exit_mm().
See the problem? Child is very much alive, it's the sole owner of pointer
to mm_struct and it calls exit_mm(). Leaving the pointer around is _not_
a good idea.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH][RFC] Re: SMP races in proc with thread_struct
2001-05-04 12:34 ` Todd Inglett
2001-05-04 12:46 ` Keith Owens
2001-05-04 16:04 ` Alexander Viro
@ 2001-05-04 17:52 ` Alexander Viro
2 siblings, 0 replies; 12+ messages in thread
From: Alexander Viro @ 2001-05-04 17:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Todd Inglett, linux-kernel
Linus, could you consider the patch below? As it is, access to
/proc/<pid>/status of dead process with dead parent is possible and
leads to access to freed memory. Besides, cd /proc/<pid> means
that even after <pid> is gone, readdir() _and_ lookup on /proc/<pid> work.
Patch makes sure that ->p_pptr is NULL once the process is gone (fixes
readdir/lookup stuff) and adds obvious couple of checks in array.c.
Al
diff -urN S5-pre1/fs/proc/array.c S5-pre1-p_pptr/fs/proc/array.c
--- S5-pre1/fs/proc/array.c Sat Apr 28 02:12:56 2001
+++ S5-pre1-p_pptr/fs/proc/array.c Fri May 4 13:15:47 2001
@@ -157,7 +157,9 @@
"Uid:\t%d\t%d\t%d\t%d\n"
"Gid:\t%d\t%d\t%d\t%d\n",
get_task_state(p),
- p->pid, p->p_opptr->pid, p->p_pptr->pid != p->p_opptr->pid ? p->p_pptr->pid : 0,
+ p->pid, p->p_opptr->pid,
+ p->p_pptr && p->p_pptr->pid != p->p_opptr->pid
+ ? p->p_pptr->pid : 0,
p->uid, p->euid, p->suid, p->fsuid,
p->gid, p->egid, p->sgid, p->fsgid);
read_unlock(&tasklist_lock);
@@ -339,7 +341,7 @@
nice = task->nice;
read_lock(&tasklist_lock);
- ppid = task->p_opptr->pid;
+ ppid = task->p_pptr ? task->p_opptr->pid : 0;
read_unlock(&tasklist_lock);
res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %ld %ld %lu %lu %ld %lu %lu %lu %lu %lu \
diff -urN S5-pre1/kernel/exit.c S5-pre1-p_pptr/kernel/exit.c
--- S5-pre1/kernel/exit.c Fri Feb 16 22:52:15 2001
+++ S5-pre1-p_pptr/kernel/exit.c Fri May 4 13:18:33 2001
@@ -62,6 +62,9 @@
current->counter += p->counter;
if (current->counter >= MAX_COUNTER)
current->counter = MAX_COUNTER;
+ write_lock_irq(&tasklist_lock);
+ p->p_pptr = NULL;
+ write_unlock_irq(&tasklist_lock);
free_task_struct(p);
} else {
printk("task releasing itself\n");
^ permalink raw reply [flat|nested] 12+ messages in thread