public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sys_getppid oopses on debug kernel (v2)
@ 2006-08-08 15:50 Kirill Korotaev
  2006-08-08 16:09 ` Dave Hansen
  0 siblings, 1 reply; 6+ messages in thread
From: Kirill Korotaev @ 2006-08-08 15:50 UTC (permalink / raw)
  To: Andrew Morton, Linux Kernel Mailing List, alan, muli,
	B.Steinbrink, stable, Dave Hansen

sys_getppid() optimization can access a freed memory.
On kernels with DEBUG_SLAB turned ON, this results in Oops.
As Dave Hansen noted, this optimization is also unsafe
for memory hotplug.

So this patch always takes the lock to be safe.

Signed-Off-By: Kirill Korotaev <dev@openvz.org>


--- ./kernel/timer.c.ppiddbg	2006-07-14 19:11:06.000000000 +0400
+++ ./kernel/timer.c	2006-08-08 19:45:57.000000000 +0400
@@ -1342,28 +1342,11 @@ asmlinkage long sys_getpid(void)
 asmlinkage long sys_getppid(void)
 {
 	int pid;
-	struct task_struct *me = current;
-	struct task_struct *parent;
 
-	parent = me->group_leader->real_parent;
-	for (;;) {
-		pid = parent->tgid;
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
-{
-		struct task_struct *old = parent;
+	read_lock(&tasklist_lock);
+	pid = current->group_leader->real_parent->tgid;
+	read_unlock(&tasklist_lock);
 
-		/*
-		 * Make sure we read the pid before re-reading the
-		 * parent pointer:
-		 */
-		smp_rmb();
-		parent = me->group_leader->real_parent;
-		if (old != parent)
-			continue;
-}
-#endif
-		break;
-	}
 	return pid;
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel (v2)
  2006-08-08 15:50 [PATCH] sys_getppid oopses on debug kernel (v2) Kirill Korotaev
@ 2006-08-08 16:09 ` Dave Hansen
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Hansen @ 2006-08-08 16:09 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: Andrew Morton, Linux Kernel Mailing List, alan, muli,
	B.Steinbrink, stable

On Tue, 2006-08-08 at 19:50 +0400, Kirill Korotaev wrote:
> sys_getppid() optimization can access a freed memory.
> On kernels with DEBUG_SLAB turned ON, this results in Oops.
> As Dave Hansen noted, this optimization is also unsafe
> for memory hotplug. 

Yeah, I just tried coding up something to do a seqlock to note when
tasks are freed, but it doesn't work.  Unless somebody is really going
crazy with getppid() on a very large system, this should be just fine.

-- Dave



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel (v2)
  2006-08-09 14:38 Oleg Nesterov
@ 2006-08-09 12:08 ` Kirill Korotaev
  2006-08-09 16:54   ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Kirill Korotaev @ 2006-08-09 12:08 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Dave Hansen, linux-kernel

>>Although I'm not sure it's needed for this problem. A getppid() which does
>>
>>asmlinkage long sys_getppid(void)
>>{
>>	int pid;
>>
>>	read_lock(&tasklist_lock);
>>	pid = current->group_leader->real_parent->tgid;
>>	read_unlock(&tasklist_lock);
>>
>>	return pid;
>>}
>>
>>seems like a fine implementation to me ;)
> 
> 
> Why do we need to use ->group_leader? All threads should have the same
> ->real_parent.
I'm not sure this is true for old LinuxThreads...

> Why do we need tasklist_lock? I think rcu_read_lock() is enough.
> 
> In other words, do you see any problems with this code
> 
> 	smlinkage long sys_getppid(void)
> 	{
> 		int pid;
> 
> 		rcu_read_lock();
> 		pid = rcu_dereference(current->real_parent)->tgid;
> 		rcu_read_unlock();
> 
> 		return pid;
> 	}
> 
> ? Yes, we may read a stale value for ->real_parent, but the memory
> can't be freed while we are under rcu_read_lock(). And in this case
> the returned value is ok because the task could be reparented just
> after return anyway.
Your patch doesn't cure the problem.
rcu_read_lock just disables preemtion and rcu_dereference
introduces memory barrier. _None_ of this _prevents_
another CPU from freeing old real_parent in parallel with your dereference.

You can minimize the probability very much by making local_irq_disable()/enable()
around the code in question, but still it won't be a real fix (at least due to NMIs).

Kirill


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel (v2)
  2006-08-09 16:54   ` Oleg Nesterov
@ 2006-08-09 13:02     ` Kirill Korotaev
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill Korotaev @ 2006-08-09 13:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Dave Hansen, linux-kernel

>>Your patch doesn't cure the problem.
>>rcu_read_lock just disables preemtion and rcu_dereference
>>introduces memory barrier. _None_ of this _prevents_
>>another CPU from freeing old real_parent in parallel with your dereference.
> 
> 
> How so? Note that release_task() doesn't call put_task_struct(), it does
> call_rcu(&p->rcu, delayed_put_task_struct) instead. When delayed_put_task_struct()
> is called, all CPUs must see the new value of ->real_parent (otherwise
> RCU is just broken). If CPU sees the old value of ->real_parent, rcu_read_lock()
> protects us from delayed_put_task_struct() on another CPU.
> 
> Ok, I think this is the same "classic" pattern as:
> 
> 	old = global_ptr;
> 	global_ptr = new;
> 	call_rcu(..free_old...);
> vs
> 	rcu_read_lock();
> 	use(global_ptr);
> 	rcu_read_unlock();
> 
> Do you agree?
Agree :)
Sorry, I didn't notice that task structs are freed now with RCU :)))
In this case your patch really helps the problem. Care to prepare it?

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel (v2)
@ 2006-08-09 14:38 Oleg Nesterov
  2006-08-09 12:08 ` Kirill Korotaev
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-08-09 14:38 UTC (permalink / raw)
  To: Andrew Morton, Kirill Korotaev; +Cc: Dave Hansen, linux-kernel

Andrew Morton wrote:
>
> Although I'm not sure it's needed for this problem. A getppid() which does
>
> asmlinkage long sys_getppid(void)
> {
> 	int pid;
>
> 	read_lock(&tasklist_lock);
> 	pid = current->group_leader->real_parent->tgid;
> 	read_unlock(&tasklist_lock);
>
> 	return pid;
> }
>
> seems like a fine implementation to me ;)

Why do we need to use ->group_leader? All threads should have the same
->real_parent.

Why do we need tasklist_lock? I think rcu_read_lock() is enough.

In other words, do you see any problems with this code

	smlinkage long sys_getppid(void)
	{
		int pid;

		rcu_read_lock();
		pid = rcu_dereference(current->real_parent)->tgid;
		rcu_read_unlock();

		return pid;
	}

? Yes, we may read a stale value for ->real_parent, but the memory
can't be freed while we are under rcu_read_lock(). And in this case
the returned value is ok because the task could be reparented just
after return anyway.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel (v2)
  2006-08-09 12:08 ` Kirill Korotaev
@ 2006-08-09 16:54   ` Oleg Nesterov
  2006-08-09 13:02     ` Kirill Korotaev
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2006-08-09 16:54 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: Andrew Morton, Dave Hansen, linux-kernel

On 08/09, Kirill Korotaev wrote:
>
> >Why do we need to use ->group_leader? All threads should have the same
> >->real_parent.
> I'm not sure this is true for old LinuxThreads...

Yes, from the user-space pov it may be not true, but ->group_leader->real_parent
should be equal ->real_parent anyway.

> >Why do we need tasklist_lock? I think rcu_read_lock() is enough.
> >
> >In other words, do you see any problems with this code
> >
> >	smlinkage long sys_getppid(void)
> >	{
> >		int pid;
> >
> >		rcu_read_lock();
> >		pid = rcu_dereference(current->real_parent)->tgid;
> >		rcu_read_unlock();
> >
> >		return pid;
> >	}
> >
> >? Yes, we may read a stale value for ->real_parent, but the memory
> >can't be freed while we are under rcu_read_lock(). And in this case
> >the returned value is ok because the task could be reparented just
> >after return anyway.
> Your patch doesn't cure the problem.
> rcu_read_lock just disables preemtion and rcu_dereference
> introduces memory barrier. _None_ of this _prevents_
> another CPU from freeing old real_parent in parallel with your dereference.

How so? Note that release_task() doesn't call put_task_struct(), it does
call_rcu(&p->rcu, delayed_put_task_struct) instead. When delayed_put_task_struct()
is called, all CPUs must see the new value of ->real_parent (otherwise
RCU is just broken). If CPU sees the old value of ->real_parent, rcu_read_lock()
protects us from delayed_put_task_struct() on another CPU.

Ok, I think this is the same "classic" pattern as:

	old = global_ptr;
	global_ptr = new;
	call_rcu(..free_old...);
vs
	rcu_read_lock();
	use(global_ptr);
	rcu_read_unlock();

Do you agree?

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-08-09 13:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-08 15:50 [PATCH] sys_getppid oopses on debug kernel (v2) Kirill Korotaev
2006-08-08 16:09 ` Dave Hansen
  -- strict thread matches above, loose matches on Subject: below --
2006-08-09 14:38 Oleg Nesterov
2006-08-09 12:08 ` Kirill Korotaev
2006-08-09 16:54   ` Oleg Nesterov
2006-08-09 13:02     ` Kirill Korotaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox