public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sys_getppid oopses on debug kernel
@ 2006-08-08 10:22 Kirill Korotaev
  2006-08-08 15:26 ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill Korotaev @ 2006-08-08 10:22 UTC (permalink / raw)
  To: Andrew Morton, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 164 bytes --]

sys_getppid() optimization can access a freed memory.
On kernels with DEBUG_SLAB turned ON, this results in
Oops.

Signed-Off-By: Kirill Korotaev <dev@openvz.org>


[-- Attachment #2: diff-get-ppid-with-slab-debug --]
[-- Type: text/plain, Size: 717 bytes --]

--- ./kernel/timer.c.ppiddbg	2006-07-14 19:11:06.000000000 +0400
+++ ./kernel/timer.c	2006-08-08 14:19:24.000000000 +0400
@@ -1342,6 +1342,7 @@ asmlinkage long sys_getpid(void)
 asmlinkage long sys_getppid(void)
 {
 	int pid;
+#ifndef CONFIG_DEBUG_SLAB
 	struct task_struct *me = current;
 	struct task_struct *parent;
 
@@ -1364,6 +1365,16 @@ asmlinkage long sys_getppid(void)
 #endif
 		break;
 	}
+#else
+	/*
+	 * ->real_parent could be released before dereference and
+	 * we accessed freed kernel memory, which faults with debugging on.
+	 * Keep it simple and stupid.
+	 */
+	read_lock(&tasklist_lock);
+	pid = current->group_leader->real_parent->tgid;
+	read_unlock(&tasklist_lock);
+#endif
 	return pid;
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 10:22 [PATCH] sys_getppid oopses on debug kernel Kirill Korotaev
@ 2006-08-08 15:26 ` Dave Hansen
  2006-08-08 15:34   ` Björn Steinbrink
  2006-08-08 15:43   ` Kirill Korotaev
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Hansen @ 2006-08-08 15:26 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: Andrew Morton, Linux Kernel Mailing List

On Tue, 2006-08-08 at 14:22 +0400, Kirill Korotaev wrote:
> sys_getppid() optimization can access a freed memory.
> On kernels with DEBUG_SLAB turned ON, this results in
> Oops.
...
> +#else
> +	/*
> +	 * ->real_parent could be released before dereference and
> +	 * we accessed freed kernel memory, which faults with debugging on.
> +	 * Keep it simple and stupid.
> +	 */
> +	read_lock(&tasklist_lock);
> +	pid = current->group_leader->real_parent->tgid;
> +	read_unlock(&tasklist_lock);
> +#endif
>  	return pid;
>  }

Accessing freed memory is a bug, always, not just *only* when slab
debugging is on, right?  Doesn't this mean we could get junk, or that
the reader could potentially run off a bad pointer?

It seems that this patch only papers over the problem in the case when
it is observed, but doesn't really even fix the normal case.

Could we use a seqlock to determine when real_parent is in flux, and
re-read real_parent until we get a consistent one?  We could use in in
lieu of the existing for() loop.

-- Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:26 ` Dave Hansen
@ 2006-08-08 15:34   ` Björn Steinbrink
  2006-08-08 15:41     ` Dave Hansen
  2006-08-08 15:43   ` Kirill Korotaev
  1 sibling, 1 reply; 10+ messages in thread
From: Björn Steinbrink @ 2006-08-08 15:34 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Kirill Korotaev, Andrew Morton, Linux Kernel Mailing List

On 2006.08.08 08:26:57 -0700, Dave Hansen wrote:
> On Tue, 2006-08-08 at 14:22 +0400, Kirill Korotaev wrote:
> > sys_getppid() optimization can access a freed memory.
> > On kernels with DEBUG_SLAB turned ON, this results in
> > Oops.
> ...
> > +#else
> > +	/*
> > +	 * ->real_parent could be released before dereference and
> > +	 * we accessed freed kernel memory, which faults with debugging on.
> > +	 * Keep it simple and stupid.
> > +	 */
> > +	read_lock(&tasklist_lock);
> > +	pid = current->group_leader->real_parent->tgid;
> > +	read_unlock(&tasklist_lock);
> > +#endif
> >  	return pid;
> >  }
> 
> Accessing freed memory is a bug, always, not just *only* when slab
> debugging is on, right?  Doesn't this mean we could get junk, or that
> the reader could potentially run off a bad pointer?
> 
> It seems that this patch only papers over the problem in the case when
> it is observed, but doesn't really even fix the normal case.
> 
> Could we use a seqlock to determine when real_parent is in flux, and
> re-read real_parent until we get a consistent one?  We could use in in
> lieu of the existing for() loop.

See this thread: http://lkml.org/lkml/2006/8/8/215

Björn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:34   ` Björn Steinbrink
@ 2006-08-08 15:41     ` Dave Hansen
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2006-08-08 15:41 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Kirill Korotaev, Andrew Morton, Linux Kernel Mailing List

On Tue, 2006-08-08 at 17:34 +0200, Björn Steinbrink wrote:
> See this thread: http://lkml.org/lkml/2006/8/8/215

Bah.  I see it now.

I'll put this on the list of things that still have to be fixed before
we do any kind of memory removal.  This will definitely be a bug there.

-- Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:26 ` Dave Hansen
  2006-08-08 15:34   ` Björn Steinbrink
@ 2006-08-08 15:43   ` Kirill Korotaev
  2006-08-08 15:49     ` Dave Hansen
  2006-08-09  3:09     ` Andi Kleen
  1 sibling, 2 replies; 10+ messages in thread
From: Kirill Korotaev @ 2006-08-08 15:43 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Andrew Morton, Linux Kernel Mailing List

> Accessing freed memory is a bug, always, not just *only* when slab
> debugging is on, right?  Doesn't this mean we could get junk, or that
> the reader could potentially run off a bad pointer?
no, read the comment in sys_getppid.
It is a valid optimization. _safe_ and alowing to bypass taking the lock.
BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
is always mapped into virtual address space.
Which is invalid for debug kernels only.

> It seems that this patch only papers over the problem in the case when
> it is observed, but doesn't really even fix the normal case.
> 
> Could we use a seqlock to determine when real_parent is in flux, and
> re-read real_parent until we get a consistent one?  We could use in in
> lieu of the existing for() loop.

Kirill


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:43   ` Kirill Korotaev
@ 2006-08-08 15:49     ` Dave Hansen
  2006-08-08 15:54       ` Kirill Korotaev
  2006-08-08 15:58       ` Martin Schwidefsky
  2006-08-09  3:09     ` Andi Kleen
  1 sibling, 2 replies; 10+ messages in thread
From: Dave Hansen @ 2006-08-08 15:49 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: Andrew Morton, Linux Kernel Mailing List, Martin Schwidefsky,
	Heiko Carstens

On Tue, 2006-08-08 at 19:43 +0400, Kirill Korotaev wrote:
> > Accessing freed memory is a bug, always, not just *only* when slab
> > debugging is on, right?  Doesn't this mean we could get junk, or that
> > the reader could potentially run off a bad pointer?
> no, read the comment in sys_getppid.
> It is a valid optimization. _safe_ and alowing to bypass taking the lock.
> BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
> is always mapped into virtual address space.
> Which is invalid for debug kernels only.

Actually, it might also be invalid in hypervisor environments.  s390 and
Xen use ballooning drivers to tell the hypervisor which pages are not
currently in use by the OS so that they may be used in virtual machines
elsewhere.

I'm cc'ing the s390 guys.  Will the s390 kernel oops if it accesses a
page which was ballooned back to the hypervisor?

-- Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:49     ` Dave Hansen
@ 2006-08-08 15:54       ` Kirill Korotaev
  2006-08-08 15:58       ` Martin Schwidefsky
  1 sibling, 0 replies; 10+ messages in thread
From: Kirill Korotaev @ 2006-08-08 15:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Linux Kernel Mailing List, Martin Schwidefsky,
	Heiko Carstens

>>>Accessing freed memory is a bug, always, not just *only* when slab
>>>debugging is on, right?  Doesn't this mean we could get junk, or that
>>>the reader could potentially run off a bad pointer?
>>
>>no, read the comment in sys_getppid.
>>It is a valid optimization. _safe_ and alowing to bypass taking the lock.
>>BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
>>is always mapped into virtual address space.
>>Which is invalid for debug kernels only.
> 
> 
> Actually, it might also be invalid in hypervisor environments.  s390 and
> Xen use ballooning drivers to tell the hypervisor which pages are not
> currently in use by the OS so that they may be used in virtual machines
> elsewhere.
> 
> I'm cc'ing the s390 guys.  Will the s390 kernel oops if it accesses a
> page which was ballooned back to the hypervisor?

Yeah, a minute after my reply I got your idea and sent a new patch
which always takes the lock :)))

Thanks,
Kirill


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:49     ` Dave Hansen
  2006-08-08 15:54       ` Kirill Korotaev
@ 2006-08-08 15:58       ` Martin Schwidefsky
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Schwidefsky @ 2006-08-08 15:58 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill Korotaev, Andrew Morton, Linux Kernel Mailing List,
	Heiko Carstens

On Tue, 2006-08-08 at 08:49 -0700, Dave Hansen wrote:
> On Tue, 2006-08-08 at 19:43 +0400, Kirill Korotaev wrote:
> > > Accessing freed memory is a bug, always, not just *only* when slab
> > > debugging is on, right?  Doesn't this mean we could get junk, or that
> > > the reader could potentially run off a bad pointer?
> > no, read the comment in sys_getppid.
> > It is a valid optimization. _safe_ and alowing to bypass taking the lock.
> > BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
> > is always mapped into virtual address space.
> > Which is invalid for debug kernels only.
> 
> Actually, it might also be invalid in hypervisor environments.  s390 and
> Xen use ballooning drivers to tell the hypervisor which pages are not
> currently in use by the OS so that they may be used in virtual machines
> elsewhere.
> 
> I'm cc'ing the s390 guys.  Will the s390 kernel oops if it accesses a
> page which was ballooned back to the hypervisor?

Not with the ballooner, that just tells the hypervisor that it can
forget the current content. On the next access the hypervisor hands out
a zeroed page so the access will succeed. But with my guest page hinting
code the kernel will oops if a free page is accessed.

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-08 15:43   ` Kirill Korotaev
  2006-08-08 15:49     ` Dave Hansen
@ 2006-08-09  3:09     ` Andi Kleen
  2006-08-09  3:31       ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2006-08-09  3:09 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: Andrew Morton, Linux Kernel Mailing List, haveblue, linux-arch

Kirill Korotaev <dev@sw.ru> writes:

[adding linux-arch]

> > Accessing freed memory is a bug, always, not just *only* when slab
> > debugging is on, right?  Doesn't this mean we could get junk, or that
> > the reader could potentially run off a bad pointer?
> no, read the comment in sys_getppid.
> It is a valid optimization. _safe_ and alowing to bypass taking the lock.
> BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
> is always mapped into virtual address space.
> Which is invalid for debug kernels only.

In x86 arch code we would use __get_user for this (and we do in a couple 
of places). But it wouldn't be portable because sometimes _user is 
in a different address space.

Maybe it would be time to make a similar facility (read/write_kernel_safe() or similar)
with error return available to generic code? 

It should be easy to implement - iirc near all architectures already
use the exception handling frame work and it is a simple extension 
of that. x86 could just define it to __put/get_user

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sys_getppid oopses on debug kernel
  2006-08-09  3:09     ` Andi Kleen
@ 2006-08-09  3:31       ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2006-08-09  3:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Kirill Korotaev, Linux Kernel Mailing List, haveblue, linux-arch

On 09 Aug 2006 05:09:11 +0200
Andi Kleen <ak@suse.de> wrote:

> Kirill Korotaev <dev@sw.ru> writes:
> 
> [adding linux-arch]
> 
> > > Accessing freed memory is a bug, always, not just *only* when slab
> > > debugging is on, right?  Doesn't this mean we could get junk, or that
> > > the reader could potentially run off a bad pointer?
> > no, read the comment in sys_getppid.
> > It is a valid optimization. _safe_ and alowing to bypass taking the lock.
> > BUT! This optimization relies on the fact that kernel memory (DMA + normal zone)
> > is always mapped into virtual address space.
> > Which is invalid for debug kernels only.
> 
> In x86 arch code we would use __get_user for this (and we do in a couple 
> of places). But it wouldn't be portable because sometimes _user is 
> in a different address space.
> 
> Maybe it would be time to make a similar facility (read/write_kernel_safe() or similar)
> with error return available to generic code? 
> 
> It should be easy to implement - iirc near all architectures already
> use the exception handling frame work and it is a simple extension 
> of that. x86 could just define it to __put/get_user
> 

I just did something like that:

Similar to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc3/2.6.18-rc3-mm2/broken-out/add-probe_kernel_address.patch

Although I'm not sure it's needed for this problem. A getppid() which does

asmlinkage long sys_getppid(void)
{
	int pid;

	read_lock(&tasklist_lock);
	pid = current->group_leader->real_parent->tgid;
	read_unlock(&tasklist_lock);

	return pid;
}

seems like a fine implementation to me ;)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-08-09  3:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-08 10:22 [PATCH] sys_getppid oopses on debug kernel Kirill Korotaev
2006-08-08 15:26 ` Dave Hansen
2006-08-08 15:34   ` Björn Steinbrink
2006-08-08 15:41     ` Dave Hansen
2006-08-08 15:43   ` Kirill Korotaev
2006-08-08 15:49     ` Dave Hansen
2006-08-08 15:54       ` Kirill Korotaev
2006-08-08 15:58       ` Martin Schwidefsky
2006-08-09  3:09     ` Andi Kleen
2006-08-09  3:31       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox