public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Oops in aio_free_ring on 2.6.9
@ 2004-11-05 19:34 Darrick J. Wong
  2004-11-07  3:43 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2004-11-05 19:34 UTC (permalink / raw)
  To: linux-aio; +Cc: linux-kernel

Hi,

In pounding on various i386 machines with a random syscall generator, I
uncovered a situation in which the kernel oopses:

1. Use mmap() to map out as much of the process address
   space as possible.  This is about 2047M on i386.
2. Call io_setup with the first argument set to a
   large (~65000) value.

(These notes reference the mainline 2.6.9 source.)

What happens is that the number of pages required to service the
io_setup request is larger than the block of internally allocated page
pointers (fs/aio.c:126), so aio_setup_ring kmalloc's a blob of struct
page pointers, and initializes these pointers to NULL. (fs/aio.c:130)

Next, the aio_setup_ring function tries to mmap a bunch of pages and
fails, because in step 1 we used up all the address space. 
aio_setup_ring then calls aio_free_ring to tear all of this down.
(fs/aio.c:143)

aio_free_ring sees the block of struct page pointers and calls free_page
(fs/aio.c:88) on the pointers without checking that they're not NULL. 
Unfortunately, they _are_ NULL and *oops*!  My patch amends the function
to include a null pointer check.

I have a simple testcase that causes this oops; see
http://submarine.dyndns.org/~djwong/docs/iosetup_crash.c.  Run
"./iosetup_crash 100 100000 1000000" to reproduce the oops.

This flaw shows up on mainline 2.6.9, Debian 2.6.8 and SLES9 2.6.5 on
four different machines.  The patch is against 2.6.9 mainline and the
problem isn't fixed in 2.6.10-rc1.

Please send replies directly to me, as I'm not subscribed to
linux-kernel or linux-aio.

--Darrick

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>

diff -uprN linux-2.6.9-orig/fs/aio.c linux-2.6.9/fs/aio.c
--- linux-2.6.9-orig/fs/aio.c	2004-10-18 14:54:07.000000000 -0700
+++ linux-2.6.9/fs/aio.c	2004-11-03 08:47:51.000000000 -0800
@@ -85,7 +85,8 @@ static void aio_free_ring(struct kioctx 
 	long i;
 
 	for (i=0; i<info->nr_pages; i++)
-		put_page(info->ring_pages[i]);
+		if (info->ring_pages[i])
+			put_page(info->ring_pages[i]);
 
 	if (info->mmap_size) {
 		down_write(&ctx->mm->mmap_sem);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Oops in aio_free_ring on 2.6.9
  2004-11-05 19:34 [PATCH] Oops in aio_free_ring on 2.6.9 Darrick J. Wong
@ 2004-11-07  3:43 ` Linus Torvalds
  2004-11-08  4:17   ` Suparna Bhattacharya
  2004-11-08 19:30   ` Darrick J. Wong
  0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2004-11-07  3:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-aio, Andrew Morton, Kernel Mailing List



On Fri, 5 Nov 2004, Darrick J. Wong wrote:
> 
> Next, the aio_setup_ring function tries to mmap a bunch of pages and
> fails, because in step 1 we used up all the address space. 
> aio_setup_ring then calls aio_free_ring to tear all of this down.
> (fs/aio.c:143)
> 
> aio_free_ring sees the block of struct page pointers and calls free_page
> (fs/aio.c:88) on the pointers without checking that they're not NULL. 
> Unfortunately, they _are_ NULL and *oops*!  My patch amends the function
> to include a null pointer check.

I don't disagree with the bug, but I disagree with the fix. 

In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong 
because it has been initialized to a bogus value. 

I'd much prefer this alternate appended patch. Can you verify that it also 
fixes the problem (we can drop the bogus info->nr_pages initialization, 
because the context - including the info part - has been cleared when it 
was allocated, so nr_pages should already have the _correct_ value of zero 
at this point).

		Linus

-----
===== fs/aio.c 1.60 vs edited =====
--- 1.60/fs/aio.c	2004-10-20 01:12:10 -07:00
+++ edited/fs/aio.c	2004-11-06 19:41:45 -08:00
@@ -118,8 +118,6 @@
 	if (nr_pages < 0)
 		return -EINVAL;
 
-	info->nr_pages = nr_pages;
-
 	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
 
 	info->nr = 0;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Oops in aio_free_ring on 2.6.9
  2004-11-07  3:43 ` Linus Torvalds
@ 2004-11-08  4:17   ` Suparna Bhattacharya
  2004-11-08 19:30   ` Darrick J. Wong
  1 sibling, 0 replies; 4+ messages in thread
From: Suparna Bhattacharya @ 2004-11-08  4:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Darrick J. Wong, linux-aio, Andrew Morton, Kernel Mailing List

On Sat, Nov 06, 2004 at 07:43:33PM -0800, Linus Torvalds wrote:
> 
> On Fri, 5 Nov 2004, Darrick J. Wong wrote:
> > 
> > Next, the aio_setup_ring function tries to mmap a bunch of pages and
> > fails, because in step 1 we used up all the address space. 
> > aio_setup_ring then calls aio_free_ring to tear all of this down.
> > (fs/aio.c:143)
> > 
> > aio_free_ring sees the block of struct page pointers and calls free_page
> > (fs/aio.c:88) on the pointers without checking that they're not NULL. 
> > Unfortunately, they _are_ NULL and *oops*!  My patch amends the function
> > to include a null pointer check.
> 
> I don't disagree with the bug, but I disagree with the fix. 
> 
> In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong 
> because it has been initialized to a bogus value. 
> 
> I'd much prefer this alternate appended patch. Can you verify that it also 
> fixes the problem (we can drop the bogus info->nr_pages initialization, 
> because the context - including the info part - has been cleared when it 
> was allocated, so nr_pages should already have the _correct_ value of zero 
> at this point).

Since aio_free_ring uses comparison with info->internal_pages rather than
nr_pages to decide whether to kfree(info->ring_pages), I see your point.

Regards
Suparna

> 
> 		Linus
> 
> -----
> ===== fs/aio.c 1.60 vs edited =====
> --- 1.60/fs/aio.c	2004-10-20 01:12:10 -07:00
> +++ edited/fs/aio.c	2004-11-06 19:41:45 -08:00
> @@ -118,8 +118,6 @@
>  	if (nr_pages < 0)
>  		return -EINVAL;
>  
> -	info->nr_pages = nr_pages;
> -
>  	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
>  
>  	info->nr = 0;
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Oops in aio_free_ring on 2.6.9
  2004-11-07  3:43 ` Linus Torvalds
  2004-11-08  4:17   ` Suparna Bhattacharya
@ 2004-11-08 19:30   ` Darrick J. Wong
  1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2004-11-08 19:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-aio, Andrew Morton, Kernel Mailing List,
	Suparna Bhattacharya

On Sat, 2004-11-06 at 19:43, Linus Torvalds wrote:

> I don't disagree with the bug, but I disagree with the fix. 
> 
> In my opinion, the problem is that "info->nr_pages" is _wrong_. It's wrong 
> because it has been initialized to a bogus value. 
> 
> I'd much prefer this alternate appended patch. Can you verify that it also 
> fixes the problem (we can drop the bogus info->nr_pages initialization, 

You're right, that is a better fix.  The aio code is not my current area
of expertise, as I've demonstrated. :)

Your patch also fixes the problem.

Thanks,

--Darrick


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-11-08 19:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-05 19:34 [PATCH] Oops in aio_free_ring on 2.6.9 Darrick J. Wong
2004-11-07  3:43 ` Linus Torvalds
2004-11-08  4:17   ` Suparna Bhattacharya
2004-11-08 19:30   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox