public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] fix the 2nd buffer race properly
@ 2005-04-27 13:15 Nick Piggin
  2005-04-27 13:20 ` Nick Piggin
  2005-04-27 17:56 ` Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Nick Piggin @ 2005-04-27 13:15 UTC (permalink / raw)
  To: Andrew Morton, Andrea Arcangeli, linux-kernel; +Cc: Chris Mason

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

OK, so I found the exact cause of the 2nd buffer problem.

Surprisingly, the first patch I sent was exactly what is
needed. Surprising because I didn't have a full handle on
the problem so perhaps I got a bit lucky.

The bug (the reason I asked you to drop the patch just now)
was that the code previously did a get_bh on all bh's in a
page, but I changed it to only put_bh the ones to be written.

The minor fix for that was to only get_bh the buffer heads to
be written.

Exact problem is described in the patch changelog. Anyone who
is feeling brave please review because I'm tired and have a
headache from too much kernel debugging ;)

Tested and seems to work. Doesn't seem to leak memory.

Nick

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: __block_write_full_page-bug.patch --]
[-- Type: text/plain, Size: 4304 bytes --]

When running
	fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
	BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write

It seems that more than one request would be submitted for a given bh
at a time.

What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
   end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.

So change __block_write_full_page to explicitly keep track of requests
rather than relying on testing all of them for buffer_async_write - because
by the time we submit the last buffer we have marked async_write, we no
longer own *any* of the buffers.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c	2005-04-27 22:43:05.000000000 +1000
+++ linux-2.6/fs/buffer.c	2005-04-27 22:45:03.000000000 +1000
@@ -1750,8 +1750,9 @@ static int __block_write_full_page(struc
 	int err;
 	sector_t block;
 	sector_t last_block;
-	struct buffer_head *bh, *head;
-	int nr_underway = 0;
+	struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+	int idx = 0;
+	int nr_underway;
 
 	BUG_ON(!PageLocked(page));
 
@@ -1808,7 +1809,6 @@ static int __block_write_full_page(struc
 	} while (bh != head);
 
 	do {
-		get_bh(bh);
 		if (!buffer_mapped(bh))
 			continue;
 		/*
@@ -1826,6 +1826,8 @@ static int __block_write_full_page(struc
 		}
 		if (test_clear_buffer_dirty(bh)) {
 			mark_buffer_async_write(bh);
+			get_bh(bh);
+			arr[idx++] = bh;
 		} else {
 			unlock_buffer(bh);
 		}
@@ -1839,15 +1841,12 @@ static int __block_write_full_page(struc
 	set_page_writeback(page);
 	unlock_page(page);
 
-	do {
-		struct buffer_head *next = bh->b_this_page;
-		if (buffer_async_write(bh)) {
-			submit_bh(WRITE, bh);
-			nr_underway++;
-		}
+	for (nr_underway = 0; nr_underway < idx; nr_underway++) {
+		bh = arr[nr_underway];
+		BUG_ON(!buffer_async_write(bh));
+		submit_bh(WRITE, bh);
 		put_bh(bh);
-		bh = next;
-	} while (bh != head);
+	}
 
 	err = 0;
 done:
@@ -1886,10 +1885,11 @@ recover:
 	bh = head;
 	/* Recovery: lock and submit the mapped buffers */
 	do {
-		get_bh(bh);
 		if (buffer_mapped(bh) && buffer_dirty(bh)) {
 			lock_buffer(bh);
 			mark_buffer_async_write(bh);
+			get_bh(bh);
+			arr[idx++] = bh;
 		} else {
 			/*
 			 * The buffer may have been set dirty during
@@ -1902,16 +1902,13 @@ recover:
 	BUG_ON(PageWriteback(page));
 	set_page_writeback(page);
 	unlock_page(page);
-	do {
-		struct buffer_head *next = bh->b_this_page;
-		if (buffer_async_write(bh)) {
-			clear_buffer_dirty(bh);
-			submit_bh(WRITE, bh);
-			nr_underway++;
-		}
+	for (nr_underway = 0; nr_underway < idx; nr_underway++) {
+		bh = arr[nr_underway];
+		BUG_ON(!buffer_async_write(bh));
+		clear_buffer_dirty(bh);
+		submit_bh(WRITE, bh);
 		put_bh(bh);
-		bh = next;
-	} while (bh != head);
+	}
 	goto done;
 }
 
@@ -2741,6 +2738,7 @@ sector_t generic_block_bmap(struct addre
 static int end_bio_bh_io_sync(struct bio *bio, unsigned int bytes_done, int err)
 {
 	struct buffer_head *bh = bio->bi_private;
+	bh_end_io_t *end_fn;
 
 	if (bio->bi_size)
 		return 1;
@@ -2750,7 +2748,16 @@ static int end_bio_bh_io_sync(struct bio
 		set_bit(BH_Eopnotsupp, &bh->b_state);
 	}
 
-	bh->b_end_io(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
+	end_fn = bh->b_end_io;
+
+	/* 
+	 * These two lines are debugging only - make sure b_end_io
+	 * isn't run twice for the same io request.
+	 */
+	BUG_ON(!end_fn);
+	bh->b_end_io = NULL;
+	
+	end_fn(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
 	bio_put(bio);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-04-28  1:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 13:15 [patch] fix the 2nd buffer race properly Nick Piggin
2005-04-27 13:20 ` Nick Piggin
2005-04-27 17:56 ` Andrew Morton
2005-04-28  0:08   ` Nick Piggin
2005-04-28  1:00     ` Andrew Morton
2005-04-28  1:29       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox