public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>,
	Chris Mason <mason@suse.com>, Linus Torvalds <torvalds@osdl.org>
Subject: [patch] alternative fix for VFS race (was Re: 2.6.12-rc3-mm2)
Date: Sun, 01 May 2005 13:30:32 +1000	[thread overview]
Message-ID: <42744D58.7090408@yahoo.com.au> (raw)
In-Reply-To: <20050430164303.6538f47c.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc3/2.6.12-rc3-mm2/
> 

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc3/2.6.12-rc3-mm2/broken-out/fix-race-in-block_write_full_page.patch

While this patch does fix the problem, I would like to propose the
following attached patch instead, which is a minimal fix for the
specific race identified.

I have the following concerns about extending the lock page coverage:
Extending lock_page coverage 1) doesn't appear to protect from any other
races; 2) doesn't seem to be how the rest of the kernel submits asynch
writes; 3) isn't how this path used to do locking; and 4) can hold the
page lock for a long time while a request slot and memory is allocated.

What's more, if there *is* a good reason to extend lock page coverage,
then that should probably be sumbmitted as a seperate changeset on top
of this minimal patch, with a seperate rationale. It would help future
work on this code identify why the locking is the way it is.


Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: __block_write_full_page-bug.patch --]
[-- Type: text/plain, Size: 2984 bytes --]

When running
	fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
	BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write

It seems that more than one request would be submitted for a given bh
at a time.

What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
   end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.

So change __block_write_full_page to explicitly keep track of the last
bh we need to issue, so we don't touch anything after issuing the last
request.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c	2005-04-27 22:43:05.000000000 +1000
+++ linux-2.6/fs/buffer.c	2005-05-01 12:44:08.000000000 +1000
@@ -1750,7 +1750,7 @@ static int __block_write_full_page(struc
 	int err;
 	sector_t block;
 	sector_t last_block;
-	struct buffer_head *bh, *head;
+	struct buffer_head *bh, *head, *last_bh = NULL;
 	int nr_underway = 0;
 
 	BUG_ON(!PageLocked(page));
@@ -1808,7 +1808,6 @@ static int __block_write_full_page(struc
 	} while (bh != head);
 
 	do {
-		get_bh(bh);
 		if (!buffer_mapped(bh))
 			continue;
 		/*
@@ -1826,6 +1825,8 @@ static int __block_write_full_page(struc
 		}
 		if (test_clear_buffer_dirty(bh)) {
 			mark_buffer_async_write(bh);
+			get_bh(bh);
+			last_bh = bh;
 		} else {
 			unlock_buffer(bh);
 		}
@@ -1844,10 +1845,13 @@ static int __block_write_full_page(struc
 		if (buffer_async_write(bh)) {
 			submit_bh(WRITE, bh);
 			nr_underway++;
+			put_bh(bh);
+			if (bh == last_bh)
+				break;
 		}
-		put_bh(bh);
 		bh = next;
 	} while (bh != head);
+	bh = head;
 
 	err = 0;
 done:
@@ -1886,10 +1890,11 @@ recover:
 	bh = head;
 	/* Recovery: lock and submit the mapped buffers */
 	do {
-		get_bh(bh);
 		if (buffer_mapped(bh) && buffer_dirty(bh)) {
 			lock_buffer(bh);
 			mark_buffer_async_write(bh);
+			get_bh(bh);
+			last_bh = bh;
 		} else {
 			/*
 			 * The buffer may have been set dirty during
@@ -1908,10 +1913,13 @@ recover:
 			clear_buffer_dirty(bh);
 			submit_bh(WRITE, bh);
 			nr_underway++;
+			put_bh(bh);
+			if (bh == last_bh)
+				break;
 		}
-		put_bh(bh);
 		bh = next;
 	} while (bh != head);
+	bh = head;
 	goto done;
 }
 

  parent reply	other threads:[~2005-05-01  3:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-30 23:43 2.6.12-rc3-mm2 Andrew Morton
2005-05-01  0:27 ` 2.6.12-rc3-mm2 Benoit Boissinot
2005-05-01  0:37   ` 2.6.12-rc3-mm2 Andrew Morton
     [not found]     ` <40f323d00504301753140a7ef4@mail.gmail.com>
2005-05-01  1:12       ` 2.6.12-rc3-mm2 Benoit Boissinot
2005-05-01  2:32 ` 2.6.12-rc3-mm2 - /proc/ide/sr0/model: No such file or directory Jesper Juhl
2005-05-03  3:11   ` Greg KH
2005-05-03  3:18     ` Andrew Morton
2005-05-03  4:48       ` Greg KH
2005-05-03  7:11         ` Bartlomiej Zolnierkiewicz
2005-05-03 18:27           ` Greg KH
2005-05-01  3:30 ` Nick Piggin [this message]
2005-05-01 12:56 ` 2.6.12-rc3-mm2: ACPI problems Rafael J. Wysocki
2005-05-01 13:41   ` Brice Goglin
2005-05-01 15:07 ` 2.6.12-rc3-mm2 - kswapd0 keeps running Damir Perisa
2005-05-01 22:06   ` Andrew Morton
2005-05-02  6:01     ` Damir Perisa
2005-05-02 15:31       ` Damir Perisa
2005-05-02 18:14         ` Andrew Morton
2005-05-02 21:30           ` Damir Perisa
2005-05-06 18:07           ` Damir Perisa
2005-05-04 19:12   ` Cameron Harris
2005-05-04 21:47     ` Damir Perisa
2005-05-01 15:08 ` 2.6.12-rc3-mm2: ppc pte_offset_map() Sean Neakums
2005-05-01 15:50   ` Jesper Juhl
2005-05-01 22:46     ` Andrew Morton
2005-05-01 23:01       ` Jesper Juhl
2005-05-03 22:04         ` cliff white
2005-05-02 10:14       ` Hugh Dickins
2005-05-01 22:29 ` 2.6.12-rc3-mm2: fs/proc/task_mmu.c warnings Adrian Bunk
2005-05-02 22:30   ` Mauricio Lin
2005-05-02 23:35     ` Mauricio Lin
2005-05-02 23:45       ` Andrew Morton
2005-05-02 23:56         ` Mauricio Lin
2005-05-03 20:16           ` cliff white
2005-05-03 22:12             ` Mauricio Lin
2005-05-03 18:02         ` Cliff White
2005-05-02  5:07 ` 2.6.12-rc3-mm2 James Cloos
2005-05-02  5:26   ` 2.6.12-rc3-mm2 Andrew Morton
2005-04-30 21:34     ` 2.6.12-rc3-mm2 Diego Calleja
2005-05-03 16:08     ` 2.6.12-rc3-mm2 Bill Davidsen
2005-05-03 13:37 ` 2.6.12-rc3-mm2 Barry K. Nathan
2005-05-04 15:12   ` 2.6.12-rc3-mm2 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42744D58.7090408@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@suse.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox