From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>,
Chris Mason <mason@suse.com>, Linus Torvalds <torvalds@osdl.org>
Subject: [patch] alternative fix for VFS race (was Re: 2.6.12-rc3-mm2)
Date: Sun, 01 May 2005 13:30:32 +1000 [thread overview]
Message-ID: <42744D58.7090408@yahoo.com.au> (raw)
In-Reply-To: <20050430164303.6538f47c.akpm@osdl.org>
[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]
Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc3/2.6.12-rc3-mm2/
>
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc3/2.6.12-rc3-mm2/broken-out/fix-race-in-block_write_full_page.patch
While this patch does fix the problem, I would like to propose the
following attached patch instead, which is a minimal fix for the
specific race identified.
I have the following concerns about extending the lock page coverage:
Extending lock_page coverage 1) doesn't appear to protect from any other
races; 2) doesn't seem to be how the rest of the kernel submits asynch
writes; 3) isn't how this path used to do locking; and 4) can hold the
page lock for a long time while a request slot and memory is allocated.
What's more, if there *is* a good reason to extend lock page coverage,
then that should probably be sumbmitted as a seperate changeset on top
of this minimal patch, with a seperate rationale. It would help future
work on this code identify why the locking is the way it is.
Thanks,
Nick
--
SUSE Labs, Novell Inc.
[-- Attachment #2: __block_write_full_page-bug.patch --]
[-- Type: text/plain, Size: 2984 bytes --]
When running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write
It seems that more than one request would be submitted for a given bh
at a time.
What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.
So change __block_write_full_page to explicitly keep track of the last
bh we need to issue, so we don't touch anything after issuing the last
request.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c 2005-04-27 22:43:05.000000000 +1000
+++ linux-2.6/fs/buffer.c 2005-05-01 12:44:08.000000000 +1000
@@ -1750,7 +1750,7 @@ static int __block_write_full_page(struc
int err;
sector_t block;
sector_t last_block;
- struct buffer_head *bh, *head;
+ struct buffer_head *bh, *head, *last_bh = NULL;
int nr_underway = 0;
BUG_ON(!PageLocked(page));
@@ -1808,7 +1808,6 @@ static int __block_write_full_page(struc
} while (bh != head);
do {
- get_bh(bh);
if (!buffer_mapped(bh))
continue;
/*
@@ -1826,6 +1825,8 @@ static int __block_write_full_page(struc
}
if (test_clear_buffer_dirty(bh)) {
mark_buffer_async_write(bh);
+ get_bh(bh);
+ last_bh = bh;
} else {
unlock_buffer(bh);
}
@@ -1844,10 +1845,13 @@ static int __block_write_full_page(struc
if (buffer_async_write(bh)) {
submit_bh(WRITE, bh);
nr_underway++;
+ put_bh(bh);
+ if (bh == last_bh)
+ break;
}
- put_bh(bh);
bh = next;
} while (bh != head);
+ bh = head;
err = 0;
done:
@@ -1886,10 +1890,11 @@ recover:
bh = head;
/* Recovery: lock and submit the mapped buffers */
do {
- get_bh(bh);
if (buffer_mapped(bh) && buffer_dirty(bh)) {
lock_buffer(bh);
mark_buffer_async_write(bh);
+ get_bh(bh);
+ last_bh = bh;
} else {
/*
* The buffer may have been set dirty during
@@ -1908,10 +1913,13 @@ recover:
clear_buffer_dirty(bh);
submit_bh(WRITE, bh);
nr_underway++;
+ put_bh(bh);
+ if (bh == last_bh)
+ break;
}
- put_bh(bh);
bh = next;
} while (bh != head);
+ bh = head;
goto done;
}
next prev parent reply other threads:[~2005-05-01 3:31 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-30 23:43 2.6.12-rc3-mm2 Andrew Morton
2005-05-01 0:27 ` 2.6.12-rc3-mm2 Benoit Boissinot
2005-05-01 0:37 ` 2.6.12-rc3-mm2 Andrew Morton
[not found] ` <40f323d00504301753140a7ef4@mail.gmail.com>
2005-05-01 1:12 ` 2.6.12-rc3-mm2 Benoit Boissinot
2005-05-01 2:32 ` 2.6.12-rc3-mm2 - /proc/ide/sr0/model: No such file or directory Jesper Juhl
2005-05-03 3:11 ` Greg KH
2005-05-03 3:18 ` Andrew Morton
2005-05-03 4:48 ` Greg KH
2005-05-03 7:11 ` Bartlomiej Zolnierkiewicz
2005-05-03 18:27 ` Greg KH
2005-05-01 3:30 ` Nick Piggin [this message]
[not found] ` <20050430164303.6538f47c.akpm-3NddpPZAyC0@public.gmane.org>
2005-05-01 12:56 ` 2.6.12-rc3-mm2: ACPI problems Rafael J. Wysocki
2005-05-01 12:56 ` Rafael J. Wysocki
[not found] ` <200505011456.38744.rjw-KKrjLPT3xs0@public.gmane.org>
2005-05-01 13:41 ` Brice Goglin
2005-05-01 13:41 ` Brice Goglin
[not found] ` <4274DC95.6080208-vYW+cPY1g1pg9hUCZPvPmw@public.gmane.org>
2005-05-26 7:08 ` Andrew Morton
[not found] ` <20050526000800.66c42b6b.akpm-3NddpPZAyC0@public.gmane.org>
2005-05-26 7:22 ` Brice Goglin
2005-05-01 15:07 ` 2.6.12-rc3-mm2 - kswapd0 keeps running Damir Perisa
2005-05-01 22:06 ` Andrew Morton
2005-05-02 6:01 ` Damir Perisa
2005-05-02 15:31 ` Damir Perisa
2005-05-02 18:14 ` Andrew Morton
2005-05-02 21:30 ` Damir Perisa
2005-05-06 18:07 ` Damir Perisa
2005-05-04 19:12 ` Cameron Harris
2005-05-04 21:47 ` Damir Perisa
2005-05-01 15:08 ` 2.6.12-rc3-mm2: ppc pte_offset_map() Sean Neakums
2005-05-01 15:08 ` Sean Neakums
2005-05-01 15:50 ` Jesper Juhl
2005-05-01 15:50 ` Jesper Juhl
2005-05-01 22:46 ` Andrew Morton
2005-05-01 22:46 ` Andrew Morton
2005-05-01 23:01 ` Jesper Juhl
2005-05-01 23:01 ` Jesper Juhl
2005-05-03 22:04 ` cliff white
2005-05-03 22:04 ` cliff white
2005-05-02 10:14 ` Hugh Dickins
2005-05-02 10:14 ` Hugh Dickins
2005-05-01 22:29 ` 2.6.12-rc3-mm2: fs/proc/task_mmu.c warnings Adrian Bunk
2005-05-02 22:30 ` Mauricio Lin
2005-05-02 23:35 ` Mauricio Lin
2005-05-02 23:45 ` Andrew Morton
2005-05-02 23:56 ` Mauricio Lin
2005-05-03 20:16 ` cliff white
2005-05-03 22:12 ` Mauricio Lin
2005-05-03 18:02 ` Cliff White
2005-05-02 5:07 ` 2.6.12-rc3-mm2 James Cloos
2005-05-02 5:26 ` 2.6.12-rc3-mm2 Andrew Morton
2005-04-30 21:34 ` 2.6.12-rc3-mm2 Diego Calleja
2005-05-03 16:08 ` 2.6.12-rc3-mm2 Bill Davidsen
2005-05-03 13:37 ` 2.6.12-rc3-mm2 Barry K. Nathan
2005-05-04 15:12 ` 2.6.12-rc3-mm2 Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42744D58.7090408@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mason@suse.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.