* [patch] fix the 2nd buffer race properly
@ 2005-04-27 13:15 Nick Piggin
2005-04-27 13:20 ` Nick Piggin
2005-04-27 17:56 ` Andrew Morton
0 siblings, 2 replies; 6+ messages in thread
From: Nick Piggin @ 2005-04-27 13:15 UTC (permalink / raw)
To: Andrew Morton, Andrea Arcangeli, linux-kernel; +Cc: Chris Mason
[-- Attachment #1: Type: text/plain, Size: 729 bytes --]
OK, so I found the exact cause of the 2nd buffer problem.
Surprisingly, the first patch I sent was exactly what is
needed. Surprising because I didn't have a full handle on
the problem so perhaps I got a bit lucky.
The bug (the reason I asked you to drop the patch just now)
was that the code previously did a get_bh on all bh's in a
page, but I changed it to only put_bh the ones to be written.
The minor fix for that was to only get_bh the buffer heads to
be written.
Exact problem is described in the patch changelog. Anyone who
is feeling brave please review because I'm tired and have a
headache from too much kernel debugging ;)
Tested and seems to work. Doesn't seem to leak memory.
Nick
--
SUSE Labs, Novell Inc.
[-- Attachment #2: __block_write_full_page-bug.patch --]
[-- Type: text/plain, Size: 4304 bytes --]
When running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write
It seems that more than one request would be submitted for a given bh
at a time.
What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.
So change __block_write_full_page to explicitly keep track of requests
rather than relying on testing all of them for buffer_async_write - because
by the time we submit the last buffer we have marked async_write, we no
longer own *any* of the buffers.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c 2005-04-27 22:43:05.000000000 +1000
+++ linux-2.6/fs/buffer.c 2005-04-27 22:45:03.000000000 +1000
@@ -1750,8 +1750,9 @@ static int __block_write_full_page(struc
int err;
sector_t block;
sector_t last_block;
- struct buffer_head *bh, *head;
- int nr_underway = 0;
+ struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+ int idx = 0;
+ int nr_underway;
BUG_ON(!PageLocked(page));
@@ -1808,7 +1809,6 @@ static int __block_write_full_page(struc
} while (bh != head);
do {
- get_bh(bh);
if (!buffer_mapped(bh))
continue;
/*
@@ -1826,6 +1826,8 @@ static int __block_write_full_page(struc
}
if (test_clear_buffer_dirty(bh)) {
mark_buffer_async_write(bh);
+ get_bh(bh);
+ arr[idx++] = bh;
} else {
unlock_buffer(bh);
}
@@ -1839,15 +1841,12 @@ static int __block_write_full_page(struc
set_page_writeback(page);
unlock_page(page);
- do {
- struct buffer_head *next = bh->b_this_page;
- if (buffer_async_write(bh)) {
- submit_bh(WRITE, bh);
- nr_underway++;
- }
+ for (nr_underway = 0; nr_underway < idx; nr_underway++) {
+ bh = arr[nr_underway];
+ BUG_ON(!buffer_async_write(bh));
+ submit_bh(WRITE, bh);
put_bh(bh);
- bh = next;
- } while (bh != head);
+ }
err = 0;
done:
@@ -1886,10 +1885,11 @@ recover:
bh = head;
/* Recovery: lock and submit the mapped buffers */
do {
- get_bh(bh);
if (buffer_mapped(bh) && buffer_dirty(bh)) {
lock_buffer(bh);
mark_buffer_async_write(bh);
+ get_bh(bh);
+ arr[idx++] = bh;
} else {
/*
* The buffer may have been set dirty during
@@ -1902,16 +1902,13 @@ recover:
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
- do {
- struct buffer_head *next = bh->b_this_page;
- if (buffer_async_write(bh)) {
- clear_buffer_dirty(bh);
- submit_bh(WRITE, bh);
- nr_underway++;
- }
+ for (nr_underway = 0; nr_underway < idx; nr_underway++) {
+ bh = arr[nr_underway];
+ BUG_ON(!buffer_async_write(bh));
+ clear_buffer_dirty(bh);
+ submit_bh(WRITE, bh);
put_bh(bh);
- bh = next;
- } while (bh != head);
+ }
goto done;
}
@@ -2741,6 +2738,7 @@ sector_t generic_block_bmap(struct addre
static int end_bio_bh_io_sync(struct bio *bio, unsigned int bytes_done, int err)
{
struct buffer_head *bh = bio->bi_private;
+ bh_end_io_t *end_fn;
if (bio->bi_size)
return 1;
@@ -2750,7 +2748,16 @@ static int end_bio_bh_io_sync(struct bio
set_bit(BH_Eopnotsupp, &bh->b_state);
}
- bh->b_end_io(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
+ end_fn = bh->b_end_io;
+
+ /*
+ * These two lines are debugging only - make sure b_end_io
+ * isn't run twice for the same io request.
+ */
+ BUG_ON(!end_fn);
+ bh->b_end_io = NULL;
+
+ end_fn(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
bio_put(bio);
return 0;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] fix the 2nd buffer race properly
2005-04-27 13:15 [patch] fix the 2nd buffer race properly Nick Piggin
@ 2005-04-27 13:20 ` Nick Piggin
2005-04-27 17:56 ` Andrew Morton
1 sibling, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2005-04-27 13:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, Andrea Arcangeli, linux-kernel, Chris Mason
Nick Piggin wrote:
> The bug (the reason I asked you to drop the patch just now)
> was that the code previously did a get_bh on all bh's in a
> page, but I changed it to only put_bh the ones to be written.
>
> The minor fix for that was to only get_bh the buffer heads to
> be written.
>
Err, that wasn't very clear: my earlier patch to fix the problem
introduced the above bh leak, but was otherwise correct in that
it solved the underlying problem.
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] fix the 2nd buffer race properly
2005-04-27 13:15 [patch] fix the 2nd buffer race properly Nick Piggin
2005-04-27 13:20 ` Nick Piggin
@ 2005-04-27 17:56 ` Andrew Morton
2005-04-28 0:08 ` Nick Piggin
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2005-04-27 17:56 UTC (permalink / raw)
To: Nick Piggin; +Cc: andrea, linux-kernel, mason
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> When running
> fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
> on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
> page size over loopback to an image file on a tmpfs filesystem, I would
> very quickly hit
> BUG_ON(!buffer_async_write(bh));
> in fs/buffer.c:end_buffer_async_write
>
> It seems that more than one request would be submitted for a given bh
> at a time.
>
> What would happen is the following:
> 2 threads doing __mpage_writepages on the same page.
> Thread 1 - lock the page first, and enter __block_write_full_page.
> Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
> Thread 1 - set page writeback, unlock page.
> Thread 2 - lock page, wait on page writeback
> Thread 1 - submit_bh on the first 2 buffers.
> => both requests complete, none of the page buffers are async_write,
> end_page_writeback is called.
> Thread 2 - wakes up. enters __block_write_full_page.
> Thread 2 - mark_buffer_async_write on (eg.) the last buffer
> Thread 1 - finds the last buffer has async_write set, submit_bh on that.
> Thread 2 - submit_bh on the last buffer.
> => oops.
ah-hah. Thanks.
There are two situations:
a) Thread 2 comes in and tries to write a buffer which thread1 didn't write:
Yes, thread 1 will get confused and will try to write thread 2's buffer.
b) Thread 2 comes in and tries to write a buffer which thread 1 is
writing. (Say, the buffer was redirtied by
munmap->__set_page_dirty_buffers, which doesn't lock the page or the
buffers)
Thread 2 will fail the test_set_buffer_locked() and will redirty the page.
That's all a bit too complex. How's about this instead?
--- 25/fs/buffer.c~fix-race-in-block_write_full_page 2005-04-27 10:42:11.191956704 -0700
+++ 25-akpm/fs/buffer.c 2005-04-27 10:42:56.548061528 -0700
@@ -1837,7 +1837,6 @@ static int __block_write_full_page(struc
*/
BUG_ON(PageWriteback(page));
set_page_writeback(page);
- unlock_page(page);
do {
struct buffer_head *next = bh->b_this_page;
@@ -1848,6 +1847,7 @@ static int __block_write_full_page(struc
put_bh(bh);
bh = next;
} while (bh != head);
+ unlock_page(page);
err = 0;
done:
@@ -1901,7 +1901,6 @@ recover:
SetPageError(page);
BUG_ON(PageWriteback(page));
set_page_writeback(page);
- unlock_page(page);
do {
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
@@ -1912,6 +1911,7 @@ recover:
put_bh(bh);
bh = next;
} while (bh != head);
+ unlock_page(page);
goto done;
}
_
Aside: can the redirty_page_for_writepage() ever happen any more? Can a
buffer against a locked page be locked by some other actor? I guess so -
kjournald in ordered mode might be trying to write the buffer as well,
perhaps...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] fix the 2nd buffer race properly
2005-04-27 17:56 ` Andrew Morton
@ 2005-04-28 0:08 ` Nick Piggin
2005-04-28 1:00 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2005-04-28 0:08 UTC (permalink / raw)
To: Andrew Morton; +Cc: andrea, linux-kernel, mason
Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>>When running
>> fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
>> on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
>> page size over loopback to an image file on a tmpfs filesystem, I would
>> very quickly hit
>> BUG_ON(!buffer_async_write(bh));
>> in fs/buffer.c:end_buffer_async_write
>>
>> It seems that more than one request would be submitted for a given bh
>> at a time.
>>
>> What would happen is the following:
>> 2 threads doing __mpage_writepages on the same page.
>> Thread 1 - lock the page first, and enter __block_write_full_page.
>> Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
>> Thread 1 - set page writeback, unlock page.
>> Thread 2 - lock page, wait on page writeback
>> Thread 1 - submit_bh on the first 2 buffers.
>> => both requests complete, none of the page buffers are async_write,
>> end_page_writeback is called.
>> Thread 2 - wakes up. enters __block_write_full_page.
>> Thread 2 - mark_buffer_async_write on (eg.) the last buffer
>> Thread 1 - finds the last buffer has async_write set, submit_bh on that.
>> Thread 2 - submit_bh on the last buffer.
>> => oops.
>
>
> ah-hah. Thanks.
>
> There are two situations:
>
> a) Thread 2 comes in and tries to write a buffer which thread1 didn't write:
>
> Yes, thread 1 will get confused and will try to write thread 2's buffer.
>
> b) Thread 2 comes in and tries to write a buffer which thread 1 is
> writing. (Say, the buffer was redirtied by
> munmap->__set_page_dirty_buffers, which doesn't lock the page or the
> buffers)
>
I don't think b) happens, because thread 1 has to have finished all
its writes before the page will end writeback and thread 2 can go
anywhere.
> Thread 2 will fail the test_set_buffer_locked() and will redirty the page.
>
> That's all a bit too complex. How's about this instead?
>
Well you really don't need to hold the page locked for that long.
block_read_full_page, nobh_prepare_write both use the same sort of
array of buffer heads logic - I think it makes sense not to touch
any buffers after submitting them all for IO...?
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] fix the 2nd buffer race properly
2005-04-28 0:08 ` Nick Piggin
@ 2005-04-28 1:00 ` Andrew Morton
2005-04-28 1:29 ` Nick Piggin
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2005-04-28 1:00 UTC (permalink / raw)
To: Nick Piggin; +Cc: andrea, linux-kernel, mason
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> > There are two situations:
> >
> > a) Thread 2 comes in and tries to write a buffer which thread1 didn't write:
> >
> > Yes, thread 1 will get confused and will try to write thread 2's buffer.
> >
> > b) Thread 2 comes in and tries to write a buffer which thread 1 is
> > writing. (Say, the buffer was redirtied by
> > munmap->__set_page_dirty_buffers, which doesn't lock the page or the
> > buffers)
> >
>
> I don't think b) happens, because thread 1 has to have finished all
> its writes before the page will end writeback and thread 2 can go
> anywhere.
hm, spose so.
> > Thread 2 will fail the test_set_buffer_locked() and will redirty the page.
> >
> > That's all a bit too complex. How's about this instead?
> >
>
> Well you really don't need to hold the page locked for that long.
Is a rare case, so there's no perfomance issue here.
I do prefer the idea of simply keeping other threads of control out of the
page until this thread has finished playing with its buffers.
(The buffer-ring walk we have in there is racy against page reclaim, too.
If only the first buffer is dirty, we inspect the other buffers after
PageWriteback has potentially cleared.)
> block_read_full_page, nobh_prepare_write both use the same sort of
> array of buffer heads logic - I think it makes sense not to touch
> any buffers after submitting them all for IO...?
Well. Most code in there uses the ->b_this_page walk.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch] fix the 2nd buffer race properly
2005-04-28 1:00 ` Andrew Morton
@ 2005-04-28 1:29 ` Nick Piggin
0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2005-04-28 1:29 UTC (permalink / raw)
To: Andrew Morton; +Cc: andrea, linux-kernel, mason
Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> > That's all a bit too complex. How's about this instead?
>> >
>>
>> Well you really don't need to hold the page locked for that long.
>
>
> Is a rare case, so there's no perfomance issue here.
>
Well it is for buffered writes, right? And in general it will be OK,
but if you run into queue/memory congestion when submitting the IO,
it will be locked for a lot longer than required.
> I do prefer the idea of simply keeping other threads of control out of the
> page until this thread has finished playing with its buffers.
>
That's exactly what my patch does too!
> (The buffer-ring walk we have in there is racy against page reclaim, too.
> If only the first buffer is dirty, we inspect the other buffers after
> PageWriteback has potentially cleared.)
>
Well we do have a reference on the buffers, so in this particular case
perhaps not. But we have no mutual exclusion on the page or buffers
so I agree it could be racy against a lot of things.
>
>> block_read_full_page, nobh_prepare_write both use the same sort of
>> array of buffer heads logic - I think it makes sense not to touch
>> any buffers after submitting them all for IO...?
>
>
> Well. Most code in there uses the ->b_this_page walk.
>
block_read_full_page does the walk in order to gather up the buffers.
They then get submitted for IO via the buffer head array walk.
I prefer my patch. I don't think it is particularly complex.
--
SUSE Labs, Novell Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-04-28 1:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-27 13:15 [patch] fix the 2nd buffer race properly Nick Piggin
2005-04-27 13:20 ` Nick Piggin
2005-04-27 17:56 ` Andrew Morton
2005-04-28 0:08 ` Nick Piggin
2005-04-28 1:00 ` Andrew Morton
2005-04-28 1:29 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox