From: Janet Morgan <janetmor@us.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Daniel McNeil <daniel@osdl.org>,
pbadari@us.ibm.com, linux-aio@kvack.org,
linux-kernel@vger.kernel.org, suparna@in.ibm.com
Subject: Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
Date: Wed, 04 Feb 2004 18:54:54 -0800 [thread overview]
Message-ID: <4021B07E.8030700@us.ibm.com> (raw)
In-Reply-To: <20040204180754.28801410.akpm@osdl.org>
Andrew Morton wrote:
>Daniel McNeil <daniel@osdl.org> wrote:
>
>
>>I have found (finally) the problem causing DIO reads racing with
>>buffered writes to see uninitialized data on ext3 file systems
>>(which is what I have been testing on).
>>
>>
>
>What kernel? If -mm, is this the only remaining buffered-vs-direct
>problem?
>
>
>
I think there was consensus on two other patches along the way:
http://marc.theaimsgroup.com/?l=linux-kernel&m=107286971311559&w=2
http://marc.theaimsgroup.com/?l=linux-aio&m=107291089712224&w=2
-Janet
>>The problem is caused by the changes to __block_write_page_full()
>>and a race with journaling:
>>
>>journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>>
>>ll_rw_block() locks the buffer, clears buffer dirty and calls
>>submit_bh()
>>
>>A racing __block_write_full_page() (from ext3_ordered_writepage())
>>
>> would see that buffer_dirty() is not set because the i/o
>> is still in flight, so it would not do a bh_submit()
>>
>> It would SetPageWriteback() and unlock_page() and then
>> see that no i/o was submitted and call end_page_writeback()
>> (with the i/o still in flight).
>>
>>This would allow the DIO code to issue the DIO read while buffer writes
>>are still in flight. The i/o can be reordered by i/o scheduling and
>>the DIO can complete BEFORE the writebacks complete. Thus the DIO
>>sees the old uninitialized data.
>>
>>
>
>ow. How'd you work this out?
>
>
>
>>Here is a quick hack that fixes it, but I am not sure if this the
>>proper long term fix.
>>
>>
>
>The problem is that ext3 and the VFS are using different paradigms. VFS is
>all page-based, but ext3 is all block-based. One or the other needs to do
>something nasty.
>
>One approach would be to change the JBD write_out_data_locked loop to use
>block_write_full_page(bh->b_page) instead of buffer_head operations. But
>that could get hairy with blocksize < PAGE_SIZE.
>
>Thanks for working this out. Let me ponder it for a bit.
>
>
>
next prev parent reply other threads:[~2004-02-05 2:56 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3FCD4B66.8090905@us.ibm.com>
2003-12-06 1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
2003-12-08 18:23 ` Daniel McNeil
2003-12-12 0:51 ` Daniel McNeil
2003-12-17 1:25 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-17 2:03 ` Andrew Morton
2003-12-17 19:25 ` Daniel McNeil
2003-12-17 20:17 ` Janet Morgan
2003-12-31 9:18 ` Suparna Bhattacharya
2003-12-31 9:35 ` Andrew Morton
2003-12-31 9:55 ` Suparna Bhattacharya
2003-12-31 9:59 ` Andrew Morton
2003-12-31 10:09 ` Suparna Bhattacharya
2003-12-31 10:10 ` Andrew Morton
2003-12-31 10:48 ` Suparna Bhattacharya
2003-12-31 10:53 ` Andrew Morton
2003-12-31 10:54 ` Andrew Morton
2003-12-31 11:17 ` Andrew Morton
2003-12-31 22:34 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-31 22:41 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
2003-12-31 23:46 ` Andrew Morton
2004-01-02 5:14 ` Suparna Bhattacharya
2004-01-02 7:46 ` Andrew Morton
2004-01-05 3:55 ` Suparna Bhattacharya
2004-01-05 5:06 ` Andrew Morton
2004-01-05 5:28 ` Suparna Bhattacharya
2004-01-05 5:28 ` Andrew Morton
2004-01-05 6:06 ` Suparna Bhattacharya
2004-01-05 6:14 ` Lincoln Dale
2003-12-31 22:47 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
2003-12-31 23:42 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
2004-01-02 4:20 ` Suparna Bhattacharya
2004-01-02 4:36 ` Andrew Morton
2004-01-02 5:50 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
2004-01-02 7:31 ` Andrew Morton
2004-01-05 13:49 ` Marcelo Tosatti
2004-01-05 20:27 ` Andrew Morton
2004-03-29 15:44 ` Marcelo Tosatti
2004-01-11 23:14 ` Janet Morgan
2004-01-11 23:44 ` Andrew Morton
2004-01-12 18:00 ` filemap_fdatawait.patch Daniel McNeil
2004-01-12 19:39 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
2004-01-12 19:46 ` Daniel McNeil
2004-01-13 4:12 ` Janet Morgan
2003-12-30 4:53 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
2003-12-31 0:29 ` Daniel McNeil
2003-12-31 6:09 ` Suparna Bhattacharya
2004-01-08 23:55 ` Daniel McNeil
2004-01-09 3:55 ` Suparna Bhattacharya
2004-02-05 1:39 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
2004-02-05 1:54 ` Badari Pulavarty
2004-02-05 2:07 ` Andrew Morton
2004-02-05 2:54 ` Janet Morgan [this message]
2004-02-05 3:19 ` Andrew Morton
2004-02-05 3:43 ` Suparna Bhattacharya
2004-02-05 5:33 ` Andrew Morton
2004-02-05 17:52 ` Daniel McNeil
2004-02-05 18:53 ` Badari Pulavarty
2004-03-29 15:41 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4021B07E.8030700@us.ibm.com \
--to=janetmor@us.ibm.com \
--cc=akpm@osdl.org \
--cc=daniel@osdl.org \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=suparna@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.