* async commit & write barrier code
@ 2008-09-22 17:20 Ric Wheeler
2008-09-23 20:41 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Ric Wheeler @ 2008-09-22 17:20 UTC (permalink / raw)
To: linux-ext4@vger.kernel.org
After today's call, I was poking around a bit to try and understand
better how the async commit code plays with the write barrier.
journal_submit_commit_record seems to disable the barriers when async IO
is enabled if I read the code correctly. If this is true, how can we
provide any promises of on disk data integrity after an fsync()?
Regards,
Ric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: async commit & write barrier code
2008-09-22 17:20 async commit & write barrier code Ric Wheeler
@ 2008-09-23 20:41 ` Eric Sandeen
2008-09-23 22:35 ` Theodore Tso
0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2008-09-23 20:41 UTC (permalink / raw)
To: Ric Wheeler; +Cc: linux-ext4@vger.kernel.org
Ric Wheeler wrote:
> After today's call, I was poking around a bit to try and understand
> better how the async commit code plays with the write barrier.
>
> journal_submit_commit_record seems to disable the barriers when async IO
> is enabled if I read the code correctly. If this is true, how can we
> provide any promises of on disk data integrity after an fsync()?
>
> Regards,
>
> Ric
I agree; with async commit, ext4/jbd2 is running with *no* barrier
writes in jbd code. (FWIW, on the fsync front, fsync calls
blkdev_issue_flush in ext4 so that part may actually be ok in the end).
But at a minimum, I think that for data=ordered, there is now *no*
guarantee that the associated file data actually hits disk before the
size updates, is there?
-Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: async commit & write barrier code
2008-09-23 20:41 ` Eric Sandeen
@ 2008-09-23 22:35 ` Theodore Tso
2008-09-23 23:16 ` Ric Wheeler
0 siblings, 1 reply; 4+ messages in thread
From: Theodore Tso @ 2008-09-23 22:35 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Ric Wheeler, linux-ext4@vger.kernel.org
On Tue, Sep 23, 2008 at 03:41:02PM -0500, Eric Sandeen wrote:
>
> I agree; with async commit, ext4/jbd2 is running with *no* barrier
> writes in jbd code. (FWIW, on the fsync front, fsync calls
> blkdev_issue_flush in ext4 so that part may actually be ok in the end).
>
> But at a minimum, I think that for data=ordered, there is now *no*
> guarantee that the associated file data actually hits disk before the
> size updates, is there?
I think the theory behind this was that the journal checksums would
protect us against misordered writes. But yes, this means that we
would effectively have data=writeback, and not data=ordered. More
seriously, when I started using my root filesystem with async commit,
when the system crashed after suspend/resumes, I was seeing filesystem
corruptions which caused data loss and which required e2fsck to fix.
I've commented the patch out of the series file for now, until we can
do some more testing of async commit.
- Ted
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: async commit & write barrier code
2008-09-23 22:35 ` Theodore Tso
@ 2008-09-23 23:16 ` Ric Wheeler
0 siblings, 0 replies; 4+ messages in thread
From: Ric Wheeler @ 2008-09-23 23:16 UTC (permalink / raw)
To: Theodore Tso
Cc: Eric Sandeen, Ric Wheeler, linux-ext4@vger.kernel.org,
Stephen C. Tweedie
Theodore Tso wrote:
> On Tue, Sep 23, 2008 at 03:41:02PM -0500, Eric Sandeen wrote:
>
>> I agree; with async commit, ext4/jbd2 is running with *no* barrier
>> writes in jbd code. (FWIW, on the fsync front, fsync calls
>> blkdev_issue_flush in ext4 so that part may actually be ok in the end).
>>
>> But at a minimum, I think that for data=ordered, there is now *no*
>> guarantee that the associated file data actually hits disk before the
>> size updates, is there?
>>
>
> I think the theory behind this was that the journal checksums would
> protect us against misordered writes. But yes, this means that we
> would effectively have data=writeback, and not data=ordered. More
> seriously, when I started using my root filesystem with async commit,
> when the system crashed after suspend/resumes, I was seeing filesystem
> corruptions which caused data loss and which required e2fsck to fix.
> I've commented the patch out of the series file for now, until we can
> do some more testing of async commit.
>
> - Ted
>
I think that is definitely the right thing to do at this point. In
addition to testing, we should try to be very clear on how async
interacts with barriers, data integrity, etc.
What worries me is how arbitrary the semantics can be given that storage
devices (without flush or similar operations) can totally reorder IO
requests. Specific worries include things like medium to large IO's can
often bypass the write cache entirely, using the write cache itself only
for small writes. That means that those small writes associated with the
commit record can stay around for a long time in volatile write cache
memory and go away on power loss (or suspend to disk!).
What are the basic assumptions (wish lists?) that we have for ordering
and persistence of the write sequence for our existing journal code?
Ric
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-09-23 23:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-22 17:20 async commit & write barrier code Ric Wheeler
2008-09-23 20:41 ` Eric Sandeen
2008-09-23 22:35 ` Theodore Tso
2008-09-23 23:16 ` Ric Wheeler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).