Re: async commit & write barrier code

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ric Wheeler <rwheeler@redhat.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Stephen C. Tweedie" <sct@redhat.com>
Subject: Re: async commit & write barrier code
Date: Tue, 23 Sep 2008 19:16:45 -0400	[thread overview]
Message-ID: <48D978DD.8080103@redhat.com> (raw)
In-Reply-To: <20080923223505.GA11346@mit.edu>

Theodore Tso wrote:
> On Tue, Sep 23, 2008 at 03:41:02PM -0500, Eric Sandeen wrote:
>   
>> I agree; with async commit, ext4/jbd2 is running with *no* barrier
>> writes in jbd code. (FWIW, on the fsync front, fsync calls
>> blkdev_issue_flush in ext4 so that part may actually be ok in the end).
>>
>> But at a minimum, I think that for data=ordered, there is now *no*
>> guarantee that the associated file data actually hits disk before the
>> size updates, is there?
>>     
>
> I think the theory behind this was that the journal checksums would
> protect us against misordered writes.  But yes, this means that we
> would effectively have data=writeback, and not data=ordered.  More
> seriously, when I started using my root filesystem with async commit,
> when the system crashed after suspend/resumes, I was seeing filesystem
> corruptions which caused data loss and which required e2fsck to fix.
> I've commented the patch out of the series file for now, until we can
> do some more testing of async commit.
>
> 							- Ted
>   

I think that is definitely the right thing to do at this point. In 
addition to testing, we should try to be very clear on how async 
interacts with barriers, data integrity, etc.

What worries me is how arbitrary the semantics can be given that storage 
devices (without flush or similar operations) can totally reorder IO 
requests. Specific worries include things like medium to large IO's can 
often bypass the write cache entirely, using the write cache itself only 
for small writes. That means that those small writes associated with the 
commit record can stay around for a long time in volatile write cache 
memory and go away on power loss (or suspend to disk!).

What are the basic  assumptions (wish lists?) that we have for ordering 
and persistence of the write sequence for our existing journal code?

Ric

     prev parent reply	other threads:[~2008-09-23 23:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-22 17:20 async commit & write barrier code Ric Wheeler
2008-09-23 20:41 ` Eric Sandeen
2008-09-23 22:35   ` Theodore Tso
2008-09-23 23:16     ` Ric Wheeler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48D978DD.8080103@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.