linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Susi <psusi@cfl.rr.com>
To: David Chinner <dgc@sgi.com>
Cc: david@lang.hm, Tejun Heo <htejun@gmail.com>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	dm-devel@redhat.com, Jens Axboe <jens.axboe@oracle.com>,
	linux-fsdevel@vger.kernel.org,
	Andreas Dilger <adilger@clusterfs.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Date: Thu, 31 May 2007 14:24:40 -0400	[thread overview]
Message-ID: <465F12E8.9080203@cfl.rr.com> (raw)
In-Reply-To: <20070531002011.GC85884050@sgi.com>

David Chinner wrote:
>> you are understanding barriers to be the same as syncronous writes. (and 
>> therefor the data is on persistant media before the call returns)
> 
> No, I'm describing the high level behaviour that is expected by
> a filesystem. The reasons for this are below....

You say no, but then you go on to contradict yourself below.

> Ok, that's my understanding of how *device based barriers* can work,
> but there's more to it than that. As far as the filesystem is
> concerned the barrier write needs to *behave* exactly like a sync
> write because of the guarantees the filesystem has to provide
> userspace. Specifically - sync, sync writes and fsync.

There, you just ascribed the synchronous property to barrier requests. 
This is false.  Barriers are about ordering, synchronous writes are 
another thing entirely.  The filesystem is supposed to use barriers to 
maintain ordering for journal data.  If you are trying to handle a 
synchronous write request, that's another flag.

> This is the big problem, right? If we use barriers for commit
> writes, the filesystem can return to userspace after a sync write or
> fsync() and an *ordered barrier device implementation* may not have
> written the blocks to persistent media. If we then pull the plug on
> the box, we've just lost data that sync or fsync said was
> successfully on disk. That's BAD.

That's why for synchronous writes, you set the flag to mark the request 
as synchronous, which has nothing at all to do with barriers.  You are 
trying to use barriers to solve two different problems.  Use one flag to 
indicate ordering, and another to indicate synchronisity.

> Right now a barrier write on the last block of the fsync/sync write
> is sufficient to prevent that because of the FUA on the barrier
> block write. A purely ordered barrier implementation does not
> provide this guarantee.

This is a side effect of the implementation of the barrier, not part of 
the semantics of barriers, so you shouldn't rely on this behavior.  You 
don't have to use FUA to handle the barrier request, and if you don't, 
then the request can be completed while the data is still in the write 
cache.  You just have to make sure to flush it before any subsequent 
requests.

> IOWs, there are two parts to the problem:
> 
> 	1 - guaranteeing I/O ordering
> 	2 - guaranteeing blocks are on persistent storage.
> 
> Right now, a single barrier I/O is used to provide both of these
> guarantees. In most cases, all we really need to provide is 1); the
> need for 2) is a much rarer condition but still needs to be
> provided.

Yep... two problems... two flags.

> Yes, if we define a barrier to only guarantee 1), then yes this
> would be a big win (esp. for XFS). But that requires all filesystems
> to handle sync writes differently, and sync_blockdev() needs to
> call blkdev_issue_flush() as well....
> 
> So, what do we do here? Do we define a barrier I/O to only provide
> ordering, or do we define it to also provide persistent storage
> writeback? Whatever we decide, it needs to be documented....

We do the former or we end up in the same boat as O_DIRECT; where you 
have one flag that means several things, and no way to specify you only 
need some of those and not the others.

  parent reply	other threads:[~2007-05-31 18:24 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-25  7:58 [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Neil Brown
2007-05-25 11:15 ` David Chinner
2007-05-25 11:49   ` Jens Axboe
2007-05-25 14:49     ` Phillip Susi
2007-05-28 18:32       ` [dm-devel] " Jens Axboe
2007-05-25 13:52 ` Stefan Bader
2007-05-28  1:37   ` Neil Brown
2007-05-29  9:12     ` Stefan Bader
2007-05-25 15:11 ` Phillip Susi
2007-05-26  1:03 ` Andreas Dilger
2007-05-26 10:27 ` Tejun Heo
2007-05-28  1:30 ` Neil Brown
2007-05-28  2:45   ` David Chinner
2007-05-28  2:57     ` Neil Brown
2007-05-28  4:29       ` David Chinner
2007-05-31  0:46         ` Neil Brown
2007-05-31  0:57           ` Alasdair G Kergon
2007-05-31  1:07           ` Alasdair G Kergon
2007-05-31  1:11             ` David Chinner
2007-05-28  4:48     ` Timothy Shimmin
2007-05-29  6:45       ` Jeremy Higdon
2007-05-29 20:03     ` Phillip Susi
2007-05-29 23:48       ` David Chinner
2007-05-30  0:01         ` david
2007-05-30  6:17           ` David Chinner
2007-05-30  8:55             ` Stefan Bader
2007-05-30 16:52             ` david
2007-05-31  0:20               ` David Chinner
2007-05-31  6:26                 ` Jens Axboe
2007-05-31  7:03                   ` David Chinner
2007-05-31  7:06                     ` Jens Axboe
2007-05-31 13:30                       ` Bill Davidsen
2007-05-31 13:36                         ` Jens Axboe
2007-06-01 16:04                           ` Bill Davidsen
2007-06-02 14:51                             ` Jens Axboe
2007-06-02 19:55                               ` Bill Davidsen
2007-06-01  3:16                       ` Tejun Heo
2007-06-01  8:21                         ` Jens Axboe
2007-06-02  9:20                           ` Tejun Heo
2007-06-02 14:34                             ` Jens Axboe
2007-06-02 22:57                               ` Guy Watkins
2007-06-04  7:39                               ` Tejun Heo
2007-05-31 18:31                     ` Phillip Susi
2007-05-31 19:00                       ` Jens Axboe
2007-05-31 19:21                         ` david
2007-05-31 19:40                           ` Jens Axboe
2007-05-31 23:34                       ` David Chinner
2007-06-01  5:59                         ` Neil Brown
2007-06-01  6:11                           ` Jens Axboe
2007-06-01  7:53                           ` David Chinner
2007-06-01 23:56                           ` Bill Davidsen
2007-05-31 18:24                 ` Phillip Susi [this message]
2007-05-30 16:45         ` Phillip Susi
2007-05-30 20:27           ` [dm-devel] " Phillip Susi
2007-05-31  6:24             ` Jens Axboe
2007-05-31 18:37               ` [dm-devel] " Phillip Susi
2007-05-31 18:58                 ` Jens Axboe
2007-06-02  0:04                   ` Bill Davidsen
2007-05-28  9:29   ` Tejun Heo
2007-05-28  9:43   ` Alasdair G Kergon
2007-05-29  9:25     ` [dm-devel] " Stefan Bader
2007-05-29 22:05       ` Alasdair G Kergon
2007-05-30  9:12         ` [dm-devel] " Stefan Bader
2007-05-30 10:41           ` Alasdair G Kergon
2007-05-30 16:55           ` Phillip Susi
2007-05-31 11:14             ` [dm-devel] " Stefan Bader
2007-06-01  3:25               ` Tejun Heo
2007-06-01  5:55                 ` david
2007-06-01  7:16                   ` [dm-devel] " Tejun Heo
2007-06-01 17:07                     ` Valdis.Kletnieks
2007-06-01 18:09                       ` Tejun Heo
2007-07-10 18:39                     ` Ric Wheeler
2007-07-10 23:40                       ` Valdis.Kletnieks
2007-07-11  2:49                         ` Tejun Heo
2007-07-11 22:44                         ` Ric Wheeler
2007-07-12 17:34                           ` Valdis.Kletnieks
2007-07-12 19:43                             ` Ric Wheeler
2007-07-12 23:10                             ` Guy Watkins
2007-07-13 11:30                               ` Ric Wheeler
2007-07-11  2:51                       ` Tejun Heo
2007-05-29 19:59   ` Phillip Susi
2007-05-31  0:22     ` Neil Brown
2007-05-30  9:35   ` Jens Axboe
2007-07-05 12:28     ` Tejun Heo
2007-07-09 12:27       ` Jens Axboe
2007-07-18 10:56     ` [PATCH] block: cosmetic changes Tejun Heo
2007-07-18 10:59       ` [PATCH] block: factor out bio_check_eod() Tejun Heo
2007-07-18 11:06         ` Jens Axboe
2007-07-18 11:18           ` Tejun Heo
2007-07-18 11:31             ` Jens Axboe
2007-07-18 11:33               ` Tejun Heo
2007-07-18 11:34                 ` Jens Axboe
2007-07-18 11:41                   ` Tejun Heo
2007-07-18 11:45                     ` Jens Axboe
2007-07-18 11:49                       ` Jens Axboe
2007-07-18 12:34                         ` Tejun Heo
2007-07-18 12:31                       ` Jens Axboe
2007-05-28 11:17 ` [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Nikita Danilov
2007-05-31  3:31   ` Neil Brown
2007-05-28 14:43 ` Bill Davidsen
2007-05-31  0:37   ` Neil Brown
2007-05-31 12:28     ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465F12E8.9080203@cfl.rr.com \
    --to=psusi@cfl.rr.com \
    --cc=adilger@clusterfs.com \
    --cc=david@lang.hm \
    --cc=dgc@sgi.com \
    --cc=dm-devel@redhat.com \
    --cc=htejun@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).