linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: david@lang.hm
To: David Chinner <dgc@sgi.com>
Cc: Tejun Heo <htejun@gmail.com>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	dm-devel@redhat.com, Jens Axboe <jens.axboe@oracle.com>,
	linux-fsdevel@vger.kernel.org,
	Andreas Dilger <adilger@clusterfs.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Date: Wed, 30 May 2007 09:52:49 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0705300946220.8246@asgard.lang.hm> (raw)
In-Reply-To: <20070530061723.GY85884050@sgi.com>

On Wed, 30 May 2007, David Chinner wrote:

> On Tue, May 29, 2007 at 05:01:24PM -0700, david@lang.hm wrote:
>> On Wed, 30 May 2007, David Chinner wrote:
>>
>>> On Tue, May 29, 2007 at 04:03:43PM -0400, Phillip Susi wrote:
>>>> David Chinner wrote:
>>>>> The use of barriers in XFS assumes the commit write to be on stable
>>>>> storage before it returns.  One of the ordering guarantees that we
>>>>> need is that the transaction (commit write) is on disk before the
>>>>> metadata block containing the change in the transaction is written
>>>>> to disk and the current barrier behaviour gives us that.
>>>>
>>>> Barrier != synchronous write,
>>>
>>> Of course. FYI, XFS only issues barriers on *async* writes.
>>>
>>> But barrier semantics - as far as they've been described by everyone
>>> but you indicate that the barrier write is guaranteed to be on stable
>>> storage when it returns.
>>
>> this doesn't match what I have seen
>>
>> wtih barriers it's perfectly legal to have the following sequence of
>> events
>>
>> 1. app writes block 10 to OS
>> 2. app writes block 4 to OS
>> 3. app writes barrier to OS
>> 4. app writes block 5 to OS
>> 5. app writes block 20 to OS
>
> hmmmmm - applications can't issue barriers to the filesystem.
> However, if you consider the barrier to be an "fsync()" for example,
> then it's still the filesystem that is issuing the barrier and
> there's a block that needs to be written that is associated with
> that barrier (either an inode or a transaction commit) that needs to
> be on stable storage before the filesystem returns to userspace.
>
>> 6. OS writes block 4 to disk drive
>> 7. OS writes block 10 to disk drive
>> 8. OS writes barrier to disk drive
>> 9. OS writes block 5 to disk drive
>> 10. OS writes block 20 to disk drive
>
> Replace OS with filesystem, and combine 7+8 together - we don't have
> zero-length barriers and hence they are *always* associated with a
> write to a certain block on disk. i.e.:
>
> 1. FS writes block 4 to disk drive
> 2. FS writes block 10 to disk drive
> 3. FS writes *barrier* block X to disk drive
> 4. FS writes block 5 to disk drive
> 5. FS writes block 20 to disk drive
>
> The order that these are expected by the filesystem to hit stable
> storage are:
>
> 1. block 4 and 10 on stable storage in any order
> 2. barrier block X on stable storage
> 3. block 5 and 20 on stable storage in any order
>
> The point I'm trying to make is that in XFS,  block 5 and 20 cannot
> be allowed to hit the disk before the barrier block because they
> have strict order dependency on block X being stable before them,
> just like block X has strict order dependency that block 4 and 10
> must be stable before we start the barrier block write.
>
>> 11. disk drive writes block 10 to platter
>> 12. disk drive writes block 4 to platter
>> 13. disk drive writes block 20 to platter
>> 14. disk drive writes block 5 to platter
>
>> if the disk drive doesn't support barriers then step #8 becomes 'issue
>> flush' and steps 11 and 12 take place before step #9, 13, 14
>
> No, you need a flush on either side of the block X write to maintain
> the same semantics as barrier writes currently have.
>
> We have filesystems that require barriers to prevent reordering of
> writes in both directions and to ensure that the block associated
> with the barrier is on stable storage when I/o completion is
> signalled.  The existing barrier implementation (where it works)
> provide these requirements. We need barriers to retain these
> semantics, otherwise we'll still have to do special stuff in
> the filesystems to get the semantics that we need.

one of us is misunderstanding barriers here.

you are understanding barriers to be the same as syncronous writes. (and 
therefor the data is on persistant media before the call returns)

I am understanding barriers to only indicate ordering requirements. things 
before the barrier can be reordered freely, things after the barrier can 
be reordered freely, but things cannot be reordered across the barrier.

if I am understanding it correctly, the big win for barriers is that you 
do NOT have to stop and wait until the data is on persistant media before 
you can continue.

in the past barriers have not been fully implmented in most cases, and as 
a result they have been simulated by forcing a full flush of the buffers 
to persistant media before any other writes are allowed. This has made 
them _in practice_ operate the same way as syncronous writes (matching 
your understanding), but the current thread is talking about fixing the 
implementation to the official symantics for all hardware that can 
actually support barriers (and fix it at the OS level)

David Lang

  parent reply	other threads:[~2007-05-30 16:52 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-25  7:58 [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Neil Brown
2007-05-25 11:15 ` David Chinner
2007-05-25 11:49   ` Jens Axboe
2007-05-25 14:49     ` Phillip Susi
2007-05-28 18:32       ` [dm-devel] " Jens Axboe
2007-05-25 13:52 ` Stefan Bader
2007-05-28  1:37   ` Neil Brown
2007-05-29  9:12     ` Stefan Bader
2007-05-25 15:11 ` Phillip Susi
2007-05-26  1:03 ` Andreas Dilger
2007-05-26 10:27 ` Tejun Heo
2007-05-28  1:30 ` Neil Brown
2007-05-28  2:45   ` David Chinner
2007-05-28  2:57     ` Neil Brown
2007-05-28  4:29       ` David Chinner
2007-05-31  0:46         ` Neil Brown
2007-05-31  0:57           ` Alasdair G Kergon
2007-05-31  1:07           ` Alasdair G Kergon
2007-05-31  1:11             ` David Chinner
2007-05-28  4:48     ` Timothy Shimmin
2007-05-29  6:45       ` Jeremy Higdon
2007-05-29 20:03     ` Phillip Susi
2007-05-29 23:48       ` David Chinner
2007-05-30  0:01         ` david
2007-05-30  6:17           ` David Chinner
2007-05-30  8:55             ` Stefan Bader
2007-05-30 16:52             ` david [this message]
2007-05-31  0:20               ` David Chinner
2007-05-31  6:26                 ` Jens Axboe
2007-05-31  7:03                   ` David Chinner
2007-05-31  7:06                     ` Jens Axboe
2007-05-31 13:30                       ` Bill Davidsen
2007-05-31 13:36                         ` Jens Axboe
2007-06-01 16:04                           ` Bill Davidsen
2007-06-02 14:51                             ` Jens Axboe
2007-06-02 19:55                               ` Bill Davidsen
2007-06-01  3:16                       ` Tejun Heo
2007-06-01  8:21                         ` Jens Axboe
2007-06-02  9:20                           ` Tejun Heo
2007-06-02 14:34                             ` Jens Axboe
2007-06-02 22:57                               ` Guy Watkins
2007-06-04  7:39                               ` Tejun Heo
2007-05-31 18:31                     ` Phillip Susi
2007-05-31 19:00                       ` Jens Axboe
2007-05-31 19:21                         ` david
2007-05-31 19:40                           ` Jens Axboe
2007-05-31 23:34                       ` David Chinner
2007-06-01  5:59                         ` Neil Brown
2007-06-01  6:11                           ` Jens Axboe
2007-06-01  7:53                           ` David Chinner
2007-06-01 23:56                           ` Bill Davidsen
2007-05-31 18:24                 ` Phillip Susi
2007-05-30 16:45         ` Phillip Susi
2007-05-30 20:27           ` [dm-devel] " Phillip Susi
2007-05-31  6:24             ` Jens Axboe
2007-05-31 18:37               ` [dm-devel] " Phillip Susi
2007-05-31 18:58                 ` Jens Axboe
2007-06-02  0:04                   ` Bill Davidsen
2007-05-28  9:29   ` Tejun Heo
2007-05-28  9:43   ` Alasdair G Kergon
2007-05-29  9:25     ` [dm-devel] " Stefan Bader
2007-05-29 22:05       ` Alasdair G Kergon
2007-05-30  9:12         ` [dm-devel] " Stefan Bader
2007-05-30 10:41           ` Alasdair G Kergon
2007-05-30 16:55           ` Phillip Susi
2007-05-31 11:14             ` [dm-devel] " Stefan Bader
2007-06-01  3:25               ` Tejun Heo
2007-06-01  5:55                 ` david
2007-06-01  7:16                   ` [dm-devel] " Tejun Heo
2007-06-01 17:07                     ` Valdis.Kletnieks
2007-06-01 18:09                       ` Tejun Heo
2007-07-10 18:39                     ` Ric Wheeler
2007-07-10 23:40                       ` Valdis.Kletnieks
2007-07-11  2:49                         ` Tejun Heo
2007-07-11 22:44                         ` Ric Wheeler
2007-07-12 17:34                           ` Valdis.Kletnieks
2007-07-12 19:43                             ` Ric Wheeler
2007-07-12 23:10                             ` Guy Watkins
2007-07-13 11:30                               ` Ric Wheeler
2007-07-11  2:51                       ` Tejun Heo
2007-05-29 19:59   ` Phillip Susi
2007-05-31  0:22     ` Neil Brown
2007-05-30  9:35   ` Jens Axboe
2007-07-05 12:28     ` Tejun Heo
2007-07-09 12:27       ` Jens Axboe
2007-07-18 10:56     ` [PATCH] block: cosmetic changes Tejun Heo
2007-07-18 10:59       ` [PATCH] block: factor out bio_check_eod() Tejun Heo
2007-07-18 11:06         ` Jens Axboe
2007-07-18 11:18           ` Tejun Heo
2007-07-18 11:31             ` Jens Axboe
2007-07-18 11:33               ` Tejun Heo
2007-07-18 11:34                 ` Jens Axboe
2007-07-18 11:41                   ` Tejun Heo
2007-07-18 11:45                     ` Jens Axboe
2007-07-18 11:49                       ` Jens Axboe
2007-07-18 12:34                         ` Tejun Heo
2007-07-18 12:31                       ` Jens Axboe
2007-05-28 11:17 ` [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Nikita Danilov
2007-05-31  3:31   ` Neil Brown
2007-05-28 14:43 ` Bill Davidsen
2007-05-31  0:37   ` Neil Brown
2007-05-31 12:28     ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0705300946220.8246@asgard.lang.hm \
    --to=david@lang.hm \
    --cc=adilger@clusterfs.com \
    --cc=dgc@sgi.com \
    --cc=dm-devel@redhat.com \
    --cc=htejun@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).