linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Susi <psusi@cfl.rr.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com, linux-raid@vger.kernel.org,
	Jens Axboe <jens.axboe@oracle.com>, David Chinner <dgc@sgi.com>,
	Stefan Bader <Stefan.Bader@de.ibm.com>,
	Andreas Dilger <adilger@clusterfs.com>,
	Tejun Heo <htejun@gmail.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Date: Tue, 29 May 2007 15:59:33 -0400	[thread overview]
Message-ID: <465C8625.1010001@cfl.rr.com> (raw)
In-Reply-To: <18010.12472.209452.148229@notabene.brown>

Neil Brown wrote:
>  md/dm modules could keep count of requests as has been suggested
>  (though that would be a fairly big change for raid0 as it currently
>  doesn't know when a request completes - bi_endio goes directly to the
>  filesystem). 

Are you sure?  I believe that dm handles bi_endio because it waits for 
all in progress bio to complete before switching tables.

> 2/ Maybe barriers provide stronger semantics than are required.
> 
>  All write requests are synchronised around a barrier write.  This is
>  often more than is required and apparently can cause a measurable
>  slowdown.

I'm not quite sure I understand this correctly, but the purpose of a 
barrier request is to prevent the elevator from reordering requests 
around a barrier.  Previous requests must be completed before the 
barrier, and latter requests must be executed after.  That is a 
sufficiently strong guarantee for careful write or journal filesystems 
to ensure that a log block hits the disk before the actual transaction 
blocks, and then the log block is marked as complete only after the 
actual transaction.  This is a weaker guarantee than a flush, and allows 
for some reordering to improve performance.

>  Also the FUA for the actual commit write might not be needed.  It is
>  important for consistency that the preceding writes are in safe
>  storage before the commit write, but it is not so important that the
>  commit write is immediately safe on storage.  That isn't needed until
>  a 'sync' or 'fsync' or similar.

Right, the barrier doesn't need to be flushed right away, so the 
elevator could complete writes after the barrier if it wishes, then 
complete the ones before, and finally the barrier itself.  Not setting 
the FUA bit allows the disk to cache the barrier write so it can be 
completed sooner, but before the queue sends any more requests to the 
disk, it must be flushed to ensure that the barrier has hit the media 
before the new requests.

>  One possible alternative is:
>    - writes can overtake barriers, but barrier cannot overtake writes.
>    - flush before the barrier, not after.
> 
>  This is considerably weaker, and hence cheaper. But I think it is
>  enough for all filesystems (providing it is still an option to call
>  blkdev_issue_flush on 'fsync').

Again I am not sure I quite understand what you mean here, but only 
writes issued after the barrier can complete before the barrier.  Those 
issued before the barrier can not overtake it in the queue.

>  Another alternative would be to tag each bio was being in a
>  particular barrier-group.  Then bio's in different groups could
>  overtake each other in either direction, but a BARRIER request must
>  be totally ordered w.r.t. other requests in the barrier group.
>  This would require an extra bio field, and would give the filesystem
>  more appearance of control.  I'm not yet sure how much it would
>  really help...
>  It would allow us to set FUA on all bios with a non-zero
>  barrier-group.  That would mean we don't have to flush the entire
>  cache, just those blocks that are critical.... but I'm still not sure
>  it's a good idea.

This all seems unnecessary work.



  parent reply	other threads:[~2007-05-29 19:59 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-25  7:58 [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Neil Brown
2007-05-25 11:15 ` David Chinner
2007-05-25 11:49   ` Jens Axboe
2007-05-25 14:49     ` Phillip Susi
2007-05-28 18:32       ` [dm-devel] " Jens Axboe
2007-05-25 13:52 ` Stefan Bader
2007-05-28  1:37   ` Neil Brown
2007-05-29  9:12     ` Stefan Bader
2007-05-25 15:11 ` Phillip Susi
2007-05-26  1:03 ` Andreas Dilger
2007-05-26 10:27 ` Tejun Heo
2007-05-28  1:30 ` Neil Brown
2007-05-28  2:45   ` David Chinner
2007-05-28  2:57     ` Neil Brown
2007-05-28  4:29       ` David Chinner
2007-05-31  0:46         ` Neil Brown
2007-05-31  0:57           ` Alasdair G Kergon
2007-05-31  1:07           ` Alasdair G Kergon
2007-05-31  1:11             ` David Chinner
2007-05-28  4:48     ` Timothy Shimmin
2007-05-29  6:45       ` Jeremy Higdon
2007-05-29 20:03     ` Phillip Susi
2007-05-29 23:48       ` David Chinner
2007-05-30  0:01         ` david
2007-05-30  6:17           ` David Chinner
2007-05-30  8:55             ` Stefan Bader
2007-05-30 16:52             ` david
2007-05-31  0:20               ` David Chinner
2007-05-31  6:26                 ` Jens Axboe
2007-05-31  7:03                   ` David Chinner
2007-05-31  7:06                     ` Jens Axboe
2007-05-31 13:30                       ` Bill Davidsen
2007-05-31 13:36                         ` Jens Axboe
2007-06-01 16:04                           ` Bill Davidsen
2007-06-02 14:51                             ` Jens Axboe
2007-06-02 19:55                               ` Bill Davidsen
2007-06-01  3:16                       ` Tejun Heo
2007-06-01  8:21                         ` Jens Axboe
2007-06-02  9:20                           ` Tejun Heo
2007-06-02 14:34                             ` Jens Axboe
2007-06-02 22:57                               ` Guy Watkins
2007-06-04  7:39                               ` Tejun Heo
2007-05-31 18:31                     ` Phillip Susi
2007-05-31 19:00                       ` Jens Axboe
2007-05-31 19:21                         ` david
2007-05-31 19:40                           ` Jens Axboe
2007-05-31 23:34                       ` David Chinner
2007-06-01  5:59                         ` Neil Brown
2007-06-01  6:11                           ` Jens Axboe
2007-06-01  7:53                           ` David Chinner
2007-06-01 23:56                           ` Bill Davidsen
2007-05-31 18:24                 ` Phillip Susi
2007-05-30 16:45         ` Phillip Susi
2007-05-30 20:27           ` [dm-devel] " Phillip Susi
2007-05-31  6:24             ` Jens Axboe
2007-05-31 18:37               ` [dm-devel] " Phillip Susi
2007-05-31 18:58                 ` Jens Axboe
2007-06-02  0:04                   ` Bill Davidsen
2007-05-28  9:29   ` Tejun Heo
2007-05-28  9:43   ` Alasdair G Kergon
2007-05-29  9:25     ` [dm-devel] " Stefan Bader
2007-05-29 22:05       ` Alasdair G Kergon
2007-05-30  9:12         ` [dm-devel] " Stefan Bader
2007-05-30 10:41           ` Alasdair G Kergon
2007-05-30 16:55           ` Phillip Susi
2007-05-31 11:14             ` [dm-devel] " Stefan Bader
2007-06-01  3:25               ` Tejun Heo
2007-06-01  5:55                 ` david
2007-06-01  7:16                   ` [dm-devel] " Tejun Heo
2007-06-01 17:07                     ` Valdis.Kletnieks
2007-06-01 18:09                       ` Tejun Heo
2007-07-10 18:39                     ` Ric Wheeler
2007-07-10 23:40                       ` Valdis.Kletnieks
2007-07-11  2:49                         ` Tejun Heo
2007-07-11 22:44                         ` Ric Wheeler
2007-07-12 17:34                           ` Valdis.Kletnieks
2007-07-12 19:43                             ` Ric Wheeler
2007-07-12 23:10                             ` Guy Watkins
2007-07-13 11:30                               ` Ric Wheeler
2007-07-11  2:51                       ` Tejun Heo
2007-05-29 19:59   ` Phillip Susi [this message]
2007-05-31  0:22     ` Neil Brown
2007-05-30  9:35   ` Jens Axboe
2007-07-05 12:28     ` Tejun Heo
2007-07-09 12:27       ` Jens Axboe
2007-07-18 10:56     ` [PATCH] block: cosmetic changes Tejun Heo
2007-07-18 10:59       ` [PATCH] block: factor out bio_check_eod() Tejun Heo
2007-07-18 11:06         ` Jens Axboe
2007-07-18 11:18           ` Tejun Heo
2007-07-18 11:31             ` Jens Axboe
2007-07-18 11:33               ` Tejun Heo
2007-07-18 11:34                 ` Jens Axboe
2007-07-18 11:41                   ` Tejun Heo
2007-07-18 11:45                     ` Jens Axboe
2007-07-18 11:49                       ` Jens Axboe
2007-07-18 12:34                         ` Tejun Heo
2007-07-18 12:31                       ` Jens Axboe
2007-05-28 11:17 ` [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Nikita Danilov
2007-05-31  3:31   ` Neil Brown
2007-05-28 14:43 ` Bill Davidsen
2007-05-31  0:37   ` Neil Brown
2007-05-31 12:28     ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=465C8625.1010001@cfl.rr.com \
    --to=psusi@cfl.rr.com \
    --cc=Stefan.Bader@de.ibm.com \
    --cc=adilger@clusterfs.com \
    --cc=dgc@sgi.com \
    --cc=dm-devel@redhat.com \
    --cc=htejun@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).