linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: David Chinner <dgc@sgi.com>,
	david@lang.hm, Phillip Susi <psusi@cfl.rr.com>,
	Neil Brown <neilb@suse.de>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	dm-devel@redhat.com, linux-raid@vger.kernel.org,
	Stefan Bader <Stefan.Bader@de.ibm.com>,
	Andreas Dilger <adilger@clusterfs.com>,
	Tejun Heo <htejun@gmail.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems,  and dm/md.
Date: Thu, 31 May 2007 15:36:50 +0200	[thread overview]
Message-ID: <20070531133649.GY32105@kernel.dk> (raw)
In-Reply-To: <465ECDDB.9030304@tmr.com>

On Thu, May 31 2007, Bill Davidsen wrote:
> Jens Axboe wrote:
> >On Thu, May 31 2007, David Chinner wrote:
> >  
> >>On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote:
> >>    
> >>>On Thu, May 31 2007, David Chinner wrote:
> >>>      
> >>>>IOWs, there are two parts to the problem:
> >>>>
> >>>>	1 - guaranteeing I/O ordering
> >>>>	2 - guaranteeing blocks are on persistent storage.
> >>>>
> >>>>Right now, a single barrier I/O is used to provide both of these
> >>>>guarantees. In most cases, all we really need to provide is 1); the
> >>>>need for 2) is a much rarer condition but still needs to be
> >>>>provided.
> >>>>
> >>>>        
> >>>>>if I am understanding it correctly, the big win for barriers is that 
> >>>>>you do NOT have to stop and wait until the data is on persistant media 
> >>>>>before you can continue.
> >>>>>          
> >>>>Yes, if we define a barrier to only guarantee 1), then yes this
> >>>>would be a big win (esp. for XFS). But that requires all filesystems
> >>>>to handle sync writes differently, and sync_blockdev() needs to
> >>>>call blkdev_issue_flush() as well....
> >>>>
> >>>>So, what do we do here? Do we define a barrier I/O to only provide
> >>>>ordering, or do we define it to also provide persistent storage
> >>>>writeback? Whatever we decide, it needs to be documented....
> >>>>        
> >>>The block layer already has a notion of the two types of barriers, with
> >>>a very small amount of tweaking we could expose that. There's absolutely
> >>>zero reason we can't easily support both types of barriers.
> >>>      
> >>That sounds like a good idea - we can leave the existing
> >>WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
> >>behaviour that only guarantees ordering. The filesystem can then
> >>choose which to use where appropriate....
> >>    
> >
> >Precisely. The current definition of barriers are what Chris and I came
> >up with many years ago, when solving the problem for reiserfs
> >originally. It is by no means the only feasible approach.
> >
> >I'll add a WRITE_ORDERED command to the #barrier branch, it already
> >contains the empty-bio barrier support I posted yesterday (well a
> >slightly modified and cleaned up version).
> >
> >  
> Wait. Do filesystems expect (depend on) anything but ordering now? Does 
> md? Having users of barriers as they currently behave suddenly getting 
> SYNC behavior where they expect ORDERED is likely to have a negative 
> effect on performance. Or do I misread what is actually guaranteed by 
> WRITE_BARRIER now, and a flush is currently happening in all cases?

See the above stuff you quote, it's answered there. It's not a change,
this is how the Linux barrier write has always worked since I first
implemented it. What David and I are talking about is adding a more
relaxed version as well, that just implies ordering.

> And will this also be available to user space f/s, since I just proposed 
> a project which uses one? :-(

I see several uses for that, so I'd hope so.

> I think the goal is good, more choice is almost always better choice, I 
> just want to be sure there won't be big disk performance regressions.

We can't get more heavy weight than the current barrier, it's about as
conservative as you can get.

-- 
Jens Axboe


  reply	other threads:[~2007-05-31 13:36 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-25  7:58 [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Neil Brown
2007-05-25 11:15 ` David Chinner
2007-05-25 11:49   ` Jens Axboe
2007-05-25 14:49     ` Phillip Susi
2007-05-28 18:32       ` [dm-devel] " Jens Axboe
2007-05-25 13:52 ` Stefan Bader
2007-05-28  1:37   ` Neil Brown
2007-05-29  9:12     ` Stefan Bader
2007-05-25 15:11 ` Phillip Susi
2007-05-26  1:03 ` Andreas Dilger
2007-05-26 10:27 ` Tejun Heo
2007-05-28  1:30 ` Neil Brown
2007-05-28  2:45   ` David Chinner
2007-05-28  2:57     ` Neil Brown
2007-05-28  4:29       ` David Chinner
2007-05-31  0:46         ` Neil Brown
2007-05-31  0:57           ` Alasdair G Kergon
2007-05-31  1:07           ` Alasdair G Kergon
2007-05-31  1:11             ` David Chinner
2007-05-28  4:48     ` Timothy Shimmin
2007-05-29  6:45       ` Jeremy Higdon
2007-05-29 20:03     ` Phillip Susi
2007-05-29 23:48       ` David Chinner
2007-05-30  0:01         ` david
2007-05-30  6:17           ` David Chinner
2007-05-30  8:55             ` Stefan Bader
2007-05-30 16:52             ` david
2007-05-31  0:20               ` David Chinner
2007-05-31  6:26                 ` Jens Axboe
2007-05-31  7:03                   ` David Chinner
2007-05-31  7:06                     ` Jens Axboe
2007-05-31 13:30                       ` Bill Davidsen
2007-05-31 13:36                         ` Jens Axboe [this message]
2007-06-01 16:04                           ` Bill Davidsen
2007-06-02 14:51                             ` Jens Axboe
2007-06-02 19:55                               ` Bill Davidsen
2007-06-01  3:16                       ` Tejun Heo
2007-06-01  8:21                         ` Jens Axboe
2007-06-02  9:20                           ` Tejun Heo
2007-06-02 14:34                             ` Jens Axboe
2007-06-02 22:57                               ` Guy Watkins
2007-06-04  7:39                               ` Tejun Heo
2007-05-31 18:31                     ` Phillip Susi
2007-05-31 19:00                       ` Jens Axboe
2007-05-31 19:21                         ` david
2007-05-31 19:40                           ` Jens Axboe
2007-05-31 23:34                       ` David Chinner
2007-06-01  5:59                         ` Neil Brown
2007-06-01  6:11                           ` Jens Axboe
2007-06-01  7:53                           ` David Chinner
2007-06-01 23:56                           ` Bill Davidsen
2007-05-31 18:24                 ` Phillip Susi
2007-05-30 16:45         ` Phillip Susi
2007-05-30 20:27           ` [dm-devel] " Phillip Susi
2007-05-31  6:24             ` Jens Axboe
2007-05-31 18:37               ` [dm-devel] " Phillip Susi
2007-05-31 18:58                 ` Jens Axboe
2007-06-02  0:04                   ` Bill Davidsen
2007-05-28  9:29   ` Tejun Heo
2007-05-28  9:43   ` Alasdair G Kergon
2007-05-29  9:25     ` [dm-devel] " Stefan Bader
2007-05-29 22:05       ` Alasdair G Kergon
2007-05-30  9:12         ` [dm-devel] " Stefan Bader
2007-05-30 10:41           ` Alasdair G Kergon
2007-05-30 16:55           ` Phillip Susi
2007-05-31 11:14             ` [dm-devel] " Stefan Bader
2007-06-01  3:25               ` Tejun Heo
2007-06-01  5:55                 ` david
2007-06-01  7:16                   ` [dm-devel] " Tejun Heo
2007-06-01 17:07                     ` Valdis.Kletnieks
2007-06-01 18:09                       ` Tejun Heo
2007-07-10 18:39                     ` Ric Wheeler
2007-07-10 23:40                       ` Valdis.Kletnieks
2007-07-11  2:49                         ` Tejun Heo
2007-07-11 22:44                         ` Ric Wheeler
2007-07-12 17:34                           ` Valdis.Kletnieks
2007-07-12 19:43                             ` Ric Wheeler
2007-07-12 23:10                             ` Guy Watkins
2007-07-13 11:30                               ` Ric Wheeler
2007-07-11  2:51                       ` Tejun Heo
2007-05-29 19:59   ` Phillip Susi
2007-05-31  0:22     ` Neil Brown
2007-05-30  9:35   ` Jens Axboe
2007-07-05 12:28     ` Tejun Heo
2007-07-09 12:27       ` Jens Axboe
2007-07-18 10:56     ` [PATCH] block: cosmetic changes Tejun Heo
2007-07-18 10:59       ` [PATCH] block: factor out bio_check_eod() Tejun Heo
2007-07-18 11:06         ` Jens Axboe
2007-07-18 11:18           ` Tejun Heo
2007-07-18 11:31             ` Jens Axboe
2007-07-18 11:33               ` Tejun Heo
2007-07-18 11:34                 ` Jens Axboe
2007-07-18 11:41                   ` Tejun Heo
2007-07-18 11:45                     ` Jens Axboe
2007-07-18 11:49                       ` Jens Axboe
2007-07-18 12:34                         ` Tejun Heo
2007-07-18 12:31                       ` Jens Axboe
2007-05-28 11:17 ` [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md Nikita Danilov
2007-05-31  3:31   ` Neil Brown
2007-05-28 14:43 ` Bill Davidsen
2007-05-31  0:37   ` Neil Brown
2007-05-31 12:28     ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070531133649.GY32105@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=Stefan.Bader@de.ibm.com \
    --cc=adilger@clusterfs.com \
    --cc=david@lang.hm \
    --cc=davidsen@tmr.com \
    --cc=dgc@sgi.com \
    --cc=dm-devel@redhat.com \
    --cc=htejun@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=psusi@cfl.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).