linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Ted Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,
	Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>,
	jaxboe@fusionio.com, James.Bottomley@suse.de,
	linux-fsdevel@vger.kernel
Subject: Re: [RFC] relaxed barrier semantics
Date: Wed, 28 Jul 2010 22:43:34 -0400	[thread overview]
Message-ID: <20100729024334.GA21736@redhat.com> (raw)
In-Reply-To: <20100729014431.GD4506@thunk.org>

On Wed, Jul 28, 2010 at 09:44:31PM -0400, Ted Ts'o wrote:
> On Wed, Jul 28, 2010 at 11:28:59AM +0200, Christoph Hellwig wrote:
> > If we move all filesystems to non-draining barriers with pre- and post-
> > flushes that might actually be a relatively easy first step.  We don't
> > have the complications to deal with multiple types of barriers to
> > start with, and it'll fix the issue for devices without volatile write
> > caches completely.
> > 
> > I just need some help from the filesystem folks to determine if they
> > are safe with them.
> > 
> > I know for sure that ext3 and xfs are from looking through them.  And
> > I know reiserfs is if we make sure it doesn't hit the code path that
> > relies on it that is currently enabled by the barrier option.
> > 
> > I'll just need more feedback from ext4, gfs2, btrfs and nilfs folks.
> > That already ends our small list of barrier supporting filesystems, and
> > possibly ocfs2, too - although the barrier implementation there seems
> > incomplete as it doesn't seem to flush caches in fsync.
> 
> Define "are safe" --- what interface we planning on using for the
> non-draining barrier?  At least for ext3, when we write the commit
> record using set_buffer_ordered(bh), it assumes that this will do a
> flush of all previous writes and that the commit will hit the disk
> before any subsequent writes are sent to the disk.  So turning the
> write of a buffer head marked with set_buffered_ordered() into a FUA
> write would _not_ be safe for ext3.
> 

I guess we will require something like set_buffer_preflush_fua() kind of
operation so that we preflush the cache to make sure everything before
commit block is on platter and then do commit block write with FUA
to make sure commit block is on platter.

This is assuming that before issuing commit block request we have waited
for completion of rest of the journal data. This will make sure none of
that journal data is in request queue. Then if we issue commit with 
preflush and FUA, it should make sure all the journal blocks are on
disk and then commit block is on disk.

So as long as we wait in filesystem for completion of the requests commit
block is dependent on, before we issue commit request, we should not
require request queue drain and preflush and FUA write probably should
be fine.

> For ext4, if we don't use journal checksums, then we have the same
> requirements as ext3, and the same method of requesting it.  If we do
> use journal checksums, what ext4 needs is a way of assuring that no
> writes after the commit are reordered with respect to the disk platter
> before the commit record --- but any of the writes before that,
> including the commit, and be reordered because we rely on the checksum
> in the commit record to know at replay time whether the last commit is
> valid or not.  We do that right now by calling blkdev_issue_flush()
> with BLKDEF_IFL_WAIT after submitting the write of the commit block.

IIUC, blkdev_issue_flush() is just a hard barrier and will drain queue
and flush the cache. I guess what we need is only flush and not drain
after we have waited for completion of commit record as well as requests
issued before commit record. That should make sure any WRITE after 
commit record does not get reordered w.r.t previous commit. So we
probably need blkdev_issue_flush_only() which will just flush caches
and not drain request queue.

This is all based on my very primitive knowledge. Please ignore if it is
all rubbish.

Thanks
Vivek

  reply	other threads:[~2010-07-29  2:43 UTC|newest]

Thread overview: 146+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 16:56 [RFC] relaxed barrier semantics Christoph Hellwig
2010-07-27 17:54 ` Jan Kara
2010-07-27 18:35   ` Vivek Goyal
2010-07-27 18:42     ` James Bottomley
2010-07-27 18:51       ` Ric Wheeler
2010-07-27 19:43       ` Christoph Hellwig
2010-07-27 19:38     ` Christoph Hellwig
2010-07-28  8:08     ` Tejun Heo
2010-07-28  8:20       ` Tejun Heo
2010-07-28 13:55         ` Vladislav Bolkhovitin
2010-07-28 14:23           ` Tejun Heo
2010-07-28 14:37             ` James Bottomley
2010-07-28 14:44               ` Tejun Heo
2010-07-28 16:17                 ` Vladislav Bolkhovitin
2010-07-28 16:17               ` Vladislav Bolkhovitin
2010-07-28 16:16             ` Vladislav Bolkhovitin
2010-07-28  8:24       ` Christoph Hellwig
2010-07-28  8:40         ` Tejun Heo
2010-07-28  8:50           ` Christoph Hellwig
2010-07-28  8:58             ` Tejun Heo
2010-07-28  9:00               ` Christoph Hellwig
2010-07-28  9:11                 ` Hannes Reinecke
2010-07-28  9:16                   ` Christoph Hellwig
2010-07-28  9:24                     ` Tejun Heo
2010-07-28  9:38                       ` Christoph Hellwig
2010-07-28  9:28                   ` Steven Whitehouse
2010-07-28  9:35                     ` READ_META semantics, was " Christoph Hellwig
2010-07-28 13:52                       ` Jeff Moyer
2010-07-28  9:17                 ` Tejun Heo
2010-07-28  9:28                   ` Christoph Hellwig
2010-07-28  9:48                     ` Tejun Heo
2010-07-28 10:19                     ` Steven Whitehouse
2010-07-28 11:45                       ` Christoph Hellwig
2010-07-28 12:47                     ` Jan Kara
2010-07-28 23:00                       ` Christoph Hellwig
2010-07-29 10:45                         ` Jan Kara
2010-07-29 16:54                           ` Joel Becker
2010-07-29 17:02                             ` Christoph Hellwig
2010-07-29  1:44                     ` Ted Ts'o
2010-07-29  2:43                       ` Vivek Goyal [this message]
2010-07-29  8:42                         ` Christoph Hellwig
2010-07-29 20:02                           ` Vivek Goyal
2010-07-29 20:06                             ` Christoph Hellwig
2010-07-30  3:17                               ` Vivek Goyal
2010-07-30  7:07                                 ` Christoph Hellwig
2010-07-30  7:41                                   ` Vivek Goyal
2010-08-02 18:28                                   ` [RFC PATCH] Flush only barriers (Was: Re: [RFC] relaxed barrier semantics) Vivek Goyal
2010-08-03 13:03                                     ` Christoph Hellwig
2010-08-04 15:29                                       ` Vivek Goyal
2010-08-04 16:21                                         ` Christoph Hellwig
2010-07-29  8:31                       ` [RFC] relaxed barrier semantics Christoph Hellwig
2010-07-29 11:16                         ` Jan Kara
2010-07-29 13:00                         ` extfs reliability Vladislav Bolkhovitin
2010-07-29 13:08                           ` Christoph Hellwig
2010-07-29 14:12                             ` Vladislav Bolkhovitin
2010-07-29 14:34                               ` Jan Kara
2010-07-29 18:20                                 ` Vladislav Bolkhovitin
2010-07-29 18:49                                 ` Vladislav Bolkhovitin
2010-07-29 14:26                           ` Jan Kara
2010-07-29 18:20                             ` Vladislav Bolkhovitin
2010-07-29 18:58                           ` Ted Ts'o
2010-07-29 19:44                       ` [RFC] relaxed barrier semantics Ric Wheeler
2010-07-29 19:49                         ` Christoph Hellwig
2010-07-29 19:56                           ` Ric Wheeler
2010-07-29 19:59                             ` James Bottomley
2010-07-29 20:03                               ` Christoph Hellwig
2010-07-29 20:07                                 ` James Bottomley
2010-07-29 20:11                                   ` Christoph Hellwig
2010-07-30 12:45                                     ` Vladislav Bolkhovitin
2010-07-30 12:56                                       ` Christoph Hellwig
2010-08-04  1:58                                     ` Jamie Lokier
2010-07-30 12:46                                 ` Vladislav Bolkhovitin
2010-07-30 12:57                                   ` Christoph Hellwig
2010-07-30 13:09                                     ` Vladislav Bolkhovitin
2010-07-30 13:12                                       ` Christoph Hellwig
2010-07-30 17:40                                         ` Vladislav Bolkhovitin
2010-07-29 20:58                               ` Ric Wheeler
2010-07-29 22:30                             ` Andreas Dilger
2010-07-29 23:04                               ` Ted Ts'o
2010-07-29 23:08                                 ` Ric Wheeler
2010-07-29 23:28                                 ` James Bottomley
2010-07-29 23:37                                   ` James Bottomley
2010-07-30  0:19                                     ` Ted Ts'o
2010-07-30 12:56                                   ` Vladislav Bolkhovitin
2010-07-30  7:11                                 ` Christoph Hellwig
2010-07-30 12:56                                 ` Vladislav Bolkhovitin
2010-07-30 13:07                                   ` Tejun Heo
2010-07-30 13:22                                     ` Vladislav Bolkhovitin
2010-07-30 13:27                                       ` Vladislav Bolkhovitin
2010-07-30 13:09                                   ` Christoph Hellwig
2010-07-30 13:25                                     ` Vladislav Bolkhovitin
2010-07-30 13:34                                       ` Christoph Hellwig
2010-07-30 13:44                                         ` Vladislav Bolkhovitin
2010-07-30 14:20                                           ` Christoph Hellwig
2010-07-31  0:47                                             ` Jan Kara
2010-07-31  9:12                                               ` Christoph Hellwig
2010-08-02 13:14                                                 ` Jan Kara
2010-08-02 10:38                                               ` Vladislav Bolkhovitin
2010-08-02 12:48                                                 ` Christoph Hellwig
2010-08-02 19:03                                                   ` xfs rm performance Vladislav Bolkhovitin
2010-08-02 19:18                                                     ` Christoph Hellwig
2010-08-05 19:31                                                       ` Vladislav Bolkhovitin
2010-08-02 19:01                                             ` [RFC] relaxed barrier semantics Vladislav Bolkhovitin
2010-08-02 19:26                                               ` Christoph Hellwig
2010-07-31  0:35                         ` Jan Kara
2010-08-02 16:47                     ` Ryusuke Konishi
2010-08-02 17:39                     ` Chris Mason
2010-08-05 13:11                       ` Vladislav Bolkhovitin
2010-08-05 13:32                         ` Chris Mason
2010-08-05 14:52                           ` Hannes Reinecke
2010-08-05 15:17                             ` Chris Mason
2010-08-05 17:07                             ` Christoph Hellwig
2010-08-05 19:48                           ` Vladislav Bolkhovitin
     [not found]                           ` <4C5B1583.6070706@vlnb.net>
2010-08-05 19:50                             ` Christoph Hellwig
2010-08-05 20:05                               ` Vladislav Bolkhovitin
2010-08-06 14:56                                 ` Hannes Reinecke
2010-08-06 18:38                                   ` Vladislav Bolkhovitin
2010-08-06 23:38                                     ` Christoph Hellwig
2010-08-06 23:34                                   ` Christoph Hellwig
2010-08-05 17:09                         ` Christoph Hellwig
2010-08-05 19:32                           ` Vladislav Bolkhovitin
2010-08-05 19:40                             ` Christoph Hellwig
2010-07-28 13:56                   ` Vladislav Bolkhovitin
2010-07-28 14:42                 ` Vivek Goyal
2010-07-27 19:37   ` Christoph Hellwig
2010-08-03 18:49   ` [PATCH, RFC 1/2] relaxed cache flushes Christoph Hellwig
2010-08-03 18:51     ` [PATCH, RFC 2/2] dm: support REQ_FLUSH directly Christoph Hellwig
2010-08-04  4:57       ` Kiyoshi Ueda
2010-08-04  8:54         ` Christoph Hellwig
2010-08-05  2:16           ` Jun'ichi Nomura
2010-08-26 22:50             ` Mike Snitzer
2010-08-27  0:40               ` Mike Snitzer
2010-08-27  1:20                 ` Jamie Lokier
2010-08-27  1:43               ` Jun'ichi Nomura
2010-08-27  4:08                 ` Mike Snitzer
2010-08-27  5:52                   ` Jun'ichi Nomura
2010-08-27 14:13                     ` Mike Snitzer
2010-08-30  4:45                       ` Jun'ichi Nomura
2010-08-30  8:33                         ` Tejun Heo
2010-08-30 12:43                           ` Mike Snitzer
2010-08-30 12:45                             ` Tejun Heo
2010-08-06 16:04     ` [PATCH, RFC] relaxed barriers Tejun Heo
2010-08-06 23:34       ` Christoph Hellwig
2010-08-07 10:13       ` [PATCH REPOST " Tejun Heo
2010-08-08 14:31         ` Christoph Hellwig
2010-08-09 14:50           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100729024334.GA21736@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=James.Bottomley@suse.de \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=linux-fsdevel@vger.kernel \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).