public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
	Ric Wheeler <rwheeler@redhat.com>, Tejun Heo <tj@kernel.org>,
	Vivek Goyal <vgoyal@redhat.com>,
	axboe@kernel.dk, tytso@mit.edu, shli@kernel.org, neilb@suse.de,
	adilger.kernel@dilger.ca, jack@suse.cz, snitzer@redhat.com,
	linux-kernel@vger.kernel.org, kmannth@us.ibm.com, cmm@us.ibm.com,
	linux-ext4@vger.kernel.org, hch@lst.de, josef@redhat.com
Subject: Re: [PATCHSET] Refactor barrier=/nobarrier flags from fs to block layer
Date: Fri, 28 Jan 2011 22:16:29 +1100	[thread overview]
Message-ID: <20110128111629.GW21311@dastard> (raw)
In-Reply-To: <20110126172413.GG32261@tux1.beaverton.ibm.com>

On Wed, Jan 26, 2011 at 09:24:13AM -0800, Darrick J. Wong wrote:
> On Wed, Jan 26, 2011 at 10:41:35AM -0600, Eric Sandeen wrote:
> > On 1/26/11 5:49 AM, Ric Wheeler wrote:
> > > On 01/26/2011 02:12 AM, Darrick J. Wong wrote:
> > >> Hello,
> > >>
> > >>  From what I can tell, most of the filesystems that know how to issue commands
> > >> to flush the write cache also have some mechanism for the user to override
> > >> whether or not the filesystem actually issues those flushes.  Unfortunately,
> > >> the term "barrier" is obsolete having been changed into flushes in 2.6.36, and
> > >> many of the filesystems implement the mount options with slightly different
> > >> syntaxes (barrier=[0|1|none|flush], nobarrier, etc).
> > >>
> > >> This patchset adds to the block layer a sysfs knob that an administrator can
> > >> use to disable flushes, and removes the mount options from the filesystem code.
> > >> As a starting point, I'm removing the mount options and flush toggle from
> > >> jbd2/ext4.
> > >>
> > >> Anyway, I'm looking for some feedback about refactoring the barrier/flush
> > >> control knob into the block layer.  It sounds like we want a knob that picks
> > >> the safest option (issue flushes when supported) unless the administrator
> > >> decides that it is appropriate to do otherwise.  I suspect that there are good
> > >> arguments for not having a knob at all, and good arguments for a safe knob.
> > >> However, since I don't see the barrier options being removed en masse, I'm
> > >> assuming that we still want a knob somewhere.  Do we need the ignore_fua knob
> > >> too?  Is this the proper way to deprecate mount options out of filesystems?
> > >>
> > >> --D
> > > 
> > > Just to be clear, I strongly object to remove the mount options.
> > 
> > Agreed, we are just finally, barely starting to win the education battle here.
> > Removing or changing the option now will just set us back.  It should at
> > LEAST remain as a deprecated option, with the deprecation message pointing
> > to crystal-clear documentation.
> 
> Ok, how about a second proposal:
> 
> 1. Put the sysfs knob and the toggle code in the block layer, similar to patch
> #1, only make it a per-bdev toggle so each mount can have its own override
> parameters.

A sysfs knob just seems wrong for this. What do you do with
filesystems or block devices that span multiple block devices,
either via md, dm, mount options (XFS - separate data, log and
realtime devices) or other means (btrfs w/ multiple devices)? 

IMO, the only sane way to control this sort of behaviour is from the
top down (i.e. from the filesystem) and not from the bottom up (i.e.
from the lowest level of block devices) because the cache flushes
are only useful to the filesystem if they are consistently
implemented from the top of the storage stack to the bottom...

Also, if you allow block devices at the bottom of the stack to be
configured to ignore flushes dynamically, we need some method to
inform the upper layers that this has happened. At minimum the
filesystem needs to log the fact that their crash/power fail
consistency guarantees have changed - there's no way I'm going to
assume that users won't do something stupid if there's a knob to
tweak....

> 2. Add some sort of "nocacheflush" option to the VFS layer to adjust the knob.
> With this we gain a consistent mount option syntax across all the filesystems,
> though what it means for a networked fs is questionable.  I guess you could
> reject the mount option if there's no block device under the fs.  Also, any fs
> that someday grows an issue-flush feature won't have to add its own option
> parsing code.

We already have a relatively widely implemented mount option pair -
barrier/nobarrier is supported by ext3, ext4, btrfs, gfs2, xfs,
hfsplus and nilfs2 - so I'd suggest that this is the best paaaaaaah
to take for implementing a generic mount option...

> At umount time, do we undo whatever overrides we set up at mount time?  Seems
> sane to me, just wanted to run it by everyone.

Does it really matter? The next mount will set it to whatever is
necessary...

> 3. Change the per-fs option handling code to call the same code as the VFS'
> nocacheflush option.  Any fs that wants to deprecate its per-fs option handler
> can do so.  Or they can stay forever.
> 
> 4. Remove all the flush conditionals from the fs code in favor of letting the
> block layer handle it.
>
> Hopefully "nocacheflush" is a little more obvious.

What cache does "nocacheflush" refer to? The page, inode, dentry, or
buffer caches? Or some other per filesystem cache? Perhaps the MD
stripe cache? Maybe something else? There are many different caches
in a storage system even before we consider hardware, so I think
"nocacheflush" is much more ambiguous than barrier/nobarrier...

Just my 2c worth....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2011-01-28 11:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-26  7:12 [PATCHSET] Refactor barrier=/nobarrier flags from fs to block layer Darrick J. Wong
2011-01-26  7:16 ` [PATCH 1/3] block: Create sysfs knobs to override FLUSH/FUA support flags Darrick J. Wong
2011-01-26  9:30   ` Tejun Heo
2011-01-26 17:00     ` Darrick J. Wong
2011-02-05 16:20   ` Greg KH
2011-01-26  7:18 ` [PATCH 2/3] jbd2: Remove barrier feature conditional flag (or: always issue flushes) Darrick J. Wong
2011-01-26  7:23 ` [PATCH 3/3] ext4: Deprecate barrier= and nobarrier mount options Darrick J. Wong
2011-01-26  9:36   ` Tejun Heo
2011-01-26 10:47     ` Jan Kara
2011-01-26 10:51       ` Tejun Heo
2011-01-26 12:16       ` Ric Wheeler
2011-01-26 12:21         ` Tejun Heo
2011-01-26 13:29       ` torn5
2011-01-26 11:47 ` [PATCHSET] Refactor barrier=/nobarrier flags from fs to block layer Ric Wheeler
2011-01-26 11:49 ` Ric Wheeler
2011-01-26 16:41   ` Eric Sandeen
2011-01-26 17:24     ` Darrick J. Wong
2011-01-28 11:16       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110128111629.GW21311@dastard \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=axboe@kernel.dk \
    --cc=cmm@us.ibm.com \
    --cc=djwong@us.ibm.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=josef@redhat.com \
    --cc=kmannth@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=shli@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox