From: Dave Chinner <david@fromorbit.com>
To: L A Walsh <xfs@tlinx.org>
Cc: Eric Sandeen <sandeen@sandeen.net>, linux-xfs@vger.kernel.org
Subject: Re: default mount options
Date: Thu, 1 Dec 2016 09:18:37 +1100 [thread overview]
Message-ID: <20161130221837.GH31101@dastard> (raw)
In-Reply-To: <583F30C8.1000206@tlinx.org>
On Wed, Nov 30, 2016 at 12:04:24PM -0800, L A Walsh wrote:
>
>
> Eric Sandeen wrote:
> >
> >> But those systems also, sometimes, change runtime
> >>behavior based on the UPS or battery state -- using write-back on
> >>a full-healthy battery, or write-through when it wouldn't be safe.
> >>
> >> In that case, it seems nobarrier would be a better choice
> >>for those volumes -- letting the controller decide.
> >
> >No. Because then xfs will /never/ send barriers requests, even
> >if the battery dies. So I think you have that backwards.
Let's just get somethign straight first - there is no "barrier"
operation that is sent to the storage, and Linux does not have
"barriers" anymore. What we now do is strictly order our IO at the
filesystem level and issue cache flush requests to ensure all IO
prior to the cache flush request is on stable storage. We also make
use of FUA writes, which guarantee that a specific write hits stable
storage before the filesystem is told that it is complete (FUA is
emulated with post-IO cache flush requests on devices that don't
support FUA).
This is why "barriers" no longer have a performance cost - we don't
need to empty the IO pipeline to guarantee integrity anymore. And it
should be clear why hardware that has non-volatile caches don't care
whether "barriers" are enabled or not because all writes are FUA and
cache flushes are no-ops.
IOWs, "barriers" are an outdated concept and we only still have it
hanging around because we were stupid enough to name a mount option
after an implementation, rather than the feature it provided.
> ---
> If the battery dies, then the controller shifts
> to write-through and no longer uses its write cache. This is
> documented and observed behavior.
For /some/ RAID controllers in /some/ modes. For example, the
megaraid driver that has been ignoring cache flushes for over 9
years because in RAID mode it doesn't need it. However, in JBOD
mode, that same controller requires cache flushes to be sent because
it turns off sane cache management behaviour in JBOD mode:
ommit 1e793f6fc0db920400574211c48f9157a37e3945
Author: Kashyap Desai <kashyap.desai@broadcom.com>
Date: Fri Oct 21 06:33:32 2016 -0700
scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
Commit 02b01e010afe ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.
This is a clear example of why "barriers" should always be on and
cache flushes always passed through to the storage - because we just
don't know WTF the storage is actually doing with it's caches.
Quite frankly, I think it's time we marked the "barrier/nobarrier"
mount options as deprecated and simply always issue the required
cache flushes.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2016-11-30 22:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-29 23:51 Fwd: default mount options L.A. Walsh
2016-11-30 0:14 ` Eric Sandeen
2016-11-30 19:27 ` L A Walsh
2016-11-30 19:50 ` Eric Sandeen
2016-11-30 20:04 ` L A Walsh
2016-11-30 20:13 ` Eric Sandeen
2016-11-30 22:18 ` Dave Chinner [this message]
2016-12-01 4:04 ` L A Walsh
2016-12-01 10:50 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161130221837.GH31101@dastard \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
--cc=xfs@tlinx.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).