All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Steve Bergman <sbergman27@gmail.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Is this expected RAID10 performance?
Date: Sun, 09 Jun 2013 19:02:20 -0500	[thread overview]
Message-ID: <51B5178C.5050900@sandeen.net> (raw)
In-Reply-To: <CAO9HMNGwdLG7ubNP1gC7To9pNstuwZPDAeV03HDkUjh5MWB0Dg@mail.gmail.com>

On 6/9/13 6:34 PM, Steve Bergman wrote:
> Hi Eric,
> 
> Yes, I understand what you are saying about the interaction between
> ordered data mode and DA in ext4. It's the combination of the 2
> options that makes the difference. Merely having a switch to turn off
> DA on XFS would not get me what I need for my data volumes. But thank
> you for making that explicit.
> 
> I intentionally disable DA on my ext4 data volumes specifically to get
> ext3-like behavior which results in a night and day difference in
> resiliency during... difficult times... for many of my customers, in
> my repeated experiences. I could just use ext3. But why give up
> extents, multiblock allocation, CRC protection of the journal, etc?
> (BTW, that's my vote *not* to remove the nodelalloc option of ext4 as
> I noticed you and Ted discussing last April. ;-)

I don't recommend nodelalloc just because I don't know that it's thoroughly
tested.  Anything that's not the default needs explicit and careful
test coverage to be sure that regressions etc. aren't popping up.

(One of ext4's weaknesses, IMHO, is its infinite matrix of options,
with wildly different behaviors.  It's more a filesystem multiplexer
than a filesystem itself.  ;)  Add enough knobs and there's no way you
can get coverage of all combinations.)

> So on a set of Cobol C/ISAM files which never get fsync'd or
> fdatasync'd, (because that concept does not exist in Cobol) would you
> expect there to be any difference in the resiliency of ext4 and xfs
> with both filesystems at completely default settings?

So back to the main point of this thread.

You probably need to define what _you_ mean by resiliency.  I have a hunch
that you have different, and IMHO unfounded, expectations.

I'm using a definition of resiliency for this conversation like this:

For properly configured, non-dodgey storage, 

1) Is metadata journaled such that the filesystem metadata is consistent
   after a crash or power loss, and fsck finds no errors?

and

2) Is data persistent on disk after either a periodic flush, or a data
   integrity syscall?

The answer to both had better be yes on ext3, ext4, xfs, or any other
journaling filesystem worth its salt.  If the answer is no, it's a broken
design or a bug.

And the answer for ext3, ext4, and xfs, barring the inevitable bugs that
come up from time to time on all filesystems, is yes, 1) and 2) are
satisfied.

Anything else you want in terms of data persistence (data from my careless
applications will be safe no matter what) is just wishful thinking.

> Or would it be
> about the same. I'm *very* interested in this topic, as I'd like the
> best speed and more filesystem options, but need the resiliency even
> more for many of my servers. Do I have an option with XFS to improve
> behavior on/after an unclean shutdown? If so, I'd sincerely like to
> know.

What you seem to want is an vanishingly small window for risk of data
loss for unsynced, buffered IO.

ext3 gave you about 5 seconds thanks to default jbd behavior and
data=ordered behavior.  ext4 & xfs are more on the order of
30s.

But this all boils down to:

Did you (or your app) fsync your data?  If not, you cannot guarantee
that it'll be there if you crash or lose power.  The window for risk
of loss depends on many things, but without data integrity syscalls,
there is a risk of data loss.  See also http://lwn.net/Articles/457667/

You said to Ric:

> I find that in practice, simply leaving the data volumes in
> data=ordered mode and turning off DA results in -osync-like
> data integrity.

It quite simply does not.  Write a new file, punch power 1-2s after
the write completes, reboot and see what you've got.  You're racing
against jbd2 waking up and getting work done, but most of the time,
you'll have data loss.

If you want a smaller window of opportunity for data loss, there
are plenty of tuneables at the fs & vm level to push data towards
disk more often, at the expense of performance.

Without data integrity syscalls, you're always exposed to a greater
or lesser degree.

(It'd probably be better to take this up on the filesystem lists,
since we've gotten awfully off-topic for linux-raid.  But I feel
like this is a rehash of the O_PONIES thread from long ago...)

-Eric

  reply	other threads:[~2013-06-10  0:02 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-08 19:56 Is this expected RAID10 performance? Steve Bergman
2013-06-09  3:08 ` Stan Hoeppner
2013-06-09 12:09 ` Ric Wheeler
2013-06-09 20:06   ` Steve Bergman
2013-06-09 21:40     ` Ric Wheeler
2013-06-09 23:08       ` Steve Bergman
2013-06-10  8:35         ` Stan Hoeppner
2013-06-10  0:11       ` Joe Landman
2013-06-09 22:05     ` Eric Sandeen
2013-06-09 23:34       ` Steve Bergman
2013-06-10  0:02         ` Eric Sandeen [this message]
2013-06-10  2:37           ` Steve Bergman
2013-06-10 10:00             ` Stan Hoeppner
2013-06-10  7:19           ` David Brown
2013-06-10  0:05     ` Joe Landman
  -- strict thread matches above, loose matches on Subject: below --
2013-06-09 23:53 Steve Bergman
2013-06-10  9:23 ` Stan Hoeppner
2013-06-06 23:52 Steve Bergman
2013-06-07  3:25 ` Stan Hoeppner
2013-06-07  7:51 ` Roger Heflin
2013-06-07  8:07   ` Alexander Zvyagin
2013-06-07 10:44     ` Steve Bergman
2013-06-07 10:52       ` Roman Mamedov
2013-06-07 11:25         ` Steve Bergman
2013-06-07 13:18           ` Stan Hoeppner
2013-06-07 13:54             ` Steve Bergman
2013-06-07 21:43               ` Bill Davidsen
2013-06-07 23:33               ` Stan Hoeppner
2013-06-07 12:39       ` Stan Hoeppner
2013-06-07 12:59         ` Steve Bergman
2013-06-07 20:51           ` Stan Hoeppner
2013-06-08 18:23 ` keld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B5178C.5050900@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=sbergman27@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.