public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Steve Bergman <sbergman27@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: Questions about XFS
Date: Tue, 11 Jun 2013 13:19:53 -0400	[thread overview]
Message-ID: <51B75C39.3030306@redhat.com> (raw)
In-Reply-To: <CAO9HMNGjdikgX+_434aGVJ2NAJ0hxDNLo+Vsa46GH3psXr4sKQ@mail.gmail.com>

On 06/11/2013 12:12 PM, Steve Bergman wrote:
> In #5 I was specifically talking about ext4. After the 2009 brouhaha
> over zero-length files in ext4 with delayed allocation turned on, Ted
> merged some patches into vanilla kernel 2,6,30 which mitigated the
> problem by recognizing certain common idioms and forcing automatically
> forcing an fsync. I'd heard the the XFS team modeled a set of XFS
> patches from them.
>
> Regarding #4, I have 12 years experience with my workloads on ext3 and
> 3 yrs on ext4 and know what I have observed. As a practical matter,
> there are large differences between filesystem behaviors which aren't
> up for debate since I know my workloads' behavior in the real world
> far better than anyone else possibly could. (In fact, I'm not sure how
> anyone else could presume to know how my workloads and filesystems
> interact.) But if I understand correctly, ext4 at default settings
> journals metadata and commits it every 5s, while flushing data every
> 30s. Ext3 journals metadata, and commits it every 5 seconds, while
> effectively flushing data, *immediately before the metadata*, every 5
> seconds. so the window in which data and metadata are not in sync is
> vanishingly small. Are you saying that with XFS there is no periodic
> flushing mechanism at all? And that unless there's an
> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> sit in the page cache forever?

I think that you are still missing the bigger point.

Periodic fsync() - done magically under the covers by the file system - does not 
provide any useful data integrity for any serious application.

Let's take a simple example - a database app that does say 30 transactions/sec.

In your example, you are extremely likely to lose up to just shy of 5 seconds of 
"committed" data - way over 100 transactions!  That can be *really* serious 
amounts of data and translate into large financial loss.

In a second example, let's say you are copying data to disk (say a movie) at a 
rate of 50 MB/second.  When the power cut hits at just the wrong time, you will 
have lost a large chunk of that data that has been "written" to disk (over 200MB).

You won't get any serious file system or storage person to go out on a limb on 
this kind of "it mostly kind of works" type of scenario. It just does not cut it 
in the enterprise world.

Hope this is helpful :)

Ric

>
> One thing is puzzling me. Everyone is telling me that I must ensure
> that fsync/fdatasync is used, even in environments where the concept
> doesn't exist. So I've gone to find good examples of how it it used.
> Since RHEL6 has been shipping with ext4 as the default for over 2.5
> years, I figured it would be a great place to find examples. However,
> I've been unable to find examples of fsync or fdatasync being used,
> when using "strace -o file.out -f" on various system programs which
> one would very much expect to use it. We talked about some Python
> config utilities the other day. But now I've moved on to C and C++
> code. e.g. "cupsd" copy/truncate/writes the config file
> "/etc/cups/printers.conf" quite frequently, all day long. But there is
> no sign whatsoever of any fsync or fdatasync when I grep the strace
> output file for those strings case insensitively. (And indeed, a
> complex printers.conf file turned up zero-length on one of my RHEL6.4
> boxes last week.)
>
> So I figured that when rpm installs a new vmlinuz, builds a new
> initramfs and puts it into place, and modifies grub.conf, that surely
> proper sync'ing must be done in this particularly critical case. But
> while I do see rpm fsync/fsync'ing its own database files, it never
> seems to fsync/fdatasync the critical system files it just installed
> and/or modified. Surely, after over 2 - 1/2 years of Red Hat shipping
> RHEL6 to customers, I must be mistaken in some way. Could you point me
> to an example in RHEL6.4 where I can see clearly how fsync is being
> properly used? In the mean time, I'll keep looking.
>
>
> Thanks,
> Steve
>
>
>
> On Tue, Jun 11, 2013 at 8:59 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
>> On 06/11/2013 05:56 AM, Steve Bergman wrote:
>>> 4. From the time I write() a bit of data, what's the maximum time before
>>> the
>>> data is actually committed to disk?
>>>
>>> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
>>> issue for some common cases via the auto_da_alloc feature added in kernel
>>> 2.6.30. Does XFS have similar behavior?
>>
>> I think that here you are talking more about ext3 than ext4.
>>
>> The answer to both of these - even for ext4 or ext3 - is that unless your
>> application and storage is all properly configured, you are effectively at
>> risk indefinitely. Chris Mason did a study years ago where he was able to
>> demonstrate that dirty data could get pinned in a disk cache effectively
>> indefinitely.  Only an fsync() would push that out.
>>
>> Applications need to use the data integrity hooks in order to have a
>> reliable promise that application data is crash safe.  Jeff Moyer wrote up a
>> really nice overview of this for lwn which you can find here:
>>
>> http://lwn.net/Articles/457667
>>
>> That said, if you have applications that do not do any of this, you can roll
>> the dice and use a file system like ext3 that will periodically push data
>> out of the page cache for you.
>>
>> Note that without the barrier mount option, that is not sufficient to push
>> data to platter, just moves it down the line to the next potentially
>> volatile cache :)  Even then, 4 out of every 5 seconds, your application
>> will be certain to lose data if the box crashes while it is writing data.
>> Lots of applications don't actually use the file system much (or write
>> much), so ext3's sync behaviour helped mask poorly written applications
>> pretty effectively for quite a while.
>>
>> There really is no short cut to doing the job right - your applications need
>> to use the correct calls and we all need to configure the file and storage
>> stack correctly.
>>
>> Thanks!
>>
>> Ric
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-06-11 17:20 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11  9:56 Questions about XFS Steve Bergman
2013-06-11 13:10 ` Emmanuel Florac
2013-06-11 13:35 ` Stefan Ring
2013-06-11 13:52 ` Ric Wheeler
2013-06-11 13:59 ` Ric Wheeler
2013-06-11 16:12   ` Steve Bergman
2013-06-11 17:19     ` Ric Wheeler [this message]
2013-06-11 17:27       ` Stefan Ring
2013-06-11 17:31         ` Ric Wheeler
2013-06-11 17:41           ` Stefan Ring
2013-06-11 18:03             ` Eric Sandeen
2013-06-11 19:30           ` Steve Bergman
2013-06-11 21:03             ` Dave Chinner
2013-06-11 21:43               ` Steve Bergman
2013-06-11 17:59         ` Ben Myers
2013-06-11 17:28     ` Eric Sandeen
2013-06-11 19:17       ` Steve Bergman
2013-06-11 21:47         ` Dave Chinner
2013-07-22 14:59       ` Steve Bergman
2013-07-22 15:16         ` Steve Bergman
2013-06-12  8:26     ` Roger Oberholtzer
2013-06-12 10:34       ` Ric Wheeler
2013-06-12 13:52         ` Roger Oberholtzer
2013-06-12 12:12       ` Stan Hoeppner
2013-06-12 13:48         ` Roger Oberholtzer
2013-06-13  0:48       ` Dave Chinner
2013-06-11 19:35 ` Ben Myers
2013-06-11 19:55   ` Steve Bergman
2013-06-11 20:08     ` Ben Myers
2013-06-11 21:57     ` Matthias Schniedermeyer
2013-06-11 22:18       ` Steve Bergman
  -- strict thread matches above, loose matches on Subject: below --
2013-10-25 14:28 harryxiyou
2013-10-25 14:42 ` Emmanuel Florac
2013-10-25 14:57   ` Eric Sandeen
2013-10-25 16:24     ` harryxiyou
2013-10-25 16:44     ` harryxiyou
2013-10-26 10:41     ` Stan Hoeppner
2013-10-27  3:29       ` Eric Sandeen
2013-10-25 16:13   ` harryxiyou
2013-10-25 16:16     ` Eric Sandeen
2007-03-13 13:40 clflush
2007-03-13 15:36 ` Klaus Strebel
2007-03-13 15:53 ` Stein M. Hugubakken
2007-03-13 15:55 ` Eric Sandeen
2007-03-14 16:33 ` Stewart Smith
2007-03-15  4:26   ` Taisuke Yamada
2007-03-15  9:07     ` clflush
2007-03-15 14:41       ` Geir A. Myrestrand
2007-03-16 10:36       ` Martin Steigerwald
2007-03-17  0:47         ` Jason White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B75C39.3030306@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=sbergman27@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox