public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Theodore Tso <tytso@mit.edu>, Eric Sandeen <sandeen@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes
Date: Tue, 20 May 2008 16:13:06 +0100	[thread overview]
Message-ID: <20080520151306.GF16676@shareable.org> (raw)
In-Reply-To: <20080519002838.GB8335@mit.edu>

Theodore Tso wrote:
> On Fri, May 16, 2008 at 11:03:15PM +0100, Jamie Lokier wrote:
> > The MacOS X folks decided that speed is most important for fsync().
> > fsync() does not guarantee commit to platter.  *But* they added an
> > fcntl() for applications to request a commit to platter, which SQLite
> > at least uses.  I don't know if MacOS X uses barriers for filesystem
> > operations.
> 
> Out of curiosity, exactly *what* semantics did MacOS X give fsync(),
> then?  Did it simply start the process of staging writes to disk, but
> not wait for the writes to hit the platter before returning?  That's
> basically the equivalent of ext3's barrier=0.

I haven't read the code and don't use MacOS myself.

>From its fcntl() man page:

    Note that while fsync() will flush all data from the host to the
    drive (i.e. the "permanent storage device"), the drive itself may
    not physically write the data to the platters for quite some time
    and it may be written in an out-of-order sequence.

    Specifically, if the drive loses power or the OS crashes, the
    application may find that only some or none of their data was
    written. The disk drive may also re-order the data so that later
    writes may be present while earlier writes are not.

    This is not a theoretical edge case. This scenario is easily
    reproduced with real world workloads and drive power failures.

    For applications that require tighter guarantess about the
    integrity of their data, MacOS X provides the F_FULLFSYNC
    fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered
    data to permanent storage.  Applications such as databases that
    require a strict ordering of writes should use F_FULLFSYNC to
    ensure their data is written in the order they expect. Please see
    fcntl(2) for more detail.

Some notable things:

   1. Para 2 says "if the drive loses power __or the OS crashes__".
      Does this mean some drives will abandon cached writes when reset
      despite retaining power?

   2. Para 3 to be re-read by the skeptical.

   3. Para 4 perpetuates the confused idea that write ordering is what
      it's all about, for things like databases.  In fact, sometimes
      ordering barriers are all that's needed and flush is unnecessary
      performance baggage.  But sometimes an fsync() which only
      guarantees ordering is insufficient.  An "ideal"
      database-friendly block layer would offer both.

I doubt if common unix mail transports use F_FULLSYNC on Darwin
instead of fsync(), before reporting a mail received safely, but they
probably should.  I recall SQLite does use it (unless I'm confusing
it with some other database).

-- Jamie

  reply	other threads:[~2008-05-20 15:13 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
2008-05-16 19:05 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
2008-05-19  8:58   ` Pavel Machek
2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 22:15   ` Jamie Lokier
2008-05-16 19:08 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
2008-05-16 19:09 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-20  2:34   ` Theodore Tso
2008-05-20 15:43     ` Jamie Lokier
2008-05-20 15:52       ` Eric Sandeen
2008-05-20 20:14         ` Jens Axboe
2008-05-20 19:54       ` Jens Axboe
2008-05-20 22:02         ` Jamie Lokier
2008-05-21  7:30           ` Jens Axboe
2008-05-16 20:05 ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Andrew Morton
2008-05-16 20:53   ` Eric Sandeen
2008-05-16 20:58     ` Andrew Morton
2008-05-16 21:45       ` Jamie Lokier
2008-05-16 22:03         ` Eric Sandeen
2008-05-16 22:09           ` Jamie Lokier
2008-05-16 22:03     ` Jamie Lokier
2008-05-16 22:21       ` Eric Sandeen
2008-05-16 22:53         ` Jamie Lokier
2008-05-17  0:20           ` Theodore Tso
2008-05-17  0:35             ` Andrew Morton
2008-05-17 13:43               ` Theodore Tso
2008-05-17 17:59                 ` Andreas Dilger
2008-05-17 20:44                 ` Theodore Tso
2008-05-20 14:45                   ` Jamie Lokier
2008-05-18  0:48               ` Chris Mason
2008-05-18  1:36                 ` Theodore Tso
2008-05-18 14:49                   ` Ric Wheeler
2008-05-20 14:42                     ` Jamie Lokier
2008-05-20 23:48                     ` Jamie Lokier
2008-05-20 23:44                 ` Jamie Lokier
2008-05-18 20:03         ` Andi Kleen
2008-05-19  0:43           ` Theodore Tso
2008-05-19  2:29             ` Eric Sandeen
2008-05-19  4:11               ` Andrew Morton
2008-05-19 17:16                 ` Chris Mason
2008-05-19 18:39                   ` Chris Mason
2008-05-19 22:39                     ` Jan Kara
2008-05-20  0:29                       ` Chris Mason
2008-05-20  3:29                         ` Timothy Shimmin
2008-05-20 12:04                           ` Chris Mason
2008-05-20  8:25                     ` Jens Axboe
2008-05-20 12:17                       ` Chris Mason
2008-05-21 11:22                     ` Pavel Machek
2008-05-21 12:32                       ` Theodore Tso
2008-05-21 18:03                       ` Andrew Morton
2008-05-21 18:15                         ` Eric Sandeen
2008-05-21 19:43                           ` Jamie Lokier
2008-05-21 18:29                         ` Theodore Tso
2008-05-21 18:49                           ` Andrew Morton
2008-05-21 19:42                             ` Jamie Lokier
2008-05-21 19:36                           ` Jamie Lokier
2008-05-21 19:40                             ` Chris Mason
2008-05-21 19:54                         ` Jamie Lokier
2008-05-20 14:58                   ` Jamie Lokier
2008-05-21 22:30                   ` Daniel Phillips
2008-05-20 23:35               ` Jamie Lokier
2008-05-19  0:28       ` Theodore Tso
2008-05-20 15:13         ` Jamie Lokier [this message]
2008-05-21 20:25           ` Greg Smith
2008-05-16 22:30   ` Jamie Lokier
2008-05-18 19:54   ` Andi Kleen
2008-05-19 13:26     ` Chris Mason
2008-05-19 14:46       ` Theodore Tso
2008-05-20  2:51         ` [PATCH, RFC] ext4: Fix use of write barrier in commit logic Theodore Tso
2008-05-20 15:23           ` Jamie Lokier
2008-05-23 18:33         ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Ric Wheeler
2008-05-20 15:36       ` Jamie Lokier
2008-05-20 16:02         ` Chris Mason
2008-05-20 16:27           ` Jamie Lokier
2008-05-20 17:08             ` Chris Mason
2008-05-20 22:26               ` Jamie Lokier
2008-05-19  9:04   ` Pavel Machek
2008-05-29 13:36   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080520151306.GF16676@shareable.org \
    --to=jamie@shareable.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox