From: Jamie Lokier <jamie@shareable.org>
To: Jeff Garzik <jeff@garzik.org>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Chris Wedgwood <cw@f00f.org>
Subject: Re: Proposal for "proper" durable fsync() and fdatasync()
Date: Tue, 26 Feb 2008 07:55:26 +0000 [thread overview]
Message-ID: <20080226075526.GF30238@shareable.org> (raw)
In-Reply-To: <47C3C33F.1070908@garzik.org>
Jeff Garzik wrote:
> Jamie Lokier wrote:
> >By durable, I mean that fsync() should actually commit writes to
> >physical stable storage,
>
> Yes, it should.
Glad we agree :-)
> >I was surprised that fsync() doesn't do this already. There was a lot
> >of effort put into block I/O write barriers during 2.5, so that
> >journalling filesystems can force correct write ordering, using disk
> >flush cache commands.
> >
> >After all that effort, I was very surprised to notice that Linux 2.6.x
> >doesn't use that capability to ensure fsync() flushes the disk cache
> >onto stable storage.
>
> It's surprising you are surprised, given that this [lame] fsync behavior
> has remaining consistently lame throughout Linux's history.
I was surprised because of the effort put into IDE write barriers to
get it right for in-kernel filesystems, and the messages in 2004
telling concerned users that fsync would use barriers in 2.6, which it
does sometimes but not always.
> [snip huge long proposal]
>
> Rather than invent new APIs, we should fix the existing ones to _really_
> flush data to physical media.
>
> Linux should default to SAFE data storage, and permit users to retain
> the older unsafe behavior via an option. It's completely ridiculous
> that we default to an unsafe fsync.
Well, I agree with you. Which is why the "new API" I suggested, being
really just an extension of an existing one, allows fsync() to be SAFE
if that's what people want.
To be fair, fsync() is rather overkill for some apps.
sync_file_range() is obviously the right place for fine tuning "less
safe" variations.
> And [anticipating a common response from others] it is completely
> irrelevant that POSIX fsync(2) permits Linux's current behavior. The
> current behavior is unsafe.
>
> Safety before performance -- ESPECIALLY when it comes to storing user data.
Especially now that people work a lot in guest VMs, where the IDE
barrier stuff doesn't work if the host fdatasync() doesn't work.
Since it happened with Mac OS X, I wouldn't be surprised if changing
fsync() and just that wasn't popular. Heck, you already get people
asking "how to turn off fsync in PostGreSQL"... (Haven't those people
heard of transactions...?)
But with changes to sync_file_range() [or whatever... I don't care] to
support database's finely tuned commit needs, and then adoption of
that by database vendors, perhaps nobody will mind fsync() becoming
safe then.
Nobody seems bothered by it's performance for other things.
-- Jamie
next prev parent reply other threads:[~2008-02-26 7:55 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-26 7:26 Proposal for "proper" durable fsync() and fdatasync() Jamie Lokier
2008-02-26 7:43 ` Andrew Morton
2008-02-26 7:59 ` Jamie Lokier
2008-02-26 9:16 ` Nick Piggin
2008-02-26 14:09 ` Jörn Engel
2008-02-26 15:07 ` Jamie Lokier
2008-02-26 16:27 ` Andrew Morton
2008-02-26 15:28 ` Jamie Lokier
2008-02-26 17:02 ` Jörn Engel
2008-02-26 17:29 ` Jamie Lokier
2008-02-26 17:38 ` Jörn Engel
2008-02-26 16:43 ` Jeff Garzik
2008-02-26 17:00 ` Jamie Lokier
2008-02-26 17:54 ` Jeff Garzik
2008-02-27 14:16 ` Jamie Lokier
2008-02-26 7:43 ` Jeff Garzik
2008-02-26 7:55 ` Jamie Lokier [this message]
2008-02-26 9:25 ` Jamie Lokier
2008-02-26 12:13 ` Ric Wheeler
2008-02-26 15:43 ` Jamie Lokier
2008-11-24 21:10 ` Sachin Gaikwad
2008-11-25 10:17 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080226075526.GF30238@shareable.org \
--to=jamie@shareable.org \
--cc=cw@f00f.org \
--cc=jeff@garzik.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).