From: Jamie Lokier <jamie@shareable.org>
To: Bryan Henderson <hbryan@us.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, Nick Piggin <npiggin@suse.de>
Subject: Re: [rfc] fsync_range?
Date: Wed, 21 Jan 2009 21:08:55 +0000 [thread overview]
Message-ID: <20090121210855.GC16133@shareable.org> (raw)
In-Reply-To: <OF7733FB07.76FC26D6-ON88257545.006AA6A8-88257545.006C6246@us.ibm.com>
Bryan Henderson wrote:
> >- although that will cause unnecessary I/O barriers, one per
> >fsync_range().
>
> What do I/O barriers have to do with it? An I/O barrier says, "don't
> harden later writes before these have hardened," whereas fsync_range()
> says, "harden these writes now." Does Linux these days send an I/O
> barrier to the block subsystem and/or device as part of fsync()?
For better or worse, I/O barriers and I/O flushes are the same thing
in the Linux block layer. I've argued for treating them distinctly,
because there are different I/O scheduling opportunities around each
of them, but there wasn't much interest.
> Or are we talking about the command to the device to harden all earlier
> writes (now) against a device power loss? Does fsync() do that?
Ultimately that's what we're talking about, yes. Imho fsync() should
do that, because a userspace database/filesystem should have access to
the same integrity guarantees as an in-kernel filesystem. Linux
fsync() doesn't always send the command - it's a bit unpredictable
last time I looked.
There are other opinions. MacOSX fsync() doesn't - because it has an
fcntl() which is a stronger version of fsync() documented for that
case. They preferred reduced integrity of fsync() to keep benchmarks
on par with other OSes which don't send the command.
Interestingly, Windows _does_ have the option to send the command to
the device, controlled by userspace. If you set the Windows
equivalents to O_DSYNC and O_DIRECT at the same time, then calls to
the Windows equivalent to fdatasync() cause an I/O barrier command to
be sent to the disk if necessary. The Windows documentation even
explain the different between OS caching and device caching and when
each one occurs, too. Wow - it looks like Windows (later versions)
has the edge in doing the right thing here for quite some time...
http://www.microsoft.com/sql/alwayson/storage-requirements.mspx
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx
> Either way, I can see that multiple fsync_ranges's in a row would be a
> little worse than just one, but it's pretty bad problem anyway, so I don't
> know if you could tell the difference.
A little? It's the difference between letting the disk schedule 100
scattered writes itself, and forcing the disk to write them in the
order you sent them from userspace, aside from the doubling the rate
of device commands...
-- Jamie
next prev parent reply other threads:[~2009-01-21 21:09 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-20 16:47 [rfc] fsync_range? Nick Piggin
2009-01-20 18:31 ` Jamie Lokier
2009-01-20 21:25 ` Bryan Henderson
2009-01-20 22:42 ` Jamie Lokier
2009-01-21 19:43 ` Bryan Henderson
2009-01-21 21:08 ` Jamie Lokier [this message]
2009-01-21 22:44 ` Bryan Henderson
2009-01-21 23:31 ` Jamie Lokier
2009-01-21 1:36 ` Nick Piggin
2009-01-21 19:58 ` Bryan Henderson
2009-01-21 20:53 ` Jamie Lokier
2009-01-21 22:14 ` Bryan Henderson
2009-01-21 22:30 ` Jamie Lokier
2009-01-22 1:52 ` Bryan Henderson
2009-01-22 3:41 ` Jamie Lokier
2009-01-21 1:29 ` Nick Piggin
2009-01-21 3:15 ` Jamie Lokier
2009-01-21 3:48 ` Nick Piggin
2009-01-21 5:24 ` Jamie Lokier
2009-01-21 6:16 ` Nick Piggin
2009-01-21 11:18 ` Jamie Lokier
2009-01-21 11:41 ` Nick Piggin
2009-01-21 12:09 ` Jamie Lokier
2009-01-21 4:16 ` Nick Piggin
2009-01-21 4:59 ` Jamie Lokier
2009-01-21 6:23 ` Nick Piggin
2009-01-21 12:02 ` Jamie Lokier
2009-01-21 12:13 ` Theodore Tso
2009-01-21 12:37 ` Jamie Lokier
2009-01-21 14:12 ` Theodore Tso
2009-01-21 14:35 ` Chris Mason
2009-01-21 15:58 ` Eric Sandeen
2009-01-21 20:41 ` Jamie Lokier
2009-01-21 21:23 ` jim owens
2009-01-21 21:59 ` Jamie Lokier
2009-01-21 23:08 ` btrfs O_DIRECT was " jim owens
2009-01-22 0:06 ` Jamie Lokier
2009-01-22 13:50 ` jim owens
2009-01-22 21:18 ` Florian Weimer
2009-01-22 21:23 ` Florian Weimer
2009-01-21 3:25 ` Jamie Lokier
2009-01-21 3:52 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090121210855.GC16133@shareable.org \
--to=jamie@shareable.org \
--cc=hbryan@us.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).