From: Jamie Lokier <jamie@shareable.org>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest
Date: Tue, 26 Feb 2008 07:32:57 +0000 [thread overview]
Message-ID: <20080226073257.GC30238@shareable.org> (raw)
In-Reply-To: <20080226011639.GA20401@puku.stupidest.org>
[To qemu-devel and Chris, I have started a thread on linux-kernel on
this topic. I've copied the first few paragraphs here, so you can see
what it's about since it's a response to a post here. But it's
largely off topic for Qemu, and on topic for linux-kernel, so I didn't
cross post lest linux-kernel replies come here.]
To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Proposal for "proper" durable fsync() and fdatasync()
Message-ID: <20080226072649.GB30238@shareable.org>
Date: Tue, 26 Feb 2008 07:26:49 +0000
Dear kernel,
This is a proposal to add "proper" durable fsync() and fdatasync() to Linux.
First the problem, then a proposed solution "with benefits", so to speak.
[...]
By durable, I mean that fsync() should actually commit writes to
physical stable storage, not just the disk write cache when that is
enabled. Databases and guest VMs needs this, or an equivalent
feature, if they aren't to face occasional corruption after power
failure and perhaps some crashes.
The alternative is to disable the disk write cache. But that isn't
modern practice or recommendation, since I/O write barriers were
implemented and they are much faster.
I was surprised that fsync() doesn't do this already. There was a lot
of effort put into block I/O write barriers during 2.5, so that
journalling filesystems can force correct write ordering, using disk
flush cache commands.
After all that effort, I was very surprised to notice that Linux 2.6.x
doesn't use that capability to ensure fsync() flushes the disk cache
onto stable storage.
I noticed this following up discussions on the Qemu mailing list,
about guest VMs and how their IDE flush cache command should translate
to fsync() to avoid data loss. (For guest VMs, fsync() isn't
necessary if the host machine is fine, and it isn't enough (on Linux
host) if the host machine loses power or the hard disk crashes another
way.)
Then I noticed it again, when I was designing a database engine with
filesystem characteristics. I thought "how do I ensure ordered
journal writes; can I use fdatasync()?" and was surprised to find the
answer is no, I have to use hacks like calling hdparm, and the authors
of major SQL databases seem to brush the problem under a carpet.
(Interestingly, in the Linux 2.4 patches for write barriers, fsync()
seems to be fine, if a bit slow.)
It isn't the first time this topic has come up:
http://groups.google.com.br/group/linux.kernel/browse_thread/thread/d343e51655b4ac7c/7ee9bca80977c2d1?#7ee9bca80977c2d1
("True fsync() in Linux (on IDE)")
In that thread, it was implied that would be fixed in 2.6. So I bet
some people are under the illusion that it's fixed in 2.6...
For a while, I've been meaning to bring it up on linux-kernel...
[More on linux-kernel].
Thanks,
-- Jamie
next prev parent reply other threads:[~2008-02-26 7:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-25 18:13 [Qemu-devel] [PATCH] ide.c make write cacheing controllable by guest Ian Jackson
2008-02-25 20:50 ` Jamie Lokier
2008-02-26 1:16 ` Chris Wedgwood
2008-02-26 7:32 ` Jamie Lokier [this message]
2008-02-26 12:15 ` Ian Jackson
2008-02-26 12:49 ` Jamie Lokier
2008-02-26 16:57 ` Ian Jackson
2008-02-26 17:25 ` Jamie Lokier
2008-02-26 18:11 ` Ian Jackson
-- strict thread matches above, loose matches on Subject: below --
2008-03-27 18:02 Ian Jackson
2008-03-27 18:16 ` Paul Brook
2008-03-28 9:38 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080226073257.GC30238@shareable.org \
--to=jamie@shareable.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).