From: Ric Wheeler <rwheeler@redhat.com>
To: Howard Chu <hyc@symas.com>
Cc: General Discussion of SQLite Database <sqlite-users@sqlite.org>,
David Lang <david@lang.hm>, Vladislav Bolkhovitin <vst@vlnb.net>,
"Theodore Ts'o" <tytso@mit.edu>, Richard Hipp <drh@hwaci.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [sqlite] light weight write barriers
Date: Fri, 16 Nov 2012 10:31:52 -0500 [thread overview]
Message-ID: <50A65C68.6080001@redhat.com> (raw)
In-Reply-To: <50A65681.8000204@symas.com>
On 11/16/2012 10:06 AM, Howard Chu wrote:
> David Lang wrote:
>> barriers keep getting mentioned because they are a easy concept to understand.
>> "do this set of stuff before doing any of this other set of stuff, but I don't
>> care when any of this gets done" and they fit well with the requirements of the
>> users.
>>
>> Users readily accept that if the system crashes, they will loose the most recent
>> stuff that they did,
>
> *some* users may accept that. *None* should.
>
>> but they get annoyed when things get corrupted to the point
>> that they loose the entire file.
>>
>> this includes things like modifying one option and a crash resulting in the
>> config file being blank. Yes, you can do the 'write to temp file, sync file,
>> sync directory, rename file" dance, but the fact that to do so the user must sit
>> and wait for the syncs to take place can be a problem. It would be far better to
>> be able to say "write to temp file, and after it's on disk, rename the file" and
>> not have the user wait. The user doesn't really care if the changes hit disk
>> immediately, or several seconds (or even 10s of seconds) later, as long as there
>> is not any possibility of the rename hitting disk before the file contents.
>>
>> The fact that this could be implemented in multiple ways in the existing
>> hardware does not mean that there need to be multiple ways exposed to userspace,
>> it just means that the cost of doing the operation will vary depending on the
>> hardware that you have. This also means that if new hardware introduces a new
>> way of implementing this, that improvement can be passed on to the users without
>> needing application changes.
>
> There are a couple industry failures here:
>
> 1) the drive manufacturers sell drives that lie, and consumers accept it
> because they don't know better. We programmers, who know better, have failed
> to raise a stink and demand that this be fixed.
> A) Drives should not lose data on power failure. If a drive accepts a write
> request and says "OK, done" then that data should get written to stable
> storage, period. Whether it requires capacitors or some other onboard power
> supply, or whatever, they should just do it. Keep in mind that today, most of
> the difference between enterprise drives and consumer desktop drives is just a
> firmware change, that hardware is already identical. Nobody should accept a
> product that doesn't offer this guarantee. It's inexcusable.
> B) it should go without saying - drives should reliably report back to the
> host, when something goes wrong. E.g., if a write request has been accepted,
> cached, and reported complete, but then during the actual write an ECC failure
> is detected in the cacheline, the drive needs to tell the host "oh by the way,
> block XXX didn't actually make it to disk like I told you it did 10ms ago."
>
> If the entire software industry were to simply state "your shit stinks and
> we're not going to take it any more" the hard drive industry would have no
> choice but to fix it. And in most cases it would be a zero-cost fix for them.
>
> Once you have drives that are actually trustworthy, actually reliable (which
> doesn't mean they never fail, it only means they tell the truth about
> successes or failures), most of these other issues disappear. Most of the need
> for barriers disappear.
>
I think that you are arguing a fairly silly point.
If you want that behaviour, you have had it for more than a decade - simply
disable the write cache on your drive and you are done.
If you - as a user - want to run faster and use applications that are coded to
handle data integrity properly (fsync, fdatasync, etc), leave the write cache
enabled and use file system barriers.
Everyone has to trade off cost versus something else and this is a very, very
long standing trade off that drive manufacturers have made.
The more money you pay for your storage, the less likely this is to be an issue
(high end SSD's, enterprise class arrays, etc don't have volatile write caches
and most SAS drives perform reasonably well with the write cache disabled).
Regards,
Ric
next prev parent reply other threads:[~2012-11-16 15:31 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <415E76CC-A53D-4643-88AB-3D7D7DC56F98@dubeyko.com>
2012-10-06 13:54 ` [PATCH 00/16] f2fs: introduce flash-friendly file system Vyacheslav Dubeyko
2012-10-06 20:06 ` Jaegeuk Kim
2012-10-07 7:09 ` Marco Stornelli
2012-10-07 9:31 ` Jaegeuk Kim
2012-10-07 12:08 ` Vyacheslav Dubeyko
2012-10-08 8:25 ` Jaegeuk Kim
2012-10-08 9:59 ` Namjae Jeon
2012-10-08 10:52 ` Jaegeuk Kim
2012-10-08 11:21 ` Namjae Jeon
2012-10-08 12:11 ` Jaegeuk Kim
2012-10-09 3:52 ` Namjae Jeon
2012-10-09 8:00 ` Jaegeuk Kim
2012-10-09 8:31 ` Lukáš Czerner
2012-10-09 10:45 ` Jaegeuk Kim
2012-10-09 11:01 ` Lukáš Czerner
2012-10-09 12:01 ` Jaegeuk Kim
2012-10-09 12:39 ` Lukáš Czerner
2012-10-09 13:10 ` Jaegeuk Kim
2012-10-09 21:20 ` Dave Chinner
2012-10-10 2:32 ` Jaegeuk Kim
2012-10-10 4:53 ` Theodore Ts'o
2012-10-12 20:55 ` Arnd Bergmann
2012-10-10 10:36 ` David Woodhouse
2012-10-12 20:58 ` Arnd Bergmann
2012-10-13 4:26 ` Namjae Jeon
2012-10-13 12:37 ` Jaegeuk Kim
2012-10-17 11:12 ` Namjae Jeon
[not found] ` <000001cdacef$b2f6eaa0$18e4bfe0$%kim@samsung.com>
2012-10-18 13:39 ` Vyacheslav Dubeyko
2012-10-18 22:14 ` Jaegeuk Kim
2012-10-19 9:20 ` NeilBrown
2012-10-08 19:22 ` Vyacheslav Dubeyko
2012-10-09 7:08 ` Jaegeuk Kim
2012-10-09 19:53 ` Jooyoung Hwang
2012-10-10 8:05 ` Vyacheslav Dubeyko
2012-10-10 9:02 ` Theodore Ts'o
2012-10-10 11:52 ` SQLite on flash (was: [PATCH 00/16] f2fs: introduce flash-friendly file system) Clemens Ladisch
[not found] ` <50756199.1090103-P6GI/4k7KOmELgA04lAiVw@public.gmane.org>
2012-10-10 12:47 ` Richard Hipp
2012-10-10 17:17 ` light weight write barriers Andi Kleen
[not found] ` <m2fw5mtffg.fsf_-_-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
2012-10-10 17:48 ` Richard Hipp
[not found] ` <CALwJ=MyR+nU3zqi3V3JMuEGNwd8FUsw9xLACJvd0HoBv3kRi0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-11 16:38 ` Nico Williams
[not found] ` <CAK3OfOi3E1ePfzWjq1epFaXsjtn8V_=r3h+PG6ankWW2fOr6GA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-11 16:48 ` Nico Williams
2012-10-11 16:32 ` 杨苏立 Yang Su Li
2012-10-11 17:41 ` [sqlite] " Christoph Hellwig
2012-10-23 19:53 ` Vladislav Bolkhovitin
[not found] ` <5086F5A7.9090406-d+Crzxg7Rs0@public.gmane.org>
2012-10-24 21:17 ` Nico Williams
2012-10-24 22:03 ` [sqlite] " david
[not found] ` <alpine.DEB.2.02.1210241447210.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 0:20 ` Nico Williams
2012-10-25 1:04 ` [sqlite] " david
[not found] ` <alpine.DEB.2.02.1210241748180.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 5:18 ` Nico Williams
2012-10-25 6:02 ` [sqlite] " Theodore Ts'o
2012-10-25 6:58 ` david
[not found] ` <alpine.DEB.2.02.1210242331060.31862-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 14:03 ` Theodore Ts'o
[not found] ` <20121025140327.GB13562-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-25 18:03 ` david-gFPdbfVZQbY
[not found] ` <alpine.DEB.2.02.1210251048280.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 18:29 ` Theodore Ts'o
2012-11-05 20:03 ` [sqlite] " Pavel Machek
[not found] ` <20121105200348.GB15821-5NIqAleC692hcjWhqY66xCZi+YwRKgec@public.gmane.org>
2012-11-05 22:04 ` Theodore Ts'o
[not found] ` <20121105220440.GB25378-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-11-05 22:37 ` Richard Hipp
[not found] ` <CALwJ=Mx-uEFLXK2wywekk=0dwrwVFb68wocnH9bjXJmHRsJx3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-05 23:00 ` Theodore Ts'o
2012-10-30 23:49 ` [sqlite] " Nico Williams
2012-10-25 5:42 ` Theodore Ts'o
2012-10-25 7:11 ` david
2012-10-27 1:52 ` Vladislav Bolkhovitin
2012-10-25 5:14 ` Theodore Ts'o
2012-10-25 13:03 ` Alan Cox
[not found] ` <20121025140325.49cd7c79-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-10-25 13:50 ` Theodore Ts'o
2012-10-27 1:55 ` [sqlite] " Vladislav Bolkhovitin
2012-10-27 1:54 ` Vladislav Bolkhovitin
[not found] ` <508B3EED.2080003-d+Crzxg7Rs0@public.gmane.org>
2012-10-27 4:44 ` Theodore Ts'o
2012-10-30 22:22 ` [sqlite] " Vladislav Bolkhovitin
[not found] ` <5090532D.4050902-d+Crzxg7Rs0@public.gmane.org>
2012-10-31 9:54 ` Alan Cox
2012-11-01 20:18 ` [sqlite] " Vladislav Bolkhovitin
[not found] ` <5092D90F.7020105-d+Crzxg7Rs0@public.gmane.org>
2012-11-01 21:24 ` Alan Cox
2012-11-02 0:15 ` [sqlite] " Vladislav Bolkhovitin
[not found] ` <20121101212418.140e3a82-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-11-02 0:38 ` Howard Chu
[not found] ` <50931601.4060102-aQkYFu9vm6AAvxtiuMwx3w@public.gmane.org>
2012-11-02 12:24 ` Richard Hipp
2012-11-13 3:41 ` [sqlite] " Vladislav Bolkhovitin
2012-11-02 12:33 ` Alan Cox
2012-11-13 3:41 ` [sqlite] " Vladislav Bolkhovitin
[not found] ` <50A1C15E.2080605-d+Crzxg7Rs0@public.gmane.org>
2012-11-13 17:40 ` Alan Cox
[not found] ` <20121113174000.6457a68b-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-11-13 19:13 ` Nico Williams
2012-11-15 1:17 ` [sqlite] " Vladislav Bolkhovitin
[not found] ` <50A442AF.9020407-d+Crzxg7Rs0@public.gmane.org>
2012-11-15 12:07 ` David Lang
[not found] ` <alpine.DEB.2.02.1211150353080.32408-UEhY+ZBZOcqqLGM74eQ/YA@public.gmane.org>
2012-11-15 16:14 ` 杨苏立 Yang Su Li
2012-11-17 5:02 ` [sqlite] " Vladislav Bolkhovitin
2012-11-16 15:06 ` Howard Chu
2012-11-16 15:31 ` Ric Wheeler [this message]
[not found] ` <50A65C68.6080001-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-11-16 15:54 ` Howard Chu
2012-11-16 18:03 ` [sqlite] " Ric Wheeler
[not found] ` <50A65681.8000204-aQkYFu9vm6AAvxtiuMwx3w@public.gmane.org>
2012-11-16 19:14 ` David Lang
2012-11-17 5:02 ` [sqlite] " Vladislav Bolkhovitin
2012-11-15 17:06 ` Ryan Johnson
2012-11-15 22:35 ` [sqlite] " Chris Friesen
2012-11-17 5:02 ` Vladislav Bolkhovitin
2012-11-20 1:23 ` Vladislav Bolkhovitin
2012-11-26 20:05 ` Nico Williams
[not found] ` <CAK3OfOjD4XBGfu3cnMwTvCfec0Lvg3zrO16+pXtiFF4UWpFjDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-29 2:15 ` Vladislav Bolkhovitin
2012-11-15 1:16 ` [sqlite] " Vladislav Bolkhovitin
2012-11-13 3:37 ` Vladislav Bolkhovitin
2012-11-11 4:25 ` 杨苏立 Yang Su Li
2012-11-13 3:42 ` [sqlite] " Vladislav Bolkhovitin
2012-10-10 7:57 ` [PATCH 00/16] f2fs: introduce flash-friendly file system Vyacheslav Dubeyko
2012-10-10 9:43 ` Jaegeuk Kim
2012-10-11 3:14 ` Namjae Jeon
[not found] ` <CAN863PuyMkSZtZCvqX+kwei9v=rnbBYVYr3TqBXF_6uxwJe2_Q@mail.gmail.com>
2012-10-17 11:13 ` Namjae Jeon
2012-10-17 23:06 ` Changman Lee
2012-10-12 12:30 ` Vyacheslav Dubeyko
2012-10-12 14:25 ` Jaegeuk Kim
2012-10-07 10:15 ` Vyacheslav Dubeyko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A65C68.6080001@redhat.com \
--to=rwheeler@redhat.com \
--cc=david@lang.hm \
--cc=drh@hwaci.com \
--cc=hyc@symas.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sqlite-users@sqlite.org \
--cc=tytso@mit.edu \
--cc=vst@vlnb.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).