linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Howard Chu <hyc@symas.com>
Cc: General Discussion of SQLite Database <sqlite-users@sqlite.org>,
	David Lang <david@lang.hm>, Vladislav Bolkhovitin <vst@vlnb.net>,
	"Theodore Ts'o" <tytso@mit.edu>, Richard Hipp <drh@hwaci.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [sqlite] light weight write barriers
Date: Fri, 16 Nov 2012 13:03:02 -0500	[thread overview]
Message-ID: <50A67FD6.1030108@redhat.com> (raw)
In-Reply-To: <50A661D0.4030200@symas.com>

On 11/16/2012 10:54 AM, Howard Chu wrote:
> Ric Wheeler wrote:
>> On 11/16/2012 10:06 AM, Howard Chu wrote:
>>> David Lang wrote:
>>>> barriers keep getting mentioned because they are a easy concept to understand.
>>>> "do this set of stuff before doing any of this other set of stuff, but I don't
>>>> care when any of this gets done" and they fit well with the requirements of 
>>>> the
>>>> users.
>>>>
>>>> Users readily accept that if the system crashes, they will loose the most 
>>>> recent
>>>> stuff that they did,
>>>
>>> *some* users may accept that. *None* should.
>>>
>>>> but they get annoyed when things get corrupted to the point
>>>> that they loose the entire file.
>>>>
>>>> this includes things like modifying one option and a crash resulting in the
>>>> config file being blank. Yes, you can do the 'write to temp file, sync file,
>>>> sync directory, rename file" dance, but the fact that to do so the user 
>>>> must sit
>>>> and wait for the syncs to take place can be a problem. It would be far 
>>>> better to
>>>> be able to say "write to temp file, and after it's on disk, rename the 
>>>> file" and
>>>> not have the user wait. The user doesn't really care if the changes hit disk
>>>> immediately, or several seconds (or even 10s of seconds) later, as long as 
>>>> there
>>>> is not any possibility of the rename hitting disk before the file contents.
>>>>
>>>> The fact that this could be implemented in multiple ways in the existing
>>>> hardware does not mean that there need to be multiple ways exposed to 
>>>> userspace,
>>>> it just means that the cost of doing the operation will vary depending on the
>>>> hardware that you have. This also means that if new hardware introduces a new
>>>> way of implementing this, that improvement can be passed on to the users 
>>>> without
>>>> needing application changes.
>>>
>>> There are a couple industry failures here:
>>>
>>> 1) the drive manufacturers sell drives that lie, and consumers accept it
>>> because they don't know better. We programmers, who know better, have failed
>>> to raise a stink and demand that this be fixed.
>>>    A) Drives should not lose data on power failure. If a drive accepts a write
>>> request and says "OK, done" then that data should get written to stable
>>> storage, period. Whether it requires capacitors or some other onboard power
>>> supply, or whatever, they should just do it. Keep in mind that today, most of
>>> the difference between enterprise drives and consumer desktop drives is just a
>>> firmware change, that hardware is already identical. Nobody should accept a
>>> product that doesn't offer this guarantee. It's inexcusable.
>>>    B) it should go without saying - drives should reliably report back to the
>>> host, when something goes wrong. E.g., if a write request has been accepted,
>>> cached, and reported complete, but then during the actual write an ECC failure
>>> is detected in the cacheline, the drive needs to tell the host "oh by the way,
>>> block XXX didn't actually make it to disk like I told you it did 10ms ago."
>>>
>>> If the entire software industry were to simply state "your shit stinks and
>>> we're not going to take it any more" the hard drive industry would have no
>>> choice but to fix it. And in most cases it would be a zero-cost fix for them.
>>>
>>> Once you have drives that are actually trustworthy, actually reliable (which
>>> doesn't mean they never fail, it only means they tell the truth about
>>> successes or failures), most of these other issues disappear. Most of the need
>>> for barriers disappear.
>>>
>>
>> I think that you are arguing a fairly silly point.
>
> Seems to me that you're arguing that we should accept inferior technology. 
> Who's really being silly?

No, just suggesting that you either pay for the expensive stuff or learn how to 
use cost effective, high capacity storage like the rest of the world.

I don't disagree that having non-volatile write caches would be nice, but 
everyone has learned how to deal with volatile write caches at the low end of 
market.

>
>> If you want that behaviour, you have had it for more than a decade - simply
>> disable the write cache on your drive and you are done.
>
> You seem to believe it's nonsensical for someone to want both fast and 
> reliable writes, or that it's unreasonable for a storage device to offer the 
> same, cheaply. And yet it is clearly trivial to provide all of the above.

I look forward to seeing your products in the market.

Until you have more than "I want" and "I think" on your storage system design 
resume, I suggest you spend the money to get the parts with non-volatile write 
caches or fix your code.

Ric


>> If you - as a user - want to run faster and use applications that are coded to
>> handle data integrity properly (fsync, fdatasync, etc), leave the write cache
>> enabled and use file system barriers.
>
> Applications aren't supposed to need to worry about such details, that's why 
> we have operating systems.
>
> Drives should tell the truth. In event of an error detected after the fact, 
> the drive should report the error back to the host. There's nothing 
> nonsensical there.
>
> When a drive's cache is enabled, the host should maintain a queue of written 
> pages, of a length equal to the size of the drive's cache. If a drive says 
> "hey, block XXX failed" the OS can reissue the write from its own queue. No 
> muss, no fuss, no performance bottlenecks. This is what Real Computers did 
> before the age of VAX Unix.
>
>> Everyone has to trade off cost versus something else and this is a very, very
>> long standing trade off that drive manufacturers have made.
>
> With the cost of storage falling as rapidly as it has in recent years, this is 
> a stupid tradeoff.
>

  reply	other threads:[~2012-11-16 18:03 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <415E76CC-A53D-4643-88AB-3D7D7DC56F98@dubeyko.com>
2012-10-06 13:54 ` [PATCH 00/16] f2fs: introduce flash-friendly file system Vyacheslav Dubeyko
2012-10-06 20:06   ` Jaegeuk Kim
2012-10-07  7:09     ` Marco Stornelli
2012-10-07  9:31       ` Jaegeuk Kim
2012-10-07 12:08         ` Vyacheslav Dubeyko
2012-10-08  8:25           ` Jaegeuk Kim
2012-10-08  9:59             ` Namjae Jeon
2012-10-08 10:52               ` Jaegeuk Kim
2012-10-08 11:21                 ` Namjae Jeon
2012-10-08 12:11                   ` Jaegeuk Kim
2012-10-09  3:52                     ` Namjae Jeon
2012-10-09  8:00                       ` Jaegeuk Kim
2012-10-09  8:31                 ` Lukáš Czerner
2012-10-09 10:45                   ` Jaegeuk Kim
2012-10-09 11:01                     ` Lukáš Czerner
2012-10-09 12:01                       ` Jaegeuk Kim
2012-10-09 12:39                         ` Lukáš Czerner
2012-10-09 13:10                           ` Jaegeuk Kim
2012-10-09 21:20                         ` Dave Chinner
2012-10-10  2:32                           ` Jaegeuk Kim
2012-10-10  4:53                       ` Theodore Ts'o
2012-10-12 20:55                         ` Arnd Bergmann
2012-10-10 10:36                   ` David Woodhouse
2012-10-12 20:58                     ` Arnd Bergmann
2012-10-13  4:26                       ` Namjae Jeon
2012-10-13 12:37                         ` Jaegeuk Kim
2012-10-17 11:12                           ` Namjae Jeon
     [not found]                             ` <000001cdacef$b2f6eaa0$18e4bfe0$%kim@samsung.com>
2012-10-18 13:39                               ` Vyacheslav Dubeyko
2012-10-18 22:14                                 ` Jaegeuk Kim
2012-10-19  9:20                                 ` NeilBrown
2012-10-08 19:22             ` Vyacheslav Dubeyko
2012-10-09  7:08               ` Jaegeuk Kim
2012-10-09 19:53                 ` Jooyoung Hwang
2012-10-10  8:05                   ` Vyacheslav Dubeyko
2012-10-10  9:02                   ` Theodore Ts'o
2012-10-10 11:52                     ` SQLite on flash (was: [PATCH 00/16] f2fs: introduce flash-friendly file system) Clemens Ladisch
     [not found]                       ` <50756199.1090103-P6GI/4k7KOmELgA04lAiVw@public.gmane.org>
2012-10-10 12:47                         ` Richard Hipp
2012-10-10 17:17                           ` light weight write barriers Andi Kleen
     [not found]                             ` <m2fw5mtffg.fsf_-_-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
2012-10-10 17:48                               ` Richard Hipp
     [not found]                                 ` <CALwJ=MyR+nU3zqi3V3JMuEGNwd8FUsw9xLACJvd0HoBv3kRi0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-11 16:38                                   ` Nico Williams
     [not found]                                     ` <CAK3OfOi3E1ePfzWjq1epFaXsjtn8V_=r3h+PG6ankWW2fOr6GA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-11 16:48                                       ` Nico Williams
2012-10-11 16:32                               ` 杨苏立 Yang Su Li
2012-10-11 17:41                                 ` [sqlite] " Christoph Hellwig
2012-10-23 19:53                                 ` Vladislav Bolkhovitin
     [not found]                                   ` <5086F5A7.9090406-d+Crzxg7Rs0@public.gmane.org>
2012-10-24 21:17                                     ` Nico Williams
2012-10-24 22:03                                       ` [sqlite] " david
     [not found]                                         ` <alpine.DEB.2.02.1210241447210.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25  0:20                                           ` Nico Williams
2012-10-25  1:04                                             ` [sqlite] " david
     [not found]                                               ` <alpine.DEB.2.02.1210241748180.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25  5:18                                                 ` Nico Williams
2012-10-25  6:02                                                   ` [sqlite] " Theodore Ts'o
2012-10-25  6:58                                                     ` david
     [not found]                                                       ` <alpine.DEB.2.02.1210242331060.31862-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 14:03                                                         ` Theodore Ts'o
     [not found]                                                           ` <20121025140327.GB13562-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-25 18:03                                                             ` david-gFPdbfVZQbY
     [not found]                                                               ` <alpine.DEB.2.02.1210251048280.8519-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2012-10-25 18:29                                                                 ` Theodore Ts'o
2012-11-05 20:03                                                                   ` [sqlite] " Pavel Machek
     [not found]                                                                     ` <20121105200348.GB15821-5NIqAleC692hcjWhqY66xCZi+YwRKgec@public.gmane.org>
2012-11-05 22:04                                                                       ` Theodore Ts'o
     [not found]                                                                         ` <20121105220440.GB25378-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-11-05 22:37                                                                           ` Richard Hipp
     [not found]                                                                             ` <CALwJ=Mx-uEFLXK2wywekk=0dwrwVFb68wocnH9bjXJmHRsJx3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-05 23:00                                                                               ` Theodore Ts'o
2012-10-30 23:49                                                     ` [sqlite] " Nico Williams
2012-10-25  5:42                                         ` Theodore Ts'o
2012-10-25  7:11                                           ` david
2012-10-27  1:52                                       ` Vladislav Bolkhovitin
2012-10-25  5:14                                   ` Theodore Ts'o
2012-10-25 13:03                                     ` Alan Cox
     [not found]                                       ` <20121025140325.49cd7c79-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-10-25 13:50                                         ` Theodore Ts'o
2012-10-27  1:55                                           ` [sqlite] " Vladislav Bolkhovitin
2012-10-27  1:54                                     ` Vladislav Bolkhovitin
     [not found]                                       ` <508B3EED.2080003-d+Crzxg7Rs0@public.gmane.org>
2012-10-27  4:44                                         ` Theodore Ts'o
2012-10-30 22:22                                           ` [sqlite] " Vladislav Bolkhovitin
     [not found]                                             ` <5090532D.4050902-d+Crzxg7Rs0@public.gmane.org>
2012-10-31  9:54                                               ` Alan Cox
2012-11-01 20:18                                                 ` [sqlite] " Vladislav Bolkhovitin
     [not found]                                                   ` <5092D90F.7020105-d+Crzxg7Rs0@public.gmane.org>
2012-11-01 21:24                                                     ` Alan Cox
2012-11-02  0:15                                                       ` [sqlite] " Vladislav Bolkhovitin
     [not found]                                                       ` <20121101212418.140e3a82-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-11-02  0:38                                                         ` Howard Chu
     [not found]                                                           ` <50931601.4060102-aQkYFu9vm6AAvxtiuMwx3w@public.gmane.org>
2012-11-02 12:24                                                             ` Richard Hipp
2012-11-13  3:41                                                               ` [sqlite] " Vladislav Bolkhovitin
2012-11-02 12:33                                                             ` Alan Cox
2012-11-13  3:41                                                               ` [sqlite] " Vladislav Bolkhovitin
     [not found]                                                                 ` <50A1C15E.2080605-d+Crzxg7Rs0@public.gmane.org>
2012-11-13 17:40                                                                   ` Alan Cox
     [not found]                                                                     ` <20121113174000.6457a68b-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-11-13 19:13                                                                       ` Nico Williams
2012-11-15  1:17                                                                         ` [sqlite] " Vladislav Bolkhovitin
     [not found]                                                                           ` <50A442AF.9020407-d+Crzxg7Rs0@public.gmane.org>
2012-11-15 12:07                                                                             ` David Lang
     [not found]                                                                               ` <alpine.DEB.2.02.1211150353080.32408-UEhY+ZBZOcqqLGM74eQ/YA@public.gmane.org>
2012-11-15 16:14                                                                                 ` 杨苏立 Yang Su Li
2012-11-17  5:02                                                                                   ` [sqlite] " Vladislav Bolkhovitin
2012-11-16 15:06                                                                                 ` Howard Chu
2012-11-16 15:31                                                                                   ` [sqlite] " Ric Wheeler
     [not found]                                                                                     ` <50A65C68.6080001-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-11-16 15:54                                                                                       ` Howard Chu
2012-11-16 18:03                                                                                         ` Ric Wheeler [this message]
     [not found]                                                                                   ` <50A65681.8000204-aQkYFu9vm6AAvxtiuMwx3w@public.gmane.org>
2012-11-16 19:14                                                                                     ` David Lang
2012-11-17  5:02                                                                               ` [sqlite] " Vladislav Bolkhovitin
2012-11-15 17:06                                                                             ` Ryan Johnson
2012-11-15 22:35                                                                               ` [sqlite] " Chris Friesen
2012-11-17  5:02                                                                                 ` Vladislav Bolkhovitin
2012-11-20  1:23                                                                                   ` Vladislav Bolkhovitin
2012-11-26 20:05                                                                                     ` Nico Williams
     [not found]                                                                                       ` <CAK3OfOjD4XBGfu3cnMwTvCfec0Lvg3zrO16+pXtiFF4UWpFjDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-29  2:15                                                                                         ` Vladislav Bolkhovitin
2012-11-15  1:16                                                                     ` [sqlite] " Vladislav Bolkhovitin
2012-11-13  3:37                                                           ` Vladislav Bolkhovitin
2012-11-11  4:25                                         ` 杨苏立 Yang Su Li
2012-11-13  3:42                                           ` [sqlite] " Vladislav Bolkhovitin
2012-10-10  7:57                 ` [PATCH 00/16] f2fs: introduce flash-friendly file system Vyacheslav Dubeyko
2012-10-10  9:43                   ` Jaegeuk Kim
2012-10-11  3:14                     ` Namjae Jeon
     [not found]                       ` <CAN863PuyMkSZtZCvqX+kwei9v=rnbBYVYr3TqBXF_6uxwJe2_Q@mail.gmail.com>
2012-10-17 11:13                         ` Namjae Jeon
2012-10-17 23:06                           ` Changman Lee
2012-10-12 12:30                     ` Vyacheslav Dubeyko
2012-10-12 14:25                       ` Jaegeuk Kim
2012-10-07 10:15     ` Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A67FD6.1030108@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=david@lang.hm \
    --cc=drh@hwaci.com \
    --cc=hyc@symas.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqlite-users@sqlite.org \
    --cc=tytso@mit.edu \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).