From: Ric Wheeler <ric@emc.com>
To: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Grzegorz Kulewski <kangur@polcom.net>, linux-kernel@vger.kernel.org
Subject: Re: New filesystem for Linux
Date: Fri, 03 Nov 2006 08:05:54 -0500 [thread overview]
Message-ID: <454B3EB2.3010600@emc.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0611030015150.3266@artax.karlin.mff.cuni.cz>
Mikulas Patocka wrote:
>> Hi,
>>
>> On Thu, 2 Nov 2006, Mikulas Patocka wrote:
>>
>>> As my PhD thesis, I am designing and writing a filesystem, and it's
>>> now in a state that it can be released. You can download it from
>>> http://artax.karlin.mff.cuni.cz/~mikulas/spadfs/
>>
>>
>> "Disk that can atomically write one sector (512 bytes) so that the
>> sector
>> contains either old or new content in case of crash."
>>
>> Well, maybe I am completly wrong but as far as I understand no disk
>> currently will provide such requirement. Disks can have (after halted
>> write):
>> - old data,
>> - new data,
>> - nothing (unreadable sector - result of not full write and disk
>> internal checksum failute for that sector, happens especially often
>> if you have frequent power outages).
>>
>> And possibly some broken drives may also return you something that
>> they think is good data but really is not (shouldn't happen since
>> both disks and cables should be protected by checksums, but hey...
>> you can never be absolutely sure especially on very big storages).
>>
>> So... isn't this making your filesystem a little flawed in design?
>
>
> There was discussion about it here some times ago, and I think the
> result was that the IDE bus is reset prior to capacitors discharge and
> total loss of power and disk has enough time to finish a sector ---
> but if you have crap power supply (doesn't signal power loss), crap
> motherboard (doesn't reset bus) or crap disk (doesn't respond to
> reset), it can fail.
These are two examples of very different classes of storage devices - if
you use a high end array (like EMC Clariion/Symm, IBM Shark, Hitachi,
NetApp Block, etc) once the target device acknowledges the write
transaction, you have a hard promise that the data is going to persist
after a power outage, etc.
If you are using a a commodity disk, then you really have to worry about
how the drive's write cache will handle your IO. These disks will ack
the write once they have stored the write request in their volatile
memory which can be lost on power outages.
That is a reasonable setting for most end users (high performance, few
power outages and some risk of data loss), but when data integrity is a
hard requirement, people typically run with the write cache disabled.
The "write barrier" support that is in reiserfs, ext3 and xfs all
provide something that is somewhere in the middle - good performance and
cache flushes injected on transaction commits or application level
fsync() commands.
I would not depend on the IDE bus reset or draining capacitors to safely
destage data - in fact, I know that it will routinely fail when we test
the write barrier on/off over power outages.
Modern S-ATA/ATA drives have 16MB or more of data in write cache and
there is a lot of data to destage in those last few ms ;-)
>
> BTW. reiserfs and xfs depend on this feature too. ext3 is the only one
> that doesn't.
>
> Mikulas
>
next prev parent reply other threads:[~2006-11-03 13:06 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-02 21:52 New filesystem for Linux Mikulas Patocka
2006-11-02 22:32 ` Gabriel C
2006-11-03 1:22 ` Mikulas Patocka
2006-11-03 1:41 ` Andrew Morton
2006-11-03 17:14 ` Oleg Verych
2006-11-03 17:09 ` Mikulas Patocka
2006-11-03 17:36 ` Oleg Verych
2006-11-03 18:14 ` Mikulas Patocka
2006-11-03 19:08 ` Adrian Bunk
2006-11-03 19:32 ` Oleg Verych
2006-11-03 19:00 ` Alan Cox
2006-11-03 19:14 ` Andi Kleen
2006-11-03 2:09 ` Gabriel C
2006-11-03 8:26 ` Jan Engelhardt
2006-11-03 11:52 ` Mikulas Patocka
2006-11-03 11:59 ` Mikulas Patocka
2006-11-03 12:50 ` Jan Engelhardt
2006-11-03 18:48 ` Mikulas Patocka
2006-11-03 21:51 ` Jan Engelhardt
2006-11-03 11:47 ` Mikulas Patocka
2006-11-02 22:53 ` Eric Dumazet
2006-11-03 1:28 ` Mikulas Patocka
2006-11-03 1:43 ` Andrew Morton
2006-11-04 18:40 ` Mikulas Patocka
2006-11-04 19:07 ` Eric Dumazet
2006-11-04 19:39 ` Tomasz Torcz
2006-11-05 1:58 ` Alan Cox
2006-11-05 2:09 ` Patrick McFarland
2006-11-05 13:03 ` Maurizio Lombardi
2006-11-05 20:16 ` H. Peter Anvin
2006-11-02 22:54 ` Grzegorz Kulewski
2006-11-02 23:10 ` Eric Dumazet
2006-11-02 23:19 ` Mikulas Patocka
2006-11-02 23:29 ` Grzegorz Kulewski
2006-11-03 1:34 ` Mikulas Patocka
2006-11-03 20:30 ` Christoph Lameter
2006-11-04 18:46 ` Mikulas Patocka
2006-11-05 12:02 ` Theodore Tso
2006-11-03 22:00 ` Oleg Verych
2006-11-03 22:42 ` Mikulas Patocka
2006-11-03 0:57 ` Nigel Cunningham
2006-11-03 13:05 ` Ric Wheeler [this message]
2006-11-06 2:42 ` Phillip Susi
2006-11-04 19:59 ` Albert Cahalan
2006-11-04 21:01 ` Jan-Benedict Glaw
2006-11-05 16:37 ` Albert Cahalan
2006-11-04 23:38 ` Mikulas Patocka
2006-11-04 23:46 ` Kyle Moffett
2006-11-05 20:26 ` H. Peter Anvin
2006-11-05 21:27 ` Rene Herman
2006-11-05 21:51 ` H. Peter Anvin
2006-11-06 0:36 ` Rene Herman
2006-11-05 21:49 ` Pavel Machek
2006-11-05 1:57 ` Alan Cox
2006-11-05 11:14 ` James Courtier-Dutton
2006-11-05 11:27 ` Brad Campbell
2006-11-05 12:37 ` Alan Cox
2006-11-06 2:48 ` Phillip Susi
2006-11-05 16:22 ` Albert Cahalan
2006-11-05 17:18 ` Mikulas Patocka
2006-11-05 18:14 ` Alan Cox
2006-11-05 18:18 ` Mikulas Patocka
2006-11-05 19:14 ` Alan Cox
2006-11-02 23:15 ` Linus Torvalds
2006-11-03 20:02 ` Paul E. McKenney
2006-11-02 23:41 ` Andi Kleen
2006-11-03 1:45 ` Mikulas Patocka
2006-11-03 13:47 ` Nikita Danilov
2006-11-03 14:39 ` Mikulas Patocka
2006-11-02 23:59 ` Jörn Engel
2006-11-03 1:19 ` Mikulas Patocka
2006-11-03 10:19 ` Jörn Engel
2006-11-03 11:56 ` Mikulas Patocka
2006-11-03 12:21 ` Jörn Engel
2006-11-03 13:31 ` Mikulas Patocka
2006-11-03 13:48 ` Jörn Engel
2006-11-03 14:19 ` Mikulas Patocka
2006-11-03 14:53 ` Jörn Engel
2006-11-03 19:01 ` Mikulas Patocka
2006-11-04 10:46 ` Jörn Engel
2006-11-04 18:50 ` Mikulas Patocka
2006-11-06 21:19 ` Jörn Engel
2006-11-03 19:51 ` Adrian Bunk
2006-11-03 19:00 ` dean gaudet
2006-11-04 10:53 ` Jörn Engel
2006-11-04 11:13 ` dean gaudet
2006-11-04 20:07 ` Jörn Engel
2006-11-04 18:52 ` Mikulas Patocka
2006-11-04 18:56 ` Grzegorz Kulewski
2006-11-04 19:18 ` Mikulas Patocka
2006-11-04 17:37 ` Gautham R Shenoy
2006-11-04 18:27 ` Eric Dumazet
2006-11-05 22:33 ` Paul E. McKenney
2006-11-05 0:52 ` Linus Torvalds
2006-11-05 4:14 ` Mikulas Patocka
2006-11-05 8:34 ` Willy Tarreau
2006-11-05 11:31 ` Jan Engelhardt
2006-11-05 14:48 ` Bruno Cesar Ribas
-- strict thread matches above, loose matches on Subject: below --
2006-11-06 17:40 Al Boldi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=454B3EB2.3010600@emc.com \
--to=ric@emc.com \
--cc=kangur@polcom.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mikulas@artax.karlin.mff.cuni.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox