public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Pavel Machek <pavel@suse.cz>
Cc: mikulas@artax.karlin.mff.cuni.cz, clock@atrey.karlin.mff.cuni.cz,
	kernel list <linux-kernel@vger.kernel.org>,
	aviro@redhat.com
Subject: Re: writing file to disk: not as easy as it looks
Date: Tue, 2 Dec 2008 11:37:20 -0500	[thread overview]
Message-ID: <20081202163720.GB18162@mit.edu> (raw)
In-Reply-To: <20081202152618.GA1646@ucw.cz>

On Tue, Dec 02, 2008 at 04:26:18PM +0100, Pavel Machek wrote:
> > I can understand why you might want to fsync the containing directory
> > to make sure the directory entry got written to disk --- but if you're
> > that paranoid, many modern filesystems use some kind of tree
> > structure
> 
> If I'm trying to write foo/bar/baz/file, and file/baz inodes/dentries
> are written to disk, but foo is not, file still will not be found
> under full name - and recovering it from lost&found is hard to do
> automatically.

Only if you've freshly created the foo/bar/baz directories...  If you
have, then yes, you'll need to sync each one.  Normally the paranoid
programs do this after each mkdir call, though.

For ext3/ext4, becaused of the entangled commit factor, fsync()'ing
the file is sufficient, but that's not something you can properly
count upon.

> If disk looses data after acknowledging the write, all hope is lost.
> Else I expect filesystem to preserve data I successfully synced.
> 
>      (In the b-tree split failed case I'd expect transaction commit to
>      fail because new data could not be weitten; at that point
>      disk+journal should still contain all the data needed for
>      recovery of synced/old files, right?)

Not necessarily.  For filesystems that do logical journalling (i.e.,
xfs, jfs, et. al), the only thing written in the journal is the
logical change (i.e., "new dir entry 'file_that_causes_the_node_split'").

The transaction commits *first*, and then the filesystem tries to
write update the filesystem with the change, and it's only then that
the write fails.  Data can very easily get lost.

Even for ext3/ext4 which is doing physical journalling, it's still the
case that the journal commits first, and it's only later when the
write happens that we write out the change.  If the disk fails some of
the writes, it's possible to lose data, especially if the two blocks
involved in the node split are far apart, and the write to the
existing old btree block fails.

> > What exactly are your requirements here, and what are you trying to
> > do?  What are you worried about?  Most MTA's are quite happy
> > settling
> 
> I'm trying to put my main filesystem on a SD card. hp2133 has only 4GB
> internal flash, so I got 32GB SDHC. Unfortunately, SD card on hp is
> very easy to eject by mistake.

So what you really want is some way of constantly flushing data to the
disk, probably after every single mkdir, every single close operation.
Of course, that has the tradeoff your flash card will get a lot of
extra wear.  I hate to say this, but have you considered something
like tape or velcro to secure the SD card?

						- Ted

  reply	other threads:[~2008-12-02 16:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-02  9:40 writing file to disk: not as easy as it looks Pavel Machek
2008-12-02 14:04 ` Theodore Tso
2008-12-02 15:26   ` Pavel Machek
2008-12-02 16:37     ` Theodore Tso [this message]
2008-12-02 17:22       ` Chris Friesen
2008-12-02 20:55         ` Theodore Tso
2008-12-02 22:44           ` Pavel Machek
2008-12-02 22:50             ` Pavel Machek
2008-12-03  5:07             ` Theodore Tso
2008-12-03  8:46               ` Pavel Machek
2008-12-03 15:50                 ` Mikulas Patocka
2008-12-03 15:54                   ` Alan Cox
2008-12-03 17:37                     ` Mikulas Patocka
2008-12-03 17:52                       ` Alan Cox
2008-12-03 18:16                       ` Pavel Machek
2008-12-03 18:33                         ` Mikulas Patocka
2008-12-03 16:42                 ` Theodore Tso
2008-12-03 17:43                   ` Mikulas Patocka
2008-12-03 18:26                     ` Pavel Machek
2008-12-03 15:34               ` Mikulas Patocka
2008-12-15 10:24               ` [patch] " Pavel Machek
2008-12-15 11:03           ` Pavel Machek
2008-12-15 20:08             ` Folkert van Heusden
2008-12-02 19:10       ` Folkert van Heusden
2008-12-02 23:01 ` Mikulas Patocka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081202163720.GB18162@mit.edu \
    --to=tytso@mit.edu \
    --cc=aviro@redhat.com \
    --cc=clock@atrey.karlin.mff.cuni.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=pavel@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox