linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Bellman <bellman@nsc.liu.se>
To: linux-btrfs@vger.kernel.org
Subject: Re: Rename+crash behaviour of btrfs - nearly ext3!
Date: Tue, 18 May 2010 16:47:48 +0200	[thread overview]
Message-ID: <4BF2A894.60204@nsc.liu.se> (raw)
In-Reply-To: <0339xpl4pp.fsf@msgid.viggen.net>

On 05/18/10 15:28, Oystein Viggen wrote:

> * [Chris Mason]
>
>> I'm more than open to discussion on this one, but I don't see how:
>>
>> rm -f foo2
>> dd if=/dev/zero of=foo bs=1M count=1000
>> mv foo foo2
>>
>> Should be expected to write 1GB of data.
>
> IIRC, the answer you're looking for is "it did with ext3 in the default
> data=ordered mode".  Combine that with the ext3 data=ordered fsync()
> escalation where (again IIRC) fsync() tended to force a full sync() of
> the file system, and it's not that difficult to see why someone would
> program with the expectation above.
>
> Anyway, there's still a question of if a new file system should emulate
> the quirks of the old file system (read: be bug compatible), or if you
> can just expect to be popular enough that userspace adapts to the new
> order and lets you do The Right Thing instead.

So what *is* the right thing?  What kind of API should userspace have?
If the obvious thing for an application programmer to do is wrong, and
the right thing requires going through more hoops, that will ensure
that the majority of applications will be buggy.  We should strive
to make it easy to get things right.

It's easy for the kernel, and the filesystem, to just ask the userspace
programmers to jump through the hoops, and declare those programs that
don't to be broken.

On the other hand, if you go *too* far in absolving applications of
responsibility for making things safe, you would end up making all
filesystem operations synchronous, and that obviously hurts performance
in big ways.  So we need some kind of compromise, and where that
compromise should end up being, I don't really have the answer to.
It's just that I feel that often only the kernel programmers view is
represented here.


The pattern of writing to a file and then changing its name *without*
overwriting an existing file, is quite common when you write files to
a spool directory, and have another program that picks up files from
that directory and processes them.  You

     fd = open("foo4711.tmp", O_CREAT|O_EXCL|O_RDWR);
     write(fd, "data", strlen("data"));
     close(fd);
     link("foo4711.tmp", "foo4711");
     unlink("foo4711.tmp");

(And note that careful programs don't use rename() here, because that
would risk clobbering a file some other process has written, and instead
use link()+unlink().  And I really wish a "safe_rename()" syscall that
didn't clobber existing files existed.)

The programs I personally have written that did this, also had an fsync()
there, because I received data from another system and didn't want to ACK
until I knew it was safely on disk at my end.  But I am a fairly careful
programmer.


Note that in my previous life I was a userspace programmer, and in my
current life I'm a sysadmin.  I'm speaking as an interrested user of
Btrfs, not as a kernel programmer.


	/Thomas Bellman

  reply	other threads:[~2010-05-18 14:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-17 18:04 Rename+crash behaviour of btrfs - nearly ext3! Jakob Unterwurzacher
2010-05-17 19:12 ` Ric Wheeler
2010-05-17 19:25 ` Josef Bacik
2010-05-17 20:09   ` Chris Mason
2010-05-17 20:30     ` Jakob Unterwurzacher
2010-05-17 19:36 ` Chris Mason
2010-05-18  0:14   ` Jakob Unterwurzacher
2010-05-18  0:30     ` Chris Mason
2010-05-18  0:59       ` Chris Mason
2010-05-18 12:03         ` Jakob Unterwurzacher
2010-05-18 13:13           ` Chris Mason
2010-05-18 13:28             ` Oystein Viggen
2010-05-18 14:47               ` Thomas Bellman [this message]
2010-05-18 13:39             ` Aidan Van Dyk
2010-05-18 14:06             ` Jakob Unterwurzacher
2010-05-18 14:36               ` Chris Mason
2010-05-18 15:57                 ` Jakob Unterwurzacher
2010-05-18 16:10                   ` Chris Mason
2010-05-18 18:01                     ` Goffredo Baroncelli
2010-05-18 18:24                     ` Jakob Unterwurzacher
2010-05-18 23:00             ` Ric Wheeler
2010-05-19  1:05               ` Bruce Guenter
2010-05-19  1:34             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BF2A894.60204@nsc.liu.se \
    --to=bellman@nsc.liu.se \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).