From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Bellman Subject: Re: Rename+crash behaviour of btrfs - nearly ext3! Date: Tue, 18 May 2010 16:47:48 +0200 Message-ID: <4BF2A894.60204@nsc.liu.se> References: <4BF18525.8080904@gmail.com> <20100517193652.GC8635@think> <4BF1DBCD.7060208@gmail.com> <20100518003032.GK8635@think> <20100518005926.GM8635@think> <4BF28225.2000908@gmail.com> <20100518131304.GX8635@think> <0339xpl4pp.fsf@msgid.viggen.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed To: linux-btrfs@vger.kernel.org Return-path: In-Reply-To: <0339xpl4pp.fsf@msgid.viggen.net> List-ID: On 05/18/10 15:28, Oystein Viggen wrote: > * [Chris Mason] > >> I'm more than open to discussion on this one, but I don't see how: >> >> rm -f foo2 >> dd if=/dev/zero of=foo bs=1M count=1000 >> mv foo foo2 >> >> Should be expected to write 1GB of data. > > IIRC, the answer you're looking for is "it did with ext3 in the default > data=ordered mode". Combine that with the ext3 data=ordered fsync() > escalation where (again IIRC) fsync() tended to force a full sync() of > the file system, and it's not that difficult to see why someone would > program with the expectation above. > > Anyway, there's still a question of if a new file system should emulate > the quirks of the old file system (read: be bug compatible), or if you > can just expect to be popular enough that userspace adapts to the new > order and lets you do The Right Thing instead. So what *is* the right thing? What kind of API should userspace have? If the obvious thing for an application programmer to do is wrong, and the right thing requires going through more hoops, that will ensure that the majority of applications will be buggy. We should strive to make it easy to get things right. It's easy for the kernel, and the filesystem, to just ask the userspace programmers to jump through the hoops, and declare those programs that don't to be broken. On the other hand, if you go *too* far in absolving applications of responsibility for making things safe, you would end up making all filesystem operations synchronous, and that obviously hurts performance in big ways. So we need some kind of compromise, and where that compromise should end up being, I don't really have the answer to. It's just that I feel that often only the kernel programmers view is represented here. The pattern of writing to a file and then changing its name *without* overwriting an existing file, is quite common when you write files to a spool directory, and have another program that picks up files from that directory and processes them. You fd = open("foo4711.tmp", O_CREAT|O_EXCL|O_RDWR); write(fd, "data", strlen("data")); close(fd); link("foo4711.tmp", "foo4711"); unlink("foo4711.tmp"); (And note that careful programs don't use rename() here, because that would risk clobbering a file some other process has written, and instead use link()+unlink(). And I really wish a "safe_rename()" syscall that didn't clobber existing files existed.) The programs I personally have written that did this, also had an fsync() there, because I received data from another system and didn't want to ACK until I knew it was safely on disk at my end. But I am a fairly careful programmer. Note that in my previous life I was a userspace programmer, and in my current life I'm a sysadmin. I'm speaking as an interrested user of Btrfs, not as a kernel programmer. /Thomas Bellman