All of lore.kernel.org
 help / color / mirror / Atom feed
* Atomic file data replace API
@ 2010-12-27 11:51 Olaf van der Spek
  2010-12-27 13:20 ` Amir Goldstein
  2010-12-28  2:59 ` Ted Ts'o
  0 siblings, 2 replies; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-27 11:51 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4

Hi,

Since non-durable appears to be controversial, let's consider the case
without that aspect.

Since the introduction of ext4, some apps/users have had issues with
file corruption after a system crash. It's not a bug in the FS AFAIK
and it's not exclusive to ext4.
Writing a temp file, fsync, rename is often proposed.
But how does one preserve meta-data, including file owner?

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-27 11:51 Atomic file data replace API Olaf van der Spek
@ 2010-12-27 13:20 ` Amir Goldstein
  2010-12-27 15:53   ` Olaf van der Spek
  2010-12-28  2:59 ` Ted Ts'o
  1 sibling, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2010-12-27 13:20 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 27, 2010 at 1:51 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
> Hi,
>
> Since non-durable appears to be controversial, let's consider the case
> without that aspect.
>
> Since the introduction of ext4, some apps/users have had issues with
> file corruption after a system crash. It's not a bug in the FS AFAIK
> and it's not exclusive to ext4.
> Writing a temp file, fsync, rename is often proposed.
> But how does one preserve meta-data, including file owner?
>

So as I wrote you on the previous thread, in Ext4 you can probably
accomplish that
already by using the Ext4 specific EXT4_IOC_EXT_MOVE ioctl, which is
used by e4defrag
to atomically switch the fragmented copy of the data with a
de-fragmented copy of the data.

It is a more granular version of the exchangedata() BSD API mentioned
in the previous thread:
http://www.manpagez.com/man/2/exchangedata/

So the atomic update is: write(tempfd); fdatasync(tempfd);
exchangedata(tempfd, fd)

If you choose to pursue your campaign for "Atomic file data replace
API", I recommend that you:
1. change the slogan to the more catchy "Implementing exchangedata()
API" (you already have a man page for that)
2. convince VFS people to support the new generic system call /
optional FS operation exchangedata()
3. if you can, post the relevant patches, so people can review and test them

Implementation of exchangedata() operation in Ext4 should be trivial
using the ext4_move_extents() function
and I didn't check, but I bet that XFS has that functionality as well.

Good luck,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-27 13:20 ` Amir Goldstein
@ 2010-12-27 15:53   ` Olaf van der Spek
  2010-12-27 17:20     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-27 15:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 27, 2010 at 2:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> So as I wrote you on the previous thread, in Ext4 you can probably

FS-specific code should of course be avoided in normal apps.

> It is a more granular version of the exchangedata() BSD API mentioned
> in the previous thread:
> http://www.manpagez.com/man/2/exchangedata/
>
> So the atomic update is: write(tempfd); fdatasync(tempfd);
> exchangedata(tempfd, fd)

Except exchangedata is not (widely) implemented?
Don't you agree it's undesirable to lose meta-data?

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-27 15:53   ` Olaf van der Spek
@ 2010-12-27 17:20     ` Amir Goldstein
  2010-12-27 18:34       ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2010-12-27 17:20 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 27, 2010 at 5:53 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
> On Mon, Dec 27, 2010 at 2:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> So as I wrote you on the previous thread, in Ext4 you can probably
>
> FS-specific code should of course be avoided in normal apps.
>
>> It is a more granular version of the exchangedata() BSD API mentioned
>> in the previous thread:
>> http://www.manpagez.com/man/2/exchangedata/
>>
>> So the atomic update is: write(tempfd); fdatasync(tempfd);
>> exchangedata(tempfd, fd)
>
> Except exchangedata is not (widely) implemented?

Not in Linux anyway.

> Don't you agree it's undesirable to lose meta-data?

Yes I agree. you can have my vote for "it's nice to have this",
but the fact that we did without it for so long must mean something...

Anyway, you need to convince someone to implement it
(unless you do it yourself), some developers to review it
and the maintainers to accept it, so unless you come up with 'a real
world problem',
the busy FS developers will not be bothered to accept 'the fix'.
Accepting new API's has a huge price of testing them and maintaining them
every release, so don't take the resistance personally.

Now let's say that you decide to focus on the problem of:
'safe editor save to a file which is not owned by you but writable by you'.
You may want to look for a specific editor which has 'safe save' functionality
(maybe LibreOffice?) and query the developers if they would like the new feature
and if they would support your proposal.

That is the way kernel development works - and for good reasons.

Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-27 17:20     ` Amir Goldstein
@ 2010-12-27 18:34       ` Olaf van der Spek
  0 siblings, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-27 18:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 27, 2010 at 6:20 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> Don't you agree it's undesirable to lose meta-data?
>
> Yes I agree. you can have my vote for "it's nice to have this",
> but the fact that we did without it for so long must mean something...

I'm not sure it means something positive.

> Anyway, you need to convince someone to implement it
> (unless you do it yourself), some developers to review it
> and the maintainers to accept it, so unless you come up with 'a real
> world problem',
> the busy FS developers will not be bothered to accept 'the fix'.
> Accepting new API's has a huge price of testing them and maintaining them
> every release, so don't take the resistance personally.
>
> Now let's say that you decide to focus on the problem of:
> 'safe editor save to a file which is not owned by you but writable by you'.
> You may want to look for a specific editor which has 'safe save' functionality
> (maybe LibreOffice?) and query the developers if they would like the new feature
> and if they would support your proposal.
>
> That is the way kernel development works - and for good reasons.

I agree in general you need a good use case. But AFAIK FS devs are
aware of many apps not doing it the right way. So I expected them to
have a FAQ entry that shows what this right way is.
Ted says a huge performance hit is involved, but nobody has been able
to tell why yet.

There's also the problem of not having permission to create a temp file.

Olaf

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-27 11:51 Atomic file data replace API Olaf van der Spek
  2010-12-27 13:20 ` Amir Goldstein
@ 2010-12-28  2:59 ` Ted Ts'o
  2010-12-28 17:27   ` Olaf van der Spek
  1 sibling, 1 reply; 47+ messages in thread
From: Ted Ts'o @ 2010-12-28  2:59 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 27, 2010 at 12:51:54PM +0100, Olaf van der Spek wrote:
> Since the introduction of ext4, some apps/users have had issues with
> file corruption after a system crash. It's not a bug in the FS AFAIK
> and it's not exclusive to ext4.
> Writing a temp file, fsync, rename is often proposed.
> But how does one preserve meta-data, including file owner?

What's the use case where preserving file ownership matters?

       	       	    	  	     	  - Ted

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28  2:59 ` Ted Ts'o
@ 2010-12-28 17:27   ` Olaf van der Spek
  2010-12-28 19:06     ` Ric Wheeler
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-28 17:27 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-fsdevel, linux-ext4

On Tue, Dec 28, 2010 at 3:59 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Dec 27, 2010 at 12:51:54PM +0100, Olaf van der Spek wrote:
>> Since the introduction of ext4, some apps/users have had issues with
>> file corruption after a system crash. It's not a bug in the FS AFAIK
>> and it's not exclusive to ext4.
>> Writing a temp file, fsync, rename is often proposed.
>> But how does one preserve meta-data, including file owner?
>
> What's the use case where preserving file ownership matters?

Why is it you ignore most of the question and only challenge a tiny bit?
I can't think of a problem case right now, but I sure can't guarantee
always resetting file owner is never a problem.

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28 17:27   ` Olaf van der Spek
@ 2010-12-28 19:06     ` Ric Wheeler
  2010-12-28 22:25       ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Ric Wheeler @ 2010-12-28 19:06 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Ted Ts'o, linux-fsdevel, linux-ext4

On 12/28/2010 12:27 PM, Olaf van der Spek wrote:
> On Tue, Dec 28, 2010 at 3:59 AM, Ted Ts'o<tytso@mit.edu>  wrote:
>> On Mon, Dec 27, 2010 at 12:51:54PM +0100, Olaf van der Spek wrote:
>>> Since the introduction of ext4, some apps/users have had issues with
>>> file corruption after a system crash. It's not a bug in the FS AFAIK
>>> and it's not exclusive to ext4.
>>> Writing a temp file, fsync, rename is often proposed.
>>> But how does one preserve meta-data, including file owner?
>> What's the use case where preserving file ownership matters?
> Why is it you ignore most of the question and only challenge a tiny bit?
> I can't think of a problem case right now, but I sure can't guarantee
> always resetting file owner is never a problem.
>
> Olaf


I really think that you have missed the point of this list.

This list is for either developers (those who have downloaded the free code and 
work on it) or others who want to move things forward concretely.

Perfectly fine to contribute ideas, but if you are not a coder or do not have 
the time or inclination to work on things yourself, you have to be *really* 
convincing.

We continually get bombarded with ideas, wish list items, etc so we are not 
lacking in work to do.

If you cannot explain the use case, you will not get any buy in...

Regards,

Ric


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28 19:06     ` Ric Wheeler
@ 2010-12-28 22:25       ` Olaf van der Spek
  2010-12-28 22:36         ` Ric Wheeler
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-28 22:25 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Ted Ts'o, linux-fsdevel, linux-ext4

On Tue, Dec 28, 2010 at 8:06 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
> I really think that you have missed the point of this list.
>
> This list is for either developers (those who have downloaded the free code
> and work on it) or others who want to move things forward concretely.

Maybe.

> Perfectly fine to contribute ideas, but if you are not a coder or do not
> have the time or inclination to work on things yourself, you have to be
> *really* convincing.
>
> We continually get bombarded with ideas, wish list items, etc so we are not
> lacking in work to do.

I understand.

> If you cannot explain the use case, you will not get any buy in...

I assumed that preserving file owner would be a normal feature and
would not require additional explanation.
One use case would be updating a file in a save way when you have
write access to that file but not to anything else.

Also, according to Ted, a lot of app devs get saving a file in a safe
way wrong. So I'm asking what the recommended way to do it is. Is that
strange?

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28 22:25       ` Olaf van der Spek
@ 2010-12-28 22:36         ` Ric Wheeler
  2010-12-28 22:58           ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Ric Wheeler @ 2010-12-28 22:36 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Ted Ts'o, linux-fsdevel, linux-ext4

On 12/28/2010 05:25 PM, Olaf van der Spek wrote:
> On Tue, Dec 28, 2010 at 8:06 PM, Ric Wheeler<rwheeler@redhat.com>  wrote:
>> I really think that you have missed the point of this list.
>>
>> This list is for either developers (those who have downloaded the free code
>> and work on it) or others who want to move things forward concretely.
> Maybe.
>
>> Perfectly fine to contribute ideas, but if you are not a coder or do not
>> have the time or inclination to work on things yourself, you have to be
>> *really* convincing.
>>
>> We continually get bombarded with ideas, wish list items, etc so we are not
>> lacking in work to do.
> I understand.
>
>> If you cannot explain the use case, you will not get any buy in...
> I assumed that preserving file owner would be a normal feature and
> would not require additional explanation.
> One use case would be updating a file in a save way when you have
> write access to that file but not to anything else.
>
> Also, according to Ted, a lot of app devs get saving a file in a safe
> way wrong. So I'm asking what the recommended way to do it is. Is that
> strange?
>
> Olaf

I think that various developers have answered this for you several times.

As a suggestion, if you are not a kernel developer, show us specifically a bit 
of application code that demonstrates something that you want to have work 
differently.

Test it with power failure (buy an external e-sata or USB disk and pull power 
while running your app).

Ric


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28 22:36         ` Ric Wheeler
@ 2010-12-28 22:58           ` Olaf van der Spek
  2010-12-29  9:20             ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-28 22:58 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Ted Ts'o, linux-fsdevel, linux-ext4

On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
> I think that various developers have answered this for you several times.

Not really, unfortunately. Haven't seen a single link to code that
shows how to do it properly.
Temp file, fsync, rename is often mentioned but that skips the
preserving meta-data part and this part, which you also skipped:
One use case would be updating a file in a safe way when you have
write access to that file but not to anything else.

> As a suggestion, if you are not a kernel developer, show us specifically a
> bit of application code that demonstrates something that you want to have
> work differently.

I will.

> Test it with power failure (buy an external e-sata or USB disk and pull
> power while running your app).

The current code?
I think I'll use a VM instead of an external disk. ;)

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-28 22:58           ` Olaf van der Spek
@ 2010-12-29  9:20             ` Amir Goldstein
  2010-12-29 12:42               ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2010-12-29  9:20 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Ric Wheeler, Ted Ts'o, linux-fsdevel, linux-ext4

On Wed, Dec 29, 2010 at 12:58 AM, Olaf van der Spek
<olafvdspek@gmail.com> wrote:
> On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
>> I think that various developers have answered this for you several times.
>
> Not really, unfortunately. Haven't seen a single link to code that
> shows how to do it properly.
> Temp file, fsync, rename is often mentioned but that skips the
> preserving meta-data part and this part, which you also skipped:
> One use case would be updating a file in a safe way when you have
> write access to that file but not to anything else.
>

I think it is safe to say that the *only* option you have now is "temp
file, fsync, rename".
There is no "generic atomic file data replace API in Linux", though it
is available via
private ioctl for XFS and EXT4.

You have started a bit of a storm with your previous thread, which
doesn't help you
much in moving forward in the current thread (previous thread is still
more popular).
I suggest that you humbly swallow you need to know WHY is it hard to implement
non-durable atomic API and focus your attention on the very achievable
data replace API.

IMHO, implementing atomic swap_inodes_data operation shouldn't be difficult
in most file systems (only implementation is simple, but testing and
maintaining
is not to be taken lightly).
Something along the lines of:
1. aquire inodes write/truncate locks
2. start transaction
3. check/update quota limits
4. swap inodes i_data content
5. invalidate (or swap?) inodes page caches
6. mark inodes dirty
7. end transaction & release locks

The real challenge would be to get everyone to agree on a common API
and carve it in stone to the kernel's ABI (is it just swap_inodes_data?
maybe also swap_inode_data_ranges? what about some options?)

Also, as wacky and (some say) faulty the UNIX permissions models is,
current systems have grown old with it, and even 'improving' the behavior
of some applications, may wake up sleeping monsters, so it will not
be done until enough people have pointed out security or usability
issues, which could not be solved otherwise.

In other words, until you find an *application* that wants to allow other
user to modify the content of a file and preserve it's metadata and ownership.
And unless that application cannot find a better way to achieve what it wanted
to do in the first place, or unless that application already has a
large install base
which suffers from *a problem*, you will not have proven *the need*.

Maybe preserving privileged extended attributes is *a need*.
I wouldn't know myself.

Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29  9:20             ` Amir Goldstein
@ 2010-12-29 12:42               ` Olaf van der Spek
  2010-12-29 15:30                 ` Christian Stroetmann
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-29 12:42 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Ric Wheeler, Ted Ts'o, linux-fsdevel, linux-ext4

On Wed, Dec 29, 2010 at 10:20 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Dec 29, 2010 at 12:58 AM, Olaf van der Spek
> <olafvdspek@gmail.com> wrote:
>> On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
>>> I think that various developers have answered this for you several times.
>>
>> Not really, unfortunately. Haven't seen a single link to code that
>> shows how to do it properly.
>> Temp file, fsync, rename is often mentioned but that skips the
>> preserving meta-data part and this part, which you also skipped:
>> One use case would be updating a file in a safe way when you have
>> write access to that file but not to anything else.
>>
>
> I think it is safe to say that the *only* option you have now is "temp
> file, fsync, rename".

I'm really looking for a concrete code snippet/function that does this.
For example, file permissions should definitely be preserved.

> There is no "generic atomic file data replace API in Linux", though it
> is available via
> private ioctl for XFS and EXT4.
>
> You have started a bit of a storm with your previous thread, which
> doesn't help you
> much in moving forward in the current thread (previous thread is still
> more popular).
> I suggest that you humbly swallow you need to know WHY is it hard to implement
> non-durable atomic API and focus your attention on the very achievable
> data replace API.
>
> IMHO, implementing atomic swap_inodes_data operation shouldn't be difficult
> in most file systems (only implementation is simple, but testing and
> maintaining
> is not to be taken lightly).
> Something along the lines of:
> 1. aquire inodes write/truncate locks
> 2. start transaction
> 3. check/update quota limits
> 4. swap inodes i_data content
> 5. invalidate (or swap?) inodes page caches
> 6. mark inodes dirty
> 7. end transaction & release locks
>
> The real challenge would be to get everyone to agree on a common API
> and carve it in stone to the kernel's ABI (is it just swap_inodes_data?
> maybe also swap_inode_data_ranges? what about some options?)

Swapping data is an improvement but still not ideal. The API is also
more complex than O_ATOMIC.

> Also, as wacky and (some say) faulty the UNIX permissions models is,
> current systems have grown old with it, and even 'improving' the behavior
> of some applications, may wake up sleeping monsters, so it will not
> be done until enough people have pointed out security or usability
> issues, which could not be solved otherwise.

Each app makes it's own decision about what API to use. Supporting
atomic stuff doesn't change the behaviour of existing apps.

> In other words, until you find an *application* that wants to allow other
> user to modify the content of a file and preserve it's metadata and ownership.
> And unless that application cannot find a better way to achieve what it wanted
> to do in the first place, or unless that application already has a
> large install base
> which suffers from *a problem*, you will not have proven *the need*.

Maybe I should ask devs of some large apps on their take of this issue.

> Maybe preserving privileged extended attributes is *a need*.
> I wouldn't know myself.

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 12:42               ` Olaf van der Spek
@ 2010-12-29 15:30                 ` Christian Stroetmann
  2010-12-29 15:35                   ` Olaf van der Spek
  2010-12-29 17:15                   ` Greg Freemyer
  0 siblings, 2 replies; 47+ messages in thread
From: Christian Stroetmann @ 2010-12-29 15:30 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-fsdevel, linux-ext4, Ric Wheeler, Amir Goldstein

On the 29.12.2010 13:42, Olaf van der Spek wrote:
> On Wed, Dec 29, 2010 at 10:20 AM, Amir Goldstein<amir73il@gmail.com>  wrote:
>> On Wed, Dec 29, 2010 at 12:58 AM, Olaf van der Spek
>> <olafvdspek@gmail.com>  wrote:
>>> On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler<rwheeler@redhat.com>  wrote:
>>>> I think that various developers have answered this for you several times.
>>> Not really, unfortunately. Haven't seen a single link to code that
>>> shows how to do it properly.

No, not this way. You were and still are asked for delivering the code. 
Don't pervert the threat of the discussion.

>>> Temp file, fsync, rename is often mentioned but that skips the
>>> preserving meta-data part and this part, which you also skipped:
>>> One use case would be updating a file in a safe way when you have
>>> write access to that file but not to anything else.
>>>
>> I think it is safe to say that the *only* option you have now is "temp
>> file, fsync, rename".
> I'm really looking for a concrete code snippet/function that does this.
> For example, file permissions should definitely be preserved.
>
>> There is no "generic atomic file data replace API in Linux", though it
>> is available via
>> private ioctl for XFS and EXT4.
>>
>> You have started a bit of a storm with your previous thread, which
>> doesn't help you
>> much in moving forward in the current thread (previous thread is still
>> more popular).
>> I suggest that you humbly swallow you need to know WHY is it hard to implement
>> non-durable atomic API and focus your attention on the very achievable
>> data replace API.
>>
>> IMHO, implementing atomic swap_inodes_data operation shouldn't be difficult
>> in most file systems (only implementation is simple, but testing and
>> maintaining
>> is not to be taken lightly).
>> Something along the lines of:
>> 1. aquire inodes write/truncate locks
>> 2. start transaction
>> 3. check/update quota limits
>> 4. swap inodes i_data content
>> 5. invalidate (or swap?) inodes page caches
>> 6. mark inodes dirty
>> 7. end transaction&  release locks
>>
>> The real challenge would be to get everyone to agree on a common API
>> and carve it in stone to the kernel's ABI (is it just swap_inodes_data?
>> maybe also swap_inode_data_ranges? what about some options?)
> Swapping data is an improvement but still not ideal. The API is also
> more complex than O_ATOMIC.
>
>> Also, as wacky and (some say) faulty the UNIX permissions models is,
>> current systems have grown old with it, and even 'improving' the behavior
>> of some applications, may wake up sleeping monsters, so it will not
>> be done until enough people have pointed out security or usability
>> issues, which could not be solved otherwise.
> Each app makes it's own decision about what API to use. Supporting
> atomic stuff doesn't change the behaviour of existing apps.

Wrong, we are talking here in the first place about general atomic FS 
operations. And to guarantee atomicity you have to change general FS 
functions in such a way that in the end all other applications are 
affected, or otherwise you have to implement an own (larger part of an) FS.
At this point there is no discussion anymore without code from you, 
because this subject is as well discussed to the maximum in information 
processing/informatics/computer science.

>> In other words, until you find an *application* that wants to allow other
>> user to modify the content of a file and preserve it's metadata and ownership.
>> And unless that application cannot find a better way to achieve what it wanted
>> to do in the first place, or unless that application already has a
>> large install base
>> which suffers from *a problem*, you will not have proven *the need*.
> Maybe I should ask devs of some large apps on their take of this issue.

Nonsense, because they are already using:
a) the functions available by an FS,
b) the functions available by a DBMS, or
c) a propritary special solution based on the available functions of the 
OS and additional functionality that they develope and maintain themselves
for their comparable use cases since decades due to the cost vs. benefit 
ratio.

>> Maybe preserving privileged extended attributes is *a need*.
>> I wouldn't know myself.
> Olaf

Christian Stroetmann

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 15:30                 ` Christian Stroetmann
@ 2010-12-29 15:35                   ` Olaf van der Spek
  2010-12-29 16:30                     ` Christian Stroetmann
  2010-12-29 17:15                   ` Greg Freemyer
  1 sibling, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-29 15:35 UTC (permalink / raw)
  To: Christian Stroetmann
  Cc: linux-fsdevel, linux-ext4, Ric Wheeler, Amir Goldstein

On Wed, Dec 29, 2010 at 4:30 PM, Christian Stroetmann
<stroetmann@ontolinux.com> wrote:
> On the 29.12.2010 13:42, Olaf van der Spek wrote:
>>>> Not really, unfortunately. Haven't seen a single link to code that
>>>> shows how to do it properly.
>
> No, not this way. You were and still are asked for delivering the code.
> Don't pervert the threat of the discussion.

I'm talking about the code for temp file, fsync, rename. Not about
O_ATOMIC code.

>> Each app makes it's own decision about what API to use. Supporting
>> atomic stuff doesn't change the behaviour of existing apps.
>
> Wrong, we are talking here in the first place about general atomic FS
> operations. And to guarantee atomicity you have to change general FS
> functions in such a way that in the end all other applications are affected,

Why's that?

> or otherwise you have to implement an own (larger part of an) FS.
> At this point there is no discussion anymore without code from you, because
> this subject is as well discussed to the maximum in information
> processing/informatics/computer science.

This subject? Exactly what subject?

>> Maybe I should ask devs of some large apps on their take of this issue.
>
> Nonsense, because they are already using:
> a) the functions available by an FS,

Of course. Does that mean the situation can't be improved for them?

> b) the functions available by a DBMS, or
> c) a propritary special solution based on the available functions of the OS
> and additional functionality that they develope and maintain themselves
> for their comparable use cases since decades due to the cost vs. benefit
> ratio.

Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 15:35                   ` Olaf van der Spek
@ 2010-12-29 16:30                     ` Christian Stroetmann
  2010-12-29 17:12                       ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Christian Stroetmann @ 2010-12-29 16:30 UTC (permalink / raw)
  To: Olaf van der Spek
  Cc: linux-fsdevel, linux-ext4, Ted Ts'o, Ric Wheeler,
	Amir Goldstein

On the 29.12.2010 16:35, Olaf van der Spek wrote:
> On Wed, Dec 29, 2010 at 4:30 PM, Christian Stroetmann
> <stroetmann@ontolinux.com>  wrote:
>> On the 29.12.2010 13:42, Olaf van der Spek wrote:
>>>>> Not really, unfortunately. Haven't seen a single link to code that
>>>>> shows how to do it properly.
>> No, not this way. You were and still are asked for delivering the code.
>> Don't pervert the threat of the discussion.
> I'm talking about the code for temp file, fsync, rename. Not about
> O_ATOMIC code.

Maybe you have not understood the hints: It doesn't matter anymore about 
what you are talking unless you present code.

>>> Each app makes it's own decision about what API to use. Supporting
>>> atomic stuff doesn't change the behaviour of existing apps.
>> Wrong, we are talking here in the first place about general atomic FS
>> operations. And to guarantee atomicity you have to change general FS
>> functions in such a way that in the end all other applications are affected,
> Why's that?

read the paragraph as a whole

>> or otherwise you have to implement an own (larger part of an) FS.
>> At this point there is no discussion anymore without code from you, because
>> this subject is as well discussed to the maximum in information
>> processing/informatics/computer science.
> This subject? Exactly what subject?

read the begining of the paragraph

>>> Maybe I should ask devs of some large apps on their take of this issue.
>> Nonsense, because they are already using:
>> a) the functions available by an FS,
> Of course. Does that mean the situation can't be improved for them?

Do you have any code that improves the situation to discuss here?

>> b) the functions available by a DBMS, or
>> c) a propritary special solution based on the available functions of the OS
>> and additional functionality that they develope and maintain themselves
>> for their comparable use cases since decades due to the cost vs. benefit
>> ratio.
> Olaf

Christian Stroetmann

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 16:30                     ` Christian Stroetmann
@ 2010-12-29 17:12                       ` Olaf van der Spek
  0 siblings, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2010-12-29 17:12 UTC (permalink / raw)
  To: Christian Stroetmann
  Cc: linux-fsdevel, linux-ext4, Ted Ts'o, Ric Wheeler,
	Amir Goldstein

On Wed, Dec 29, 2010 at 5:30 PM, Christian Stroetmann
<stroetmann@ontolinux.com> wrote:
>> I'm talking about the code for temp file, fsync, rename. Not about
>> O_ATOMIC code.
>
> Maybe you have not understood the hints: It doesn't matter anymore about
> what you are talking unless you present code.

What code?

>>>> Each app makes it's own decision about what API to use. Supporting
>>>> atomic stuff doesn't change the behaviour of existing apps.
>>>
>>> Wrong, we are talking here in the first place about general atomic FS
>>> operations. And to guarantee atomicity you have to change general FS
>>> functions in such a way that in the end all other applications are
>>> affected,
>>
>> Why's that?
>
> read the paragraph as a whole

I have. Still wondering why.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 15:30                 ` Christian Stroetmann
  2010-12-29 15:35                   ` Olaf van der Spek
@ 2010-12-29 17:15                   ` Greg Freemyer
  2010-12-29 19:30                     ` Christian Stroetmann
  1 sibling, 1 reply; 47+ messages in thread
From: Greg Freemyer @ 2010-12-29 17:15 UTC (permalink / raw)
  To: Christian Stroetmann
  Cc: Olaf van der Spek, linux-fsdevel, linux-ext4, Ric Wheeler,
	Amir Goldstein

On Wed, Dec 29, 2010 at 10:30 AM, Christian Stroetmann
<stroetmann@ontolinux.com> wrote:
> On the 29.12.2010 13:42, Olaf van der Spek wrote:
>>
>> On Wed, Dec 29, 2010 at 10:20 AM, Amir Goldstein<amir73il@gmail.com>
>>  wrote:
>>>
>>> On Wed, Dec 29, 2010 at 12:58 AM, Olaf van der Spek
>>> <olafvdspek@gmail.com>  wrote:
>>>>
>>>> On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler<rwheeler@redhat.com>
>>>>  wrote:
>>>>>
>>>>> I think that various developers have answered this for you several
>>>>> times.
>>>>
>>>> Not really, unfortunately. Haven't seen a single link to code that
>>>> shows how to do it properly.
>
> No, not this way. You were and still are asked for delivering the code.
> Don't pervert the threat of the discussion.
>
>>>> Temp file, fsync, rename is often mentioned but that skips the
>>>> preserving meta-data part and this part, which you also skipped:
>>>> One use case would be updating a file in a safe way when you have
>>>> write access to that file but not to anything else.
>>>>
>>> I think it is safe to say that the *only* option you have now is "temp
>>> file, fsync, rename".
>>
>> I'm really looking for a concrete code snippet/function that does this.
>> For example, file permissions should definitely be preserved.
>>
>>> There is no "generic atomic file data replace API in Linux", though it
>>> is available via
>>> private ioctl for XFS and EXT4.
>>>
>>> You have started a bit of a storm with your previous thread, which
>>> doesn't help you
>>> much in moving forward in the current thread (previous thread is still
>>> more popular).
>>> I suggest that you humbly swallow you need to know WHY is it hard to
>>> implement
>>> non-durable atomic API and focus your attention on the very achievable
>>> data replace API.
>>>
>>> IMHO, implementing atomic swap_inodes_data operation shouldn't be
>>> difficult
>>> in most file systems (only implementation is simple, but testing and
>>> maintaining
>>> is not to be taken lightly).
>>> Something along the lines of:
>>> 1. aquire inodes write/truncate locks
>>> 2. start transaction
>>> 3. check/update quota limits
>>> 4. swap inodes i_data content
>>> 5. invalidate (or swap?) inodes page caches
>>> 6. mark inodes dirty
>>> 7. end transaction&  release locks
>>>
>>> The real challenge would be to get everyone to agree on a common API
>>> and carve it in stone to the kernel's ABI (is it just swap_inodes_data?
>>> maybe also swap_inode_data_ranges? what about some options?)
>>
>> Swapping data is an improvement but still not ideal. The API is also
>> more complex than O_ATOMIC.
>>
>>> Also, as wacky and (some say) faulty the UNIX permissions models is,
>>> current systems have grown old with it, and even 'improving' the behavior
>>> of some applications, may wake up sleeping monsters, so it will not
>>> be done until enough people have pointed out security or usability
>>> issues, which could not be solved otherwise.
>>
>> Each app makes it's own decision about what API to use. Supporting
>> atomic stuff doesn't change the behaviour of existing apps.
>
> Wrong, we are talking here in the first place about general atomic FS
> operations. And to guarantee atomicity you have to change general FS
> functions in such a way that in the end all other applications are affected,
> or otherwise you have to implement an own (larger part of an) FS.
> At this point there is no discussion anymore without code from you, because
> this subject is as well discussed to the maximum in information
> processing/informatics/computer science.
>
>>> In other words, until you find an *application* that wants to allow other
>>> user to modify the content of a file and preserve it's metadata and
>>> ownership.
>>> And unless that application cannot find a better way to achieve what it
>>> wanted
>>> to do in the first place, or unless that application already has a
>>> large install base
>>> which suffers from *a problem*, you will not have proven *the need*.
>>
>> Maybe I should ask devs of some large apps on their take of this issue.
>
> Nonsense, because they are already using:
> a) the functions available by an FS,
> b) the functions available by a DBMS, or
> c) a propritary special solution based on the available functions of the OS
> and additional functionality that they develope and maintain themselves
> for their comparable use cases since decades due to the cost vs. benefit
> ratio.

<sarcasm>
Olaf, clearly if you want to find issues / use cases for your new API
you should not talk to developers of complex tools.  They have it all
figured out.

It's only you that doesn't know how to code up a userspace solution to
the problem.
<\sarcasm>

Surely productivity suites like openoffice have to address the issue.
How satisfied they are I don't know.  And despite Neil's argument that
only one user should be able to write to a given doc, that is just not
how normal office suites work today.

Also, I believe KDE and its myriad of config files has issues with
major config file corruption due to unexpected shutdowns during the
config file update process, so they certainly don't have it figured
out.

Why don't they use the temp file, fsync, rename process?

Those are the 2 user-space suites I would go investigate first.  I'm
sure there are many others.

Also, I believe Windows offers an API like your proposing.  How does
Samba support it?

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2010-12-29 17:15                   ` Greg Freemyer
@ 2010-12-29 19:30                     ` Christian Stroetmann
  0 siblings, 0 replies; 47+ messages in thread
From: Christian Stroetmann @ 2010-12-29 19:30 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: linux-fsdevel, linux-ext4, Olaf van der Spek, Ric Wheeler,
	Amir Goldstein, Neil Brown

On the 29.12.2010 18:15, Greg Freemyer wrote:
> On Wed, Dec 29, 2010 at 10:30 AM, Christian Stroetmann
> <stroetmann@ontolinux.com>  wrote:
>> On the 29.12.2010 13:42, Olaf van der Spek wrote:
>>> On Wed, Dec 29, 2010 at 10:20 AM, Amir Goldstein<amir73il@gmail.com>
>>>   wrote:
>>>> On Wed, Dec 29, 2010 at 12:58 AM, Olaf van der Spek
>>>> <olafvdspek@gmail.com>    wrote:
>>>>> On Tue, Dec 28, 2010 at 11:36 PM, Ric Wheeler<rwheeler@redhat.com>
>>>>>   wrote:
>>>>>> I think that various developers have answered this for you several
>>>>>> times.
>>>>> Not really, unfortunately. Haven't seen a single link to code that
>>>>> shows how to do it properly.
>> No, not this way. You were and still are asked for delivering the code.
>> Don't pervert the threat of the discussion.
>>
>>>>> Temp file, fsync, rename is often mentioned but that skips the
>>>>> preserving meta-data part and this part, which you also skipped:
>>>>> One use case would be updating a file in a safe way when you have
>>>>> write access to that file but not to anything else.
>>>>>
>>>> I think it is safe to say that the *only* option you have now is "temp
>>>> file, fsync, rename".
>>> I'm really looking for a concrete code snippet/function that does this.
>>> For example, file permissions should definitely be preserved.
>>>
>>>> There is no "generic atomic file data replace API in Linux", though it
>>>> is available via
>>>> private ioctl for XFS and EXT4.
>>>>
>>>> You have started a bit of a storm with your previous thread, which
>>>> doesn't help you
>>>> much in moving forward in the current thread (previous thread is still
>>>> more popular).
>>>> I suggest that you humbly swallow you need to know WHY is it hard to
>>>> implement
>>>> non-durable atomic API and focus your attention on the very achievable
>>>> data replace API.
>>>>
>>>> IMHO, implementing atomic swap_inodes_data operation shouldn't be
>>>> difficult
>>>> in most file systems (only implementation is simple, but testing and
>>>> maintaining
>>>> is not to be taken lightly).
>>>> Something along the lines of:
>>>> 1. aquire inodes write/truncate locks
>>>> 2. start transaction
>>>> 3. check/update quota limits
>>>> 4. swap inodes i_data content
>>>> 5. invalidate (or swap?) inodes page caches
>>>> 6. mark inodes dirty
>>>> 7. end transaction&    release locks
>>>>
>>>> The real challenge would be to get everyone to agree on a common API
>>>> and carve it in stone to the kernel's ABI (is it just swap_inodes_data?
>>>> maybe also swap_inode_data_ranges? what about some options?)
>>> Swapping data is an improvement but still not ideal. The API is also
>>> more complex than O_ATOMIC.
>>>
>>>> Also, as wacky and (some say) faulty the UNIX permissions models is,
>>>> current systems have grown old with it, and even 'improving' the behavior
>>>> of some applications, may wake up sleeping monsters, so it will not
>>>> be done until enough people have pointed out security or usability
>>>> issues, which could not be solved otherwise.
>>> Each app makes it's own decision about what API to use. Supporting
>>> atomic stuff doesn't change the behaviour of existing apps.
>> Wrong, we are talking here in the first place about general atomic FS
>> operations. And to guarantee atomicity you have to change general FS
>> functions in such a way that in the end all other applications are affected,
>> or otherwise you have to implement an own (larger part of an) FS.
>> At this point there is no discussion anymore without code from you, because
>> this subject is as well discussed to the maximum in information
>> processing/informatics/computer science.
>>
>>>> In other words, until you find an *application* that wants to allow other
>>>> user to modify the content of a file and preserve it's metadata and
>>>> ownership.
>>>> And unless that application cannot find a better way to achieve what it
>>>> wanted
>>>> to do in the first place, or unless that application already has a
>>>> large install base
>>>> which suffers from *a problem*, you will not have proven *the need*.
>>> Maybe I should ask devs of some large apps on their take of this issue.
>> Nonsense, because they are already using:
>> a) the functions available by an FS,
>> b) the functions available by a DBMS, or
>> c) a propritary special solution based on the available functions of the OS
>> and additional functionality that they develope and maintain themselves
>> for their comparable use cases since decades due to the cost vs. benefit
>> ratio.
> <sarcasm>
> Olaf, clearly if you want to find issues / use cases for your new API
> you should not talk to developers of complex tools.  They have it all
> figured out.
>
> It's only you that doesn't know how to code up a userspace solution to
> the problem.
> <\sarcasm>

<no_sarcasm>
This is not the place for sarcasm.
</no_sarcasm>

> Surely productivity suites like openoffice have to address the issue.
> How satisfied they are I don't know.  And despite Neil's argument that
> only one user should be able to write to a given doc, that is just not
> how normal office suites work today.

I think that Neil doesn't meant it in this way or context.

> Also, I believe KDE and its myriad of config files has issues with
> major config file corruption due to unexpected shutdowns during the
> config file update process, so they certainly don't have it figured
> out.
>
> Why don't they use the temp file, fsync, rename process?

<no_sarcasm>
Because they figured it out?!
</no_sarcasm>

> Those are the 2 user-space suites I would go investigate first.  I'm
> sure there are many others.
>
> Also, I believe Windows offers an API like your proposing.  How does
> Samba support it?
>
> Greg
>

<no sarcasm>
Furthermore, in conjunction with the given 2 user-space suites it was said:
"I don't know" and
"I believe".
</no sarcasm>

==>  leaving the thread
Please don't TO and CC anymore.
E-mails that are related with this thread will be sorted by name and then deleted without reading on the behalf of the reciever.


Christian Stroetmann


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Atomic file data replace API
@ 2011-01-06 20:01 Olaf van der Spek
  2011-01-07 13:55 ` Mike Fleetwood
  2011-01-07 14:58 ` Chris Mason
  0 siblings, 2 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-06 20:01 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Does btrfs support atomic file data replaces? Basically, the atomic
variant of this:
// old stage
open(O_TRUNC)
write() // 0+ times
close()
// new state
-- 
Olaf

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-06 20:01 Olaf van der Spek
@ 2011-01-07 13:55 ` Mike Fleetwood
  2011-01-07 14:01   ` Olaf van der Spek
  2011-01-07 14:58 ` Chris Mason
  1 sibling, 1 reply; 47+ messages in thread
From: Mike Fleetwood @ 2011-01-07 13:55 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

On 6 January 2011 20:01, Olaf van der Spek <olafvdspek@gmail.com> wrote:
> Hi,
>
> Does btrfs support atomic file data replaces?

Hi Olaf,

Yes btrfs does support atomic replace, since kernel 2.6.30 circa June 2009. [1]

Special handling was added to ext3, ext4, btrfs (and probably other
Linux FSs) for your replace-via-truncate and the alternative
replace-via-rename application patterns.  Try reading "Delayed
allocation and the zero-length file problem" article and comments by
Ted Ts'o for further discussion. [2]

Mike
-- 
[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a3f23d515a2ebf0c750db80579ca57b28cbce6d
[2] http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 13:55 ` Mike Fleetwood
@ 2011-01-07 14:01   ` Olaf van der Spek
  2011-01-07 14:10     ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 14:01 UTC (permalink / raw)
  To: Mike Fleetwood; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 2:55 PM, Mike Fleetwood
<mike.fleetwood@googlemail.com> wrote:
> On 6 January 2011 20:01, Olaf van der Spek <olafvdspek@gmail.com> wro=
te:
>> Hi,
>>
>> Does btrfs support atomic file data replaces?
>
> Hi Olaf,
>
> Yes btrfs does support atomic replace, since kernel 2.6.30 circa June=
 2009. [1]
>
> Special handling was added to ext3, ext4, btrfs (and probably other
> Linux FSs) for your replace-via-truncate and the alternative
> replace-via-rename application patterns. =C2=A0Try reading "Delayed
> allocation and the zero-length file problem" article and comments by
> Ted Ts'o for further discussion. [2]

According to Ted, via-truncate and via-rename are unsafe. Only fsync,
rename is safe.
Disadvantage of rename is resetting file owner (if non-root), having
issues with meta-data and other stuff.

My proposal was for an open flag, O_ATOMIC, to be introduced to tell
the FS the whole file update should be done atomically.
Ted says this is too hard in ext4, so I was wondering if this would be
possible in btrfs.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 14:01   ` Olaf van der Spek
@ 2011-01-07 14:10     ` Olaf van der Spek
  0 siblings, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 14:10 UTC (permalink / raw)
  To: Mike Fleetwood; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 3:01 PM, Olaf van der Spek <olafvdspek@gmail.com> wrote:
> According to Ted, via-truncate and via-rename are unsafe. Only fsync,
> rename is safe.
> Disadvantage of rename is resetting file owner (if non-root), having
> issues with meta-data and other stuff.
>
> My proposal was for an open flag, O_ATOMIC, to be introduced to tell
> the FS the whole file update should be done atomically.
> Ted says this is too hard in ext4, so I was wondering if this would be
> possible in btrfs.

http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2082
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2089
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/#comment-2090

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-06 20:01 Olaf van der Spek
  2011-01-07 13:55 ` Mike Fleetwood
@ 2011-01-07 14:58 ` Chris Mason
  2011-01-07 15:01   ` Olaf van der Spek
  2011-01-08  1:11   ` Phillip Susi
  1 sibling, 2 replies; 47+ messages in thread
From: Chris Mason @ 2011-01-07 14:58 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0500:
> Hi,
> 
> Does btrfs support atomic file data replaces? Basically, the atomic
> variant of this:
> // old stage
> open(O_TRUNC)
> write() // 0+ times
> close()
> // new state

Yes and no.  We have a best effort mechanism where we try to guess that
since you've done this truncate and the write that you want the writes
to show up quickly.  But its a guess.

The problem is the write() // 0+ times.  The kernel has no idea what
new result you want the file to contain because the application isn't
telling us.

What btrfs can do (but we haven't yet implemented) is make sure that the
results of a single write file are on disk atomically, even if they are
replacing existing bytes in the file.

Because we cow and because we don't update metadata pointers until the
IO is complete, we can wait until all the IO for a given write call is
on disk before we update any of the metadata.

This isn't hard, it's on my TODO list.

-chris

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 14:58 ` Chris Mason
@ 2011-01-07 15:01   ` Olaf van der Spek
  2011-01-07 15:05     ` Chris Mason
  2011-01-08  1:11   ` Phillip Susi
  1 sibling, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 15:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
> Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -050=
0:
>> Hi,
>>
>> Does btrfs support atomic file data replaces? Basically, the atomic
>> variant of this:
>> // old stage
>> open(O_TRUNC)
>> write() // 0+ times
>> close()
>> // new state
>
> Yes and no. =C2=A0We have a best effort mechanism where we try to gue=
ss that
> since you've done this truncate and the write that you want the write=
s
> to show up quickly. =C2=A0But its a guess.
>
> The problem is the write() // 0+ times. =C2=A0The kernel has no idea =
what
> new result you want the file to contain because the application isn't
> telling us.

Isn't it safe for the kernel to wait until the first write or close
before writing anything to disk?

> What btrfs can do (but we haven't yet implemented) is make sure that =
the
> results of a single write file are on disk atomically, even if they a=
re
> replacing existing bytes in the file.
>
> Because we cow and because we don't update metadata pointers until th=
e
> IO is complete, we can wait until all the IO for a given write call i=
s
> on disk before we update any of the metadata.
>
> This isn't hard, it's on my TODO list.

What about a new flag: O_ATOMIC that'd take the guesswork out of the ke=
rnel?

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:01   ` Olaf van der Spek
@ 2011-01-07 15:05     ` Chris Mason
  2011-01-07 15:08       ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2011-01-07 15:05 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

Excerpts from Olaf van der Spek's message of 2011-01-07 10:01:59 -0500:
> On Fri, Jan 7, 2011 at 3:58 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> > Excerpts from Olaf van der Spek's message of 2011-01-06 15:01:15 -0=
500:
> >> Hi,
> >>
> >> Does btrfs support atomic file data replaces? Basically, the atomi=
c
> >> variant of this:
> >> // old stage
> >> open(O_TRUNC)
> >> write() // 0+ times
> >> close()
> >> // new state
> >
> > Yes and no. =C2=A0We have a best effort mechanism where we try to g=
uess that
> > since you've done this truncate and the write that you want the wri=
tes
> > to show up quickly. =C2=A0But its a guess.
> >
> > The problem is the write() // 0+ times. =C2=A0The kernel has no ide=
a what
> > new result you want the file to contain because the application isn=
't
> > telling us.
>=20
> Isn't it safe for the kernel to wait until the first write or close
> before writing anything to disk?

I'm afraid not.  Picture an application that opens a thousand files and
writes 1MB to each of them, and then didn't close any.  If we waited
until close, you'd have 1GB of memory pinned or staged somehow.

>=20
> > What btrfs can do (but we haven't yet implemented) is make sure tha=
t the
> > results of a single write file are on disk atomically, even if they=
 are
> > replacing existing bytes in the file.
> >
> > Because we cow and because we don't update metadata pointers until =
the
> > IO is complete, we can wait until all the IO for a given write call=
 is
> > on disk before we update any of the metadata.
> >
> > This isn't hard, it's on my TODO list.
>=20
> What about a new flag: O_ATOMIC that'd take the guesswork out of the =
kernel?

We can't guess beyond a single write call.  Otherwise we get into
the problem above where an application can force the kernel to wait
forever.  I'm not against O_ATOMIC to enable the new btrfs
functionality, but it will still be limited to one write.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:05     ` Chris Mason
@ 2011-01-07 15:08       ` Olaf van der Spek
  2011-01-07 15:13         ` Chris Mason
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 15:08 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 4:05 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
>> > The problem is the write() // 0+ times. =C2=A0The kernel has no id=
ea what
>> > new result you want the file to contain because the application is=
n't
>> > telling us.
>>
>> Isn't it safe for the kernel to wait until the first write or close
>> before writing anything to disk?
>
> I'm afraid not. =C2=A0Picture an application that opens a thousand fi=
les and
> writes 1MB to each of them, and then didn't close any. =C2=A0If we wa=
ited
> until close, you'd have 1GB of memory pinned or staged somehow.

That's not what I asked. ;)
I asked to wait until the first write (or close). That way, you don't
get unintentional empty files.
One step further, you don't have to keep the data in memory, you're
free to write them to disk. You just wouldn't update the meta-data
(yet).

>> > This isn't hard, it's on my TODO list.
>>
>> What about a new flag: O_ATOMIC that'd take the guesswork out of the=
 kernel?
>
> We can't guess beyond a single write call. =C2=A0Otherwise we get int=
o
> the problem above where an application can force the kernel to wait
> forever. =C2=A0I'm not against O_ATOMIC to enable the new btrfs
> functionality, but it will still be limited to one write.
>
> -chris
>



--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:08       ` Olaf van der Spek
@ 2011-01-07 15:13         ` Chris Mason
  2011-01-07 15:17           ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2011-01-07 15:13 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

Excerpts from Olaf van der Spek's message of 2011-01-07 10:08:24 -0500:
> On Fri, Jan 7, 2011 at 4:05 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> >> > The problem is the write() // 0+ times. =C2=A0The kernel has no =
idea what
> >> > new result you want the file to contain because the application =
isn't
> >> > telling us.
> >>
> >> Isn't it safe for the kernel to wait until the first write or clos=
e
> >> before writing anything to disk?
> >
> > I'm afraid not. =C2=A0Picture an application that opens a thousand =
files and
> > writes 1MB to each of them, and then didn't close any. =C2=A0If we =
waited
> > until close, you'd have 1GB of memory pinned or staged somehow.
>=20
> That's not what I asked. ;)
> I asked to wait until the first write (or close). That way, you don't
> get unintentional empty files.
> One step further, you don't have to keep the data in memory, you're
> free to write them to disk. You just wouldn't update the meta-data
> (yet).

Sorry ;) Picture an application that truncates 1024 files without closi=
ng any
of them.  Basically any operation that includes the kernel waiting for
applications because they promise to do something soon is a denial of
service attack, or a really easy way to run out of memory on the box.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:13         ` Chris Mason
@ 2011-01-07 15:17           ` Olaf van der Spek
  2011-01-07 16:12             ` Chris Mason
  2011-01-07 16:32             ` Massimo Maggi
  0 siblings, 2 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 15:17 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
>> That's not what I asked. ;)
>> I asked to wait until the first write (or close). That way, you don'=
t
>> get unintentional empty files.
>> One step further, you don't have to keep the data in memory, you're
>> free to write them to disk. You just wouldn't update the meta-data
>> (yet).
>
> Sorry ;) Picture an application that truncates 1024 files without clo=
sing any
> of them. =C2=A0Basically any operation that includes the kernel waiti=
ng for
> applications because they promise to do something soon is a denial of
> service attack, or a really easy way to run out of memory on the box.

I'm not sure why you would run out of memory in that case.

O_ATOMIC would be the solution for the rename workaround: write temp
file, rename
With advantages like a way simpler API, no issues with resetting
meta-data, no issues with temp file and maybe better performance.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:17           ` Olaf van der Spek
@ 2011-01-07 16:12             ` Chris Mason
  2011-01-07 16:19               ` Olaf van der Spek
  2011-01-07 16:26               ` Hubert Kario
  2011-01-07 16:32             ` Massimo Maggi
  1 sibling, 2 replies; 47+ messages in thread
From: Chris Mason @ 2011-01-07 16:12 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500:
> On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
> >> That's not what I asked. ;)
> >> I asked to wait until the first write (or close). That way, you do=
n't
> >> get unintentional empty files.
> >> One step further, you don't have to keep the data in memory, you'r=
e
> >> free to write them to disk. You just wouldn't update the meta-data
> >> (yet).
> >
> > Sorry ;) Picture an application that truncates 1024 files without c=
losing any
> > of them. =C2=A0Basically any operation that includes the kernel wai=
ting for
> > applications because they promise to do something soon is a denial =
of
> > service attack, or a really easy way to run out of memory on the bo=
x.
>=20
> I'm not sure why you would run out of memory in that case.

Well, lets make sure I've got a good handle on the proposed interface:

1) fd =3D open(some_file, O_ATOMIC)
2) truncate(fd, 0)
3) write(fd, new data)

The semantics are that we promise not to let the truncate hit the disk
until the application does the write.

We have a few choices on how we do this:

1) Leave the disk untouched, but keep something in memory that says thi=
s
inode is really truncated

2) Record on disk that we've done our atomic truncate but it is still
pending.  We'd need some way to remove or invalidate this record after =
a
crash.

3) Go ahead and do the operation but don't allow the transaction to
commit until the write is done.

option #1: keep something in memory.  Well, any time we have a
requirement to pin something in memory until userland decides to do a
write, we risk oom.

option #2: disk format change.  Actually somewhat complex because if we
haven't crashed, we need to be able to read the inode in again without
invalidating the record but if we do crash, we have to invalidate the
record.  Not impossible, but not trivial.

option #3: Pin the whole transaction.  Depending on the FS this may be
impossible.  Certain operations require us to commit the transaction to
reclaim space, and we cannot allow userland to put that on hold without
deadlocking.

What most people don't realize about the crash safe filesystems is they
don't have fine grained transactions.  There is one single transaction
for all the operations done.  This is mostly because it is less complex
and much faster, but it also makes any 'pin the whole transaction' type
system unusable.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 16:12             ` Chris Mason
@ 2011-01-07 16:19               ` Olaf van der Spek
  2011-01-07 16:26               ` Hubert Kario
  1 sibling, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 16:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 5:12 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
>> I'm not sure why you would run out of memory in that case.
>
> Well, lets make sure I've got a good handle on the proposed interface=
:
>
> 1) fd =3D open(some_file, O_ATOMIC)

No, O_TRUNC should be used in open. Maybe it works with a separate trun=
cate too.

> 2) truncate(fd, 0)
> 3) write(fd, new data)
>
> The semantics are that we promise not to let the truncate hit the dis=
k
> until the application does the write.
>
> We have a few choices on how we do this:
>
> 1) Leave the disk untouched, but keep something in memory that says t=
his
> inode is really truncated
>
> 2) Record on disk that we've done our atomic truncate but it is still
> pending. =C2=A0We'd need some way to remove or invalidate this record=
 after a
> crash.
>
> 3) Go ahead and do the operation but don't allow the transaction to
> commit until the write is done.
>
> option #1: keep something in memory. =C2=A0Well, any time we have a
> requirement to pin something in memory until userland decides to do a
> write, we risk oom.

Since the file is open, you have to keep something in memory anyway,
right? Adding a bit (or bool) does not make a difference IMO.
Isn't this comparable to opening a temp file?

> option #2: disk format change. =C2=A0Actually somewhat complex becaus=
e if we
> haven't crashed, we need to be able to read the inode in again withou=
t
> invalidating the record but if we do crash, we have to invalidate the
> record. =C2=A0Not impossible, but not trivial.
>
> option #3: Pin the whole transaction. =C2=A0Depending on the FS this =
may be
> impossible. =C2=A0Certain operations require us to commit the transac=
tion to
> reclaim space, and we cannot allow userland to put that on hold witho=
ut
> deadlocking.

#1 is the only one that makes sense.

> What most people don't realize about the crash safe filesystems is th=
ey
> don't have fine grained transactions. =C2=A0There is one single trans=
action
> for all the operations done. =C2=A0This is mostly because it is less =
complex
> and much faster, but it also makes any 'pin the whole transaction' ty=
pe
> system unusable.

AFAIK the cost is mostly more complex code / runtime. The cost is not
disk performance.

--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 16:12             ` Chris Mason
  2011-01-07 16:19               ` Olaf van der Spek
@ 2011-01-07 16:26               ` Hubert Kario
  2011-01-07 19:29                 ` Chris Mason
  1 sibling, 1 reply; 47+ messages in thread
From: Hubert Kario @ 2011-01-07 16:26 UTC (permalink / raw)
  To: Chris Mason; +Cc: Olaf van der Spek, linux-btrfs

On Friday, January 07, 2011 17:12:11 Chris Mason wrote:
> Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -050=
0:
> > On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com=
>=20
wrote:
> > >> That's not what I asked. ;)
> > >> I asked to wait until the first write (or close). That way, you =
don't
> > >> get unintentional empty files.
> > >> One step further, you don't have to keep the data in memory, you=
're
> > >> free to write them to disk. You just wouldn't update the meta-da=
ta
> > >> (yet).
> > >=20
> > > Sorry ;) Picture an application that truncates 1024 files without
> > > closing any of them.  Basically any operation that includes the k=
ernel
> > > waiting for applications because they promise to do something soo=
n is
> > > a denial of service attack, or a really easy way to run out of me=
mory
> > > on the box.
> >=20
> > I'm not sure why you would run out of memory in that case.
>=20
> Well, lets make sure I've got a good handle on the proposed interface=
:
>=20
> 1) fd =3D open(some_file, O_ATOMIC)
> 2) truncate(fd, 0)
> 3) write(fd, new data)
>=20
> The semantics are that we promise not to let the truncate hit the dis=
k
> until the application does the write.
>=20
> We have a few choices on how we do this:
>=20
> 1) Leave the disk untouched, but keep something in memory that says t=
his
> inode is really truncated
>=20
> 2) Record on disk that we've done our atomic truncate but it is still
> pending.  We'd need some way to remove or invalidate this record afte=
r a
> crash.
>=20
> 3) Go ahead and do the operation but don't allow the transaction to
> commit until the write is done.
>=20
> option #1: keep something in memory.  Well, any time we have a
> requirement to pin something in memory until userland decides to do a
> write, we risk oom.

Userland has already a file descriptor allocated (which can fail anyway=
=20
because of OOM), I see no problem in increasing the size of kernel memo=
ry=20
usage by 4 bytes (if not less) just to note that the application wants =
to see=20
the file as truncated (1 bit) and the next write has to be atomic (2nd =
bit?).

--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=C3=B3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 15:17           ` Olaf van der Spek
  2011-01-07 16:12             ` Chris Mason
@ 2011-01-07 16:32             ` Massimo Maggi
  2011-01-07 16:34               ` Olaf van der Spek
  1 sibling, 1 reply; 47+ messages in thread
From: Massimo Maggi @ 2011-01-07 16:32 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: linux-btrfs

Are you suggesting to do:
1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file
2)application writes to that fd, with one or more system calls, in a
short time or in long time, at his will.
3)at fclose (or even at fsync ) atomically swap "data pointer" of "real
file" with "temp file", then delete temp.In a transparent mode to
userland.  (something similar to e4defrag).
Is this sum up correct?

Massimo Maggi

Il 07/01/2011 16:17, Olaf van der Spek ha scritto:
> On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com> wrote:
>>> That's not what I asked. ;)
>>> I asked to wait until the first write (or close). That way, you don't
>>> get unintentional empty files.
>>> One step further, you don't have to keep the data in memory, you're
>>> free to write them to disk. You just wouldn't update the meta-data
>>> (yet).
>> Sorry ;) Picture an application that truncates 1024 files without closing any
>> of them.  Basically any operation that includes the kernel waiting for
>> applications because they promise to do something soon is a denial of
>> service attack, or a really easy way to run out of memory on the box.
> I'm not sure why you would run out of memory in that case.
>
> O_ATOMIC would be the solution for the rename workaround: write temp
> file, rename
> With advantages like a way simpler API, no issues with resetting
> meta-data, no issues with temp file and maybe better performance.
>
> Olaf
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 16:32             ` Massimo Maggi
@ 2011-01-07 16:34               ` Olaf van der Spek
  2011-01-07 19:29                 ` Thomas Bellman
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-07 16:34 UTC (permalink / raw)
  To: Massimo Maggi; +Cc: linux-btrfs

On Fri, Jan 7, 2011 at 5:32 PM, Massimo Maggi <massimo@mmmm.it> wrote:
> Are you suggesting to do:
> 1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file
> 2)application writes to that fd, with one or more system calls, in a
> short time or in long time, at his will.
> 3)at fclose (or even at fsync ) atomically swap "data pointer" of "re=
al
> file" with "temp file", then delete temp.In a transparent mode to
> userland. =C2=A0(something similar to e4defrag).
> Is this sum up correct?

Almost. Swap should probably not be done at fsync time.
Other open references (for example running executables) should be swapp=
ed too.

The new-file case has to be handled too.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 16:34               ` Olaf van der Spek
@ 2011-01-07 19:29                 ` Thomas Bellman
  2011-01-08 14:36                   ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Bellman @ 2011-01-07 19:29 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Massimo Maggi, linux-btrfs

Olaf van der Spek wrote:

> On Fri, Jan 7, 2011 at 5:32 PM, Massimo Maggi <massimo@mmmm.it> wrote:
>> Are you suggesting to do:
>> 1)fopen with O_TRUNC, O_ATOMIC: returns fd to a temporary file
>> 2)application writes to that fd, with one or more system calls, in a
>> short time or in long time, at his will.
>> 3)at fclose (or even at fsync ) atomically swap "data pointer" of "real
>> file" with "temp file", then delete temp.In a transparent mode to
>> userland.  (something similar to e4defrag).
>> Is this sum up correct?
> 
> Almost. Swap should probably not be done at fsync time.
> Other open references (for example running executables) should be swapped too.

What is the visibility of the changes for other processes supposed
to be in the meantime?  I.e., if things happen in this order:

1. Process A does fda = open("foo.txt", O_TRUNC|O_ATOMIC)
2. Process B does fdb = open("foo.txt", O_RDONLY)
3. B does read(fdb, buf, 4096)
4. A does write(fda, "NEW DATA\n", 9)
5. Process C comes in and does fdc = open("foo.txt", O_RDONLY)
6. C does read(fdc, buf, 4096)
7. A calls close(fda)

Does B see an empty file, or does it see the old contents of
the file?  Does C see "NEW DATA\n", or does it see the old
contents of the file, or perhaps an empty file?


	/Bellman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 16:26               ` Hubert Kario
@ 2011-01-07 19:29                 ` Chris Mason
  2011-01-08 14:40                   ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2011-01-07 19:29 UTC (permalink / raw)
  To: Hubert Kario; +Cc: Olaf van der Spek, linux-btrfs

Excerpts from Hubert Kario's message of 2011-01-07 11:26:02 -0500:
> On Friday, January 07, 2011 17:12:11 Chris Mason wrote:
> > Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500:
> > > On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@oracle.com> 
> wrote:
> > > >> That's not what I asked. ;)
> > > >> I asked to wait until the first write (or close). That way, you don't
> > > >> get unintentional empty files.
> > > >> One step further, you don't have to keep the data in memory, you're
> > > >> free to write them to disk. You just wouldn't update the meta-data
> > > >> (yet).
> > > > 
> > > > Sorry ;) Picture an application that truncates 1024 files without
> > > > closing any of them.  Basically any operation that includes the kernel
> > > > waiting for applications because they promise to do something soon is
> > > > a denial of service attack, or a really easy way to run out of memory
> > > > on the box.
> > > 
> > > I'm not sure why you would run out of memory in that case.
> > 
> > Well, lets make sure I've got a good handle on the proposed interface:
> > 
> > 1) fd = open(some_file, O_ATOMIC)
> > 2) truncate(fd, 0)
> > 3) write(fd, new data)
> > 
> > The semantics are that we promise not to let the truncate hit the disk
> > until the application does the write.
> > 
> > We have a few choices on how we do this:
> > 
> > 1) Leave the disk untouched, but keep something in memory that says this
> > inode is really truncated
> > 
> > 2) Record on disk that we've done our atomic truncate but it is still
> > pending.  We'd need some way to remove or invalidate this record after a
> > crash.
> > 
> > 3) Go ahead and do the operation but don't allow the transaction to
> > commit until the write is done.
> > 
> > option #1: keep something in memory.  Well, any time we have a
> > requirement to pin something in memory until userland decides to do a
> > write, we risk oom.
> 
> Userland has already a file descriptor allocated (which can fail anyway 
> because of OOM), I see no problem in increasing the size of kernel memory 
> usage by 4 bytes (if not less) just to note that the application wants to see 
> the file as truncated (1 bit) and the next write has to be atomic (2nd bit?).
> 

The exact amount of tracking is going to vary.  The reason why is that
actually doing the truncate is an O(size of the file) operation and so
you can't just flip a switch when the write or the close comes in.  You
have to run through all the metadata of the file and do something
temporary with each part that is only completed when the file IO is
actually done.

Honestly, there many different ways to solve this in the application.
Requiring high speed atomic replacement of individual file contents is a
recipe for frustration.

-chris

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 14:58 ` Chris Mason
  2011-01-07 15:01   ` Olaf van der Spek
@ 2011-01-08  1:11   ` Phillip Susi
  1 sibling, 0 replies; 47+ messages in thread
From: Phillip Susi @ 2011-01-08  1:11 UTC (permalink / raw)
  To: Chris Mason; +Cc: Olaf van der Spek, linux-btrfs

On 01/07/2011 09:58 AM, Chris Mason wrote:
> Yes and no.  We have a best effort mechanism where we try to guess that
> since you've done this truncate and the write that you want the writes
> to show up quickly.  But its a guess.

It is a pretty good guess, and one that the NT kernel has been making 
for 15 years or so.  I've been following this issue for some time and I 
still don't understand why Ted is so hostile to this and can't make it 
work right on ext4.  When you get a rename() you just need to check if 
there are outstanding journal transactions and/or dirty cache pages, and 
hang the rename() transaction on the end of those.  That way if the 
system crashes after the new file has fully hit the disk, the old file 
is gone and you only have the new one, but if it crashes before, you 
still have the old one in place.

Both the writes and the rename can be delayed in the cache to an 
arbitrary point in the future; what matters is that their order is 
preserved.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 19:29                 ` Thomas Bellman
@ 2011-01-08 14:36                   ` Olaf van der Spek
  2011-01-08 21:43                     ` Thomas Bellman
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-08 14:36 UTC (permalink / raw)
  To: Thomas Bellman; +Cc: Massimo Maggi, linux-btrfs

On Fri, Jan 7, 2011 at 8:29 PM, Thomas Bellman <bellman@nsc.liu.se> wro=
te:
> What is the visibility of the changes for other processes supposed
> to be in the meantime? =C2=A0I.e., if things happen in this order:

Should be atomic too, at close time.

> 1. Process A does fda =3D open("foo.txt", O_TRUNC|O_ATOMIC)
> 2. Process B does fdb =3D open("foo.txt", O_RDONLY)
> 3. B does read(fdb, buf, 4096)
> 4. A does write(fda, "NEW DATA\n", 9)
> 5. Process C comes in and does fdc =3D open("foo.txt", O_RDONLY)
> 6. C does read(fdc, buf, 4096)
> 7. A calls close(fda)
>
> Does B see an empty file, or does it see the old contents of
> the file?

Old file, otherwise A wouldn't be atomic.

> Does C see "NEW DATA\n", or does it see the old
> contents of the file, or perhaps an empty file?

Old file again, as the 'transaction' isn't finished until close.

--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-07 19:29                 ` Chris Mason
@ 2011-01-08 14:40                   ` Olaf van der Spek
  2011-01-26 18:30                     ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-08 14:40 UTC (permalink / raw)
  To: Chris Mason; +Cc: Hubert Kario, linux-btrfs

On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
> The exact amount of tracking is going to vary. =C2=A0The reason why i=
s that
> actually doing the truncate is an O(size of the file) operation and s=
o
> you can't just flip a switch when the write or the close comes in. =C2=
=A0You
> have to run through all the metadata of the file and do something
> temporary with each part that is only completed when the file IO is
> actually done.

That's true. Maybe the proper way, via O_ATOMIC, is better.

> Honestly, there many different ways to solve this in the application.
> Requiring high speed atomic replacement of individual file contents i=
s a
> recipe for frustration.

Did you see message of Massimo? That'd be the ideal way from an app
point of view.
Not solving this properly in the FS moves the problem to userspace
where it's even harder to solve and is not as performant.

Replacing file data is a common operation that IMO the FS should
support in a safe way.
--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-08 14:36                   ` Olaf van der Spek
@ 2011-01-08 21:43                     ` Thomas Bellman
  2011-01-09 15:16                       ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Bellman @ 2011-01-08 21:43 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Massimo Maggi, linux-btrfs

Olaf van der Spek wrote:

> On Fri, Jan 7, 2011 at 8:29 PM, Thomas Bellman <bellman@nsc.liu.se> wrote:
>> What is the visibility of the changes for other processes supposed
>> to be in the meantime?  I.e., if things happen in this order:
> 
> Should be atomic too, at close time.
> 
>> 1. Process A does fda = open("foo.txt", O_TRUNC|O_ATOMIC)
>> 2. Process B does fdb = open("foo.txt", O_RDONLY)
>> 3. B does read(fdb, buf, 4096)
>> 4. A does write(fda, "NEW DATA\n", 9)
>> 5. Process C comes in and does fdc = open("foo.txt", O_RDONLY)
>> 6. C does read(fdc, buf, 4096)
>> 7. A calls close(fda)
>>
>> Does B see an empty file, or does it see the old contents of
>> the file?
> 
> Old file, otherwise A wouldn't be atomic.
> 
>> Does C see "NEW DATA\n", or does it see the old
>> contents of the file, or perhaps an empty file?
> 
> Old file again, as the 'transaction' isn't finished until close.

So, basically database transactions with an isolation level of
"committed read", for file operations.  That's something I have
wanted for a long time, especially if I also get a rollback()
operation, but have never heard of any Unix that implemented it.

A separate commit() operation would be better than conflating it
with close().  And as I said, we want a rollback() as well.  And
a process that terminates without committing the transaction that
it is performing, should have the transaction automatically rolled
back.

I only have a very shallow knowledge about the internals of the
Linux kernel in regards to filesystems, but I suspect that this
could be implemented almost entirely within the VFS, and not need
to touch the actual filesystems, as long as you are satisfied
with a limited amount of transaction space (what fits in RAM +
swap).

I'm looking forward to your implementation. :-)  Even though I
suspect that it would be a rather large undertaking to implement...


	/Bellman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-08 21:43                     ` Thomas Bellman
@ 2011-01-09 15:16                       ` Olaf van der Spek
  2011-01-09 18:56                         ` Thomas Bellman
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-09 15:16 UTC (permalink / raw)
  To: Thomas Bellman; +Cc: Massimo Maggi, linux-btrfs

On Sat, Jan 8, 2011 at 10:43 PM, Thomas Bellman <bellman@nsc.liu.se> wr=
ote:
> So, basically database transactions with an isolation level of
> "committed read", for file operations. =C2=A0That's something I have
> wanted for a long time, especially if I also get a rollback()
> operation, but have never heard of any Unix that implemented it.

True, that's why this feature request is here.
Note that it's (ATM) only about  single file data replace.

> A separate commit() operation would be better than conflating it
> with close(). =C2=A0And as I said, we want a rollback() as well. =C2=A0=
And
> a process that terminates without committing the transaction that
> it is performing, should have the transaction automatically rolled
> back.

What could you do between commit and close?

> I only have a very shallow knowledge about the internals of the
> Linux kernel in regards to filesystems, but I suspect that this
> could be implemented almost entirely within the VFS, and not need
> to touch the actual filesystems, as long as you are satisfied
> with a limited amount of transaction space (what fits in RAM +
> swap).
>
> I'm looking forward to your implementation. :-) =C2=A0Even though I
> suspect that it would be a rather large undertaking to implement...

I have no plans to work on an implementation.

--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-09 15:16                       ` Olaf van der Spek
@ 2011-01-09 18:56                         ` Thomas Bellman
  2011-01-09 19:06                           ` Olaf van der Spek
  2011-01-09 20:13                           ` Phillip Susi
  0 siblings, 2 replies; 47+ messages in thread
From: Thomas Bellman @ 2011-01-09 18:56 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Massimo Maggi, linux-btrfs

Olaf van der Spek wrote:

> On Sat, Jan 8, 2011 at 10:43 PM, Thomas Bellman <bellman@nsc.liu.se> wrote:
>> So, basically database transactions with an isolation level of
>> "committed read", for file operations.  That's something I have
>> wanted for a long time, especially if I also get a rollback()
>> operation, but have never heard of any Unix that implemented it.
> 
> True, that's why this feature request is here.
> Note that it's (ATM) only about  single file data replace.

That particular problem was solved with the introduction of the
rename(2) system call in 4.2BSD a bit more than a quarter of a
century ago.  There is no need to introduce another, less flexible,
API for doing the same thing.

>> A separate commit() operation would be better than conflating it
>> with close().  And as I said, we want a rollback() as well.  And
>> a process that terminates without committing the transaction that
>> it is performing, should have the transaction automatically rolled
>> back.
> 
> What could you do between commit and close?

More write() operations, of course.  Just like you can continue
with more transactions after a COMMIT WORK call without having
to close and re-open the database in SQL.


	/Bellman

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-09 18:56                         ` Thomas Bellman
@ 2011-01-09 19:06                           ` Olaf van der Spek
  2011-01-09 20:13                           ` Phillip Susi
  1 sibling, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-09 19:06 UTC (permalink / raw)
  To: Thomas Bellman; +Cc: Massimo Maggi, linux-btrfs

On Sun, Jan 9, 2011 at 7:56 PM, Thomas Bellman <bellman@nsc.liu.se> wro=
te:
>> True, that's why this feature request is here.
>> Note that it's (ATM) only about =C2=A0single file data replace.
>
> That particular problem was solved with the introduction of the
> rename(2) system call in 4.2BSD a bit more than a quarter of a
> century ago. =C2=A0There is no need to introduce another, less flexib=
le,
> API for doing the same thing.

You might want to read about the problems with that workaround.

>> What could you do between commit and close?
>
> More write() operations, of course. =C2=A0Just like you can continue
> with more transactions after a COMMIT WORK call without having
> to close and re-open the database in SQL.

The transaction is defined as beginning with open and ending with close=
=2E
--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-09 18:56                         ` Thomas Bellman
  2011-01-09 19:06                           ` Olaf van der Spek
@ 2011-01-09 20:13                           ` Phillip Susi
  1 sibling, 0 replies; 47+ messages in thread
From: Phillip Susi @ 2011-01-09 20:13 UTC (permalink / raw)
  To: Thomas Bellman; +Cc: Olaf van der Spek, Massimo Maggi, linux-btrfs

On 01/09/2011 01:56 PM, Thomas Bellman wrote:
> That particular problem was solved with the introduction of the
> rename(2) system call in 4.2BSD a bit more than a quarter of a
> century ago. There is no need to introduce another, less flexible,
> API for doing the same thing.

I'm curious if there are any BSD specifications that state that rename() 
has this behavior.  Ted Tso has been claiming that POSIX does not 
require this behavior in the face of a crash and that as a result, an 
application that relies on such behavior is broken, and needs to fsync() 
before rename().  This of course, makes replacing numerous files much 
slower, glacially so on btrfs.  There has been a great deal of 
discussion ok the dpkg mailing lists about it since plenty of people are 
upset that dpkg runs much slower these days than it used to, because it 
now calls fsync() before rename() in order to avoid breakage on ext4.

You can read more, including the rationale of why POSIX does not require 
this behavior at http://lwn.net/Articles/323607/.

I still say that preserving the order of the writes and rename is the 
only sane thing to do, whether POSIX requires it or not.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-08 14:40                   ` Olaf van der Spek
@ 2011-01-26 18:30                     ` Olaf van der Spek
  2011-01-26 19:30                       ` Chris Mason
  0 siblings, 1 reply; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-26 18:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: Hubert Kario, linux-btrfs

On Sat, Jan 8, 2011 at 3:40 PM, Olaf van der Spek <olafvdspek@gmail.com=
> wrote:
> On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
>> The exact amount of tracking is going to vary. =C2=A0The reason why =
is that
>> actually doing the truncate is an O(size of the file) operation and =
so
>> you can't just flip a switch when the write or the close comes in. =C2=
=A0You
>> have to run through all the metadata of the file and do something
>> temporary with each part that is only completed when the file IO is
>> actually done.
>
> That's true. Maybe the proper way, via O_ATOMIC, is better.
>
>> Honestly, there many different ways to solve this in the application=
=2E
>> Requiring high speed atomic replacement of individual file contents =
is a
>> recipe for frustration.
>
> Did you see message of Massimo? That'd be the ideal way from an app
> point of view.
> Not solving this properly in the FS moves the problem to userspace
> where it's even harder to solve and is not as performant.
>
> Replacing file data is a common operation that IMO the FS should
> support in a safe way.

Chris?


--=20
Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-26 18:30                     ` Olaf van der Spek
@ 2011-01-26 19:30                       ` Chris Mason
  2011-01-26 21:56                         ` Olaf van der Spek
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2011-01-26 19:30 UTC (permalink / raw)
  To: Olaf van der Spek; +Cc: Hubert Kario, linux-btrfs

Excerpts from Olaf van der Spek's message of 2011-01-26 13:30:08 -0500:
> On Sat, Jan 8, 2011 at 3:40 PM, Olaf van der Spek <olafvdspek@gmail.c=
om> wrote:
> > On Fri, Jan 7, 2011 at 8:29 PM, Chris Mason <chris.mason@oracle.com=
> wrote:
> >> The exact amount of tracking is going to vary. =C2=A0The reason wh=
y is that
> >> actually doing the truncate is an O(size of the file) operation an=
d so
> >> you can't just flip a switch when the write or the close comes in.=
 =C2=A0You
> >> have to run through all the metadata of the file and do something
> >> temporary with each part that is only completed when the file IO i=
s
> >> actually done.
> >
> > That's true. Maybe the proper way, via O_ATOMIC, is better.
> >
> >> Honestly, there many different ways to solve this in the applicati=
on.
> >> Requiring high speed atomic replacement of individual file content=
s is a
> >> recipe for frustration.
> >
> > Did you see message of Massimo? That'd be the ideal way from an app
> > point of view.
> > Not solving this properly in the FS moves the problem to userspace
> > where it's even harder to solve and is not as performant.
> >
> > Replacing file data is a common operation that IMO the FS should
> > support in a safe way.
>=20
> Chris?
>=20

My answer hasn't really changed ;)  Replacing file data is a common
operation, but it is still surprisingly complex.  Again, the truncate i=
s
O(size of the file) and it is actually impossible to do this atomically
in most filesystems.

You don't notice this because xfs/ext34/btrfs (and many others) have
code that makes sure a truncate is restarted if you crash.  So, it
appears to be atomic even though we're really just restarting the
operation.  In order to have a truncate + replacement of data operation=
,
we'd have to do a disk format change that includes both the truncate an=
d
the new data.

It would look a lot like echo data > file.new ; truncate file ; mv
file.new file, but recorded in the FS metadata.

I don't have this in the btrfs roadmap.  It would be nice but most
people use databases for things that require atomic operations.  I
think what ext4 and btrfs do today fall into the category of best
effort and least surprise, and I think it is as good as we can get
without huge performance penalties for normal use.

Now, if you want to talk about atomic replacement of file data without
changing the file size, that's much easier.  At least it's easier for
those of us with cows in our pockets.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Atomic file data replace API
  2011-01-26 19:30                       ` Chris Mason
@ 2011-01-26 21:56                         ` Olaf van der Spek
  0 siblings, 0 replies; 47+ messages in thread
From: Olaf van der Spek @ 2011-01-26 21:56 UTC (permalink / raw)
  To: Chris Mason; +Cc: Hubert Kario, linux-btrfs

On Wed, Jan 26, 2011 at 8:30 PM, Chris Mason <chris.mason@oracle.com> w=
rote:
> My answer hasn't really changed ;) =C2=A0Replacing file data is a com=
mon
> operation, but it is still surprisingly complex. =C2=A0Again, the tru=
ncate is
> O(size of the file) and it is actually impossible to do this atomical=
ly
> in most filesystems.

Unfortunately life isn't trivial. ;)
Given that it's common, it doesn't make sense to have code duplication
in lots of apps to implement the temp file rename pattern.
If it's too complex to implement in the FS (ATM), would it be possible
to implement it in a higher layer?

> You don't notice this because xfs/ext34/btrfs (and many others) have
> code that makes sure a truncate is restarted if you crash. =C2=A0So, =
it
> appears to be atomic even though we're really just restarting the
> operation. =C2=A0In order to have a truncate + replacement of data op=
eration,
> we'd have to do a disk format change that includes both the truncate =
and
> the new data.

I'm not sure why the disk format would have to change.
Conceptually, just like the temp file case, you'd write the new data
to newly allocated blocks.
After (and I guess that's the complex part) they're safely on disk,
you update the meta data, in an atomic way.

> It would look a lot like echo data > file.new ; truncate file ; mv
> file.new file, but recorded in the FS metadata.
>
> I don't have this in the btrfs roadmap. =C2=A0It would be nice but mo=
st
> people use databases for things that require atomic operations. =C2=A0=
I

Executables and files shouldn't be in a DB.

Olaf
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2011-01-26 21:56 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-27 11:51 Atomic file data replace API Olaf van der Spek
2010-12-27 13:20 ` Amir Goldstein
2010-12-27 15:53   ` Olaf van der Spek
2010-12-27 17:20     ` Amir Goldstein
2010-12-27 18:34       ` Olaf van der Spek
2010-12-28  2:59 ` Ted Ts'o
2010-12-28 17:27   ` Olaf van der Spek
2010-12-28 19:06     ` Ric Wheeler
2010-12-28 22:25       ` Olaf van der Spek
2010-12-28 22:36         ` Ric Wheeler
2010-12-28 22:58           ` Olaf van der Spek
2010-12-29  9:20             ` Amir Goldstein
2010-12-29 12:42               ` Olaf van der Spek
2010-12-29 15:30                 ` Christian Stroetmann
2010-12-29 15:35                   ` Olaf van der Spek
2010-12-29 16:30                     ` Christian Stroetmann
2010-12-29 17:12                       ` Olaf van der Spek
2010-12-29 17:15                   ` Greg Freemyer
2010-12-29 19:30                     ` Christian Stroetmann
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06 20:01 Olaf van der Spek
2011-01-07 13:55 ` Mike Fleetwood
2011-01-07 14:01   ` Olaf van der Spek
2011-01-07 14:10     ` Olaf van der Spek
2011-01-07 14:58 ` Chris Mason
2011-01-07 15:01   ` Olaf van der Spek
2011-01-07 15:05     ` Chris Mason
2011-01-07 15:08       ` Olaf van der Spek
2011-01-07 15:13         ` Chris Mason
2011-01-07 15:17           ` Olaf van der Spek
2011-01-07 16:12             ` Chris Mason
2011-01-07 16:19               ` Olaf van der Spek
2011-01-07 16:26               ` Hubert Kario
2011-01-07 19:29                 ` Chris Mason
2011-01-08 14:40                   ` Olaf van der Spek
2011-01-26 18:30                     ` Olaf van der Spek
2011-01-26 19:30                       ` Chris Mason
2011-01-26 21:56                         ` Olaf van der Spek
2011-01-07 16:32             ` Massimo Maggi
2011-01-07 16:34               ` Olaf van der Spek
2011-01-07 19:29                 ` Thomas Bellman
2011-01-08 14:36                   ` Olaf van der Spek
2011-01-08 21:43                     ` Thomas Bellman
2011-01-09 15:16                       ` Olaf van der Spek
2011-01-09 18:56                         ` Thomas Bellman
2011-01-09 19:06                           ` Olaf van der Spek
2011-01-09 20:13                           ` Phillip Susi
2011-01-08  1:11   ` Phillip Susi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.