linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Zero length files - an alternative approach?
@ 2009-03-29 10:43 Graham Murray
  2009-03-29 11:22 ` Måns Rullgård
  2009-03-29 16:49 ` Avi Kivity
  0 siblings, 2 replies; 10+ messages in thread
From: Graham Murray @ 2009-03-29 10:43 UTC (permalink / raw)
  To: linux-ext4, linux-kernel

Just a thought on the ongoing discussion of dataloss with ext4 vs ext3.

Taking the common scenario:
Read oldfile
create newfile file
write newfile data
close newfile
rename newfile to oldfile

When using this scenario, the application writer wants to ensure that
either the old or new content are present. With delayed allocation, this
can lead to zero length files. Most of the suggestions on how to address
this have involved syncing the data either before the rename or making
the rename sync the data.

What about, instead of 'bringing forward' the allocation and flushing of
the data, would it be possible to instead delay the rename until after
the blocks for newfile have been allocated and the data buffers flushed?
This would keep the performance benefits of delayed allocation etc and
also satisfy the applications developers' apparent dislike of using
fsync(). It would give better performance that syncing the data at
rename time (either using fsync() or automatically) and satisfy the
requirements that either the old or new content is present.

I am not a filesystem developer, so do not know how feasible this would
be. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 10:43 Zero length files - an alternative approach? Graham Murray
@ 2009-03-29 11:22 ` Måns Rullgård
  2009-03-29 12:02   ` Andreas T.Auer
  2009-03-30 12:41   ` Chris Mason
  2009-03-29 16:49 ` Avi Kivity
  1 sibling, 2 replies; 10+ messages in thread
From: Måns Rullgård @ 2009-03-29 11:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4

Graham Murray <graham@gmurray.org.uk> writes:

> Just a thought on the ongoing discussion of dataloss with ext4 vs ext3.
>
> Taking the common scenario:
> Read oldfile
> create newfile file
> write newfile data
> close newfile
> rename newfile to oldfile
>
> When using this scenario, the application writer wants to ensure that
> either the old or new content are present. With delayed allocation, this
> can lead to zero length files. Most of the suggestions on how to address
> this have involved syncing the data either before the rename or making
> the rename sync the data.
>
> What about, instead of 'bringing forward' the allocation and flushing of
> the data, would it be possible to instead delay the rename until after
> the blocks for newfile have been allocated and the data buffers flushed?
> This would keep the performance benefits of delayed allocation etc and
> also satisfy the applications developers' apparent dislike of using
> fsync(). It would give better performance that syncing the data at
> rename time (either using fsync() or automatically) and satisfy the
> requirements that either the old or new content is present.

Consider this scenario:

1. Create/write/close newfile
2. Rename newfile to oldfile
3. Open/read oldfile.  This must return the new contents.
4. System crash and reboot before delayed allocation/flush complete
5. Open/read oldfile.  Old contents now returned.

This rollback isn't obviously, to me at least, without problems of its
own.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 11:22 ` Måns Rullgård
@ 2009-03-29 12:02   ` Andreas T.Auer
  2009-03-29 12:10     ` Måns Rullgård
  2009-03-30 12:41   ` Chris Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas T.Auer @ 2009-03-29 12:02 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-kernel, linux-ext4



On 29.03.2009 13:22 Måns Rullgård wrote:
> Consider this scenario:
>
> 1. Create/write/close newfile
> 2. Rename newfile to oldfile
> 3. Open/read oldfile.  This must return the new contents.
> 4. System crash and reboot before delayed allocation/flush complete
> 5. Open/read oldfile.  Old contents now returned.
>
> This rollback isn't obviously, to me at least, without problems of its
> own.
>
>   
Having the old data in 5) is far better than having no data in 5).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 12:02   ` Andreas T.Auer
@ 2009-03-29 12:10     ` Måns Rullgård
  2009-03-29 13:49       ` Pavel Machek
  0 siblings, 1 reply; 10+ messages in thread
From: Måns Rullgård @ 2009-03-29 12:10 UTC (permalink / raw)
  To: Andreas T.Auer; +Cc: linux-kernel, linux-ext4

"Andreas T.Auer" <andreas.t.auer_lkml_73537@ursus.ath.cx> writes:

> On 29.03.2009 13:22 Måns Rullgård wrote:
>> Consider this scenario:
>>
>> 1. Create/write/close newfile
>> 2. Rename newfile to oldfile
>> 3. Open/read oldfile.  This must return the new contents.
>> 4. System crash and reboot before delayed allocation/flush complete
>> 5. Open/read oldfile.  Old contents now returned.
>>
>> This rollback isn't obviously, to me at least, without problems of its
>> own.
>>   
> Having the old data in 5) is far better than having no data in 5).

Of course having old data is better than no data.  However, fsync()
and similar approaches make a rollback to old data after new data has
been visible impossible or far less likely than the suggested one.
I'm not saying it's necessarily a problem, only that it is a
difference that should be taken into account.

-- 
Måns Rullgård
mans@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 12:10     ` Måns Rullgård
@ 2009-03-29 13:49       ` Pavel Machek
  2009-03-29 20:16         ` David Newall
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Machek @ 2009-03-29 13:49 UTC (permalink / raw)
  To: M?ns Rullg?rd; +Cc: Andreas T.Auer, linux-kernel, linux-ext4

On Sun 2009-03-29 13:10:23, M?ns Rullg?rd wrote:
> "Andreas T.Auer" <andreas.t.auer_lkml_73537@ursus.ath.cx> writes:
> 
> > On 29.03.2009 13:22 M?ns Rullg?rd wrote:
> >> Consider this scenario:
> >>
> >> 1. Create/write/close newfile
> >> 2. Rename newfile to oldfile
> >> 3. Open/read oldfile.  This must return the new contents.
> >> 4. System crash and reboot before delayed allocation/flush complete
> >> 5. Open/read oldfile.  Old contents now returned.
> >>
> >> This rollback isn't obviously, to me at least, without problems of its
> >> own.
> >>   
> > Having the old data in 5) is far better than having no data in 5).
> 
> Of course having old data is better than no data.  However, fsync()
> and similar approaches make a rollback to old data after new data has
> been visible impossible or far less likely than the suggested one.

Untrue. Unless you fsync after rename, you can get olddata.

fsync() is easy. But some people _want_ to have either newdata _or_
olddata, but don't care which one, and would prefer to avoid
fsync. That's where replace() should help...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 10:43 Zero length files - an alternative approach? Graham Murray
  2009-03-29 11:22 ` Måns Rullgård
@ 2009-03-29 16:49 ` Avi Kivity
  1 sibling, 0 replies; 10+ messages in thread
From: Avi Kivity @ 2009-03-29 16:49 UTC (permalink / raw)
  To: Graham Murray; +Cc: linux-ext4, linux-kernel

Graham Murray wrote:
> Just a thought on the ongoing discussion of dataloss with ext4 vs ext3.
>
> Taking the common scenario:
> Read oldfile
> create newfile file
> write newfile data
> close newfile
> rename newfile to oldfile
>
> When using this scenario, the application writer wants to ensure that
> either the old or new content are present. With delayed allocation, this
> can lead to zero length files. Most of the suggestions on how to address
> this have involved syncing the data either before the rename or making
> the rename sync the data.
>
> What about, instead of 'bringing forward' the allocation and flushing of
> the data, would it be possible to instead delay the rename until after
> the blocks for newfile have been allocated and the data buffers flushed?
> This would keep the performance benefits of delayed allocation etc and
> also satisfy the applications developers' apparent dislike of using
> fsync(). It would give better performance that syncing the data at
> rename time (either using fsync() or automatically) and satisfy the
> requirements that either the old or new content is present.
>
> I am not a filesystem developer, so do not know how feasible this would
> be. 
>   

This has been suggested, I believe.  In filesystem terms, it means 
inserting a barrier before the rename operation, meaning that any write 
operations needed to carry out the rename must not take place until all 
write operations from the previous calls have completed.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 13:49       ` Pavel Machek
@ 2009-03-29 20:16         ` David Newall
  0 siblings, 0 replies; 10+ messages in thread
From: David Newall @ 2009-03-29 20:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4

Pavel Machek wrote:
> fsync() is easy. But some people _want_ to have either newdata _or_
> olddata, but don't care which one, and would prefer to avoid
> fsync. That's where replace() should help...

Most people, I wager, care more about their code being portable than
they do about leaping through a Linux-specific hoop.  They're not going
to use replace; not ever; that's what link/unlink is for.

If you think it's reasonable to modify every instance in applications
where a sudden crash would cause data loss, why not make a mount-time
flag that does all of that in FS; and for the other 99% of users, it
doesn't, but runs faster?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-29 11:22 ` Måns Rullgård
  2009-03-29 12:02   ` Andreas T.Auer
@ 2009-03-30 12:41   ` Chris Mason
  2009-03-30 14:06     ` Theodore Tso
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Mason @ 2009-03-30 12:41 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-kernel, linux-ext4

On Sun, 2009-03-29 at 12:22 +0100, Måns Rullgård wrote:
> Graham Murray <graham@gmurray.org.uk> writes:
> 
> > Just a thought on the ongoing discussion of dataloss with ext4 vs ext3.
> >
> > Taking the common scenario:
> > Read oldfile
> > create newfile file
> > write newfile data
> > close newfile
> > rename newfile to oldfile
> >
> > When using this scenario, the application writer wants to ensure that
> > either the old or new content are present. With delayed allocation, this
> > can lead to zero length files. Most of the suggestions on how to address
> > this have involved syncing the data either before the rename or making
> > the rename sync the data.
> >
> > What about, instead of 'bringing forward' the allocation and flushing of
> > the data, would it be possible to instead delay the rename until after
> > the blocks for newfile have been allocated and the data buffers flushed?
> > This would keep the performance benefits of delayed allocation etc and
> > also satisfy the applications developers' apparent dislike of using
> > fsync(). It would give better performance that syncing the data at
> > rename time (either using fsync() or automatically) and satisfy the
> > requirements that either the old or new content is present.
> 
> Consider this scenario:
> 
> 1. Create/write/close newfile
> 2. Rename newfile to oldfile

2a. create oldfile again
2b. fsync oldfile

> 3. Open/read oldfile.  This must return the new contents.
> 4. System crash and reboot before delayed allocation/flush complete
> 5. Open/read oldfile.  Old contents now returned.
> 

What happens to the new generation of oldfile?  We could insert
dependency tracking so that we know the fsync of oldfile is supposed to
also fsync the rename'd new file.  But then picture a loop of operations
doing renames and creating files in the place of the old one...that
dependency tracking gets ugly in a hurry.

Databases know how to do all of this, but filesystems don't implement
most of the database transactional features.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
  2009-03-30 12:41   ` Chris Mason
@ 2009-03-30 14:06     ` Theodore Tso
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Tso @ 2009-03-30 14:06 UTC (permalink / raw)
  To: Chris Mason; +Cc: Måns Rullgård, linux-kernel, linux-ext4

On Mon, Mar 30, 2009 at 08:41:26AM -0400, Chris Mason wrote:
> > 
> > Consider this scenario:
> > 
> > 1. Create/write/close newfile
> > 2. Rename newfile to oldfile
> 
> 2a. create oldfile again
> 2b. fsync oldfile
> 
> > 3. Open/read oldfile.  This must return the new contents.
> > 4. System crash and reboot before delayed allocation/flush complete
> > 5. Open/read oldfile.  Old contents now returned.
> > 
> 
> What happens to the new generation of oldfile?  We could insert
> dependency tracking so that we know the fsync of oldfile is supposed to
> also fsync the rename'd new file.  But then picture a loop of operations
> doing renames and creating files in the place of the old one...that
> dependency tracking gets ugly in a hurry.

If there are any calls to link(2) to create hard links to oldfile or
newfile intermingled in this sequence, life also gets very
entertaining.

> Databases know how to do all of this, but filesystems don't implement
> most of the database transactional features.

Yep, we'd have to implement a rollback log to get this right, which
would also impact performance.  My guess is that just aggressively
forcing out the data write before the rename() is going to cost less
in performance, and is certainly much easier to implement.

   		       		      	     - Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Zero length files - an alternative approach?
       [not found]   ` <clp6o-91-17@gated-at.bofh.it>
@ 2009-03-30 21:10     ` Bodo Eggert
  0 siblings, 0 replies; 10+ messages in thread
From: Bodo Eggert @ 2009-03-30 21:10 UTC (permalink / raw)
  To: Chris Mason, Måns Rullgård, linux-kernel, linux-ext4

Chris Mason <chris.mason@oracle.com> wrote:
> On Sun, 2009-03-29 at 12:22 +0100, Måns Rullgård wrote:

>> Consider this scenario:
>> 
>> 1. Create/write/close newfile
>> 2. Rename newfile to oldfile
> 
> 2a. create oldfile again
> 2b. fsync oldfile
> 
>> 3. Open/read oldfile.  This must return the new contents.
>> 4. System crash and reboot before delayed allocation/flush complete
>> 5. Open/read oldfile.  Old contents now returned.
>> 
> 
> What happens to the new generation of oldfile?  We could insert
> dependency tracking so that we know the fsync of oldfile is supposed to
> also fsync the rename'd new file.

If rename() is BEFORE create(oldfile) and if create(oldfile) is committed,
oldfile will be the newly created file. If the sync() is interrupted by the
crash, any intermediate state may appear. If the system crashes before
create(), either the old oldfile or newfile should be visible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-30 21:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-29 10:43 Zero length files - an alternative approach? Graham Murray
2009-03-29 11:22 ` Måns Rullgård
2009-03-29 12:02   ` Andreas T.Auer
2009-03-29 12:10     ` Måns Rullgård
2009-03-29 13:49       ` Pavel Machek
2009-03-29 20:16         ` David Newall
2009-03-30 12:41   ` Chris Mason
2009-03-30 14:06     ` Theodore Tso
2009-03-29 16:49 ` Avi Kivity
     [not found] <cl0KI-3zZ-3@gated-at.bofh.it>
     [not found] ` <cl1oA-4El-9@gated-at.bofh.it>
     [not found]   ` <clp6o-91-17@gated-at.bofh.it>
2009-03-30 21:10     ` Bodo Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).