precise characterization of ext3 atomicity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* precise characterization of ext3 atomicity
@ 2003-09-04 14:20 Hans Reiser
  2003-09-04 15:55 ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 14:20 UTC (permalink / raw)
  To: ReiserFS, Linux Kernel Mailing List

Is it correct to say of ext3 that it guarantees and only guarantees 
atomicity of writes that do not cross page boundaries?

I am trying to define the difference between "Atomic Reiser4" and ext3, 
as it seems to be a frequently asked question, and I am thinking of 
saying something like:

    Reiser4 allows you to define a set of up to A separate arbitrary 
filesystem operations (where A by default is not allowed to exceed 64) 
that are to be committed to disk atomically.  Every individual 
filesystem operation is atomic without the need to specify it.

    By contrast, ext3 only guarantees the atomicity of a single write 
that does not span a page boundary, and it guarantees that its internal 
metadata will not be corrupted even if your applications data is 
corrupted after the crash.

-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 14:20 precise characterization of ext3 atomicity Hans Reiser
@ 2003-09-04 15:55 ` Andrew Morton
  2003-09-04 15:59   ` Hans Reiser
  2003-09-04 20:16   ` Daniel Phillips
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2003-09-04 15:55 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list, linux-kernel

Hans Reiser <reiser@namesys.com> wrote:
>
> Is it correct to say of ext3 that it guarantees and only guarantees 
> atomicity of writes that do not cross page boundaries?

Yes.

>     By contrast, ext3 only guarantees the atomicity of a single write 
> that does not span a page boundary, and it guarantees that its internal 
> metadata will not be corrupted even if your applications data is 
> corrupted after the crash.

Not sure that I understand this.  In data=writeback mode, metadata
integrity is preserved but data writes may be lost.  In data=journal and
data=ordered modes the data and the metadata which refers to it are always
in sync on-disk.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 15:55 ` Andrew Morton
@ 2003-09-04 15:59   ` Hans Reiser
  2003-09-04 16:12     ` Andrew Morton
  2003-09-04 20:16   ` Daniel Phillips
  1 sibling, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 15:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: reiserfs-list, linux-kernel

Andrew Morton wrote:

>Hans Reiser <reiser@namesys.com> wrote:
>  
>
>>Is it correct to say of ext3 that it guarantees and only guarantees 
>>atomicity of writes that do not cross page boundaries?
>>    
>>
>
>Yes.
>
>  
>
>>    By contrast, ext3 only guarantees the atomicity of a single write 
>>that does not span a page boundary, and it guarantees that its internal 
>>metadata will not be corrupted even if your applications data is 
>>corrupted after the crash.
>>    
>>
>
>Not sure that I understand this.  In data=writeback mode, metadata
>integrity is preserved but data writes may be lost.  In data=journal and
>data=ordered modes the data and the metadata which refers to it are always
>in sync on-disk.
>
>
>
>  
>
Perhaps the following is correct?

    By contrast, ext3 in data=journal and data=ordered modes only guarantees the atomicity of a single write 
that does not span a page boundary, and it guarantees that its internal 
metadata will not be corrupted even if your application's data is 
corrupted after the crash (due to the application spreading what should be committed atomically across more than one block).





-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 15:59   ` Hans Reiser
@ 2003-09-04 16:12     ` Andrew Morton
  2003-09-04 16:25       ` Hans Reiser
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2003-09-04 16:12 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list, linux-kernel

Hans Reiser <reiser@namesys.com> wrote:
>
>  Perhaps the following is correct?
> 
>      By contrast, ext3 in data=journal and data=ordered modes only guarantees the atomicity of a single write 
>  that does not span a page boundary, and it guarantees that its internal 
>  metadata will not be corrupted even if your application's data is 
>  corrupted after the crash (due to the application spreading what should be committed atomically across more than one block).

Correct != comprehensible ;)

"In all journalling modes ext3 guarantees metadata consistency after a
 crash.  In its data=journal and data=ordered modes ext3 also guarantees that
 user data is consistent with metadata after a crash.

 However ext3 does not provide user data atomicity guarantees beyond the
 scope of a single filesystem disk block (usually 4 kilobytes).  If a
 single write() spans two disk blocks it is possible that a crash partway
 through the write will result in only one of those blocks appearing in the
 file after recovery"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 16:12     ` Andrew Morton
@ 2003-09-04 16:25       ` Hans Reiser
  2003-09-04 18:15         ` Mike Fedyk
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 16:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: reiserfs-list, linux-kernel

Andrew Morton wrote:

>Hans Reiser <reiser@namesys.com> wrote:
>  
>
>> Perhaps the following is correct?
>>
>>     By contrast, ext3 in data=journal and data=ordered modes only guarantees the atomicity of a single write 
>> that does not span a page boundary, and it guarantees that its internal 
>> metadata will not be corrupted even if your application's data is 
>> corrupted after the crash (due to the application spreading what should be committed atomically across more than one block).
>>    
>>
>
>Correct != comprehensible ;)
>
Touche!

Let's try this then:

Ext3 guarantees that its metadata will be comitted sufficiently 
atomically that after a crash it will be consistent with itself.

In data=journal and data=ordered modes ext3 also guarantees that the metadata will be committed atomically with the data they point to.  However ext3 does not provide user data atomicity guarantees beyond the scope of a single filesystem disk block (usually 4 kilobytes).  If a single write() spans two disk blocks it is possible that a crash partway through the write will result in only one of those blocks appearing in the file after recovery.

-- 
Hans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 16:25       ` Hans Reiser
@ 2003-09-04 18:15         ` Mike Fedyk
  2003-09-04 16:05           ` Antonio Vargas
  2003-09-04 18:37           ` Hans Reiser
  0 siblings, 2 replies; 22+ messages in thread
From: Mike Fedyk @ 2003-09-04 18:15 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Andrew Morton, reiserfs-list, linux-kernel

On Thu, Sep 04, 2003 at 08:25:18PM +0400, Hans Reiser wrote:
> In data=journal and data=ordered modes ext3 also guarantees that the 
> metadata will be committed atomically with the data they point to.  However 
> ext3 does not provide user data atomicity guarantees beyond the scope of a 
> single filesystem disk block (usually 4 kilobytes).  If a single write() 
> spans two disk blocks it is possible that a crash partway through the write 
> will result in only one of those blocks appearing in the file after 
> recovery.

And how does reiser4 do this without changing the userspace apps?

Most files are written with several write() calls, so even if each call is
atomic, your entire file will not be there.

Also, ext3 could claim the same atomicity if it only updated meta-data on
write() call boundaries, instead of block boundaries.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 18:15         ` Mike Fedyk
@ 2003-09-04 16:05           ` Antonio Vargas
  2003-09-04 18:37           ` Hans Reiser
  1 sibling, 0 replies; 22+ messages in thread
From: Antonio Vargas @ 2003-09-04 16:05 UTC (permalink / raw)
  To: Hans Reiser, Andrew Morton, reiserfs-list, linux-kernel

On Thu, Sep 04, 2003 at 11:15:40AM -0700, Mike Fedyk wrote:
> On Thu, Sep 04, 2003 at 08:25:18PM +0400, Hans Reiser wrote:
> > In data=journal and data=ordered modes ext3 also guarantees that the 
> > metadata will be committed atomically with the data they point to.  However 
> > ext3 does not provide user data atomicity guarantees beyond the scope of a 
> > single filesystem disk block (usually 4 kilobytes).  If a single write() 
> > spans two disk blocks it is possible that a crash partway through the write 
> > will result in only one of those blocks appearing in the file after 
> > recovery.
> 
> And how does reiser4 do this without changing the userspace apps?

It won't.


[ snip ] 
> Most files are written with several write() calls, so even if each call is
> atomic, your entire file will not be there.
> 
> Also, ext3 could claim the same atomicity if it only updated meta-data on
> write() call boundaries, instead of block boundaries.

There will be a new API to support userspace-controlled
multifile transactions.

At first stab, multifile transactions will be used internally to
implement extended attributes.

Now, another question is.. will the transaction API support commit() and
rollback()? *grin*

(wonder about coding a simple transactional database with
 shell scripts ;)

-- 
winden/network

1. Dado un programa, siempre tiene al menos un fallo.
2. Dadas varias lineas de codigo, siempre se pueden acortar a menos lineas.
3. Por induccion, todos los programas se pueden
   reducir a una linea que no funciona.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 18:15         ` Mike Fedyk
  2003-09-04 16:05           ` Antonio Vargas
@ 2003-09-04 18:37           ` Hans Reiser
  2003-09-04 19:12             ` Mike Fedyk
  2003-09-04 19:28             ` Andreas Dilger
  1 sibling, 2 replies; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 18:37 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Andrew Morton, reiserfs-list, linux-kernel

Mike Fedyk wrote:

>On Thu, Sep 04, 2003 at 08:25:18PM +0400, Hans Reiser wrote:
>  
>
>>In data=journal and data=ordered modes ext3 also guarantees that the 
>>metadata will be committed atomically with the data they point to.  However 
>>ext3 does not provide user data atomicity guarantees beyond the scope of a 
>>single filesystem disk block (usually 4 kilobytes).  If a single write() 
>>spans two disk blocks it is possible that a crash partway through the write 
>>will result in only one of those blocks appearing in the file after 
>>recovery.
>>    
>>
>
>And how does reiser4 do this without changing the userspace apps?
>
We don't.  We just make the hovercraft, we don't force you to go over 
the water.....

>
>Most files are written with several write() calls, so even if each call is
>atomic, your entire file will not be there.
>
>Also, ext3 could claim the same atomicity if it only updated meta-data on
>write() call boundaries, instead of block boundaries.
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 18:37           ` Hans Reiser
@ 2003-09-04 19:12             ` Mike Fedyk
  2003-09-04 21:03               ` Hans Reiser
  2003-09-04 19:28             ` Andreas Dilger
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Fedyk @ 2003-09-04 19:12 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Andrew Morton, reiserfs-list, linux-kernel

On Thu, Sep 04, 2003 at 10:37:10PM +0400, Hans Reiser wrote:
> Mike Fedyk wrote:
> 
> >On Thu, Sep 04, 2003 at 08:25:18PM +0400, Hans Reiser wrote:
> > 
> >
> >>In data=journal and data=ordered modes ext3 also guarantees that the 
> >>metadata will be committed atomically with the data they point to.  
> >>However ext3 does not provide user data atomicity guarantees beyond the 
> >>scope of a single filesystem disk block (usually 4 kilobytes).  If a 
> >>single write() spans two disk blocks it is possible that a crash partway 
> >>through the write will result in only one of those blocks appearing in 
> >>the file after recovery.
> >>   
> >>
> >
> >And how does reiser4 do this without changing the userspace apps?
> >
> We don't.  We just make the hovercraft, we don't force you to go over 
> the water.....

So by default with no user space modifications, reiser4 will be atomic for
each write() call, and ext3 will if it aligns withing a single page.

Is that correct?

Then you can go on to specify that you can have larger transactions if you
make some changes to the userspace apps.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 19:12             ` Mike Fedyk
@ 2003-09-04 21:03               ` Hans Reiser
  0 siblings, 0 replies; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 21:03 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Andrew Morton, reiserfs-list, linux-kernel

Mike Fedyk wrote:

>On Thu, Sep 04, 2003 at 10:37:10PM +0400, Hans Reiser wrote:
>  
>
>>Mike Fedyk wrote:
>>
>>    
>>
>>>On Thu, Sep 04, 2003 at 08:25:18PM +0400, Hans Reiser wrote:
>>>
>>>
>>>      
>>>
>>>>In data=journal and data=ordered modes ext3 also guarantees that the 
>>>>metadata will be committed atomically with the data they point to.  
>>>>However ext3 does not provide user data atomicity guarantees beyond the 
>>>>scope of a single filesystem disk block (usually 4 kilobytes).  If a 
>>>>single write() spans two disk blocks it is possible that a crash partway 
>>>>through the write will result in only one of those blocks appearing in 
>>>>the file after recovery.
>>>>  
>>>>
>>>>        
>>>>
>>>And how does reiser4 do this without changing the userspace apps?
>>>
>>>      
>>>
>>We don't.  We just make the hovercraft, we don't force you to go over 
>>the water.....
>>    
>>
>
>So by default with no user space modifications, reiser4 will be atomic for
>each write() call, and ext3 will if it aligns withing a single page.
>
>Is that correct?
>
Yes.

>
>Then you can go on to specify that you can have larger transactions if you
>make some changes to the userspace apps.
>
>
>  
>
or you are a programmer who writes code....;-)  It's not that hard to 
write code....;-)

-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 18:37           ` Hans Reiser
  2003-09-04 19:12             ` Mike Fedyk
@ 2003-09-04 19:28             ` Andreas Dilger
  2003-09-04 21:32               ` Hans Reiser
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Dilger @ 2003-09-04 19:28 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Mike Fedyk, Andrew Morton, reiserfs-list, linux-kernel

On Sep 04, 2003  22:37 +0400, Hans Reiser wrote:
> Mike Fedyk wrote:
> >And how does reiser4 do this [export atomic ops to userspace]
> >without changing the userspace apps?
>
> We don't.  We just make the hovercraft, we don't force you to go over 
> the water.....

It is possible to do the same with ext3, namely exporting journal_start()
and journal_stop() (or some interface to them) to userspace so the application
can start a transaction for multiple operations.  We had discussed this in
the past, but decided not to do so because user applications can screw up in
so many ways, and if an application uses these interfaces it is possible to
deadlock the entire filesystem if the application isn't well behaved.

If the app doesn't eventually say "end the transaction", the filesystem might
wait indefinitely.  You could start adding more plumbing like "if the file
is closed (maybe because the process crashed), cancel the transaction",
and "if the process doesn't complete the transaction in time, cancel the
transaction", etc.  How do you guarantee in advance that the application
will be able to complete all of the operations it needs (i.e. if it runs
out of space in the filesystem or something)?

I suppose at worst, the application doesn't get its multi-op atomicity
guarantee, but I'm guessing that apps which use this interface depend on
it working properly or they wouldn't be using it.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 19:28             ` Andreas Dilger
@ 2003-09-04 21:32               ` Hans Reiser
  2003-09-04 22:03                 ` Andreas Dilger
  2003-09-09 13:09                 ` Pavel Machek
  0 siblings, 2 replies; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 21:32 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Mike Fedyk, Andrew Morton, reiserfs-list, linux-kernel

Andreas Dilger wrote:

>On Sep 04, 2003  22:37 +0400, Hans Reiser wrote:
>  
>
>>Mike Fedyk wrote:
>>    
>>
>>>And how does reiser4 do this [export atomic ops to userspace]
>>>without changing the userspace apps?
>>>      
>>>
>>We don't.  We just make the hovercraft, we don't force you to go over 
>>the water.....
>>    
>>
>
>It is possible to do the same with ext3, namely exporting journal_start()
>and journal_stop() (or some interface to them) to userspace so the application
>can start a transaction for multiple operations.  We had discussed this in
>the past, but decided not to do so because user applications can screw up in
>so many ways, and if an application uses these interfaces it is possible to
>deadlock the entire filesystem if the application isn't well behaved.
>
Yup.  That's why we confine it to a (finite #defined number) set of 
operations within one sys_reiser4 call.  At some point we will allow 
trusted user space processes to span multiple system calls (mail server 
applicances, database appliances, etc., might find this useful).  You 
might consider supporting sys_reiser4 at some point.

>
>If the app doesn't eventually say "end the transaction", the filesystem might
>wait indefinitely.  You could start adding more plumbing like "if the file
>is closed (maybe because the process crashed), cancel the transaction",
>and "if the process doesn't complete the transaction in time, cancel the
>transaction", etc.  How do you guarantee in advance that the application
>will be able to complete all of the operations it needs (i.e. if it runs
>out of space in the filesystem or something)?
>
We will export our space reservation infrastructure code also.  We are 
still thinking about the right API for that.  At first I had some idea 
that we should calculate for the user how much space would be needed, 
but I am getting lazier as we get closer to actually doing it, and I am 
thinking we can add the helpful but complex in some cases 
estimate_sizeof() functions later, and for now just let the user grab 
space, and then if they exceed it return an error, and if they both 
exceed it and run out of disk space return a nastier error that tells 
them to go cope with their mistake.  Now I think it will be something 
that takes a 64 bit int that is the number of blocks to grab, compares 
it to their quota if any, and causes the sys_reiser4 to do nothing and 
error nicely if it can't get it.

When coding sometimes you have to be careful not to let the complex  
needs of the 20% prevent you from getting something that the 80% need 
and would  be happy with to market.  (Learned this one from Dave Hitz of 
NetApp.)

There's a lot of applications that have very simple needs in regards to 
atomicity, like write 3 things to 3 files as one atom.  If we can 
address that, then later people can do their PhDs on the needs of the 
complex 20%.... and hopefully send some nice patches.....

>
>I suppose at worst, the application doesn't get its multi-op atomicity
>guarantee, but I'm guessing that apps which use this interface depend on
>it working properly or they wouldn't be using it.
>
>Cheers, Andreas
>--
>Andreas Dilger
>http://sourceforge.net/projects/ext2resize/
>http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 21:32               ` Hans Reiser
@ 2003-09-04 22:03                 ` Andreas Dilger
  2003-09-05 13:47                   ` Chris Mason
  2003-09-09 13:09                 ` Pavel Machek
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Dilger @ 2003-09-04 22:03 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Mike Fedyk, Andrew Morton, reiserfs-list, linux-kernel

On Sep 05, 2003  01:32 +0400, Hans Reiser wrote:
> Andreas Dilger wrote:
> >It is possible to do the same with ext3, namely exporting journal_start()
> >and journal_stop() (or some interface to them) to userspace so the application
> >can start a transaction for multiple operations.  We had discussed this in
> >the past, but decided not to do so because user applications can screw up in
> >so many ways, and if an application uses these interfaces it is possible to
> >deadlock the entire filesystem if the application isn't well behaved.
>
> That's why we confine it to a (finite #defined number) set of 
> operations within one sys_reiser4 call.  At some point we will allow 
> trusted user space processes to span multiple system calls (mail server 
> applicances, database appliances, etc., might find this useful).  You 
> might consider supporting sys_reiser4 at some point.

Ah, OK.  If you are confining the atom to a single syscall, then this is
a much easier problem to solve, assuming sys_reiser4() has a sufficiently
rich interface to express what people want to do.  It avoids all of the
potential problems that could arise if you want to keep a transaction
open over multiple syscalls.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 22:03                 ` Andreas Dilger
@ 2003-09-05 13:47                   ` Chris Mason
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Mason @ 2003-09-05 13:47 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Hans Reiser, Mike Fedyk, Andrew Morton, reiserfs-list,
	linux-kernel

On Thu, 2003-09-04 at 18:03, Andreas Dilger wrote:
> On Sep 05, 2003  01:32 +0400, Hans Reiser wrote:
> > Andreas Dilger wrote:
> > >It is possible to do the same with ext3, namely exporting journal_start()
> > >and journal_stop() (or some interface to them) to userspace so the application
> > >can start a transaction for multiple operations.  We had discussed this in
> > >the past, but decided not to do so because user applications can screw up in
> > >so many ways, and if an application uses these interfaces it is possible to
> > >deadlock the entire filesystem if the application isn't well behaved.
> >
> > That's why we confine it to a (finite #defined number) set of 
> > operations within one sys_reiser4 call.  At some point we will allow 
> > trusted user space processes to span multiple system calls (mail server 
> > applicances, database appliances, etc., might find this useful).  You 
> > might consider supporting sys_reiser4 at some point.

Please rename sys_reiser4 if you want it to be a generic use syscall ;-)

-chris



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 21:32               ` Hans Reiser
  2003-09-04 22:03                 ` Andreas Dilger
@ 2003-09-09 13:09                 ` Pavel Machek
  2003-09-09 19:21                   ` Gábor Lénárt
  1 sibling, 1 reply; 22+ messages in thread
From: Pavel Machek @ 2003-09-09 13:09 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Andreas Dilger, Mike Fedyk, Andrew Morton, reiserfs-list,
	linux-kernel

Hi!

> Yup.  That's why we confine it to a (finite #defined number) set of 
> operations within one sys_reiser4 call.  At some point we will allow 
> trusted user space processes to span multiple system calls (mail 
> server applicances, database appliances, etc., might find this 
> useful).  You might consider supporting sys_reiser4 at some point.


Well, if you want that API to be widely usable, you should invent
better name than sys_reiser4 :-).
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-09 13:09                 ` Pavel Machek
@ 2003-09-09 19:21                   ` Gábor Lénárt
  2003-09-09 19:43                     ` Mike Fedyk
  0 siblings, 1 reply; 22+ messages in thread
From: Gábor Lénárt @ 2003-09-09 19:21 UTC (permalink / raw)
  To: linux-kernel

On Tue, Sep 09, 2003 at 03:09:02PM +0200, Pavel Machek wrote:
> > Yup.  That's why we confine it to a (finite #defined number) set of 
> > operations within one sys_reiser4 call.  At some point we will allow 
> > trusted user space processes to span multiple system calls (mail 
> > server applicances, database appliances, etc., might find this 
> > useful).  You might consider supporting sys_reiser4 at some point.
> 
> 
> Well, if you want that API to be widely usable, you should invent
> better name than sys_reiser4 :-).

Like ActiveFSControll or such? ;-)

- Gábor (larta'H)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-09 19:21                   ` Gábor Lénárt
@ 2003-09-09 19:43                     ` Mike Fedyk
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Fedyk @ 2003-09-09 19:43 UTC (permalink / raw)
  To: G?bor L?n?rt; +Cc: linux-kernel

On Tue, Sep 09, 2003 at 09:21:07PM +0200, G?bor L?n?rt wrote:
> On Tue, Sep 09, 2003 at 03:09:02PM +0200, Pavel Machek wrote:
> > > Yup.  That's why we confine it to a (finite #defined number) set of 
> > > operations within one sys_reiser4 call.  At some point we will allow 
> > > trusted user space processes to span multiple system calls (mail 
> > > server applicances, database appliances, etc., might find this 
> > > useful).  You might consider supporting sys_reiser4 at some point.
> > 
> > 
> > Well, if you want that API to be widely usable, you should invent
> > better name than sys_reiser4 :-).
> 
> Like ActiveFSControll or such? ;-)

How about sys_group_journal_ops?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 15:55 ` Andrew Morton
  2003-09-04 15:59   ` Hans Reiser
@ 2003-09-04 20:16   ` Daniel Phillips
  2003-09-04 20:10     ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Phillips @ 2003-09-04 20:16 UTC (permalink / raw)
  To: Andrew Morton, Hans Reiser; +Cc: reiserfs-list, linux-kernel

On Thursday 04 September 2003 17:55, Andrew Morton wrote:
> Hans Reiser <reiser@namesys.com> wrote:
> > Is it correct to say of ext3 that it guarantees and only guarantees
> > atomicity of writes that do not cross page boundaries?
>
> Yes.

Is that just happenstance, or does Posix or similar mandate it?

Regards,

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 20:16   ` Daniel Phillips
@ 2003-09-04 20:10     ` Andrew Morton
  2003-09-04 21:08       ` Daniel Phillips
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2003-09-04 20:10 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: reiser, reiserfs-list, linux-kernel

Daniel Phillips <phillips@arcor.de> wrote:
>
> On Thursday 04 September 2003 17:55, Andrew Morton wrote:
> > Hans Reiser <reiser@namesys.com> wrote:
> > > Is it correct to say of ext3 that it guarantees and only guarantees
> > > atomicity of writes that do not cross page boundaries?
> >
> > Yes.
> 
> Is that just happenstance, or does Posix or similar mandate it?

Happenstance.

It's semi-trivial to do this in ext3.  You'd open the file with O_ATOMIC
and a write() would either be completely atomic or would return -EFOO
without having written anything.

The thing which prevents this is the ranking order between journal_start()
and lock_page().

It's not trivial but also not too hard to change things so that
journal_start() can rank outside lock_page() - this would also offer some
CPU savings.

Can't say that I'm terribly motivated about the feature though.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 20:10     ` Andrew Morton
@ 2003-09-04 21:08       ` Daniel Phillips
  2003-09-04 21:39         ` Hans Reiser
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Phillips @ 2003-09-04 21:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: reiser, reiserfs-list, linux-kernel

On Thursday 04 September 2003 22:10, Andrew Morton wrote:
> Daniel Phillips <phillips@arcor.de> wrote:
> > On Thursday 04 September 2003 17:55, Andrew Morton wrote:
> > > Hans Reiser <reiser@namesys.com> wrote:
> > > > Is it correct to say of ext3 that it guarantees and only guarantees
> > > > atomicity of writes that do not cross page boundaries?
> > >
> > > Yes.
> >
> > Is that just happenstance, or does Posix or similar mandate it?
>
> Happenstance.
>
> It's semi-trivial to do this in ext3.  You'd open the file with O_ATOMIC
> and a write() would either be completely atomic or would return -EFOO
> without having written anything.
>
> The thing which prevents this is the ranking order between journal_start()
> and lock_page().
>
> It's not trivial but also not too hard to change things so that
> journal_start() can rank outside lock_page() - this would also offer some
> CPU savings.
>
> Can't say that I'm terribly motivated about the feature though.

I'm relieved, I have always thought that some higher level synchronization is 
required for simultaneous writes.  So Hans might as well tell his fans that 
Ext3 makes no official guarantee, and neither does Linux.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 21:08       ` Daniel Phillips
@ 2003-09-04 21:39         ` Hans Reiser
  2003-09-04 21:59           ` Daniel Phillips
  0 siblings, 1 reply; 22+ messages in thread
From: Hans Reiser @ 2003-09-04 21:39 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrew Morton, reiserfs-list, linux-kernel

Daniel Phillips wrote:

>I have always thought that some higher level synchronization is 
>required for simultaneous writes.  So Hans might as well tell his fans that 
>Ext3 makes no official guarantee, and neither does Linux.
>
>  
>
Not sure what you mean.

-- 
Hans



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: precise characterization of ext3 atomicity
  2003-09-04 21:39         ` Hans Reiser
@ 2003-09-04 21:59           ` Daniel Phillips
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Phillips @ 2003-09-04 21:59 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Andrew Morton, reiserfs-list, linux-kernel

On Thursday 04 September 2003 23:39, Hans Reiser wrote:
> Daniel Phillips wrote:
> >I have always thought that some higher level synchronization is
> >required for simultaneous writes.  So Hans might as well tell his fans
> >that Ext3 makes no official guarantee, and neither does Linux.
>
> Not sure what you mean.

Nothing bad.  More power to you for adding a transaction interface to Reiser4, 
and blazing that trail.  It's totally missing as a generic api at the moment, 
and needs a push.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-09-09 19:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-04 14:20 precise characterization of ext3 atomicity Hans Reiser
2003-09-04 15:55 ` Andrew Morton
2003-09-04 15:59   ` Hans Reiser
2003-09-04 16:12     ` Andrew Morton
2003-09-04 16:25       ` Hans Reiser
2003-09-04 18:15         ` Mike Fedyk
2003-09-04 16:05           ` Antonio Vargas
2003-09-04 18:37           ` Hans Reiser
2003-09-04 19:12             ` Mike Fedyk
2003-09-04 21:03               ` Hans Reiser
2003-09-04 19:28             ` Andreas Dilger
2003-09-04 21:32               ` Hans Reiser
2003-09-04 22:03                 ` Andreas Dilger
2003-09-05 13:47                   ` Chris Mason
2003-09-09 13:09                 ` Pavel Machek
2003-09-09 19:21                   ` Gábor Lénárt
2003-09-09 19:43                     ` Mike Fedyk
2003-09-04 20:16   ` Daniel Phillips
2003-09-04 20:10     ` Andrew Morton
2003-09-04 21:08       ` Daniel Phillips
2003-09-04 21:39         ` Hans Reiser
2003-09-04 21:59           ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox