public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: mode data=journal in ext3. Is it safe to use?
       [not found] <40FB8221D224C44393B0549DDB7A5CE83E31B1@tor.lokal.lan>
@ 2004-06-15 18:09 ` Petter Larsen
  2004-06-15 18:20   ` Eugene Crosser
                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Petter Larsen @ 2004-06-15 18:09 UTC (permalink / raw)
  To: ext3; +Cc: ext3, Nicolas.Kowalski, linux-kernel

Hello

I try again.

Can anybody of you acknowledge or not if mode data=journal in ext3 is
safe to use in Linux kernel 2.6.x?

Wee need to have a very consistent and integrity for our filesystem, and
it would then be desired to journal both data and metadata.

But if this mode can corrupt the filesystem as both Phil White and
Nicolas Kowalski has experienced, it may be more advised to use mode
data=ordered instead.

Data integrity is much more important for us than speed.

What do you people out there say?

I also try to post this in the kernel mailing list. I have not
subscribed to the kml so if anybody there have som advisory about this I
would be pleased if you could CC me.

Petter
 
On Mon, 2004-06-07 at 10:21, Petter Larsen wrote:
> Hello
> 
> I can see several postings on this mailing-list that people have
> problem
> with mounting ext3 partition with mode data=journal.
> 
> See URL's:
> https://www.redhat.com/archives/ext3-users/2004-March/msg00000.html
> https://www.redhat.com/archives/ext3-users/2004-March/msg00050.html
> 
> We are going to use ext3 on a Compact Flash disk in true IDE mode. We
> need this filesystem to be as safe and consistent as possible. We can
> not tolerate any garbage in the files after a crash or sudden power
> failures. We have then decided to use ext3 with mode data=journal.
> 
> Can I rely on this?
> We use kernel 2.6.5 on PowerPC 8260, and may be using newer kernels
> later in the project.
> 
> 
> Best regards
> --
> Petter Larsen
> cand. scient.
> moreCom as
> 913 17 222
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
-- 
Petter Larsen
cand. scient.
moreCom as
913 17 222

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-15 18:09 ` mode data=journal in ext3. Is it safe to use? Petter Larsen
@ 2004-06-15 18:20   ` Eugene Crosser
  2004-06-17  8:36     ` Petter Larsen
  2004-06-16  7:34   ` Oleg Drokin
  2004-06-16 15:49   ` Timothy Miller
  2 siblings, 1 reply; 32+ messages in thread
From: Eugene Crosser @ 2004-06-15 18:20 UTC (permalink / raw)
  To: Petter Larsen; +Cc: ext3, ext3, Nicolas.Kowalski, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On Tue, 2004-06-15 at 20:09 +0200, Petter Larsen wrote:

> Can anybody of you acknowledge or not if mode data=journal in ext3 is
> safe to use in Linux kernel 2.6.x?
> 
> Wee need to have a very consistent and integrity for our filesystem, and
> it would then be desired to journal both data and metadata.
> 
> But if this mode can corrupt the filesystem as both Phil White and
> Nicolas Kowalski has experienced, it may be more advised to use mode
> data=ordered instead.
> 
> Data integrity is much more important for us than speed.

I ran ext3 with data=journal on 2.6.6smp for about a week on a heavily
loaded system (I mean it).  I did not ever experience filesystem
corruption (related to the fs code).  I did, however, hit complete
system lockup once.  It *may* have been unrelated to the fs code.

(If you use quota, it *will* lock.  The author is working on a fix.
Above, I am referring to a lockup with quota off).

Eugene

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-15 18:09 ` mode data=journal in ext3. Is it safe to use? Petter Larsen
  2004-06-15 18:20   ` Eugene Crosser
@ 2004-06-16  7:34   ` Oleg Drokin
  2004-06-17  8:27     ` Petter Larsen
  2004-06-16 15:49   ` Timothy Miller
  2 siblings, 1 reply; 32+ messages in thread
From: Oleg Drokin @ 2004-06-16  7:34 UTC (permalink / raw)
  To: pla, linux-kernel

Hello!

Petter Larsen <pla@morecom.no> wrote:

PL> Can anybody of you acknowledge or not if mode data=journal in ext3 is
PL> safe to use in Linux kernel 2.6.x?
PL> Wee need to have a very consistent and integrity for our filesystem, and
PL> it would then be desired to journal both data and metadata.

Actually data=journal mode would gain you mostly zero extra consistency compared
to data=ordered mode. (the only more consistency bit that you get is
correct mtime on files that have their pages overwritten, I think).
You have zero control over transaction boundaries in ext3, so you still need
to design your applications in such a way that they have their own
sort of transactions (if this is needed).

PL> Data integrity is much more important for us than speed.

It is not clear what sort of extra data integrity do you expect from data
journaling mode and why do you think it is there.

Garbage in files should not happen in data ordered mode as data pages are
written first before metadata updates are committed.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-15 18:09 ` mode data=journal in ext3. Is it safe to use? Petter Larsen
  2004-06-15 18:20   ` Eugene Crosser
  2004-06-16  7:34   ` Oleg Drokin
@ 2004-06-16 15:49   ` Timothy Miller
  2004-06-17  0:51     ` Daniel Pittman
                       ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Timothy Miller @ 2004-06-16 15:49 UTC (permalink / raw)
  To: Petter Larsen; +Cc: ext3, ext3, Nicolas.Kowalski, linux-kernel



Petter Larsen wrote:

> 
> Data integrity is much more important for us than speed.
> 


You might want to consider ReiserFS or one of the others which were 
designed with journaling in mind.  And I hope you're using RAID1 or RAID5.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-16 15:49   ` Timothy Miller
@ 2004-06-17  0:51     ` Daniel Pittman
  2004-06-17  3:02       ` Tim Connors
  2004-06-17  5:35       ` Hans Reiser
  2004-06-17  8:29     ` Petter Larsen
       [not found]     ` <1805.216.148.213.196.1087426691.squirrel@www.code-visions.com>
  2 siblings, 2 replies; 32+ messages in thread
From: Daniel Pittman @ 2004-06-17  0:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ext3-users

On 17 Jun 2004, Timothy Miller wrote:
> Petter Larsen wrote:
>
>> Data integrity is much more important for us than speed.
>
> You might want to consider ReiserFS or one of the others which were
> designed with journaling in mind.  And I hope you're using RAID1 or
> RAID5.

I must admit, that isn't quite the response that I would have expected
for those requirements. :)

ReiserFS, XFS and (presumably) JFS all have considerably better
performance than ext3, for most tasks, because they were indeed designed
with journaling in mind.

OTOH, ReiserFS had an extremely long period of instability, and was
build by a group who felt that a working fsck was something you put
together after you got the filesystem working.

This, combined with the occasional "ReiserFS 3 ate my data" reports and
the reluctance of the developers to adapt to the 4K kernel stacks in
2.6.recent, would leave me hesitant to recommend it as "more
trustworthy" than ext3.


XFS, with the "null out data on recovery" mode, is less reliable than
ext3, full stop. It routinely destroys data in real world situations, a
secure, but irritating, choice.


ext3 remains the only journaling filesystem that I would, personally,
put any great degree of faith in, since it is still developed in a
cautious and safe fashion, and has a focus on getting the tools to
verify correctness in place before enabling kernel-side features.


Obviously, your millage may vary on these topics, as presumably have
your experiences.

Regards,
        Daniel
-- 
Advertising may be described as the science of arresting the human
intelligence long enough to get money from it.
        -- Stephen Leacock


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re:  mode data=journal in ext3. Is it safe to use?
  2004-06-17  0:51     ` Daniel Pittman
@ 2004-06-17  3:02       ` Tim Connors
  2004-06-17  5:35       ` Hans Reiser
  1 sibling, 0 replies; 32+ messages in thread
From: Tim Connors @ 2004-06-17  3:02 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: linux-kernel, Ext3-users

Daniel Pittman <daniel@rimspace.net> said on Thu, 17 Jun 2004 10:51:54 +1000:
> XFS, with the "null out data on recovery" mode, is less reliable than
> ext3, full stop. It routinely destroys data in real world situations, a
> secure, but irritating, choice.

And please tell me -- the point of journalling is to reduce fsck times
upon failure - particularly important if you have 14TB of raid (yes,
we had to fsck after a recent downtime, and it had been > 180 days -
took half the day). What is the point of journalling if you have to
compare and restore against backup everytime the power fails? This is
slower than a mere fsckage.

FYI, I think jfs has the same behaviour as xfs - I do notice a
distinct lack of usage of a /lost+found, which has been important to
me in the past.

> ext3 remains the only journaling filesystem that I would, personally,
> put any great degree of faith in, since it is still developed in a
> cautious and safe fashion, and has a focus on getting the tools to
> verify correctness in place before enabling kernel-side features.
> 
> 
> Obviously, your millage may vary on these topics, as presumably have
> your experiences.

Sounds about right :)

Next time I reformat/get a new drive, I'll be going back to ext3 -
never caused any problems for me.

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Single White Stick-Figure, L12, enjoys long walks by the shore,
cooking up a nice menudo, and bashing small animals with sticks. My
meat sword is enormous. Seeks female Accordian Thief for relationship
and buffs.               -- Riff @ some game forum

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17  0:51     ` Daniel Pittman
  2004-06-17  3:02       ` Tim Connors
@ 2004-06-17  5:35       ` Hans Reiser
  2004-06-17 10:08         ` Dave Jones
  1 sibling, 1 reply; 32+ messages in thread
From: Hans Reiser @ 2004-06-17  5:35 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: linux-kernel, Ext3-users

Daniel Pittman wrote:

>OTOH, ReiserFS had an extremely long period of instability, 
>
we were stable before ext3 was...

>and was
>build by a group who felt that a working fsck was something you put
>together after you got the filesystem working.
>  
>
Well, if you have a total of two guys working on a filesystem, and 
plenty not working yet in the filesystem, why the hell would you start 
to work on fsck before the main body of code is working and performing 
well enough that anybody would want to use it?  Surely my task ordering 
was correct for a two man team.

With Reiser4 we had funding for an fsck guy, and as a result fsck is 
working at ship.  With V3, we had no funding at all until it started to 
work.

>This, combined with the occasional "ReiserFS 3 ate my data" reports and
>  
>
like ext2/ext3, we are now able to say that almost all such reports are 
hardware (for V3 not V4, V4 gained some bugs when we ported to -mm and 
its radix trees, and is still not shipped as a result).

>the reluctance of the developers to adapt to the 4K kernel stacks in
>2.6.recent,
>
do you use them?  I don't know real users who do, or else I would be 
quicker to care.

On the one hand, you complain about how we were unstable, and on the 
other hand you complain about how we aren't willing to destabilize the 
code to add new features to what is no longer the development branch.  
Seems pretty inconsistent logically to me.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-16  7:34   ` Oleg Drokin
@ 2004-06-17  8:27     ` Petter Larsen
  2004-06-17 17:09       ` Oleg Drokin
  0 siblings, 1 reply; 32+ messages in thread
From: Petter Larsen @ 2004-06-17  8:27 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel, ext3

Hello

I comment inline..

> PL> Can anybody of you acknowledge or not if mode data=journal in ext3 is
> PL> safe to use in Linux kernel 2.6.x?
> PL> Wee need to have a very consistent and integrity for our filesystem, and
> PL> it would then be desired to journal both data and metadata.
> 
> OLEG> Actually data=journal mode would gain you mostly zero extra consistency compared
> to data=ordered mode. (the only more consistency bit that you get is
> correct mtime on files that have their pages overwritten, I think).
> You have zero control over transaction boundaries in ext3, so you still need
> to design your applications in such a way that they have their own
> sort of transactions (if this is needed).

So your conclusion is that data=journal mode is useless if you do not
want a correct mtime?

It would be a littles sense in developing the data=journal mode if this
is the only benefit, don't you think?

>From the Linux/Documentation/filesystems/ext3.txt

data=journal            All data are committed into the journal prior
                        to being written into the main file system.

data=ordered    (*)     All data are forced directly out to the main
file                    system prior to its metadata being committed to
                        the journal.

My problem is that ext3 in the latest kernel, 2.6.x and the latest
2.4.x, are not well documented around the web. Whitepapers and so are
pretty old. Much have changed I belive in ext3 since it was first
introduced by Dr. Tweedie. The first release was journaling both data
and metadata, se also the transcript from Dr. Tweedie from the Ottawa
Linux Symposium 20th July 2000.
http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html

There he says that they are journaling both metadata and data, but that
the design goal is not to do that. So can this be interpreted that mode
data=journal is only there for historic reasons?
 

> PL> Data integrity is much more important for us than speed.
> 
> OLEG> It is not clear what sort of extra data integrity do you expect from data
> journaling mode and why do you think it is there.

I would belive that the goal for such a mode data=journal would gain
extra data integrity because it also journals data. Why should it not? I
would belive that it makes sense to have these different modes so people
can choose the best mode for there applications.

> OLEG> Garbage in files should not happen in data ordered mode as data pages are
> written first before metadata updates are committed.

Are you sure?


Petter



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-16 15:49   ` Timothy Miller
  2004-06-17  0:51     ` Daniel Pittman
@ 2004-06-17  8:29     ` Petter Larsen
  2004-06-17 19:30       ` Daniel Egger
       [not found]       ` <87wu26mto2.fsf@enki.rimspace.net>
       [not found]     ` <1805.216.148.213.196.1087426691.squirrel@www.code-visions.com>
  2 siblings, 2 replies; 32+ messages in thread
From: Petter Larsen @ 2004-06-17  8:29 UTC (permalink / raw)
  To: Timothy Miller; +Cc: ext3, linux-kernel

> > 
> > Data integrity is much more important for us than speed.
> > 
> 
> 
> You might want to consider ReiserFS or one of the others which were 
> designed with journaling in mind.  And I hope you're using RAID1 or RAID5.

We are using ext3 on a compact flash disk in an embedded device. So we
are not using RAID systems.

Best regards
-- 
Petter Larsen
cand. scient.
moreCom as
913 17 222

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-15 18:20   ` Eugene Crosser
@ 2004-06-17  8:36     ` Petter Larsen
  0 siblings, 0 replies; 32+ messages in thread
From: Petter Larsen @ 2004-06-17  8:36 UTC (permalink / raw)
  To: Eugene Crosser; +Cc: ext3, linux-kernel

> > 
> > Data integrity is much more important for us than speed.
> 
> I ran ext3 with data=journal on 2.6.6smp for about a week on a heavily
> loaded system (I mean it).  I did not ever experience filesystem
> corruption (related to the fs code).  I did, however, hit complete
> system lockup once.  It *may* have been unrelated to the fs code.
> 
> (If you use quota, it *will* lock.  The author is working on a fix.
> Above, I am referring to a lockup with quota off).
> 
> Eugene

Good to here. But there may have been a lookup once because you are not
sure that the crash was unrelated to ext3 fs code?

Are you going to test it more?

We are not going to use quota, we are using ext3 on a compact flash disk
in an embedded device.


-- 
Petter Larsen
cand. scient.
moreCom as
913 17 222

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17  5:35       ` Hans Reiser
@ 2004-06-17 10:08         ` Dave Jones
  2004-06-17 16:55           ` Hans Reiser
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Jones @ 2004-06-17 10:08 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Daniel Pittman, linux-kernel, Ext3-users

On Wed, Jun 16, 2004 at 10:35:50PM -0700, Hans Reiser wrote:

 > >the reluctance of the developers to adapt to the 4K kernel stacks in
 > >2.6.recent,
 > >
 > do you use them?  I don't know real users who do, or else I would be 
 > quicker to care.

The Fedora Core 2 kernel (and what will be RHEL4) is currently
using 4K stacks.  This makes up quite a large userbase.

 > On the one hand, you complain about how we were unstable, and on the 
 > other hand you complain about how we aren't willing to destabilize the 
 > code to add new features to what is no longer the development branch.  
 > Seems pretty inconsistent logically to me.

If you really are reluctant it fix it, there's always the option of
marking CONFIG_REISER4 as dependant on CONFIG_BROKEN if CONFIG_4KSTACKS
is selected.

		Dave


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
       [not found]     ` <1805.216.148.213.196.1087426691.squirrel@www.code-visions.com>
@ 2004-06-17 11:23       ` Petter Larsen
  2004-06-17 16:26         ` Andreas Dilger
  0 siblings, 1 reply; 32+ messages in thread
From: Petter Larsen @ 2004-06-17 11:23 UTC (permalink / raw)
  To: Phil White; +Cc: ext3, linux-kernel


> I was never able to resolve the problems I had with data=journal with the
> 2.4 kernel.  I did *not* try the 2.6 kernel though, so I can't give you
> any data points there.  In the end, I settled for data=ordered, and have
> never seen the problems I described in my original posts.  Also, to give
> you some background, I had been using ReiserFS before switching to ext3,
> and I experienced a lot of corruption with Reiser (my company makes linux
> based appliances which sometimes get turned off while under heavy IO). 
> Since ReiserFS doesn't do data journalling (metadata only), we
> consistently ended up with corrupt files.  After this, I decided to try
> ext3 with data=journal, and I never even got far enough with load testing
> to try the 'hard reset' test.  It would consistently crash in the fs code
> under heavy load.

This should be considered a serious bug, dont you think. Have you
reported this to the kernel list? I have the list now on the CC, but it
probably should be made as a bug report.


> 
> We have since had no problems with data=ordered, and since it writes data
> blocks before writing metadata to the journal, we don't see corrupt files
> anymore (even on hard resets).

Ok

> 
> If data integrity (within the file) is important to you in the face of a
> crash or power loss, do NOT use ReiserFS or ext3 data=writeback.   If your
> application never overwrites data in files, you will be just fine using
> data=ordered (appending to files or creating new files is pretty much
> guaranteed to never cause corruption).  If you need to overwrite data in
> files, you need to use data=journal (and probably beg people to fix it) or
> rewrite your application to use some other method (i.e. copy the file,
> delete the old one) and just use data=ordered.
> 

So data=journal would gain safer data integrity (if it works as intended then) 
than using data=ordered. But if data=journal does not work correctly we may be 
better off using data=ordered if we design our application after it. The problem 
is that we can not do this consistent because we have a mix of both open source 
applications and our own developed applications.

But think of your scenario  of copy, delete and make a new file with the new content.
First we copy the contents of the file, then we do our modifications. When we are done 
we delete the original file. Then we hit a crash. The content we had of the file in our process are 
gone, the original file is deleted. This is not a good idea. But if we write the new 
file first as fileX.new and den delete fileX, hit a crash then we would have at least the 
correct file written as fileX.new.

But we would be best off if we could trust the filesystem. 

In practise there are probably many more systems out there which use data=ordered because 
this is the default, and therefor get best testet. 

Journaling both data and metadata was what Dr. Tweedie did in the first public releases,
but the goal was not to do it. 

It is not easy to know what is the best thing to do.

We use this ext3 filesystem on a compact flash in an embedded system. 

Petter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
@ 2004-06-17 14:56 Ken Ryan
  2004-06-17 16:06 ` Timothy Miller
  2004-06-19 14:49 ` Petter Larsen
  0 siblings, 2 replies; 32+ messages in thread
From: Ken Ryan @ 2004-06-17 14:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: pla

> > > 
> > > Data integrity is much more important for us than speed.
> > > 
> > 
> > 
> > You might want to consider ReiserFS or one of the others which were 
> > designed with journaling in mind.  And I hope you're using RAID1 or RAID5.
> 
> We are using ext3 on a compact flash disk in an embedded device. So we
> are not using RAID systems.

[I'm not subscribed, hopefully this threads]

Um, is this a new application or have you done this before?

It's my understanding that very few (or no) CF devices do wear-levelling internally.
Using a journal, especially a true data journal, seems like *the* way to wear out your
flash as quickly as possible.

If you've had success using ext2 in read/write mode on flash/CF in a shipping product,
I for one would like to know more details!

		ken




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 14:56 Ken Ryan
@ 2004-06-17 16:06 ` Timothy Miller
  2004-06-17 17:20   ` Hans Reiser
  2004-06-19 14:49 ` Petter Larsen
  1 sibling, 1 reply; 32+ messages in thread
From: Timothy Miller @ 2004-06-17 16:06 UTC (permalink / raw)
  To: Ken Ryan; +Cc: linux-kernel, pla

Doesn't Reiser4 do wear-leveling for flash?

Ken Ryan wrote:
>> > > > > Data integrity is much more important for us than speed.
>> > > > > > You might want to consider ReiserFS or one of the others 
>> which were > designed with journaling in mind.  And I hope you're 
>> using RAID1 or RAID5.
>>
>> We are using ext3 on a compact flash disk in an embedded device. So we
>> are not using RAID systems.
> 
> 
> [I'm not subscribed, hopefully this threads]
> 
> Um, is this a new application or have you done this before?
> 
> It's my understanding that very few (or no) CF devices do wear-levelling 
> internally.
> Using a journal, especially a true data journal, seems like *the* way to 
> wear out your
> flash as quickly as possible.
> 
> If you've had success using ext2 in read/write mode on flash/CF in a 
> shipping product,
> I for one would like to know more details!
> 
>         ken
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 11:23       ` Petter Larsen
@ 2004-06-17 16:26         ` Andreas Dilger
  0 siblings, 0 replies; 32+ messages in thread
From: Andreas Dilger @ 2004-06-17 16:26 UTC (permalink / raw)
  To: Petter Larsen; +Cc: Phil White, ext3, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On Jun 17, 2004  13:23 +0200, Petter Larsen wrote:
> But think of your scenario  of copy, delete and make a new file with the
> new content. First we copy the contents of the file, then we do our
> modifications. When we are done  we delete the original file. Then we hit
> a crash. The content we had of the file in our process are  gone, the
> original file is deleted. This is not a good idea. But if we write the new 
> file first as fileX.new and den delete fileX, hit a crash then we would
> have at least the  correct file written as fileX.new.

The rename operation is guaranteed to be atomic.  You implement updates as:
1) create new file
2) write data to new file
3) rename new file over old filename

If the system crashes at any time you are guaranteed that the old filename
has valid data in it.  Even if you use data=journal mode while overwriting
the old filename directly you wouldn't be guaranteed to have valid data
unless your application was only e.g. writing aligned records to fixed file
offsets, and those records were <= 4kB in size.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/                 http://members.shaw.ca/golinux/


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 10:08         ` Dave Jones
@ 2004-06-17 16:55           ` Hans Reiser
  0 siblings, 0 replies; 32+ messages in thread
From: Hans Reiser @ 2004-06-17 16:55 UTC (permalink / raw)
  To: Dave Jones; +Cc: Daniel Pittman, linux-kernel, Ext3-users, Chris Mason

Dave Jones wrote:

>On Wed, Jun 16, 2004 at 10:35:50PM -0700, Hans Reiser wrote:
>
> > >the reluctance of the developers to adapt to the 4K kernel stacks in
> > >2.6.recent,
> > >
> > do you use them?  I don't know real users who do, or else I would be 
> > quicker to care.
>
>The Fedora Core 2 kernel (and what will be RHEL4) is currently
>using 4K stacks.  This makes up quite a large userbase.
>  
>
Sigh.  I guess we have to support it then.

Chris, are you up to doing it?

> > On the one hand, you complain about how we were unstable, and on the 
> > other hand you complain about how we aren't willing to destabilize the 
> > code to add new features to what is no longer the development branch.  
> > Seems pretty inconsistent logically to me.
>
>If you really are reluctant it fix it, there's always the option of
>marking CONFIG_REISER4 as dependant on CONFIG_BROKEN if CONFIG_4KSTACKS
>is selected.
>
>		Dave
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17  8:27     ` Petter Larsen
@ 2004-06-17 17:09       ` Oleg Drokin
  2004-06-18  9:41         ` Helge Hafting
  0 siblings, 1 reply; 32+ messages in thread
From: Oleg Drokin @ 2004-06-17 17:09 UTC (permalink / raw)
  To: Petter Larsen; +Cc: linux-kernel, ext3

Hello!

On Thu, Jun 17, 2004 at 10:27:17AM +0200, Petter Larsen wrote:
> > PL> Can anybody of you acknowledge or not if mode data=journal in ext3 is
> > PL> safe to use in Linux kernel 2.6.x?
> > PL> Wee need to have a very consistent and integrity for our filesystem, and
> > PL> it would then be desired to journal both data and metadata.
> > OLEG> Actually data=journal mode would gain you mostly zero extra consistency compared
> > to data=ordered mode. (the only more consistency bit that you get is
> > correct mtime on files that have their pages overwritten, I think).
> > You have zero control over transaction boundaries in ext3, so you still need
> > to design your applications in such a way that they have their own
> > sort of transactions (if this is needed).
> So your conclusion is that data=journal mode is useless if you do not
> want a correct mtime?

Well, yes.

> It would be a littles sense in developing the data=journal mode if this
> is the only benefit, don't you think?
> >From the Linux/Documentation/filesystems/ext3.txt
> data=journal            All data are committed into the journal prior
>                         to being written into the main file system.
> data=ordered    (*)     All data are forced directly out to the main
> file                    system prior to its metadata being committed to
>                         the journal.
> My problem is that ext3 in the latest kernel, 2.6.x and the latest
> 2.4.x, are not well documented around the web. Whitepapers and so are
> pretty old. Much have changed I belive in ext3 since it was first
> introduced by Dr. Tweedie. The first release was journaling both data
> and metadata, se also the transcript from Dr. Tweedie from the Ottawa
> Linux Symposium 20th July 2000.
> http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html
> There he says that they are journaling both metadata and data, but that
> the design goal is not to do that. So can this be interpreted that mode
> data=journal is only there for historic reasons?

May be so. Also fsync heavy loads on real disk devices with large journals
tend to benefit from journaled data mode as well.

> > PL> Data integrity is much more important for us than speed.
> > 
> > OLEG> It is not clear what sort of extra data integrity do you expect from data
> > journaling mode and why do you think it is there.
> I would belive that the goal for such a mode data=journal would gain
> extra data integrity because it also journals data. Why should it not? I

Well, actually I bet you do not care if the data goes through journal or not
as long as it is not lost.
In case of ordered journaling mode, data is written first before metadata
updates, mostly the same happens with data journal mode, only with the latter
case date is written into journal and if transaction was not committed, after
a reboot it won't be copied to where it should be, same scenario in ordered
journal mode will result in data getting where it should be, but due to
lack of metadata updates, you won't see it. (this is in case of append,
for overwrite it will be a little bit different, but still you have no
control over how much of stuff will be overwritten).

> would belive that it makes sense to have these different modes so people
> can choose the best mode for there applications.

True.

> > OLEG> Garbage in files should not happen in data ordered mode as data pages are
> > written first before metadata updates are committed.
> Are you sure?

If you can reproduce a garbage in files in ordered journal mode, that would be a
bug that should be fixed then.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 16:06 ` Timothy Miller
@ 2004-06-17 17:20   ` Hans Reiser
  2004-06-17 19:15     ` Ken Ryan
  2004-06-17 19:43     ` Daniel Egger
  0 siblings, 2 replies; 32+ messages in thread
From: Hans Reiser @ 2004-06-17 17:20 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Ken Ryan, linux-kernel, pla

Timothy Miller wrote:

> Doesn't Reiser4 do wear-leveling for flash?

No, we don't.  We do have wandering logs, so it would be feasible to 
code, but bitmap blocks and super blocks get written to the same 
locations repeatedly.

Actually, most compact flash devices DO do wear leveling, from what I 
have heard.

>
> Ken Ryan wrote:
>
>>> > > > > Data integrity is much more important for us than speed.
>>> > > > > > You might want to consider ReiserFS or one of the others 
>>> which were > designed with journaling in mind.  And I hope you're 
>>> using RAID1 or RAID5.
>>>
>>> We are using ext3 on a compact flash disk in an embedded device. So we
>>> are not using RAID systems.
>>
>>
>>
>> [I'm not subscribed, hopefully this threads]
>>
>> Um, is this a new application or have you done this before?
>>
>> It's my understanding that very few (or no) CF devices do 
>> wear-levelling internally.
>> Using a journal, especially a true data journal, seems like *the* way 
>> to wear out your
>> flash as quickly as possible.
>>
>> If you've had success using ext2 in read/write mode on flash/CF in a 
>> shipping product,
>> I for one would like to know more details!
>>
>>         ken
>>
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 17:20   ` Hans Reiser
@ 2004-06-17 19:15     ` Ken Ryan
  2004-06-18  6:18       ` Hans Reiser
  2004-06-17 19:43     ` Daniel Egger
  1 sibling, 1 reply; 32+ messages in thread
From: Ken Ryan @ 2004-06-17 19:15 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Timothy Miller, linux-kernel, pla

Hans Reiser wrote:

> Timothy Miller wrote:
>
>> Doesn't Reiser4 do wear-leveling for flash?
>
>
> No, we don't.  We do have wandering logs, so it would be feasible to 
> code, but bitmap blocks and super blocks get written to the same 
> locations repeatedly.
>
> Actually, most compact flash devices DO do wear leveling, from what I 
> have heard.


The ones I've seen, only sort of.  They'll allocate writes from 
available erased pages to try to distribute their use, but if you
have a disk that's, say, 70% read-only data and 30% read-write then the 
wear-levelling will only happen on that
30% of the disk.  True wear levelling will actually scrub read-only or 
rarely-written data, forcing it to get off its
duff so the flash cells they're sitting on can get some exercise, and 
give the more worn cells a rest (that scrub
helps ECC fix soft errors from weak cells too).  True wear-levelling is 
really hard, and obviously requires
budgeting extra bandwidth and storage devices for safely shuffling 
around data that the application has no
intention of moving (picture losing power in the middle of a scrub).  
It's not worth it for the consumer CF
usage model of "take photos until the card is full, then copy them all 
to the PC and wipe the card clean".

[Yes, I tend to see this from the inside-out: I'm actually an FPGA/ASIC 
weenie not a kernel hacker.  One of my current
projects is part of a controller chip for a solid-state storage system 
with ${bignum} NAND flash chips.  Alas, my specialty
is video and graphics, so I'm still coming up the learning curve on 
storage systems].

               ken




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17  8:29     ` Petter Larsen
@ 2004-06-17 19:30       ` Daniel Egger
       [not found]       ` <87wu26mto2.fsf@enki.rimspace.net>
  1 sibling, 0 replies; 32+ messages in thread
From: Daniel Egger @ 2004-06-17 19:30 UTC (permalink / raw)
  To: Petter Larsen; +Cc: Timothy Miller, ext3, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 227 bytes --]

On 17.06.2004, at 10:29, Petter Larsen wrote:

> We are using ext3 on a compact flash disk in an embedded device. So we
> are not using RAID systems.

An excellent way to kill such media. Hopefully YMMV.

Servus,
       Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 478 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 17:20   ` Hans Reiser
  2004-06-17 19:15     ` Ken Ryan
@ 2004-06-17 19:43     ` Daniel Egger
  2004-06-17 19:59       ` Ken Ryan
  1 sibling, 1 reply; 32+ messages in thread
From: Daniel Egger @ 2004-06-17 19:43 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Timothy Miller, linux-kernel, pla, Ken Ryan

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On 17.06.2004, at 19:20, Hans Reiser wrote:

> Actually, most compact flash devices DO do wear leveling, from what I 
> have heard.

Care to mention sources? I'd be surprised if they did simply because
it'll cost money that could be earned otherwise. Also I think you
confuse bad block remapping with wear leveling and even the former
I haven't experienced so far.

CF disks were designed for simply the reason of having an empty disk,
writing data onto it up to a certain level, reading it a few times
and emptying the disk again. So except for the organizational blocks
and "the end" of a disk which tends to get rarely hit there're a
well distributed write utilization.

Servus,
       Daniel

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 478 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 19:43     ` Daniel Egger
@ 2004-06-17 19:59       ` Ken Ryan
  0 siblings, 0 replies; 32+ messages in thread
From: Ken Ryan @ 2004-06-17 19:59 UTC (permalink / raw)
  To: Daniel Egger; +Cc: Hans Reiser, Timothy Miller, linux-kernel, pla

Daniel Egger wrote:

> On 17.06.2004, at 19:20, Hans Reiser wrote:
>
>> Actually, most compact flash devices DO do wear leveling, from what I 
>> have heard.
>
>
> Care to mention sources? I'd be surprised if they did simply because
> it'll cost money that could be earned otherwise. Also I think you
> confuse bad block remapping with wear leveling and even the former
> I haven't experienced so far.
>
> CF disks were designed for simply the reason of having an empty disk,
> writing data onto it up to a certain level, reading it a few times
> and emptying the disk again. So except for the organizational blocks
> and "the end" of a disk which tends to get rarely hit there're a
> well distributed write utilization.
>
> Servus,
>       Daniel


For example:

Just bop over to the Sandisk website, go the the OEM section, and download
the manual/datasheet for CF devices.  The wearlevel command itself isn't
supported (I'm ignorant of flash on IDE, I assume it is intended to mean
full scrub-style wear levelling) but they note they roll simplified wear 
levelling
into the erased page pool.

Doing that is an easy way to get part of the way there without needing a 
lot of
infrastructure.  And for the fill-read-empty usage model it's perfectly 
fine.

                ken



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 19:15     ` Ken Ryan
@ 2004-06-18  6:18       ` Hans Reiser
  0 siblings, 0 replies; 32+ messages in thread
From: Hans Reiser @ 2004-06-18  6:18 UTC (permalink / raw)
  To: Ken Ryan; +Cc: Timothy Miller, linux-kernel, pla

Ken Ryan wrote:

> Hans Reiser wrote:
>
>> Timothy Miller wrote:
>>
>>> Doesn't Reiser4 do wear-leveling for flash?
>>
>>
>>
>> No, we don't.  We do have wandering logs, so it would be feasible to 
>> code, but bitmap blocks and super blocks get written to the same 
>> locations repeatedly.
>>
>> Actually, most compact flash devices DO do wear leveling, from what I 
>> have heard.
>
>
>
> The ones I've seen, only sort of.  They'll allocate writes from 
> available erased pages to try to distribute their use, but if you
> have a disk that's, say, 70% read-only data and 30% read-write then 
> the wear-levelling will only happen on that
> 30% of the disk.  True wear levelling will actually scrub read-only or 
> rarely-written data, forcing it to get off its
> duff so the flash cells they're sitting on can get some exercise, and 
> give the more worn cells a rest (that scrub
> helps ECC fix soft errors from weak cells too).  True wear-levelling 
> is really hard, and obviously requires
> budgeting extra bandwidth and storage devices for safely shuffling 
> around data that the application has no
> intention of moving (picture losing power in the middle of a scrub).  
> It's not worth it for the consumer CF
> usage model of "take photos until the card is full, then copy them all 
> to the PC and wipe the card clean".
>
> [Yes, I tend to see this from the inside-out: I'm actually an 
> FPGA/ASIC weenie not a kernel hacker.  One of my current
> projects is part of a controller chip for a solid-state storage system 
> with ${bignum} NAND flash chips.  Alas, my specialty
> is video and graphics, so I'm still coming up the learning curve on 
> storage systems].
>
>               ken
>
>
>
>
>
Interesting.  Thanks for educating me. 

No existing general purpose filesystem that I know of will address your 
needs.  We could of course write one if someone paid for it....

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 17:09       ` Oleg Drokin
@ 2004-06-18  9:41         ` Helge Hafting
  2004-06-18 10:15           ` Oleg Drokin
  0 siblings, 1 reply; 32+ messages in thread
From: Helge Hafting @ 2004-06-18  9:41 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Petter Larsen, linux-kernel, ext3

Oleg Drokin wrote:

>Hello!
>
>On Thu, Jun 17, 2004 at 10:27:17AM +0200, Petter Larsen wrote:
>  
>
>>>PL> Can anybody of you acknowledge or not if mode data=journal in ext3 is
>>>PL> safe to use in Linux kernel 2.6.x?
>>>PL> Wee need to have a very consistent and integrity for our filesystem, and
>>>PL> it would then be desired to journal both data and metadata.
>>>OLEG> Actually data=journal mode would gain you mostly zero extra consistency compared
>>>to data=ordered mode. (the only more consistency bit that you get is
>>>correct mtime on files that have their pages overwritten, I think).
>>>You have zero control over transaction boundaries in ext3, so you still need
>>>to design your applications in such a way that they have their own
>>>sort of transactions (if this is needed).
>>>      
>>>
>>So your conclusion is that data=journal mode is useless if you do not
>>want a correct mtime?
>>    
>>
>
>Well, yes.
>
>  
>
>>It would be a littles sense in developing the data=journal mode if this
>>is the only benefit, don't you think?
>>>From the Linux/Documentation/filesystems/ext3.txt
>>data=journal            All data are committed into the journal prior
>>                        to being written into the main file system.
>>data=ordered    (*)     All data are forced directly out to the main
>>file                    system prior to its metadata being committed to
>>                        the journal.
>>My problem is that ext3 in the latest kernel, 2.6.x and the latest
>>2.4.x, are not well documented around the web. Whitepapers and so are
>>pretty old. Much have changed I belive in ext3 since it was first
>>introduced by Dr. Tweedie. The first release was journaling both data
>>and metadata, se also the transcript from Dr. Tweedie from the Ottawa
>>Linux Symposium 20th July 2000.
>>http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html
>>There he says that they are journaling both metadata and data, but that
>>the design goal is not to do that. So can this be interpreted that mode
>>data=journal is only there for historic reasons?
>>    
>>
>
>May be so. Also fsync heavy loads on real disk devices with large journals
>tend to benefit from journaled data mode as well.
>
>  
>
>>>PL> Data integrity is much more important for us than speed.
>>>
>>>OLEG> It is not clear what sort of extra data integrity do you expect from data
>>>journaling mode and why do you think it is there.
>>>      
>>>
>>I would belive that the goal for such a mode data=journal would gain
>>extra data integrity because it also journals data. Why should it not? I
>>    
>>
>
>Well, actually I bet you do not care if the data goes through journal or not
>as long as it is not lost.
>In case of ordered journaling mode, data is written first before metadata
>updates, mostly the same happens with data journal mode, only with the latter
>case date is written into journal and if transaction was not committed, after
>a reboot it won't be copied to where it should be, same scenario in ordered
>journal mode will result in data getting where it should be, but due to
>lack of metadata updates, you won't see it. (this is in case of append,
>for overwrite it will be a little bit different, but still you have no
>control over how much of stuff will be overwritten).
>
>  
>
>>would belive that it makes sense to have these different modes so people
>>can choose the best mode for there applications.
>>    
>>
>
>True.
>
>  
>
>>>OLEG> Garbage in files should not happen in data ordered mode as data pages are
>>>written first before metadata updates are committed.
>>>      
>>>
>>Are you sure?
>>    
>>
>
>If you can reproduce a garbage in files in ordered journal mode, that would be a
>bug that should be fixed then.
>  
>
Hard to _produce_, but consider:
1. Write data to an existing file
2. Sync metadata
3. data is forced out because of ordered mode, a powerout crash happens
    in the middle of this. The file now has a block with a mix of new 
and old,
    it may even be unreadable due to a bad sector checksum.

With data journalling you either get the old data (because the crash 
happened
during a write to the journal) or new data (crash happened during data 
write,
the data is restored from the good copy in the journal.)

Helge Hafting

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-18  9:41         ` Helge Hafting
@ 2004-06-18 10:15           ` Oleg Drokin
  2004-06-18 11:30             ` Paulo Marques
  0 siblings, 1 reply; 32+ messages in thread
From: Oleg Drokin @ 2004-06-18 10:15 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Petter Larsen, linux-kernel, ext3

Hello!

On Fri, Jun 18, 2004 at 11:41:23AM +0200, Helge Hafting wrote:

> >If you can reproduce a garbage in files in ordered journal mode, that 
> >would be a
> >bug that should be fixed then.
> Hard to _produce_, but consider:
> 1. Write data to an existing file
> 2. Sync metadata
> 3. data is forced out because of ordered mode, a powerout crash happens
>    in the middle of this. The file now has a block with a mix of new 
> and old,

Well, this is not much worse than having two blocks, one from old file
and one from new after a crash.

>    it may even be unreadable due to a bad sector checksum.

Well, in data journaled mode you may get unreadable journal, is this much
better? (Also original question was about CF flash media, so no bad sector
problems I presume).

> With data journalling you either get the old data (because the crash 
> happened
> during a write to the journal) or new data (crash happened during data 
> write,

Well, while with data journaling mode your granularity is one block,
with data ordered it is one sector.

> the data is restored from the good copy in the journal.)

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-18 10:15           ` Oleg Drokin
@ 2004-06-18 11:30             ` Paulo Marques
  2004-06-18 12:05               ` Oleg Drokin
  2004-06-19 19:16               ` mode data=journal in ext3. Is it safe to use? Bernd Eckenfels
  0 siblings, 2 replies; 32+ messages in thread
From: Paulo Marques @ 2004-06-18 11:30 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Helge Hafting, Petter Larsen, linux-kernel@vger.kernel.org, ext3

On Fri, 2004-06-18 at 11:15, Oleg Drokin wrote:
> Hello!
> 
> On Fri, Jun 18, 2004 at 11:41:23AM +0200, Helge Hafting wrote:
> 
> > >If you can reproduce a garbage in files in ordered journal mode, that 
> > >would be a
> > >bug that should be fixed then.
> > Hard to _produce_, but consider:
> > 1. Write data to an existing file
> > 2. Sync metadata
> > 3. data is forced out because of ordered mode, a powerout crash happens
> >    in the middle of this. The file now has a block with a mix of new 
> > and old,
> 
> Well, this is not much worse than having two blocks, one from old file
> and one from new after a crash.

Agree. If the application needs consistency it must do some journaling
itself. At least, until the time when an application can say "start
transaction" "commit transaction" to the file system itself.

> >    it may even be unreadable due to a bad sector checksum.
> 
> Well, in data journaled mode you may get unreadable journal, is this much
> better? (Also original question was about CF flash media, so no bad sector
> problems I presume).

You got it wrong here. The sentence was "bad sector checksum", not "bad
sector". If the sector was "half written", then the checksum would not
match.

If the journal is "half written" then it is just discarded (or at least
it should be).

> > With data journalling you either get the old data (because the crash 
> > happened
> > during a write to the journal) or new data (crash happened during data 
> > write,
> 
> Well, while with data journaling mode your granularity is one block,
> with data ordered it is one sector.

Imagine that you request a 2Mb write to an ext3 filesystem with an 1Mb
journal. There is *no way* the filesystem can do the write in an atomic
operation. (there would be if the filesystem wrote the data to free
blocks and updated the metadata through the journal)

The point is, there is no concept of "atomic operation" at the file
system level, so the application must do journaling itself if it wants
to have some concept of "transactions".

>From my experience with CF cards, there are some brands that do
wear-leveling (I know that at least the TwinMOS ones do, and probably
SanDisk too) and others that don't (Kingmax). 

With a bad CF card and an ext3 filesystem you can get bad sectors in a
couple of hours doing some intensive writing. 

A good CF card will sustain "normal use" (2 writes per minute average)
and an ext3 filesystem for months (maybe years, I still didn't went that
far in time :)

Just my two cents,

-- 
Paulo Marques - www.grupopie.com
"In a world without walls and fences who needs windows and gates?"


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-18 11:30             ` Paulo Marques
@ 2004-06-18 12:05               ` Oleg Drokin
  2004-06-21 17:42                 ` mode data=journal in ext3. Is it safe to use? Conclusion Petter Larsen
  2004-06-19 19:16               ` mode data=journal in ext3. Is it safe to use? Bernd Eckenfels
  1 sibling, 1 reply; 32+ messages in thread
From: Oleg Drokin @ 2004-06-18 12:05 UTC (permalink / raw)
  To: Paulo Marques
  Cc: Helge Hafting, Petter Larsen, linux-kernel@vger.kernel.org, ext3

Hello!

On Fri, Jun 18, 2004 at 12:30:55PM +0100, Paulo Marques wrote:
> > > Hard to _produce_, but consider:
> > > 1. Write data to an existing file
> > > 2. Sync metadata
> > > 3. data is forced out because of ordered mode, a powerout crash happens
> > >    in the middle of this. The file now has a block with a mix of new 
> > > and old,
> > Well, this is not much worse than having two blocks, one from old file
> > and one from new after a crash.
> Agree. If the application needs consistency it must do some journaling
> itself. At least, until the time when an application can say "start
> transaction" "commit transaction" to the file system itself.

Right, this is my point.

> > >    it may even be unreadable due to a bad sector checksum.
> > Well, in data journaled mode you may get unreadable journal, is this much
> > better? (Also original question was about CF flash media, so no bad sector
> > problems I presume).
> You got it wrong here. The sentence was "bad sector checksum", not "bad
> sector". If the sector was "half written", then the checksum would not
> match.

In any case bad sector checksum is hardware bug. Sector write is supposed to be
atomic, it either happens or not.

> If the journal is "half written" then it is just discarded (or at least
> it should be).

Well, if there is bad sector checksum inside journal block, ext3 won't be
all that happy about this for sure (and most of other journaling filesystems as
well, I am sure).

> > > With data journalling you either get the old data (because the crash 
> > > happened
> > > during a write to the journal) or new data (crash happened during data 
> > > write,
> > Well, while with data journaling mode your granularity is one block,
> > with data ordered it is one sector.
> Imagine that you request a 2Mb write to an ext3 filesystem with an 1Mb
> journal. There is *no way* the filesystem can do the write in an atomic
> operation. (there would be if the filesystem wrote the data to free
> blocks and updated the metadata through the journal)

True.
Even if you write 512K of data and have 1Mb journal, still there is no atomicity
guarantee.

> The point is, there is no concept of "atomic operation" at the file
> system level, so the application must do journaling itself if it wants
> to have some concept of "transactions".

Well, if you go with less than 1 block size updates (that do not cross block
boundaries), this can be done atomically. (with help of fsync and stuff).

> >From my experience with CF cards, there are some brands that do
> wear-leveling (I know that at least the TwinMOS ones do, and probably
> SanDisk too) and others that don't (Kingmax). 
> With a bad CF card and an ext3 filesystem you can get bad sectors in a
> couple of hours doing some intensive writing. 

Well, for flash memory there is jffs2, it does (data) journalling and supports
compression. And it can even work over conventional block devices via mtd block
emulation, I think. Basically jffs2 is one large fs-sized journal as I
understand it.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-17 14:56 Ken Ryan
  2004-06-17 16:06 ` Timothy Miller
@ 2004-06-19 14:49 ` Petter Larsen
  1 sibling, 0 replies; 32+ messages in thread
From: Petter Larsen @ 2004-06-19 14:49 UTC (permalink / raw)
  To: Ken Ryan; +Cc: linux-kernel

> > 
> > We are using ext3 on a compact flash disk in an embedded device. So we
> > are not using RAID systems.
> 

> Um, is this a new application or have you done this before?
> 
> It's my understanding that very few (or no) CF devices do wear-levelling internally.
> Using a journal, especially a true data journal, seems like *the* way to wear out your
> flash as quickly as possible.
> 
> If you've had success using ext2 in read/write mode on flash/CF in a shipping product,
> I for one would like to know more details!
> 
> 		ken

>From our data sheet:

    Wear Leveling is an intrinsic part of the operation of 	
    SanDisk products using NAND memory.

But for sure, we will use a Compact flash that DO wear leveling, and
also shuffling read-only data around the Compact Flash disk.

This will be for production, yes.

Petter

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-18 11:30             ` Paulo Marques
  2004-06-18 12:05               ` Oleg Drokin
@ 2004-06-19 19:16               ` Bernd Eckenfels
  1 sibling, 0 replies; 32+ messages in thread
From: Bernd Eckenfels @ 2004-06-19 19:16 UTC (permalink / raw)
  To: linux-kernel

In article <1087558255.25904.14.camel@pmarqueslinux> you wrote:
> The point is, there is no concept of "atomic operation" at the file
> system level, so the application must do journaling itself if it wants
> to have some concept of "transactions".

Well, there can be rules like  "writes after flush with size less than x are
atomic". With X beeing something between sector size, blocksize or data
journal size.

However most unix programs  which do not do yournalling and rely on some
stable atomic behaviour work with generating new files and renaming that.
And for this the meta data journalling in ordered mode is fine. 

So only the append only logfiles may need some special treatment, this looks
like a common source for null-bytes in a file. And only in case it is not a
temp file, its a problem (syslog)

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use? Conclusion
  2004-06-18 12:05               ` Oleg Drokin
@ 2004-06-21 17:42                 ` Petter Larsen
  0 siblings, 0 replies; 32+ messages in thread
From: Petter Larsen @ 2004-06-21 17:42 UTC (permalink / raw)
  To: ext3, linux-kernel; +Cc: albertogli


I will summarise this thread and try to set the picture of what has been
discussed and concluded.

1. ext3 with mode data=journal in kernel 2.6.x is probably working as
intended. One has responded with using this mode heavily on 2.6.6
without corruption related to the fs code. Since nobody has said that
they have seen faults, we should belive that it is safe. It is in an
stable kernel...  

2. Mode data=journal will not gain much more than correct mtime compared
to mode data=ordered.

3. Applications that need a very consistent filesystem, e.g. consistent
writes, they need to do this by implementing there own
transaction/journaling system. Alberto Bertogli has written a library
that can assist with this. See URL,
http://users.auriga.wearlab.de/~alb/libjio/. I have not used it so I can
not say for sure how good it is, but it seems like a nice start and
worth to take a look at.

4.  Because mode data=journal does not gain much, it would be better to
use mode data=ordered and use any form of transaction/journaling itself.
Mode data=ordered is the default in ext3 and probably most used, and
therefor also best tested.

5. If, and only if, you have less than 1 block size updates (that do not
cross block boundaries), these operations (write)  can be done
atomically. (with help of fsync and stuff,(from Oleg and others)).

6. Wear leveling on a Compact Flash card:
Wear leveling is an important task. SanDisk has Industrial Grade support
for some of there CF-cards, see these links.
http://www.sandisk.com/pressrelease/020522_toughness.htm
http://www.sandisk.com/pressrelease/021112_igapps.htm
http://www.sandisk.com/pdf/oem/WPaperWearLevelv1.0.pdf
We are in the telecommunications and networking business and need this
kind of Compact Flash cards. From there site:
* Enhanced error correction and sophisticated wear leveling technology
* Card level MTBF >3 million hours
* 2 million program/erase cycle endurance per block 

We are not bound to SanDisk. We could use any suplier that meet these
criteria. 

I do not know the wear leveling algorithm in detail so how they shuffle
read-only data (or if they do) around the disk, and even how it does it
if we create partitions on this CF disk (partition are probably
transparent for the wear leveling algorithm), is an issue we need to
find out of.

Thanks for all your replies ( there are 32 threads:-) spread along the
ext3 ML and the LKML and a couple private ). It has helped me a lot.

Best regards
-- 
Petter Larsen
cand. scient.
moreCom as
913 17 222

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
       [not found]       ` <87wu26mto2.fsf@enki.rimspace.net>
@ 2004-06-27 14:17         ` Petter Larsen
  2004-06-28  0:22           ` Daniel Pittman
  0 siblings, 1 reply; 32+ messages in thread
From: Petter Larsen @ 2004-06-27 14:17 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: ext3, linux-kernel

> > We are using ext3 on a compact flash disk in an embedded device. So we
> > are not using RAID systems.
> 
> Watch out - even with the internal wear leveling the CF disk will do,
> ext3 is still a pretty heavy filesystem to use there.
> 
>      Daniel

Well, which filesystem would you then used for read-write on this CF?


-- 
Petter Larsen
cand. scient.
moreCom as
913 17 222

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: mode data=journal in ext3. Is it safe to use?
  2004-06-27 14:17         ` Petter Larsen
@ 2004-06-28  0:22           ` Daniel Pittman
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Pittman @ 2004-06-28  0:22 UTC (permalink / raw)
  To: Petter Larsen; +Cc: ext3, linux-kernel

On 28 Jun 2004, Petter Larsen wrote:
>>> We are using ext3 on a compact flash disk in an embedded device. So we
>>> are not using RAID systems.
>>
>> Watch out - even with the internal wear leveling the CF disk will do,
>> ext3 is still a pretty heavy filesystem to use there.
>
> Well, which filesystem would you then used for read-write on this CF?

My recommendation would be to look at running your system out of memory,
and writing back to flash on a scheduled basis, and at shutdown.

That way the write load is minimized, but you still have a persistent
store.
        Daniel
-- 
Anyone who goes to a psychiatrist ought to have his head examined.
        -- Samuel Goldwyn

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2004-06-28  0:22 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <40FB8221D224C44393B0549DDB7A5CE83E31B1@tor.lokal.lan>
2004-06-15 18:09 ` mode data=journal in ext3. Is it safe to use? Petter Larsen
2004-06-15 18:20   ` Eugene Crosser
2004-06-17  8:36     ` Petter Larsen
2004-06-16  7:34   ` Oleg Drokin
2004-06-17  8:27     ` Petter Larsen
2004-06-17 17:09       ` Oleg Drokin
2004-06-18  9:41         ` Helge Hafting
2004-06-18 10:15           ` Oleg Drokin
2004-06-18 11:30             ` Paulo Marques
2004-06-18 12:05               ` Oleg Drokin
2004-06-21 17:42                 ` mode data=journal in ext3. Is it safe to use? Conclusion Petter Larsen
2004-06-19 19:16               ` mode data=journal in ext3. Is it safe to use? Bernd Eckenfels
2004-06-16 15:49   ` Timothy Miller
2004-06-17  0:51     ` Daniel Pittman
2004-06-17  3:02       ` Tim Connors
2004-06-17  5:35       ` Hans Reiser
2004-06-17 10:08         ` Dave Jones
2004-06-17 16:55           ` Hans Reiser
2004-06-17  8:29     ` Petter Larsen
2004-06-17 19:30       ` Daniel Egger
     [not found]       ` <87wu26mto2.fsf@enki.rimspace.net>
2004-06-27 14:17         ` Petter Larsen
2004-06-28  0:22           ` Daniel Pittman
     [not found]     ` <1805.216.148.213.196.1087426691.squirrel@www.code-visions.com>
2004-06-17 11:23       ` Petter Larsen
2004-06-17 16:26         ` Andreas Dilger
2004-06-17 14:56 Ken Ryan
2004-06-17 16:06 ` Timothy Miller
2004-06-17 17:20   ` Hans Reiser
2004-06-17 19:15     ` Ken Ryan
2004-06-18  6:18       ` Hans Reiser
2004-06-17 19:43     ` Daniel Egger
2004-06-17 19:59       ` Ken Ryan
2004-06-19 14:49 ` Petter Larsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox