* safe flash filesystem
@ 2001-06-21 10:54 Abraham vd Merwe
2001-06-21 13:43 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Abraham vd Merwe @ 2001-06-21 10:54 UTC (permalink / raw)
To: MTD for Linux
[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]
Hi!
We're developing a product that needs a small part of the flash memory to
contain a very scaled down file system containing configuration files. This
file system have to be extremely reliable. Speed is not an issue, but the
validity of any data written to it is of utmost importance.
So before I go reinvent the wheel, I'd like to know if there's something
which already does this, i.e. does things like keep duplicate copies of
everything around, have all sorts of checks and balances to check for
damaged parts of the flash, no caching and some sort of wear levelling, etc?
--
Regards
Abraham
Have you noticed that all you need to grow healthy, vigorous grass is a
crack in your sidewalk?
__________________________________________________________
Abraham vd Merwe - 2d3D, Inc.
Device Driver Development, Outsourcing, Embedded Systems
Cell: +27 82 565 4451 Snailmail:
Tel: +27 21 761 7549 Block C, Antree Park
Fax: +27 21 761 7648 Doncaster Road
Email: abraham@2d3d.co.za Kenilworth, 7700
Http: http://www.2d3d.com South Africa
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 10:54 safe flash filesystem Abraham vd Merwe
@ 2001-06-21 13:43 ` Vipin Malik
2001-06-21 13:57 ` Abraham vd Merwe
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 13:43 UTC (permalink / raw)
To: Abraham vd Merwe, MTD for Linux
>
>We're developing a product that needs a small part of the flash memory to
>contain a very scaled down file system containing configuration files. This
>file system have to be extremely reliable. Speed is not an issue, but the
>validity of any data written to it is of utmost importance.
>
>So before I go reinvent the wheel, I'd like to know if there's something
>which already does this, i.e. does things like keep duplicate copies of
>everything around, have all sorts of checks and balances to check for
>damaged parts of the flash, no caching and some sort of wear levelling, etc?
Normally you could just use a "small" JFFS2 partition- or even a "full" JFFS2
partition with just your config database file on it.
HOWEVER, at this time JFFS2 is NOT power fail "roll back and recover" safe-
even for write()'s less than PAGE_SIZE.
Read gory details at:
JFFS: A practical guide at:
http://www.embeddedlinuxworks.com/articles/jffs_guide.html
There is another very serious problem with JFFS(2). IT may block read write
accesses for 10's of seconds while it GC's- specially on a new full fs.
(watch for a new article on read/write latencies on periodic tasks
reading/writing
from/to a JFFS fs very soon).
In my opinion a "small" embedded power fail safe database utility is
needed that would solve the power fail issue as well as provide caching
support for read/write with a read/write log on another (separate) device
to get around the latency problems.
I have started a project to define the features required for this. Please
read details at:
http://www.embeddedlinuxworks.com/articles/db_project.html
I need the same thing for my project. I am sure there are others out there
that need
this capability for their embedded systems. It would be nice if all could
collaborate and come
up with an open source software that addresses this need.
Regards,
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 13:43 ` Vipin Malik
@ 2001-06-21 13:57 ` Abraham vd Merwe
2001-06-21 14:29 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Abraham vd Merwe @ 2001-06-21 13:57 UTC (permalink / raw)
To: Vipin Malik; +Cc: MTD for Linux
[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]
Hi Vipin!
> I have started a project to define the features required for this. Please
> read details at:
> http://www.embeddedlinuxworks.com/articles/db_project.html
>
> I need the same thing for my project. I am sure there are others out there
> that need
> this capability for their embedded systems. It would be nice if all could
> collaborate and come
> up with an open source software that addresses this need.
Exciting stuff. It does address the issues I'm interested in. More
specifically I agree that a driver level system is needed and not a user
layer (otherwise we should be using Sleepycat's db3 which is perfect for
transactional data except that it's not free).
If it's a file system people will actually use it instead of just talk about
it.
Doing this sort of thing properly though is a huge task however and should
be carefully planned.
I'm really interested in helping (if we're talking driver, not user level
process - I don't want to develop a database :P)
--
Regards
Abraham
Don't let people drive you crazy when you know it's in walking distance.
__________________________________________________________
Abraham vd Merwe - 2d3D, Inc.
Device Driver Development, Outsourcing, Embedded Systems
Cell: +27 82 565 4451 Snailmail:
Tel: +27 21 761 7549 Block C, Antree Park
Fax: +27 21 761 7648 Doncaster Road
Email: abraham@2d3d.co.za Kenilworth, 7700
Http: http://www.2d3d.com South Africa
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 13:57 ` Abraham vd Merwe
@ 2001-06-21 14:29 ` Vipin Malik
2001-06-21 14:35 ` Abraham vd Merwe
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 14:29 UTC (permalink / raw)
To: Abraham vd Merwe; +Cc: MTD for Linux
>Exciting stuff. It does address the issues I'm interested in. More
>specifically I agree that a driver level system is needed and not a user
>layer (otherwise we should be using Sleepycat's db3 which is perfect for
>transactional data except that it's not free).
Yeah, that "except is not free" means it costs >100K USD (for the
transaction version) if I remember the quote they gave me.
Plus, it's too "thick" for the typical use in embedded systems to store
configuration variables and a few logs and data values.
>Doing this sort of thing properly though is a huge task however and should
>be carefully planned.
>
>I'm really interested in helping (if we're talking driver, not user level
>process - I don't want to develop a database :P)
I guess I should change the name away from a "database", as I really don't
want to do anything more drastic than maybe a linear "recno" type flat file
database or something very simple. Something that you or I may have
probably implemented anyway to solve our needs to store these config
variables in our respective systems.
I did not have a full featured commercial "database" type database in mind,
or even anything close.
I agree that this functionality should be present in the JFFS2 layer
itself, but have been unable to convince the powers to be.
David W. says that he'll entertain a diff -u if we (I) implement the
feature, but I don't think that I have the time or the capability to change
JFFS2 to implement this functionality all by myself. Plus the issue of the
blocked access will remain (even if we solve the power fail issue) and the
solution to that may be not be acceptable for inclusion to the regular
JFFS2 fs.
Maybe you may want to subscribe to the development list on the
www.EmbeddedLinuxWorks site and we can take this discussion there. I am
looking for user input to define the feature set of this "config system"
(not database :) And I want to make it LGPL (if it does become a lib or
task) so that users can link to it without releasing the source of their
own code.
I can't believe that you and I are the only folks that may be interested in
this. Maybe there is an existing solution- we just don't know about it- or
other's have not thought about this issue just yet ;)
Regards,
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 14:29 ` Vipin Malik
@ 2001-06-21 14:35 ` Abraham vd Merwe
2001-06-21 15:05 ` Vipin Malik
` (3 more replies)
0 siblings, 4 replies; 30+ messages in thread
From: Abraham vd Merwe @ 2001-06-21 14:35 UTC (permalink / raw)
To: Vipin Malik; +Cc: MTD for Linux
[-- Attachment #1: Type: text/plain, Size: 3015 bytes --]
Hi Vipin!
> Yeah, that "except is not free" means it costs >100K USD (for the
> transaction version) if I remember the quote they gave me.
$125K iirc (;
> I guess I should change the name away from a "database", as I really don't
> want to do anything more drastic than maybe a linear "recno" type flat file
> database or something very simple. Something that you or I may have
> probably implemented anyway to solve our needs to store these config
> variables in our respective systems.
That's exactly what I've started writing today (:
> I agree that this functionality should be present in the JFFS2 layer
> itself, but have been unable to convince the powers to be.
>
> David W. says that he'll entertain a diff -u if we (I) implement the
> feature, but I don't think that I have the time or the capability to change
> JFFS2 to implement this functionality all by myself. Plus the issue of the
> blocked access will remain (even if we solve the power fail issue) and the
> solution to that may be not be acceptable for inclusion to the regular
> JFFS2 fs.
I don't know if merging something like this with jffs2 would solve the
problem like you said. I was more thinking of a completely different user
MTD driver to provide an uncached block device and slap a file system on top
of that. Or we can sync() all the time from the file system.
I have to agree that it's probably better to write a library/utilities first
to do a preliminary thing. That way we'd get a functional thing quite fast
and figure out what we did wrong in the first place.
> Maybe you may want to subscribe to the development list on the
> www.EmbeddedLinuxWorks site and we can take this discussion there. I am
Where do I subscribe?
> looking for user input to define the feature set of this "config system"
> (not database :) And I want to make it LGPL (if it does become a lib or
> task) so that users can link to it without releasing the source of their
> own code.
I'm in the fortunate position of being able to work on this fulltime for
the next week or so, so if we can figure out a useful specification for this
in a short time, I'm really keen on helping to implement this and LGPL/BSD
license is just fine.
> I can't believe that you and I are the only folks that may be interested in
> this. Maybe there is an existing solution- we just don't know about it- or
> other's have not thought about this issue just yet ;)
That's why I mailed here in the first place :P
--
Regards
Abraham
Matrimony is the root of all evil.
__________________________________________________________
Abraham vd Merwe - 2d3D, Inc.
Device Driver Development, Outsourcing, Embedded Systems
Cell: +27 82 565 4451 Snailmail:
Tel: +27 21 761 7549 Block C, Antree Park
Fax: +27 21 761 7648 Doncaster Road
Email: abraham@2d3d.co.za Kenilworth, 7700
Http: http://www.2d3d.com South Africa
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 14:35 ` Abraham vd Merwe
@ 2001-06-21 15:05 ` Vipin Malik
2001-06-21 15:36 ` Chris Read
2001-06-21 15:09 ` Joakim Tjernlund
` (2 subsequent siblings)
3 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 15:05 UTC (permalink / raw)
To: Abraham vd Merwe; +Cc: MTD for Linux
>
>That's exactly what I've started writing today (:
Have you written a spec for it?
>I don't know if merging something like this with jffs2 would solve the
>problem like you said. I was more thinking of a completely different user
>MTD driver to provide an uncached block device and slap a file system on top
>of that. Or we can sync() all the time from the file system.
Nooooo.... ;)
JFFS2 already provides for:
1. Interface to MTD
2. Flash wear levelling
3. Compression/decompression on the fly
4. "always sync()" data to flash before your write() returns functionality
5. handling of erase paritions, GC, a file system interface etc.
6. tested for power fail reliability of the fs metadata.
7. Extensive usage by others even if they do not need this (our)
functionality- hence minimal hidden bugs.
My initial feel is that I really don't think that reinventing the wheel is
the right answer.
What we need is a "layer" on top of JFFS2 to provide the 2 features that it
lacks. Namely:
1. Roll back and recover to last data if your write did not complete and
power failed
2. 0 latency writes. Reads are no problem as they can always be cached in
memory by reading the entire (it's not a database) database on startup.
Alan Cox called this "transactional level" functionality.
An implementation based on a transaction cache solves the issue of having
to duplicate all the members and CRC them thus supporting quite a large
database without the need for twice the space on the flash device, as well
as the issue of "roll back and recover" or "complete transaction" on the
next power up if the complete transaction is available. There is a reason
that transactional logs are the preferred choice even for large databases
that need to support fail safe.
>I have to agree that it's probably better to write a library/utilities first
>to do a preliminary thing. That way we'd get a functional thing quite fast
>and figure out what we did wrong in the first place.
My thoughts exactly.
> > Maybe you may want to subscribe to the development list on the
> > www.EmbeddedLinuxWorks site and we can take this discussion there. I am
>
>Where do I subscribe?
http://www.embeddedlinuxworks.com/lists.html
> > looking for user input to define the feature set of this "config system"
> > (not database :) And I want to make it LGPL (if it does become a lib or
> > task) so that users can link to it without releasing the source of their
> > own code.
>
>I'm in the fortunate position of being able to work on this fulltime for
>the next week or so, so if we can figure out a useful specification for this
>in a short time, I'm really keen on helping to implement this and LGPL/BSD
>license is just fine.
That suits me just fine. We need to start working on a spec first. Maybe a
clearly defined requirement spec, then a design document. If you like I can
elaborate on the above thoughts a bit and start something- or feel free to
send me something if you like.
> > I can't believe that you and I are the only folks that may be
> interested in
> > this. Maybe there is an existing solution- we just don't know about it- or
> > other's have not thought about this issue just yet ;)
>
>That's why I mailed here in the first place :P
But I haven't heard from anyone else yet :(
Regards,
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
2001-06-21 14:35 ` Abraham vd Merwe
2001-06-21 15:05 ` Vipin Malik
@ 2001-06-21 15:09 ` Joakim Tjernlund
2001-06-21 15:34 ` Vipin Malik
2001-06-21 15:11 ` Herman Oosthuysen
2001-06-21 21:26 ` safe flash filesystem Russ Dill
3 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2001-06-21 15:09 UTC (permalink / raw)
To: Abraham vd Merwe, Vipin Malik; +Cc: MTD for Linux
> Hi Vipin!
>
> > Yeah, that "except is not free" means it costs >100K USD (for the
> > transaction version) if I remember the quote they gave me.
>
> $125K iirc (;
>
Check out the Mird DB at http://www.mirar.org/mird/
I would be very interested to hear what you think
of this DB.
Joakim Tjernlund
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 14:35 ` Abraham vd Merwe
2001-06-21 15:05 ` Vipin Malik
2001-06-21 15:09 ` Joakim Tjernlund
@ 2001-06-21 15:11 ` Herman Oosthuysen
2001-06-21 17:54 ` Tim Riker
2001-06-21 21:26 ` safe flash filesystem Russ Dill
3 siblings, 1 reply; 30+ messages in thread
From: Herman Oosthuysen @ 2001-06-21 15:11 UTC (permalink / raw)
To: Abraham vd Merwe, Vipin Malik; +Cc: MTD for Linux
Hi guys,
We are currently exploring a product by Tevero in Norway:
http://www.tevero.no/products/fdc/ to use instead of the still buggy JFFS2.
Price is USD2500, which isn't bad. It appears to be the cheapest commercial
FFS available.
Cheers,
Herman
http://www.WirelessNetworksInc.com
----- Original Message -----
From: Abraham vd Merwe <abraham@2d3d.co.za>
To: Vipin Malik <mtd-linux@embeddedlinuxworks.com>
Cc: MTD for Linux <linux-mtd@lists.infradead.org>
Sent: Thursday, June 21, 2001 8:35 AM
Subject: Re: safe flash filesystem
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
2001-06-21 15:09 ` Joakim Tjernlund
@ 2001-06-21 15:34 ` Vipin Malik
2001-06-21 19:34 ` Joakim Tjernlund
2001-06-21 19:47 ` Joakim Tjernlund
0 siblings, 2 replies; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 15:34 UTC (permalink / raw)
To: joakim.tjernlund, Abraham vd Merwe; +Cc: MTD for Linux
Thanks for the link. At first blush it seems like something that will
provide a great place to start- if it does not provide all the
functionality already.
I'll examine it a bit more and give you my thoughts. Are you using this db
or have any experience with it?
The author does not specify the license- besides stating that it is "free".
Is it GPL or LGPL?
Regards,
Vipin
At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote:
> > Hi Vipin!
> >
> > > Yeah, that "except is not free" means it costs >100K USD (for the
> > > transaction version) if I remember the quote they gave me.
> >
> > $125K iirc (;
> >
>Check out the Mird DB at http://www.mirar.org/mird/
>
>I would be very interested to hear what you think
>of this DB.
>
> Joakim Tjernlund
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
2001-06-21 15:05 ` Vipin Malik
@ 2001-06-21 15:36 ` Chris Read
0 siblings, 0 replies; 30+ messages in thread
From: Chris Read @ 2001-06-21 15:36 UTC (permalink / raw)
To: 'Vipin Malik', 'Abraham vd Merwe'; +Cc: 'MTD for Linux'
I would also be very interested in this.
The ability to retain consistency after multiple power outages
is crucial to many of the types of project upon which I work.
The problem can be quite complex if you get a power fail in a garbage
collection started as a result of a power fail during a previous GC.
Chris Read
CLR Associates Limited
> -----Original Message-----
> From: linux-mtd-admin@lists.infradead.org
> [mailto:linux-mtd-admin@lists.infradead.org]On Behalf Of Vipin Malik
> Sent: Thursday, June 21, 2001 4:05 PM
> To: Abraham vd Merwe
> Cc: MTD for Linux
> Subject: Re: safe flash filesystem
>
>
>
> >
> >That's exactly what I've started writing today (:
>
> Have you written a spec for it?
>
> >I don't know if merging something like this with jffs2 would
> solve the
> >problem like you said. I was more thinking of a completely
> different user
> >MTD driver to provide an uncached block device and slap a
> file system on top
> >of that. Or we can sync() all the time from the file system.
>
> Nooooo.... ;)
>
> JFFS2 already provides for:
>
> 1. Interface to MTD
> 2. Flash wear levelling
> 3. Compression/decompression on the fly
> 4. "always sync()" data to flash before your write() returns
> functionality
> 5. handling of erase paritions, GC, a file system interface etc.
> 6. tested for power fail reliability of the fs metadata.
> 7. Extensive usage by others even if they do not need this (our)
> functionality- hence minimal hidden bugs.
>
> My initial feel is that I really don't think that reinventing
> the wheel is
> the right answer.
> What we need is a "layer" on top of JFFS2 to provide the 2
> features that it
> lacks. Namely:
>
> 1. Roll back and recover to last data if your write did not
> complete and
> power failed
> 2. 0 latency writes. Reads are no problem as they can always
> be cached in
> memory by reading the entire (it's not a database) database
> on startup.
>
> Alan Cox called this "transactional level" functionality.
>
> An implementation based on a transaction cache solves the
> issue of having
> to duplicate all the members and CRC them thus supporting
> quite a large
> database without the need for twice the space on the flash
> device, as well
> as the issue of "roll back and recover" or "complete
> transaction" on the
> next power up if the complete transaction is available. There
> is a reason
> that transactional logs are the preferred choice even for
> large databases
> that need to support fail safe.
>
>
> >I have to agree that it's probably better to write a
> library/utilities first
> >to do a preliminary thing. That way we'd get a functional
> thing quite fast
> >and figure out what we did wrong in the first place.
>
> My thoughts exactly.
>
>
> > > Maybe you may want to subscribe to the development list on the
> > > www.EmbeddedLinuxWorks site and we can take this
> discussion there. I am
> >
> >Where do I subscribe?
>
> http://www.embeddedlinuxworks.com/lists.html
>
>
>
> > > looking for user input to define the feature set of this
> "config system"
> > > (not database :) And I want to make it LGPL (if it does
> become a lib or
> > > task) so that users can link to it without releasing the
> source of their
> > > own code.
> >
> >I'm in the fortunate position of being able to work on this
> fulltime for
> >the next week or so, so if we can figure out a useful
> specification for this
> >in a short time, I'm really keen on helping to implement
> this and LGPL/BSD
> >license is just fine.
>
> That suits me just fine. We need to start working on a spec
> first. Maybe a
> clearly defined requirement spec, then a design document. If
> you like I can
> elaborate on the above thoughts a bit and start something- or
> feel free to
> send me something if you like.
>
>
> > > I can't believe that you and I are the only folks that may be
> > interested in
> > > this. Maybe there is an existing solution- we just don't
> know about it- or
> > > other's have not thought about this issue just yet ;)
> >
> >That's why I mailed here in the first place :P
>
>
> But I haven't heard from anyone else yet :(
>
> Regards,
>
> Vipin
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
@ 2001-06-21 16:05 Vipin Malik
0 siblings, 0 replies; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 16:05 UTC (permalink / raw)
To: 'chris.read@clrassociates.co.uk',
'Abraham vd Merwe'
Cc: 'MTD for Linux'
>I would also be very interested in this.
>The ability to retain consistency after multiple power outages
>is crucial to many of the types of project upon which I work.
>The problem can be quite complex if you get a power fail in a garbage
>collection started as a result of a power fail during a previous GC.
Well then subscribe to the "Dev list" at the following address. This
discussion has now been taken there.
You may also want to read:
http://www.embeddedLinuxWorks.com/articles/jffs_guide.html and
http://www.embeddedLinuxWorks.com/articles/db_project.html
Regards,
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 15:11 ` Herman Oosthuysen
@ 2001-06-21 17:54 ` Tim Riker
2001-06-21 19:43 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Tim Riker @ 2001-06-21 17:54 UTC (permalink / raw)
To: Herman Oosthuysen; +Cc: Abraham vd Merwe, Vipin Malik, MTD for Linux
Hmm... would it not be easier to just use ext2 on a CF card?
less $$ and more flexibility no?
Herman Oosthuysen wrote:
>
> Hi guys,
>
> We are currently exploring a product by Tevero in Norway:
> http://www.tevero.no/products/fdc/ to use instead of the still buggy JFFS2.
> Price is USD2500, which isn't bad. It appears to be the cheapest commercial
> FFS available.
>
> Cheers,
>
> Herman
> http://www.WirelessNetworksInc.com
>
> ----- Original Message -----
> From: Abraham vd Merwe <abraham@2d3d.co.za>
> To: Vipin Malik <mtd-linux@embeddedlinuxworks.com>
> Cc: MTD for Linux <linux-mtd@lists.infradead.org>
> Sent: Thursday, June 21, 2001 8:35 AM
> Subject: Re: safe flash filesystem
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
--
Tim Riker - http://rikers.org/ - short SIGs! <g>
All I need to know I could have learned in Kindergarten
... if I'd just been paying attention.
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
2001-06-21 15:34 ` Vipin Malik
@ 2001-06-21 19:34 ` Joakim Tjernlund
2001-06-21 19:47 ` Joakim Tjernlund
1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2001-06-21 19:34 UTC (permalink / raw)
To: Vipin Malik, Abraham vd Merwe; +Cc: MTD for Linux
It
>
> Thanks for the link. At first blush it seems like something that will
> provide a great place to start- if it does not provide all the
> functionality already.
>
> I'll examine it a bit more and give you my thoughts. Are you
> using this db
> or have any experience with it?
> The author does not specify the license- besides stating that it
> is "free".
> Is it GPL or LGPL?
>
> Regards,
>
> Vipin
>
> At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote:
> > > Hi Vipin!
> > >
> > > > Yeah, that "except is not free" means it costs >100K USD (for the
> > > > transaction version) if I remember the quote they gave me.
> > >
> > > $125K iirc (;
> > >
> >Check out the Mird DB at http://www.mirar.org/mird/
> >
> >I would be very interested to hear what you think
> >of this DB.
> >
> > Joakim Tjernlund
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 19:43 ` Vipin Malik
@ 2001-06-21 19:35 ` Tim Riker
2001-06-21 19:56 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Tim Riker @ 2001-06-21 19:35 UTC (permalink / raw)
To: Vipin Malik; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux
ok,
what about reiserfs on CF then?
Vipin Malik wrote:
>
> Tim Riker wrote:
>
> > Hmm... would it not be easier to just use ext2 on a CF card?
> >
> > less $$ and more flexibility no?
> >
>
> ext2 on CF takes about 3-5 power fails before it falls _flat_ on its face!
> Pretty ugly too.
>
> I would not use it.
>
> Vipin
--
Tim Riker - http://rikers.org/ - short SIGs! <g>
All I need to know I could have learned in Kindergarten
... if I'd just been paying attention.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 17:54 ` Tim Riker
@ 2001-06-21 19:43 ` Vipin Malik
2001-06-21 19:35 ` Tim Riker
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 19:43 UTC (permalink / raw)
To: Tim Riker; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux
Tim Riker wrote:
> Hmm... would it not be easier to just use ext2 on a CF card?
>
> less $$ and more flexibility no?
>
ext2 on CF takes about 3-5 power fails before it falls _flat_ on its face!
Pretty ugly too.
I would not use it.
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem
2001-06-21 15:34 ` Vipin Malik
2001-06-21 19:34 ` Joakim Tjernlund
@ 2001-06-21 19:47 ` Joakim Tjernlund
1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2001-06-21 19:47 UTC (permalink / raw)
To: Vipin Malik; +Cc: MTD for Linux
Sorry for the previous incomplete mail, I slipped with my fingers :-(
It's not GPL or LGPL, it's under a JPEG-like license(according to the
author in a E-mail to me).
We are looking into this DB and at first glance it looks OK, but
the guy who knows DB's here is on vacation and I don't know that
much about DB's so I can not be more specific.
Joakim
> -----Original Message-----
> From: linux-mtd-admin@lists.infradead.org
> [mailto:linux-mtd-admin@lists.infradead.org]On Behalf Of Vipin Malik
> Sent: Thursday, June 21, 2001 17:35
> To: joakim.tjernlund@lumentis.se; Abraham vd Merwe
> Cc: MTD for Linux
> Subject: RE: safe flash filesystem
>
>
> Thanks for the link. At first blush it seems like something that will
> provide a great place to start- if it does not provide all the
> functionality already.
>
> I'll examine it a bit more and give you my thoughts. Are you
> using this db
> or have any experience with it?
> The author does not specify the license- besides stating that it
> is "free".
> Is it GPL or LGPL?
>
> Regards,
>
> Vipin
>
> At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote:
> > > Hi Vipin!
> > >
> > > > Yeah, that "except is not free" means it costs >100K USD (for the
> > > > transaction version) if I remember the quote they gave me.
> > >
> > > $125K iirc (;
> > >
> >Check out the Mird DB at http://www.mirar.org/mird/
> >
> >I would be very interested to hear what you think
> >of this DB.
> >
> > Joakim Tjernlund
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 19:35 ` Tim Riker
@ 2001-06-21 19:56 ` Vipin Malik
2001-06-21 21:17 ` Kyle Harris
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-06-21 19:56 UTC (permalink / raw)
To: Tim Riker; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux
Tim Riker wrote:
> ok,
>
> what about reiserfs on CF then?
>
I have tested one *major* brand of IDE flash devices and 2 brands of CF
devices, in more than 20K power fail tests.
Both suffer from low level failures, which cause the IDE driver layer to "give
up" with
"unrecoverable errors". The CF is so bad that I just gve up on the testing
after a few hundred cycles.
The IDE flash was better, but not much.
Will raiserfs be happy if the underlying IDE /dev/hdxx driver returns
"unrecoverable error" from the IDE device?
Your call.
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 19:56 ` Vipin Malik
@ 2001-06-21 21:17 ` Kyle Harris
2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen
0 siblings, 1 reply; 30+ messages in thread
From: Kyle Harris @ 2001-06-21 21:17 UTC (permalink / raw)
To: Vipin Malik, MTD for Linux
Hey,
I've read thru several posts and Vipin's jffs_guide. It appears that
JFFS, at his time, is about the most reliable open source fs for
embedded systems, even though it still has some problems. When JFFS
fails, is the filesystem still usable? My question is this. What if you
save only a small datafile (< 1K) and write it alternately to 2
different JFFS partitions (or even the same partition). At boot, you
read from both and get the latest, valid copy. This way if one is bad
you still have a backup. How reliable would this be?
Just wondering... Kyle.
Vipin Malik wrote:
>
> Tim Riker wrote:
>
> > ok,
> >
> > what about reiserfs on CF then?
> >
>
> I have tested one *major* brand of IDE flash devices and 2 brands of CF
> devices, in more than 20K power fail tests.
>
> Both suffer from low level failures, which cause the IDE driver layer to "give
> up" with
> "unrecoverable errors". The CF is so bad that I just gve up on the testing
> after a few hundred cycles.
>
> The IDE flash was better, but not much.
>
> Will raiserfs be happy if the underlying IDE /dev/hdxx driver returns
> "unrecoverable error" from the IDE device?
>
> Your call.
>
> Vipin
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 14:35 ` Abraham vd Merwe
` (2 preceding siblings ...)
2001-06-21 15:11 ` Herman Oosthuysen
@ 2001-06-21 21:26 ` Russ Dill
2001-06-22 8:22 ` Abraham vd Merwe
[not found] ` <20010622102154.E1828@crystal.2d3d.co.za>
3 siblings, 2 replies; 30+ messages in thread
From: Russ Dill @ 2001-06-21 21:26 UTC (permalink / raw)
To: MTD for Linux
If its just a config file, why make all this so complicated?
struct node {
u32 magic;
char valid;
u32 version;
u32 data_crc;
u32 hdr_crc;
char data[DATA_SIZE];
};
set aside 2-4 eraseblocks (preferably paramater blocks) and on mount,
find the valid config, walk though the flash and find the valid node
with the matching crc's and highest version (watch wraparound).
on writing a new config, if there is space left in the current erase
block, put it after the last one, after finishing writing it, set the
previos config's valid field to zero (flash lets you do this). If the
eraseblock is full, write in the next eraseblock, and when you are done,
erase the previous eraseblock.
All of this can be done in userspace or with a userspace library, just
mmap an mtd, and then use the erase ioctls.
databases and logs...thats another story
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-21 21:26 ` safe flash filesystem Russ Dill
@ 2001-06-22 8:22 ` Abraham vd Merwe
[not found] ` <20010622102154.E1828@crystal.2d3d.co.za>
1 sibling, 0 replies; 30+ messages in thread
From: Abraham vd Merwe @ 2001-06-22 8:22 UTC (permalink / raw)
To: MTD for Linux
[-- Attachment #1: Type: text/plain, Size: 963 bytes --]
Hi Russ!
> If its just a config file, why make all this so complicated?
>
> struct node {
>
> u32 magic;
> char valid;
> u32 version;
> u32 data_crc;
> u32 hdr_crc;
> char data[DATA_SIZE];
> };
Yes, this is something in the lines I was thinking of. But what complicates
things is if you start taking things like avoiding damaged blocks into
account, wear levelling (this is fairly easy to solve) and keeping the flash
unfragmented.
--
Regards
Abraham
Walking on water wasn't built in a day.
-- Jack Kerouac
__________________________________________________________
Abraham vd Merwe - 2d3D, Inc.
Device Driver Development, Outsourcing, Embedded Systems
Cell: +27 82 565 4451 Snailmail:
Tel: +27 21 761 7549 Block C, Antree Park
Fax: +27 21 761 7648 Doncaster Road
Email: abraham@2d3d.co.za Kenilworth, 7700
Http: http://www.2d3d.com South Africa
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
[not found] ` <20010622102154.E1828@crystal.2d3d.co.za>
@ 2001-06-22 17:23 ` Russ Dill
2001-06-25 7:45 ` Abraham vd Merwe
0 siblings, 1 reply; 30+ messages in thread
From: Russ Dill @ 2001-06-22 17:23 UTC (permalink / raw)
To: Abraham vd Merwe, linux-mtd
Abraham vd Merwe wrote:
> Hi Russ!
>
>
>>If its just a config file, why make all this so complicated?
>>
>>struct node {
>>
>> u32 magic;
>> char valid;
>> u32 version;
>> u32 data_crc;
>> u32 hdr_crc;
>> char data[DATA_SIZE];
>>};
>>
>
> Yes, this is something in the lines I was thinking of. But what complicates
> things is if you start taking things like avoiding damaged blocks into
> account, wear levelling (this is fairly easy to solve) and keeping the flash
> unfragmented.
>
>
if you only eraseblocks when you need to, you always have at least N-1
eraseblocks of pevious data, (where N is the number of eraseblocks
used). A CRC can be done after the store to see if the node written is
ok, if not, write it again (in the next node). since its a small amount
of data (maybe 4-8k) and written linearly, wear leveling and
fragmentation is not a problem. Lets say 4 parameter blocks of 16k a
peice are used, that would be 1 erase cycle per 8 configs written, this
would allow 800,000 configs to be written on standard flash. If a config
was written at a rate of once an hour, it would last 93 years. If it
were on 2 128k standard blocks, then you wolud have 3.2M configs
written, which at the same rate, would last about 332 years. Remember,
you are only performing an erase cycle after a block fills up, not for
every write.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-22 17:23 ` Russ Dill
@ 2001-06-25 7:45 ` Abraham vd Merwe
2001-06-25 7:59 ` Russ Dill
0 siblings, 1 reply; 30+ messages in thread
From: Abraham vd Merwe @ 2001-06-25 7:45 UTC (permalink / raw)
To: Russ Dill; +Cc: MTD for Linux
[-- Attachment #1: Type: text/plain, Size: 1927 bytes --]
Hi Russ!
> > Yes, this is something in the lines I was thinking of. But what complicates
> > things is if you start taking things like avoiding damaged blocks into
> > account, wear levelling (this is fairly easy to solve) and keeping the flash
> > unfragmented.
> >
> if you only eraseblocks when you need to, you always have at least N-1
> eraseblocks of pevious data, (where N is the number of eraseblocks
> used). A CRC can be done after the store to see if the node written is
> ok, if not, write it again (in the next node). since its a small amount
> of data (maybe 4-8k) and written linearly, wear leveling and
> fragmentation is not a problem. Lets say 4 parameter blocks of 16k a
> peice are used, that would be 1 erase cycle per 8 configs written, this
> would allow 800,000 configs to be written on standard flash. If a config
> was written at a rate of once an hour, it would last 93 years. If it
> were on 2 128k standard blocks, then you wolud have 3.2M configs
> written, which at the same rate, would last about 332 years. Remember,
> you are only performing an erase cycle after a block fills up, not for
> every write.
True, but once the flash fills up you have to start moving things around to
erase entire blocks and then the whole 4k-8k thing doesn't hold anymore.
But anyhow, like you said, it's not the most complicated thing in the world.
--
Regards
Abraham
You don't have to know how the computer works, just how to work the computer.
__________________________________________________________
Abraham vd Merwe - 2d3D, Inc.
Device Driver Development, Outsourcing, Embedded Systems
Cell: +27 82 565 4451 Snailmail:
Tel: +27 21 761 7549 Block C, Antree Park
Fax: +27 21 761 7648 Doncaster Road
Email: abraham@2d3d.co.za Kenilworth, 7700
Http: http://www.2d3d.com South Africa
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-25 7:45 ` Abraham vd Merwe
@ 2001-06-25 7:59 ` Russ Dill
2001-06-25 14:11 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Russ Dill @ 2001-06-25 7:59 UTC (permalink / raw)
To: Abraham vd Merwe; +Cc: MTD for Linux
Abraham vd Merwe wrote:
>
> Hi Russ!
>
> > > Yes, this is something in the lines I was thinking of. But what complicates
> > > things is if you start taking things like avoiding damaged blocks into
> > > account, wear levelling (this is fairly easy to solve) and keeping the flash
> > > unfragmented.
> > >
> > if you only eraseblocks when you need to, you always have at least N-1
> > eraseblocks of pevious data, (where N is the number of eraseblocks
> > used). A CRC can be done after the store to see if the node written is
> > ok, if not, write it again (in the next node). since its a small amount
> > of data (maybe 4-8k) and written linearly, wear leveling and
> > fragmentation is not a problem. Lets say 4 parameter blocks of 16k a
> > peice are used, that would be 1 erase cycle per 8 configs written, this
> > would allow 800,000 configs to be written on standard flash. If a config
> > was written at a rate of once an hour, it would last 93 years. If it
> > were on 2 128k standard blocks, then you wolud have 3.2M configs
> > written, which at the same rate, would last about 332 years. Remember,
> > you are only performing an erase cycle after a block fills up, not for
> > every write.
>
> True, but once the flash fills up you have to start moving things around to
> erase entire blocks and then the whole 4k-8k thing doesn't hold anymore.
>
> But anyhow, like you said, it's not the most complicated thing in the world.
you are overcomplicating things, there is one config file, and the flash
is filled linearly, so once a block is full of written configs (only one
of which being the current, valid config), the next eraseblock is
erased. There is no moving things around, once we fill a block, all the
other blocks have much older versions of the config, and we could care
less
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem
2001-06-25 7:59 ` Russ Dill
@ 2001-06-25 14:11 ` Vipin Malik
0 siblings, 0 replies; 30+ messages in thread
From: Vipin Malik @ 2001-06-25 14:11 UTC (permalink / raw)
To: Russ Dill, Abraham vd Merwe; +Cc: MTD for Linux, elw_dev_list
At 12:59 AM 6/25/2001 -0700, Russ Dill wrote:
>Abraham vd Merwe wrote:
> >
> > Hi Russ!
> >
> > > > Yes, this is something in the lines I was thinking of. But what
> complicates
> > > > things is if you start taking things like avoiding damaged blocks into
> > > > account, wear levelling (this is fairly easy to solve) and keeping
> the flash
> > > > unfragmented.
> > > >
> > > if you only eraseblocks when you need to, you always have at least N-1
> > > eraseblocks of pevious data, (where N is the number of eraseblocks
> > > used). A CRC can be done after the store to see if the node written is
> > > ok, if not, write it again (in the next node). since its a small amount
> > > of data (maybe 4-8k) and written linearly, wear leveling and
> > > fragmentation is not a problem. Lets say 4 parameter blocks of 16k a
> > > peice are used, that would be 1 erase cycle per 8 configs written, this
> > > would allow 800,000 configs to be written on standard flash. If a config
> > > was written at a rate of once an hour, it would last 93 years. If it
> > > were on 2 128k standard blocks, then you wolud have 3.2M configs
> > > written, which at the same rate, would last about 332 years. Remember,
> > > you are only performing an erase cycle after a block fills up, not for
> > > every write.
> >
> > True, but once the flash fills up you have to start moving things around to
> > erase entire blocks and then the whole 4k-8k thing doesn't hold anymore.
> >
> > But anyhow, like you said, it's not the most complicated thing in the
> world.
>
>you are overcomplicating things, there is one config file, and the flash
>is filled linearly, so once a block is full of written configs (only one
>of which being the current, valid config), the next eraseblock is
>erased. There is no moving things around, once we fill a block, all the
>other blocks have much older versions of the config, and we could care
>less
Russ, you are assuming a very trivial implementation (i.e. to a trivial
requirement), where
the solution is the duplicate the entire config file and rewrite it *every
time*, even
if just one of the config variables changed. (is my interpretation correct?).
While this may be what is required of a _few_ designs out there, it is very
difficult to extend, specially if you now want to store a "few" data values
whose value updates more frequently than your config values. How are you going
to handle this?
IMHO, this approach is tyring to reinvent the wheel- thinking it will be easier
this time (compared to JFFS which does essentially the same thing) because some
"features" are not required.
This may be very well be true for a particular case this time, but it sure
won't work in most cases,
and I would suspect for quite a lot of cases. How many embedded systems
out there don't generate "data" value updates, as compared to only
requiring (mostly static ) config
files.
I would be interested in hearing what the typical requirement is of the
folks reading this.
This is really not a JFFS/MTD discussion per se and we run the risk of
polluting this list.
If you care, just reply to me and the elw_dev_list@embeddedLinuxWorks.com
where this
discussion is already going on.
(or subscribe at:
http://www.embeddedlinuxworks.com/cgi-bin/signup/signup-dev.cgi)
(Russ,) I've written a first cut, requirement spec for what I think would
be required of most embedded
systems that store config data as well as regular data value updates (and
logs). Have you
seen it?
Regards,
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* On the "safe filesystem" and write() topic
2001-06-21 21:17 ` Kyle Harris
@ 2001-07-03 23:53 ` Bjorn Wesen
2001-07-04 14:10 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Wesen @ 2001-07-03 23:53 UTC (permalink / raw)
To: MTD for Linux; +Cc: Kyle Harris, Vipin Malik
Hi,
I designed the JFFS specifications, log layout and GC method in the first
place and me and Finn put a lot of thought into it while implementing so
please consider some of these late night ramblings:
The initial requirement was that a small partition of configuration files
(the /etc directory to be more specific) should be able to reside in flash
and be completely safe from inconvenient power-outs or crashes.
It is my opinion (of course) that JFFS solves this in a manner as good as
possible given the standard Linux VFS API. This means that when you
rewrite a configuration file, you write the new one to another file and do
a rename over the old once you're ready. Technically JFFS is based on a
log structure consisting of VFS operations, and this is the best you can
do while not involving the application more than what standard VFS gives
you. VFS operations are not "transactions" in the high-level sense though.
In our embedded products this is handled by a configuration handling
daemon similar to linuxconf, which caches parameters and knows how to
rewrite configuration files atomically (just like any other sane Unix
program does it). There is no need for any transactional semantics for
small configuration files. We sell a lot of these products and I certainly
disagree with Vipin's comment on his website that it's impossible to use
JFFS in embedded products :) Log-files are not usually kept in flash and
if they are they don't need anything more advanced than normal rotation
and if a crash occurs, it's no big deal if the last line gets cut off
completely or in the middle...
It is difficult (if not impossible) in any consistant way to handle the
case with random write()'s inside an already existing file. The filesystem
needs to "roll back" to any pre-existing state but it then needs to
know what the desired state would be. What we do now is make sure the
filsystem itself is never corrupt even if a file was under writing.
The problems arise from the vague definition of what the desired state
would be - is it the data before the last write(), and what happens if you
receive a signal ? Writes to mmap'ed pages can't use that mechanism, and
you'll be stuck with using write()'s when you really probably want to use
libc wrappers like fwrite and fprintf.
I agree that if you need a binary database which is big so that you cannot
rewrite it when you update something, you'll need to rethink. Either just
split the database in smaller files, or you'll need a transaction marker
API down to the filesystem (an ioctl pair was suggested somewhere I
think). I don't think trying to tweak write() would lead to anything
generally useful though.
The kernel-level transactional extension would probably be quite difficult
to get consistent also, because Linux VFS does not know about it yet (this
is eventually changing with the integration of the general journalling
layer I guess). I get a headache thinking about it, perhaps it's possible
perhaps it's not; perhaps this code already exist in the other journalling
filesystems, perhaps it does not.
With regards to Kyle's question below though, the answer is certainly that
he can do as he says but use the rename() operation and keep them on a
single partition. There is no need for anything more advanced..
(All this assumes other more technical problems are solved of course like
the nasty surprises we've had with some flashes getting bits halfway
erased...)
/BW
On Thu, 21 Jun 2001, Kyle Harris wrote:
> I've read thru several posts and Vipin's jffs_guide. It appears that
> JFFS, at his time, is about the most reliable open source fs for
> embedded systems, even though it still has some problems. When JFFS
> fails, is the filesystem still usable? My question is this. What if you
> save only a small datafile (< 1K) and write it alternately to 2
> different JFFS partitions (or even the same partition). At boot, you
> read from both and get the latest, valid copy. This way if one is bad
> you still have a backup. How reliable would this be?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic
2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen
@ 2001-07-04 14:10 ` Vipin Malik
2001-07-05 18:16 ` Bjorn Wesen
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-07-04 14:10 UTC (permalink / raw)
To: Bjorn Wesen, MTD for Linux; +Cc: Kyle Harris
Hi,
At 01:53 AM 7/4/2001 +0200, Bjorn Wesen wrote:
>I designed the JFFS specifications, log layout and GC method in the first
>place and me and Finn put a lot of thought into it while implementing so
>please consider some of these late night ramblings:
Definitely! Thoughts, discussions, suggestions most welcome and thank you
for reading my ramblings!
>The initial requirement was that a small partition of configuration files
>(the /etc directory to be more specific) should be able to reside in flash
>and be completely safe from inconvenient power-outs or crashes.
>
>It is my opinion (of course) that JFFS solves this in a manner as good as
>possible given the standard Linux VFS API. This means that when you
>rewrite a configuration file, you write the new one to another file and do
>a rename over the old once you're ready.
Agreed. Of course as long as the config files are small and relatively few and
not changing that often. Your example of config files in /etc fits the
bill perfectly.
> Technically JFFS is based on a
>log structure consisting of VFS operations, and this is the best you can
>do while not involving the application more than what standard VFS gives
>you. VFS operations are not "transactions" in the high-level sense though.
Agreed again.
>In our embedded products this is handled by a configuration handling
>daemon similar to linuxconf, which caches parameters and knows how to
>rewrite configuration files atomically (just like any other sane Unix
>program does it). There is no need for any transactional semantics for
>small configuration files.
This is surely the preferred way to do it for such files. As a matter of
fact it is most preferred for small config files. I think that I need to
explicitly mention it in one of my ramblings on my site ;)
> We sell a lot of these products and I certainly
>disagree with Vipin's comment on his website that it's impossible to use
>JFFS in embedded products :)
Wait a minute! Where did I say that in context of config files. And if I
did I need to go and correct it (so please send me an email).
I think I surely said it in context of JFFS (not JFFS2) loosing integrity
(including files at random) during power fail tests and I stand behind
those results till proven otherwise. Have you guys tested the JFFS fs under
power fail? What version are you using and what were your results?
> Log-files are not usually kept in flash and
>if they are they don't need anything more advanced than normal rotation
>and if a crash occurs, it's no big deal if the last line gets cut off
>completely or in the middle...
Again agreed. Log files being of the course the "append" type, and a simple
scan of the log file on startup will enable one to detect and remove this
last half written offending line.
>It is difficult (if not impossible) in any consistant way to handle the
>case with random write()'s inside an already existing file. The filesystem
>needs to "roll back" to any pre-existing state but it then needs to
>know what the desired state would be. What we do now is make sure the
>filsystem itself is never corrupt even if a file was under writing.
JFFS2 does that (not getting corrupt) under random power fail. JFFS
attempts to do that, but there is a bug in the latest version in CVS that
causes files to disappear at random in power fail testing. This happened
anywhere after 600+ to 1300+ power fails. I've mentioned this specifically
in my "JFFS: A Practical guide" on my site.
It's quite possible that *I* introduced this bug myself when I was mucking
around with JFFS trying to fix other problems. But considering the fact
that when I started testing JFFS, it would never last more than 10 power
cycles without a failed mount on power up, and other issues like leaking
memory to the point that the kernel panicked (again on mount after a power
fail)- to the point when I left it with my patches, that I get at least
600+ (and once 1300+) async power fails without any problem, which version
would you rather go with? With the maturing of JFFS2, IMHO folks should be
encouraged to migrate to JFFS2 if possible (I am). Is there anything that
JFFS gives you that you don't get with JFFS2?
>The problems arise from the vague definition of what the desired state
>would be - is it the data before the last write(), and what happens if you
>receive a signal ?
Isn't it the same case as what happens when you get a power fail? (please
pardon my lack of understanding of signals in kernels. Can the execution
that was interrupted with a signal ever resume at the interrupted point?)
> Writes to mmap'ed pages can't use that mechanism, and
>you'll be stuck with using write()'s when you really probably want to use
>libc wrappers like fwrite and fprintf.
That's true, but it's a tradeoff: If the task wants reliable writes to the
fs, it must not use any lib calls. As a matter of fact, that's the last
thing you want to use anyway as these wrappers buffer the programs writes,
defeating the purpose of the default mechanism of O_SYNC of the JFFS(2) fs.
>I agree that if you need a binary database which is big so that you cannot
>rewrite it when you update something, you'll need to rethink. Either just
>split the database in smaller files, or you'll need a transaction marker
>API down to the filesystem (an ioctl pair was suggested somewhere I
>think). I don't think trying to tweak write() would lead to anything
>generally useful though.
See, we agree on all the same points :)
The main issue here is not only a BIG database, but also one with a lot of
points in it that are being updated frequently. Each file has an overhead
(as well a max # of files limit on the fs). How reasonable is it to put
5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB
in a single (db) file) vs at least 5000*64(file overhead)+5000*8 = 360KB as
separate files, assuming that you can even fit 5000 files on your partition.
>The kernel-level transactional extension would probably be quite difficult
>to get consistent also, because Linux VFS does not know about it yet (this
>is eventually changing with the integration of the general journalling
>layer I guess). I get a headache thinking about it, perhaps it's possible
>perhaps it's not; perhaps this code already exist in the other journalling
>filesystems, perhaps it does not.
I cannot speak intelligently about this so I'll keep my mouth shut :)
>With regards to Kyle's question below though, the answer is certainly that
>he can do as he says but use the rename() operation and keep them on a
>single partition. There is no need for anything more advanced..
For a lot of solutions, this is certainly true. OTOH, the current blocking
times of JFFS2
(I didn't do this test on JFFS, but no reason to be different methinks)
makes putting
any config or db directly on the fs unreasonable. (if you've been following
my jitter tests recently, JFFS2 can block for 10's of seconds when it
getting quite full).
>(All this assumes other more technical problems are solved of course like
>the nasty surprises we've had with some flashes getting bits halfway
>erased...)
This "filpping bits" syndrome (TM Vipin Malik :) is solved reliably for
JFFS2. JFFS2 has passed 15K+ power fails without any failures that I could
detect or was looking for. IMHO it cannot be solved reliably for JFFS
because JFFS does not handle (or know about) erase sectors. I've solved it
be re-reading the same sector 4 times. See big note above
scan_for_partially_erased_sectors() (or something like that) in jffs/intrep.c
To a large extent, we've (I) have allowed the thought of having
transactions in JFFS(2) lapse. Maybe this is not such a bad thing after all
and with each discussion I better appreciate the cons of having
transactions in the fs. Anyway, there is a new project that is being
started on developing (or modifying an existing embedded db (mird)) to
provide for this transaction level processing for embedded systems on
JFFS(2). In addition to providing transactions it will also provide a
caching layer that will allow the transaction log to be put on *another*
non-volatile medium if such is available in your system. The big advantage
of this will be 0 latency, transaction protected, power fail safe writes
available to programs that use this interface. As a freebe it will also
provide for key/value type store/retrieve from a (small) hash database.
Read more about it at:
http://www.embeddedlinuxworks.com/articles/db_project.html
To sign up for the development mailing list, go to:
http://www.embeddedlinuxworks.com/cgi-bin/signup/signup-dev.cgi
Thanks for reading and your thoughts.
Regards,
Vipin
http://www.EmbeddedLinuxWorks.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic
2001-07-04 14:10 ` Vipin Malik
@ 2001-07-05 18:16 ` Bjorn Wesen
2001-07-06 13:40 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Wesen @ 2001-07-05 18:16 UTC (permalink / raw)
To: Vipin Malik; +Cc: jffs-dev, MTD for Linux
On Wed, 4 Jul 2001, Vipin Malik wrote:
> I think I surely said it in context of JFFS (not JFFS2) loosing integrity
> (including files at random) during power fail tests and I stand behind
> those results till proven otherwise. Have you guys tested the JFFS fs under
> power fail? What version are you using and what were your results?
We've tested it but probably not in more than a couple of hundred
cycles; I've never seen that floating bit error before, perhaps it's just
some flash chips that get bitten by that and it might depend on the
hardware as well (resident charge in capacitors etc).
> would you rather go with? With the maturing of JFFS2, IMHO folks should be
> encouraged to migrate to JFFS2 if possible (I am). Is there anything that
> JFFS gives you that you don't get with JFFS2?
All products on sale from Axis still run 2.0.. next generation will be 2.4
and some sort of JFFS, and it will be JFFS2 if the bugs are sorted out (no
theoretical reason why JFFS2 shouldn't be perfect of course, it's just a
matter of finetuning :) Well apart from compression-code and
latency; after all you cannot both have synchronous writes, compression
and expecting the application to not be blocked..
(The rest of the system should not be blocked though, that's just a matter
of being able to yield due to need_resched inside the
compression code)
> >The problems arise from the vague definition of what the desired state
> >would be - is it the data before the last write(), and what happens if you
> >receive a signal ?
>
> Isn't it the same case as what happens when you get a power fail? (please
> pardon my lack of understanding of signals in kernels. Can the execution
> that was interrupted with a signal ever resume at the interrupted point?)
Depends on the system call and underlying filesystem; for a
normal read/write, they probably just return the number of chars
read/written up to the point of the signal (just as they can by the
API). And hence my comment that it's no use trying to enforce atomic
behaviour for entire write() chunks. Your app can catch a signal, return
from a half-written write and then crash before you can write() the
"missing" chars.
So if you want to do the "atomic write" you need to disable all signal
checking inside the write paths, which means going back to the non-generic
write VFS functions and coincidentally you'll need to block the rest of
the system as well (see 2'nd above paragraph) because you can't reschedule
without a signal-check.
It's simply not a tenable scenario :)
I'd much rather see the "start transaction/end transaction" ioctl's than
trying to make write be atomic.
> > Writes to mmap'ed pages can't use that mechanism, and
> >you'll be stuck with using write()'s when you really probably want to use
> >libc wrappers like fwrite and fprintf.
>
> That's true, but it's a tradeoff: If the task wants reliable writes to the
> fs, it must not use any lib calls. As a matter of fact, that's the last
> thing you want to use anyway as these wrappers buffer the programs writes,
> defeating the purpose of the default mechanism of O_SYNC of the JFFS(2) fs.
I think that's a non sequiteur, especially given that the individual write
itself is not atomic anyway. It can't matter if you do fprintf or a
write() in a loop (since that's exactly what fprintf does eventually
anyway).
As long as writes are enforced to be sequential, I think that's
enough. Does not JFFS2 queue writes internally anyway BTW ? And if you
have O_SYNC (assuming JFFS adheres to it) when fprintf returns you can be
as guaranteed that the data has been written as if you'd done it yourself
with a write().
> points in it that are being updated frequently. Each file has an overhead
> (as well a max # of files limit on the fs). How reasonable is it to put
> 5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB
> in a single (db) file) vs at least 5000*64(file overhead)+5000*8 = 360KB as
> separate files, assuming that you can even fit 5000 files on your partition.
I think either a transaction mechanism or an entirely different flash
filesystem (not VFS-based) need to be used if that is a common usage
scenario.
> >The kernel-level transactional extension would probably be quite difficult
> >to get consistent also, because Linux VFS does not know about it yet (this
> >is eventually changing with the integration of the general journalling
> >layer I guess). I get a headache thinking about it, perhaps it's possible
> >perhaps it's not; perhaps this code already exist in the other journalling
> >filesystems, perhaps it does not.
>
> I cannot speak intelligently about this so I'll keep my mouth shut :)
IIRC the main holding points against merging reiserfs before was that it
really should wait until VFS is made aware of journalling concepts in
order to avoid "half way" solutions, and that in turn was dependant on the
ext3 developers etc...
Thing is, I think JFFS2 uses the generic file writing in VFS which means
that VFS itself fetches and updates pages in the page-cache (or
similar) which means an overall more complex situation for JFFS which
wants to write this transactionally without inter-process dependencies
etc..
I.e. suppose process A is writing to file X while B is reading from it,
and writing to file Y at the same time. A starts a transaction and
writes. If VFS does not know about transactions, it will simply put the
writes in the page-cache so B might read them and write to file Y. So if a
crash occurs, yes, file X is intact but Y is screwed up.
So the writes need to be queued up in JFFS or VFS or you need to guarantee
that only the process doing the writes have access to the file at the same
time. This is a major obstacle, and I don't know how it's solved in
reiser, JFS and XFS (if they support user-level transactions at
all) without patching VFS and the page-cache.
> any config or db directly on the fs unreasonable. (if you've been following
> my jitter tests recently, JFFS2 can block for 10's of seconds when it
> getting quite full).
Probably possible but that's an implementation problem not a theoretical
problem. In a "run time" phase (flash is almost all dirty, space exist and
writes are coming in) there should never need to be more latency that what
it takes to GC the same amount of space as you want to write.
And as I wrote above somewhere, while the writing process needs to be
blocked (in O_SYNC) there is no reason to block other processes from
scheduling in, unless I've missed something major...
> transactions in the fs. Anyway, there is a new project that is being
> started on developing (or modifying an existing embedded db (mird)) to
> provide for this transaction level processing for embedded systems on
> JFFS(2). In addition to providing transactions it will also provide a
One alternative is a completely user-mode flash DB. Have a deamon which
have access to a raw flash device and implements a transactional database
on that device. No need for a kernel system really..
> caching layer that will allow the transaction log to be put on *another*
> non-volatile medium if such is available in your system. The big advantage
Why would this be necessary ?
/BW
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic
2001-07-05 18:16 ` Bjorn Wesen
@ 2001-07-06 13:40 ` Vipin Malik
2001-07-07 9:25 ` Bjorn Wesen
0 siblings, 1 reply; 30+ messages in thread
From: Vipin Malik @ 2001-07-06 13:40 UTC (permalink / raw)
To: Bjorn Wesen; +Cc: jffs-dev, MTD for Linux
Hi,
> > Have you guys tested the JFFS fs under
> > power fail? What version are you using and what were your results?
>
>We've tested it but probably not in more than a couple of hundred
>cycles; I've never seen that floating bit error before, perhaps it's just
>some flash chips that get bitten by that and it might depend on the
>hardware as well (resident charge in capacitors etc).
I believe that David also mentioned that he has seen that error also.
It's detection is very proportional to the probability of power failing in
the middle of a sector erase. So the larger number of sector erases that
one does, as well as the larger number of power fail one does, the higher
the probability of seeing it. With a few hundred tests, I'm not surprised
that you haven't seem it.
>Well apart from compression-code and
>latency; after all you cannot both have synchronous writes, compression
>and expecting the application to not be blocked..
HeHe, well, maybe the fs can (will or may?) block, but in all realistic
situations it's unacceptable for a real world embedded app to block for
multiple seconds while the fs is "busy". Where does the app store any data
value updates it's generating (specially if they have to be stored
immediately in a non-volatile manner)?
>(The rest of the system should not be blocked though, that's just a matter
>of being able to yield due to need_resched inside the
>compression code)
My latest tests indicate that this is already the case. A POSIX RT task
(not interacting with JFFS2) does not block (for too long) even if the
underlying JFFS2 fs is blocked for >40 seconds!
> > >The problems arise from the vague definition of what the desired state
> > >would be - is it the data before the last write(), and what happens if you
> > >receive a signal ?
> >
> > Isn't it the same case as what happens when you get a power fail? (please
> > pardon my lack of understanding of signals in kernels. Can the execution
> > that was interrupted with a signal ever resume at the interrupted point?)
>
>Depends on the system call and underlying filesystem; for a
>normal read/write, they probably just return the number of chars
>read/written up to the point of the signal (just as they can by the
>API). And hence my comment that it's no use trying to enforce atomic
>behaviour for entire write() chunks. Your app can catch a signal, return
>from a half-written write and then crash before you can write() the
>"missing" chars.
I guess you are right. This is best handled as an "out of band" solution-
i.e. with
ioctl transactions, or a transaction db etc.
>As long as writes are enforced to be sequential, I think that's
>enough. Does not JFFS2 queue writes internally anyway BTW ? And if you
>have O_SYNC (assuming JFFS adheres to it) when fprintf returns you can be
>as guaranteed that the data has been written as if you'd done it yourself
>with a write().
Hmm, I was under the impression that lib fprintf, fread, fwrite etc. all
work with some delimiter, usually '\n' and specially in the case of
fprintf(), the data is buffered till a '\n' is detected. I assumed (perhaps
incorrectly) that a similar mechanism may be at play with the lib file i/o
calls as well.
> > points in it that are being updated frequently. Each file has an overhead
> > (as well a max # of files limit on the fs). How reasonable is it to put
> > 5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB
> > in a single (db) file) vs at least 5000*64(file overhead)+5000*8 =
> 360KB as
> > separate files, assuming that you can even fit 5000 files on your
> partition.
>
>I think either a transaction mechanism or an entirely different flash
>filesystem (not VFS-based) need to be used if that is a common usage
>scenario.
That's why we are looking at using a transaction db (mird) to provide this
functionality rather than hack JFFS2 (and or the VFS) to support it.
> > any config or db directly on the fs unreasonable. (if you've been
> following
> > my jitter tests recently, JFFS2 can block for 10's of seconds when it
> > getting quite full).
>
>Probably possible but that's an implementation problem not a theoretical
>problem. In a "run time" phase (flash is almost all dirty, space exist and
>writes are coming in) there should never need to be more latency that what
>it takes to GC the same amount of space as you want to write.
When the rubber meets the road, implementation problems and theoretical
problems are indistinguishable :)
The reality is that JFFS2 can block for 10's of seconds on a reasonable
powerful processor (a 133MHz 486).
Tweaking may get that down to a few seconds, but unless there is a design
or implementation bug in JFFS2, there will always be some processing
required to GC when there is no more ready free space on the flash. At this
time a task updating variables on the FS will block. The question is: How
long a block is acceptable? IMHO, anything more than a few hundred ms will
be unacceptable to a reasonable percentage of embedded applications. I know
it is unacceptable for my application. I generate data updates 5 times a
seconds and I want that data stored reliability on the flash fs, as well
not be blocked for more than 200ms.
>One alternative is a completely user-mode flash DB. Have a deamon which
>have access to a raw flash device and implements a transactional database
>on that device. No need for a kernel system really..
The biggest problem with this is the one has to reinvent all the major
flash interface features of JFFS2. Not a elegant solution IMHO.
> > caching layer that will allow the transaction log to be put on *another*
> > non-volatile medium if such is available in your system. The big advantage
>
>Why would this be necessary ?
To provide for 0 latency writes for tasks updating data values, when the
underlying fs is blocked and cannot accept any more writes for another
"few" (at the moment >40) seconds.
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic
2001-07-06 13:40 ` Vipin Malik
@ 2001-07-07 9:25 ` Bjorn Wesen
2001-07-07 13:06 ` Vipin Malik
0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Wesen @ 2001-07-07 9:25 UTC (permalink / raw)
To: Vipin Malik; +Cc: jffs-dev, MTD for Linux
On Fri, 6 Jul 2001, Vipin Malik wrote:
> >latency; after all you cannot both have synchronous writes, compression
> >and expecting the application to not be blocked..
>
> HeHe, well, maybe the fs can (will or may?) block, but in all realistic
> situations it's unacceptable for a real world embedded app to block for
> multiple seconds while the fs is "busy". Where does the app store any data
You might not like it but you cannot have it any other way :)
Fact: flash chip sectors takes long to erase (1-2 seconds)
Fact: you need to erase to make room for new data
Hence, if you need the app to do synchronous writing, it will need to
wait.
> Hmm, I was under the impression that lib fprintf, fread, fwrite etc. all
> work with some delimiter, usually '\n' and specially in the case of
No, but now that I think about it they are not synchronous either (since
they buffer and return).
> >problem. In a "run time" phase (flash is almost all dirty, space exist and
> >writes are coming in) there should never need to be more latency that what
> >it takes to GC the same amount of space as you want to write.
>
> When the rubber meets the road, implementation problems and theoretical
> problems are indistinguishable :)
> The reality is that JFFS2 can block for 10's of seconds on a reasonable
> powerful processor (a 133MHz 486).
Yes but that might BE the time it takes to make room for the data you want
to write..
> time a task updating variables on the FS will block. The question is: How
> long a block is acceptable? IMHO, anything more than a few hundred ms will
> be unacceptable to a reasonable percentage of embedded applications. I know
Then you can't use flash chips in your embedded application :)
> > > caching layer that will allow the transaction log to be put on *another*
> > > non-volatile medium if such is available in your system. The big advantage
> >
> >Why would this be necessary ?
>
> To provide for 0 latency writes for tasks updating data values, when the
> underlying fs is blocked and cannot accept any more writes for another
> "few" (at the moment >40) seconds.
So what happens when that gets full and need to be erased ? All you'd do
is interleave the writes and postpone the problem a bit. If you mean that
the transactional log will "never" get full and require erasing, then yes,
that would work but I doubt the "never" constraint :)
Some flash chip configurations might allow you to erase one sector while
writing to another; this is transiently good if you only write one sector
worth of information during the time it takes to erase the other
sector. As soon as you go over that you hit the latency again.
/BW
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic
2001-07-07 9:25 ` Bjorn Wesen
@ 2001-07-07 13:06 ` Vipin Malik
0 siblings, 0 replies; 30+ messages in thread
From: Vipin Malik @ 2001-07-07 13:06 UTC (permalink / raw)
To: Bjorn Wesen; +Cc: jffs-dev, MTD for Linux
At 11:25 AM 7/7/2001 +0200, Bjorn Wesen wrote:
> > HeHe, well, maybe the fs can (will or may?) block, but in all realistic
> > situations it's unacceptable for a real world embedded app to block for
> > multiple seconds while the fs is "busy". Where does the app store any data
>
>You might not like it but you cannot have it any other way :)
>
>Fact: flash chip sectors takes long to erase (1-2 seconds)
>
>Fact: you need to erase to make room for new data
>
>Hence, if you need the app to do synchronous writing, it will need to
>wait.
>
>
> > time a task updating variables on the FS will block. The question is: How
> > long a block is acceptable? IMHO, anything more than a few hundred ms will
> > be unacceptable to a reasonable percentage of embedded applications. I
> know
>
>Then you can't use flash chips in your embedded application :)
>
> > > > caching layer that will allow the transaction log to be put on
> *another*
> > > > non-volatile medium if such is available in your system. The big
> advantage
> > >
> > >Why would this be necessary ?
> >
> > To provide for 0 latency writes for tasks updating data values, when the
> > underlying fs is blocked and cannot accept any more writes for another
> > "few" (at the moment >40) seconds.
>
>So what happens when that gets full and need to be erased ? All you'd do
>is interleave the writes and postpone the problem a bit. If you mean that
>the transactional log will "never" get full and require erasing, then yes,
>that would work but I doubt the "never" constraint :)
That's why if 0 latency writes are important to a design, they must put
this cache on a 0-erase-latency non-volatile medium like a battery backed
RAM or FRAM.
Then a simple equation will help one size it for one's particular need, namely:
C_KB = size_of_nonvolatile_cache_device_required;
t1 = max_block_time_of_flash_FS_sec;
t2 = time_to_xfer_C_KB_to_flash_FS_sec;
NEW_KB_PER_SEC = max_new_data_generating_rate_KB_per_sec;
C_KB = (t1+t2) * NEW_KB_PER_SEC;
That's what the 0 latency write, transaction protected, embedded database
project on the dev list is for.
Vipin
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2001-07-07 12:49 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-21 10:54 safe flash filesystem Abraham vd Merwe
2001-06-21 13:43 ` Vipin Malik
2001-06-21 13:57 ` Abraham vd Merwe
2001-06-21 14:29 ` Vipin Malik
2001-06-21 14:35 ` Abraham vd Merwe
2001-06-21 15:05 ` Vipin Malik
2001-06-21 15:36 ` Chris Read
2001-06-21 15:09 ` Joakim Tjernlund
2001-06-21 15:34 ` Vipin Malik
2001-06-21 19:34 ` Joakim Tjernlund
2001-06-21 19:47 ` Joakim Tjernlund
2001-06-21 15:11 ` Herman Oosthuysen
2001-06-21 17:54 ` Tim Riker
2001-06-21 19:43 ` Vipin Malik
2001-06-21 19:35 ` Tim Riker
2001-06-21 19:56 ` Vipin Malik
2001-06-21 21:17 ` Kyle Harris
2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen
2001-07-04 14:10 ` Vipin Malik
2001-07-05 18:16 ` Bjorn Wesen
2001-07-06 13:40 ` Vipin Malik
2001-07-07 9:25 ` Bjorn Wesen
2001-07-07 13:06 ` Vipin Malik
2001-06-21 21:26 ` safe flash filesystem Russ Dill
2001-06-22 8:22 ` Abraham vd Merwe
[not found] ` <20010622102154.E1828@crystal.2d3d.co.za>
2001-06-22 17:23 ` Russ Dill
2001-06-25 7:45 ` Abraham vd Merwe
2001-06-25 7:59 ` Russ Dill
2001-06-25 14:11 ` Vipin Malik
-- strict thread matches above, loose matches on Subject: below --
2001-06-21 16:05 Vipin Malik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox