* safe flash filesystem @ 2001-06-21 10:54 Abraham vd Merwe 2001-06-21 13:43 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Abraham vd Merwe @ 2001-06-21 10:54 UTC (permalink / raw) To: MTD for Linux [-- Attachment #1: Type: text/plain, Size: 1134 bytes --] Hi! We're developing a product that needs a small part of the flash memory to contain a very scaled down file system containing configuration files. This file system have to be extremely reliable. Speed is not an issue, but the validity of any data written to it is of utmost importance. So before I go reinvent the wheel, I'd like to know if there's something which already does this, i.e. does things like keep duplicate copies of everything around, have all sorts of checks and balances to check for damaged parts of the flash, no caching and some sort of wear levelling, etc? -- Regards Abraham Have you noticed that all you need to grow healthy, vigorous grass is a crack in your sidewalk? __________________________________________________________ Abraham vd Merwe - 2d3D, Inc. Device Driver Development, Outsourcing, Embedded Systems Cell: +27 82 565 4451 Snailmail: Tel: +27 21 761 7549 Block C, Antree Park Fax: +27 21 761 7648 Doncaster Road Email: abraham@2d3d.co.za Kenilworth, 7700 Http: http://www.2d3d.com South Africa [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 10:54 safe flash filesystem Abraham vd Merwe @ 2001-06-21 13:43 ` Vipin Malik 2001-06-21 13:57 ` Abraham vd Merwe 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-06-21 13:43 UTC (permalink / raw) To: Abraham vd Merwe, MTD for Linux > >We're developing a product that needs a small part of the flash memory to >contain a very scaled down file system containing configuration files. This >file system have to be extremely reliable. Speed is not an issue, but the >validity of any data written to it is of utmost importance. > >So before I go reinvent the wheel, I'd like to know if there's something >which already does this, i.e. does things like keep duplicate copies of >everything around, have all sorts of checks and balances to check for >damaged parts of the flash, no caching and some sort of wear levelling, etc? Normally you could just use a "small" JFFS2 partition- or even a "full" JFFS2 partition with just your config database file on it. HOWEVER, at this time JFFS2 is NOT power fail "roll back and recover" safe- even for write()'s less than PAGE_SIZE. Read gory details at: JFFS: A practical guide at: http://www.embeddedlinuxworks.com/articles/jffs_guide.html There is another very serious problem with JFFS(2). IT may block read write accesses for 10's of seconds while it GC's- specially on a new full fs. (watch for a new article on read/write latencies on periodic tasks reading/writing from/to a JFFS fs very soon). In my opinion a "small" embedded power fail safe database utility is needed that would solve the power fail issue as well as provide caching support for read/write with a read/write log on another (separate) device to get around the latency problems. I have started a project to define the features required for this. Please read details at: http://www.embeddedlinuxworks.com/articles/db_project.html I need the same thing for my project. I am sure there are others out there that need this capability for their embedded systems. It would be nice if all could collaborate and come up with an open source software that addresses this need. Regards, Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 13:43 ` Vipin Malik @ 2001-06-21 13:57 ` Abraham vd Merwe 2001-06-21 14:29 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Abraham vd Merwe @ 2001-06-21 13:57 UTC (permalink / raw) To: Vipin Malik; +Cc: MTD for Linux [-- Attachment #1: Type: text/plain, Size: 1528 bytes --] Hi Vipin! > I have started a project to define the features required for this. Please > read details at: > http://www.embeddedlinuxworks.com/articles/db_project.html > > I need the same thing for my project. I am sure there are others out there > that need > this capability for their embedded systems. It would be nice if all could > collaborate and come > up with an open source software that addresses this need. Exciting stuff. It does address the issues I'm interested in. More specifically I agree that a driver level system is needed and not a user layer (otherwise we should be using Sleepycat's db3 which is perfect for transactional data except that it's not free). If it's a file system people will actually use it instead of just talk about it. Doing this sort of thing properly though is a huge task however and should be carefully planned. I'm really interested in helping (if we're talking driver, not user level process - I don't want to develop a database :P) -- Regards Abraham Don't let people drive you crazy when you know it's in walking distance. __________________________________________________________ Abraham vd Merwe - 2d3D, Inc. Device Driver Development, Outsourcing, Embedded Systems Cell: +27 82 565 4451 Snailmail: Tel: +27 21 761 7549 Block C, Antree Park Fax: +27 21 761 7648 Doncaster Road Email: abraham@2d3d.co.za Kenilworth, 7700 Http: http://www.2d3d.com South Africa [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 13:57 ` Abraham vd Merwe @ 2001-06-21 14:29 ` Vipin Malik 2001-06-21 14:35 ` Abraham vd Merwe 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-06-21 14:29 UTC (permalink / raw) To: Abraham vd Merwe; +Cc: MTD for Linux >Exciting stuff. It does address the issues I'm interested in. More >specifically I agree that a driver level system is needed and not a user >layer (otherwise we should be using Sleepycat's db3 which is perfect for >transactional data except that it's not free). Yeah, that "except is not free" means it costs >100K USD (for the transaction version) if I remember the quote they gave me. Plus, it's too "thick" for the typical use in embedded systems to store configuration variables and a few logs and data values. >Doing this sort of thing properly though is a huge task however and should >be carefully planned. > >I'm really interested in helping (if we're talking driver, not user level >process - I don't want to develop a database :P) I guess I should change the name away from a "database", as I really don't want to do anything more drastic than maybe a linear "recno" type flat file database or something very simple. Something that you or I may have probably implemented anyway to solve our needs to store these config variables in our respective systems. I did not have a full featured commercial "database" type database in mind, or even anything close. I agree that this functionality should be present in the JFFS2 layer itself, but have been unable to convince the powers to be. David W. says that he'll entertain a diff -u if we (I) implement the feature, but I don't think that I have the time or the capability to change JFFS2 to implement this functionality all by myself. Plus the issue of the blocked access will remain (even if we solve the power fail issue) and the solution to that may be not be acceptable for inclusion to the regular JFFS2 fs. Maybe you may want to subscribe to the development list on the www.EmbeddedLinuxWorks site and we can take this discussion there. I am looking for user input to define the feature set of this "config system" (not database :) And I want to make it LGPL (if it does become a lib or task) so that users can link to it without releasing the source of their own code. I can't believe that you and I are the only folks that may be interested in this. Maybe there is an existing solution- we just don't know about it- or other's have not thought about this issue just yet ;) Regards, Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 14:29 ` Vipin Malik @ 2001-06-21 14:35 ` Abraham vd Merwe 2001-06-21 15:05 ` Vipin Malik ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Abraham vd Merwe @ 2001-06-21 14:35 UTC (permalink / raw) To: Vipin Malik; +Cc: MTD for Linux [-- Attachment #1: Type: text/plain, Size: 3015 bytes --] Hi Vipin! > Yeah, that "except is not free" means it costs >100K USD (for the > transaction version) if I remember the quote they gave me. $125K iirc (; > I guess I should change the name away from a "database", as I really don't > want to do anything more drastic than maybe a linear "recno" type flat file > database or something very simple. Something that you or I may have > probably implemented anyway to solve our needs to store these config > variables in our respective systems. That's exactly what I've started writing today (: > I agree that this functionality should be present in the JFFS2 layer > itself, but have been unable to convince the powers to be. > > David W. says that he'll entertain a diff -u if we (I) implement the > feature, but I don't think that I have the time or the capability to change > JFFS2 to implement this functionality all by myself. Plus the issue of the > blocked access will remain (even if we solve the power fail issue) and the > solution to that may be not be acceptable for inclusion to the regular > JFFS2 fs. I don't know if merging something like this with jffs2 would solve the problem like you said. I was more thinking of a completely different user MTD driver to provide an uncached block device and slap a file system on top of that. Or we can sync() all the time from the file system. I have to agree that it's probably better to write a library/utilities first to do a preliminary thing. That way we'd get a functional thing quite fast and figure out what we did wrong in the first place. > Maybe you may want to subscribe to the development list on the > www.EmbeddedLinuxWorks site and we can take this discussion there. I am Where do I subscribe? > looking for user input to define the feature set of this "config system" > (not database :) And I want to make it LGPL (if it does become a lib or > task) so that users can link to it without releasing the source of their > own code. I'm in the fortunate position of being able to work on this fulltime for the next week or so, so if we can figure out a useful specification for this in a short time, I'm really keen on helping to implement this and LGPL/BSD license is just fine. > I can't believe that you and I are the only folks that may be interested in > this. Maybe there is an existing solution- we just don't know about it- or > other's have not thought about this issue just yet ;) That's why I mailed here in the first place :P -- Regards Abraham Matrimony is the root of all evil. __________________________________________________________ Abraham vd Merwe - 2d3D, Inc. Device Driver Development, Outsourcing, Embedded Systems Cell: +27 82 565 4451 Snailmail: Tel: +27 21 761 7549 Block C, Antree Park Fax: +27 21 761 7648 Doncaster Road Email: abraham@2d3d.co.za Kenilworth, 7700 Http: http://www.2d3d.com South Africa [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 14:35 ` Abraham vd Merwe @ 2001-06-21 15:05 ` Vipin Malik 2001-06-21 15:36 ` Chris Read 2001-06-21 15:09 ` Joakim Tjernlund ` (2 subsequent siblings) 3 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-06-21 15:05 UTC (permalink / raw) To: Abraham vd Merwe; +Cc: MTD for Linux > >That's exactly what I've started writing today (: Have you written a spec for it? >I don't know if merging something like this with jffs2 would solve the >problem like you said. I was more thinking of a completely different user >MTD driver to provide an uncached block device and slap a file system on top >of that. Or we can sync() all the time from the file system. Nooooo.... ;) JFFS2 already provides for: 1. Interface to MTD 2. Flash wear levelling 3. Compression/decompression on the fly 4. "always sync()" data to flash before your write() returns functionality 5. handling of erase paritions, GC, a file system interface etc. 6. tested for power fail reliability of the fs metadata. 7. Extensive usage by others even if they do not need this (our) functionality- hence minimal hidden bugs. My initial feel is that I really don't think that reinventing the wheel is the right answer. What we need is a "layer" on top of JFFS2 to provide the 2 features that it lacks. Namely: 1. Roll back and recover to last data if your write did not complete and power failed 2. 0 latency writes. Reads are no problem as they can always be cached in memory by reading the entire (it's not a database) database on startup. Alan Cox called this "transactional level" functionality. An implementation based on a transaction cache solves the issue of having to duplicate all the members and CRC them thus supporting quite a large database without the need for twice the space on the flash device, as well as the issue of "roll back and recover" or "complete transaction" on the next power up if the complete transaction is available. There is a reason that transactional logs are the preferred choice even for large databases that need to support fail safe. >I have to agree that it's probably better to write a library/utilities first >to do a preliminary thing. That way we'd get a functional thing quite fast >and figure out what we did wrong in the first place. My thoughts exactly. > > Maybe you may want to subscribe to the development list on the > > www.EmbeddedLinuxWorks site and we can take this discussion there. I am > >Where do I subscribe? http://www.embeddedlinuxworks.com/lists.html > > looking for user input to define the feature set of this "config system" > > (not database :) And I want to make it LGPL (if it does become a lib or > > task) so that users can link to it without releasing the source of their > > own code. > >I'm in the fortunate position of being able to work on this fulltime for >the next week or so, so if we can figure out a useful specification for this >in a short time, I'm really keen on helping to implement this and LGPL/BSD >license is just fine. That suits me just fine. We need to start working on a spec first. Maybe a clearly defined requirement spec, then a design document. If you like I can elaborate on the above thoughts a bit and start something- or feel free to send me something if you like. > > I can't believe that you and I are the only folks that may be > interested in > > this. Maybe there is an existing solution- we just don't know about it- or > > other's have not thought about this issue just yet ;) > >That's why I mailed here in the first place :P But I haven't heard from anyone else yet :( Regards, Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem 2001-06-21 15:05 ` Vipin Malik @ 2001-06-21 15:36 ` Chris Read 0 siblings, 0 replies; 30+ messages in thread From: Chris Read @ 2001-06-21 15:36 UTC (permalink / raw) To: 'Vipin Malik', 'Abraham vd Merwe'; +Cc: 'MTD for Linux' I would also be very interested in this. The ability to retain consistency after multiple power outages is crucial to many of the types of project upon which I work. The problem can be quite complex if you get a power fail in a garbage collection started as a result of a power fail during a previous GC. Chris Read CLR Associates Limited > -----Original Message----- > From: linux-mtd-admin@lists.infradead.org > [mailto:linux-mtd-admin@lists.infradead.org]On Behalf Of Vipin Malik > Sent: Thursday, June 21, 2001 4:05 PM > To: Abraham vd Merwe > Cc: MTD for Linux > Subject: Re: safe flash filesystem > > > > > > >That's exactly what I've started writing today (: > > Have you written a spec for it? > > >I don't know if merging something like this with jffs2 would > solve the > >problem like you said. I was more thinking of a completely > different user > >MTD driver to provide an uncached block device and slap a > file system on top > >of that. Or we can sync() all the time from the file system. > > Nooooo.... ;) > > JFFS2 already provides for: > > 1. Interface to MTD > 2. Flash wear levelling > 3. Compression/decompression on the fly > 4. "always sync()" data to flash before your write() returns > functionality > 5. handling of erase paritions, GC, a file system interface etc. > 6. tested for power fail reliability of the fs metadata. > 7. Extensive usage by others even if they do not need this (our) > functionality- hence minimal hidden bugs. > > My initial feel is that I really don't think that reinventing > the wheel is > the right answer. > What we need is a "layer" on top of JFFS2 to provide the 2 > features that it > lacks. Namely: > > 1. Roll back and recover to last data if your write did not > complete and > power failed > 2. 0 latency writes. Reads are no problem as they can always > be cached in > memory by reading the entire (it's not a database) database > on startup. > > Alan Cox called this "transactional level" functionality. > > An implementation based on a transaction cache solves the > issue of having > to duplicate all the members and CRC them thus supporting > quite a large > database without the need for twice the space on the flash > device, as well > as the issue of "roll back and recover" or "complete > transaction" on the > next power up if the complete transaction is available. There > is a reason > that transactional logs are the preferred choice even for > large databases > that need to support fail safe. > > > >I have to agree that it's probably better to write a > library/utilities first > >to do a preliminary thing. That way we'd get a functional > thing quite fast > >and figure out what we did wrong in the first place. > > My thoughts exactly. > > > > > Maybe you may want to subscribe to the development list on the > > > www.EmbeddedLinuxWorks site and we can take this > discussion there. I am > > > >Where do I subscribe? > > http://www.embeddedlinuxworks.com/lists.html > > > > > > looking for user input to define the feature set of this > "config system" > > > (not database :) And I want to make it LGPL (if it does > become a lib or > > > task) so that users can link to it without releasing the > source of their > > > own code. > > > >I'm in the fortunate position of being able to work on this > fulltime for > >the next week or so, so if we can figure out a useful > specification for this > >in a short time, I'm really keen on helping to implement > this and LGPL/BSD > >license is just fine. > > That suits me just fine. We need to start working on a spec > first. Maybe a > clearly defined requirement spec, then a design document. If > you like I can > elaborate on the above thoughts a bit and start something- or > feel free to > send me something if you like. > > > > > I can't believe that you and I are the only folks that may be > > interested in > > > this. Maybe there is an existing solution- we just don't > know about it- or > > > other's have not thought about this issue just yet ;) > > > >That's why I mailed here in the first place :P > > > But I haven't heard from anyone else yet :( > > Regards, > > Vipin > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem 2001-06-21 14:35 ` Abraham vd Merwe 2001-06-21 15:05 ` Vipin Malik @ 2001-06-21 15:09 ` Joakim Tjernlund 2001-06-21 15:34 ` Vipin Malik 2001-06-21 15:11 ` Herman Oosthuysen 2001-06-21 21:26 ` safe flash filesystem Russ Dill 3 siblings, 1 reply; 30+ messages in thread From: Joakim Tjernlund @ 2001-06-21 15:09 UTC (permalink / raw) To: Abraham vd Merwe, Vipin Malik; +Cc: MTD for Linux > Hi Vipin! > > > Yeah, that "except is not free" means it costs >100K USD (for the > > transaction version) if I remember the quote they gave me. > > $125K iirc (; > Check out the Mird DB at http://www.mirar.org/mird/ I would be very interested to hear what you think of this DB. Joakim Tjernlund ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem 2001-06-21 15:09 ` Joakim Tjernlund @ 2001-06-21 15:34 ` Vipin Malik 2001-06-21 19:34 ` Joakim Tjernlund 2001-06-21 19:47 ` Joakim Tjernlund 0 siblings, 2 replies; 30+ messages in thread From: Vipin Malik @ 2001-06-21 15:34 UTC (permalink / raw) To: joakim.tjernlund, Abraham vd Merwe; +Cc: MTD for Linux Thanks for the link. At first blush it seems like something that will provide a great place to start- if it does not provide all the functionality already. I'll examine it a bit more and give you my thoughts. Are you using this db or have any experience with it? The author does not specify the license- besides stating that it is "free". Is it GPL or LGPL? Regards, Vipin At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote: > > Hi Vipin! > > > > > Yeah, that "except is not free" means it costs >100K USD (for the > > > transaction version) if I remember the quote they gave me. > > > > $125K iirc (; > > >Check out the Mird DB at http://www.mirar.org/mird/ > >I would be very interested to hear what you think >of this DB. > > Joakim Tjernlund ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem 2001-06-21 15:34 ` Vipin Malik @ 2001-06-21 19:34 ` Joakim Tjernlund 2001-06-21 19:47 ` Joakim Tjernlund 1 sibling, 0 replies; 30+ messages in thread From: Joakim Tjernlund @ 2001-06-21 19:34 UTC (permalink / raw) To: Vipin Malik, Abraham vd Merwe; +Cc: MTD for Linux It > > Thanks for the link. At first blush it seems like something that will > provide a great place to start- if it does not provide all the > functionality already. > > I'll examine it a bit more and give you my thoughts. Are you > using this db > or have any experience with it? > The author does not specify the license- besides stating that it > is "free". > Is it GPL or LGPL? > > Regards, > > Vipin > > At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote: > > > Hi Vipin! > > > > > > > Yeah, that "except is not free" means it costs >100K USD (for the > > > > transaction version) if I remember the quote they gave me. > > > > > > $125K iirc (; > > > > >Check out the Mird DB at http://www.mirar.org/mird/ > > > >I would be very interested to hear what you think > >of this DB. > > > > Joakim Tjernlund > ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem 2001-06-21 15:34 ` Vipin Malik 2001-06-21 19:34 ` Joakim Tjernlund @ 2001-06-21 19:47 ` Joakim Tjernlund 1 sibling, 0 replies; 30+ messages in thread From: Joakim Tjernlund @ 2001-06-21 19:47 UTC (permalink / raw) To: Vipin Malik; +Cc: MTD for Linux Sorry for the previous incomplete mail, I slipped with my fingers :-( It's not GPL or LGPL, it's under a JPEG-like license(according to the author in a E-mail to me). We are looking into this DB and at first glance it looks OK, but the guy who knows DB's here is on vacation and I don't know that much about DB's so I can not be more specific. Joakim > -----Original Message----- > From: linux-mtd-admin@lists.infradead.org > [mailto:linux-mtd-admin@lists.infradead.org]On Behalf Of Vipin Malik > Sent: Thursday, June 21, 2001 17:35 > To: joakim.tjernlund@lumentis.se; Abraham vd Merwe > Cc: MTD for Linux > Subject: RE: safe flash filesystem > > > Thanks for the link. At first blush it seems like something that will > provide a great place to start- if it does not provide all the > functionality already. > > I'll examine it a bit more and give you my thoughts. Are you > using this db > or have any experience with it? > The author does not specify the license- besides stating that it > is "free". > Is it GPL or LGPL? > > Regards, > > Vipin > > At 05:09 PM 6/21/2001 +0200, Joakim Tjernlund wrote: > > > Hi Vipin! > > > > > > > Yeah, that "except is not free" means it costs >100K USD (for the > > > > transaction version) if I remember the quote they gave me. > > > > > > $125K iirc (; > > > > >Check out the Mird DB at http://www.mirar.org/mird/ > > > >I would be very interested to hear what you think > >of this DB. > > > > Joakim Tjernlund > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 14:35 ` Abraham vd Merwe 2001-06-21 15:05 ` Vipin Malik 2001-06-21 15:09 ` Joakim Tjernlund @ 2001-06-21 15:11 ` Herman Oosthuysen 2001-06-21 17:54 ` Tim Riker 2001-06-21 21:26 ` safe flash filesystem Russ Dill 3 siblings, 1 reply; 30+ messages in thread From: Herman Oosthuysen @ 2001-06-21 15:11 UTC (permalink / raw) To: Abraham vd Merwe, Vipin Malik; +Cc: MTD for Linux Hi guys, We are currently exploring a product by Tevero in Norway: http://www.tevero.no/products/fdc/ to use instead of the still buggy JFFS2. Price is USD2500, which isn't bad. It appears to be the cheapest commercial FFS available. Cheers, Herman http://www.WirelessNetworksInc.com ----- Original Message ----- From: Abraham vd Merwe <abraham@2d3d.co.za> To: Vipin Malik <mtd-linux@embeddedlinuxworks.com> Cc: MTD for Linux <linux-mtd@lists.infradead.org> Sent: Thursday, June 21, 2001 8:35 AM Subject: Re: safe flash filesystem ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 15:11 ` Herman Oosthuysen @ 2001-06-21 17:54 ` Tim Riker 2001-06-21 19:43 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Tim Riker @ 2001-06-21 17:54 UTC (permalink / raw) To: Herman Oosthuysen; +Cc: Abraham vd Merwe, Vipin Malik, MTD for Linux Hmm... would it not be easier to just use ext2 on a CF card? less $$ and more flexibility no? Herman Oosthuysen wrote: > > Hi guys, > > We are currently exploring a product by Tevero in Norway: > http://www.tevero.no/products/fdc/ to use instead of the still buggy JFFS2. > Price is USD2500, which isn't bad. It appears to be the cheapest commercial > FFS available. > > Cheers, > > Herman > http://www.WirelessNetworksInc.com > > ----- Original Message ----- > From: Abraham vd Merwe <abraham@2d3d.co.za> > To: Vipin Malik <mtd-linux@embeddedlinuxworks.com> > Cc: MTD for Linux <linux-mtd@lists.infradead.org> > Sent: Thursday, June 21, 2001 8:35 AM > Subject: Re: safe flash filesystem > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ -- Tim Riker - http://rikers.org/ - short SIGs! <g> All I need to know I could have learned in Kindergarten ... if I'd just been paying attention. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 17:54 ` Tim Riker @ 2001-06-21 19:43 ` Vipin Malik 2001-06-21 19:35 ` Tim Riker 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-06-21 19:43 UTC (permalink / raw) To: Tim Riker; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux Tim Riker wrote: > Hmm... would it not be easier to just use ext2 on a CF card? > > less $$ and more flexibility no? > ext2 on CF takes about 3-5 power fails before it falls _flat_ on its face! Pretty ugly too. I would not use it. Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 19:43 ` Vipin Malik @ 2001-06-21 19:35 ` Tim Riker 2001-06-21 19:56 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Tim Riker @ 2001-06-21 19:35 UTC (permalink / raw) To: Vipin Malik; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux ok, what about reiserfs on CF then? Vipin Malik wrote: > > Tim Riker wrote: > > > Hmm... would it not be easier to just use ext2 on a CF card? > > > > less $$ and more flexibility no? > > > > ext2 on CF takes about 3-5 power fails before it falls _flat_ on its face! > Pretty ugly too. > > I would not use it. > > Vipin -- Tim Riker - http://rikers.org/ - short SIGs! <g> All I need to know I could have learned in Kindergarten ... if I'd just been paying attention. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 19:35 ` Tim Riker @ 2001-06-21 19:56 ` Vipin Malik 2001-06-21 21:17 ` Kyle Harris 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-06-21 19:56 UTC (permalink / raw) To: Tim Riker; +Cc: Herman Oosthuysen, Abraham vd Merwe, MTD for Linux Tim Riker wrote: > ok, > > what about reiserfs on CF then? > I have tested one *major* brand of IDE flash devices and 2 brands of CF devices, in more than 20K power fail tests. Both suffer from low level failures, which cause the IDE driver layer to "give up" with "unrecoverable errors". The CF is so bad that I just gve up on the testing after a few hundred cycles. The IDE flash was better, but not much. Will raiserfs be happy if the underlying IDE /dev/hdxx driver returns "unrecoverable error" from the IDE device? Your call. Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 19:56 ` Vipin Malik @ 2001-06-21 21:17 ` Kyle Harris 2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen 0 siblings, 1 reply; 30+ messages in thread From: Kyle Harris @ 2001-06-21 21:17 UTC (permalink / raw) To: Vipin Malik, MTD for Linux Hey, I've read thru several posts and Vipin's jffs_guide. It appears that JFFS, at his time, is about the most reliable open source fs for embedded systems, even though it still has some problems. When JFFS fails, is the filesystem still usable? My question is this. What if you save only a small datafile (< 1K) and write it alternately to 2 different JFFS partitions (or even the same partition). At boot, you read from both and get the latest, valid copy. This way if one is bad you still have a backup. How reliable would this be? Just wondering... Kyle. Vipin Malik wrote: > > Tim Riker wrote: > > > ok, > > > > what about reiserfs on CF then? > > > > I have tested one *major* brand of IDE flash devices and 2 brands of CF > devices, in more than 20K power fail tests. > > Both suffer from low level failures, which cause the IDE driver layer to "give > up" with > "unrecoverable errors". The CF is so bad that I just gve up on the testing > after a few hundred cycles. > > The IDE flash was better, but not much. > > Will raiserfs be happy if the underlying IDE /dev/hdxx driver returns > "unrecoverable error" from the IDE device? > > Your call. > > Vipin > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 30+ messages in thread
* On the "safe filesystem" and write() topic 2001-06-21 21:17 ` Kyle Harris @ 2001-07-03 23:53 ` Bjorn Wesen 2001-07-04 14:10 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Bjorn Wesen @ 2001-07-03 23:53 UTC (permalink / raw) To: MTD for Linux; +Cc: Kyle Harris, Vipin Malik Hi, I designed the JFFS specifications, log layout and GC method in the first place and me and Finn put a lot of thought into it while implementing so please consider some of these late night ramblings: The initial requirement was that a small partition of configuration files (the /etc directory to be more specific) should be able to reside in flash and be completely safe from inconvenient power-outs or crashes. It is my opinion (of course) that JFFS solves this in a manner as good as possible given the standard Linux VFS API. This means that when you rewrite a configuration file, you write the new one to another file and do a rename over the old once you're ready. Technically JFFS is based on a log structure consisting of VFS operations, and this is the best you can do while not involving the application more than what standard VFS gives you. VFS operations are not "transactions" in the high-level sense though. In our embedded products this is handled by a configuration handling daemon similar to linuxconf, which caches parameters and knows how to rewrite configuration files atomically (just like any other sane Unix program does it). There is no need for any transactional semantics for small configuration files. We sell a lot of these products and I certainly disagree with Vipin's comment on his website that it's impossible to use JFFS in embedded products :) Log-files are not usually kept in flash and if they are they don't need anything more advanced than normal rotation and if a crash occurs, it's no big deal if the last line gets cut off completely or in the middle... It is difficult (if not impossible) in any consistant way to handle the case with random write()'s inside an already existing file. The filesystem needs to "roll back" to any pre-existing state but it then needs to know what the desired state would be. What we do now is make sure the filsystem itself is never corrupt even if a file was under writing. The problems arise from the vague definition of what the desired state would be - is it the data before the last write(), and what happens if you receive a signal ? Writes to mmap'ed pages can't use that mechanism, and you'll be stuck with using write()'s when you really probably want to use libc wrappers like fwrite and fprintf. I agree that if you need a binary database which is big so that you cannot rewrite it when you update something, you'll need to rethink. Either just split the database in smaller files, or you'll need a transaction marker API down to the filesystem (an ioctl pair was suggested somewhere I think). I don't think trying to tweak write() would lead to anything generally useful though. The kernel-level transactional extension would probably be quite difficult to get consistent also, because Linux VFS does not know about it yet (this is eventually changing with the integration of the general journalling layer I guess). I get a headache thinking about it, perhaps it's possible perhaps it's not; perhaps this code already exist in the other journalling filesystems, perhaps it does not. With regards to Kyle's question below though, the answer is certainly that he can do as he says but use the rename() operation and keep them on a single partition. There is no need for anything more advanced.. (All this assumes other more technical problems are solved of course like the nasty surprises we've had with some flashes getting bits halfway erased...) /BW On Thu, 21 Jun 2001, Kyle Harris wrote: > I've read thru several posts and Vipin's jffs_guide. It appears that > JFFS, at his time, is about the most reliable open source fs for > embedded systems, even though it still has some problems. When JFFS > fails, is the filesystem still usable? My question is this. What if you > save only a small datafile (< 1K) and write it alternately to 2 > different JFFS partitions (or even the same partition). At boot, you > read from both and get the latest, valid copy. This way if one is bad > you still have a backup. How reliable would this be? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic 2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen @ 2001-07-04 14:10 ` Vipin Malik 2001-07-05 18:16 ` Bjorn Wesen 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-07-04 14:10 UTC (permalink / raw) To: Bjorn Wesen, MTD for Linux; +Cc: Kyle Harris Hi, At 01:53 AM 7/4/2001 +0200, Bjorn Wesen wrote: >I designed the JFFS specifications, log layout and GC method in the first >place and me and Finn put a lot of thought into it while implementing so >please consider some of these late night ramblings: Definitely! Thoughts, discussions, suggestions most welcome and thank you for reading my ramblings! >The initial requirement was that a small partition of configuration files >(the /etc directory to be more specific) should be able to reside in flash >and be completely safe from inconvenient power-outs or crashes. > >It is my opinion (of course) that JFFS solves this in a manner as good as >possible given the standard Linux VFS API. This means that when you >rewrite a configuration file, you write the new one to another file and do >a rename over the old once you're ready. Agreed. Of course as long as the config files are small and relatively few and not changing that often. Your example of config files in /etc fits the bill perfectly. > Technically JFFS is based on a >log structure consisting of VFS operations, and this is the best you can >do while not involving the application more than what standard VFS gives >you. VFS operations are not "transactions" in the high-level sense though. Agreed again. >In our embedded products this is handled by a configuration handling >daemon similar to linuxconf, which caches parameters and knows how to >rewrite configuration files atomically (just like any other sane Unix >program does it). There is no need for any transactional semantics for >small configuration files. This is surely the preferred way to do it for such files. As a matter of fact it is most preferred for small config files. I think that I need to explicitly mention it in one of my ramblings on my site ;) > We sell a lot of these products and I certainly >disagree with Vipin's comment on his website that it's impossible to use >JFFS in embedded products :) Wait a minute! Where did I say that in context of config files. And if I did I need to go and correct it (so please send me an email). I think I surely said it in context of JFFS (not JFFS2) loosing integrity (including files at random) during power fail tests and I stand behind those results till proven otherwise. Have you guys tested the JFFS fs under power fail? What version are you using and what were your results? > Log-files are not usually kept in flash and >if they are they don't need anything more advanced than normal rotation >and if a crash occurs, it's no big deal if the last line gets cut off >completely or in the middle... Again agreed. Log files being of the course the "append" type, and a simple scan of the log file on startup will enable one to detect and remove this last half written offending line. >It is difficult (if not impossible) in any consistant way to handle the >case with random write()'s inside an already existing file. The filesystem >needs to "roll back" to any pre-existing state but it then needs to >know what the desired state would be. What we do now is make sure the >filsystem itself is never corrupt even if a file was under writing. JFFS2 does that (not getting corrupt) under random power fail. JFFS attempts to do that, but there is a bug in the latest version in CVS that causes files to disappear at random in power fail testing. This happened anywhere after 600+ to 1300+ power fails. I've mentioned this specifically in my "JFFS: A Practical guide" on my site. It's quite possible that *I* introduced this bug myself when I was mucking around with JFFS trying to fix other problems. But considering the fact that when I started testing JFFS, it would never last more than 10 power cycles without a failed mount on power up, and other issues like leaking memory to the point that the kernel panicked (again on mount after a power fail)- to the point when I left it with my patches, that I get at least 600+ (and once 1300+) async power fails without any problem, which version would you rather go with? With the maturing of JFFS2, IMHO folks should be encouraged to migrate to JFFS2 if possible (I am). Is there anything that JFFS gives you that you don't get with JFFS2? >The problems arise from the vague definition of what the desired state >would be - is it the data before the last write(), and what happens if you >receive a signal ? Isn't it the same case as what happens when you get a power fail? (please pardon my lack of understanding of signals in kernels. Can the execution that was interrupted with a signal ever resume at the interrupted point?) > Writes to mmap'ed pages can't use that mechanism, and >you'll be stuck with using write()'s when you really probably want to use >libc wrappers like fwrite and fprintf. That's true, but it's a tradeoff: If the task wants reliable writes to the fs, it must not use any lib calls. As a matter of fact, that's the last thing you want to use anyway as these wrappers buffer the programs writes, defeating the purpose of the default mechanism of O_SYNC of the JFFS(2) fs. >I agree that if you need a binary database which is big so that you cannot >rewrite it when you update something, you'll need to rethink. Either just >split the database in smaller files, or you'll need a transaction marker >API down to the filesystem (an ioctl pair was suggested somewhere I >think). I don't think trying to tweak write() would lead to anything >generally useful though. See, we agree on all the same points :) The main issue here is not only a BIG database, but also one with a lot of points in it that are being updated frequently. Each file has an overhead (as well a max # of files limit on the fs). How reasonable is it to put 5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB in a single (db) file) vs at least 5000*64(file overhead)+5000*8 = 360KB as separate files, assuming that you can even fit 5000 files on your partition. >The kernel-level transactional extension would probably be quite difficult >to get consistent also, because Linux VFS does not know about it yet (this >is eventually changing with the integration of the general journalling >layer I guess). I get a headache thinking about it, perhaps it's possible >perhaps it's not; perhaps this code already exist in the other journalling >filesystems, perhaps it does not. I cannot speak intelligently about this so I'll keep my mouth shut :) >With regards to Kyle's question below though, the answer is certainly that >he can do as he says but use the rename() operation and keep them on a >single partition. There is no need for anything more advanced.. For a lot of solutions, this is certainly true. OTOH, the current blocking times of JFFS2 (I didn't do this test on JFFS, but no reason to be different methinks) makes putting any config or db directly on the fs unreasonable. (if you've been following my jitter tests recently, JFFS2 can block for 10's of seconds when it getting quite full). >(All this assumes other more technical problems are solved of course like >the nasty surprises we've had with some flashes getting bits halfway >erased...) This "filpping bits" syndrome (TM Vipin Malik :) is solved reliably for JFFS2. JFFS2 has passed 15K+ power fails without any failures that I could detect or was looking for. IMHO it cannot be solved reliably for JFFS because JFFS does not handle (or know about) erase sectors. I've solved it be re-reading the same sector 4 times. See big note above scan_for_partially_erased_sectors() (or something like that) in jffs/intrep.c To a large extent, we've (I) have allowed the thought of having transactions in JFFS(2) lapse. Maybe this is not such a bad thing after all and with each discussion I better appreciate the cons of having transactions in the fs. Anyway, there is a new project that is being started on developing (or modifying an existing embedded db (mird)) to provide for this transaction level processing for embedded systems on JFFS(2). In addition to providing transactions it will also provide a caching layer that will allow the transaction log to be put on *another* non-volatile medium if such is available in your system. The big advantage of this will be 0 latency, transaction protected, power fail safe writes available to programs that use this interface. As a freebe it will also provide for key/value type store/retrieve from a (small) hash database. Read more about it at: http://www.embeddedlinuxworks.com/articles/db_project.html To sign up for the development mailing list, go to: http://www.embeddedlinuxworks.com/cgi-bin/signup/signup-dev.cgi Thanks for reading and your thoughts. Regards, Vipin http://www.EmbeddedLinuxWorks.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic 2001-07-04 14:10 ` Vipin Malik @ 2001-07-05 18:16 ` Bjorn Wesen 2001-07-06 13:40 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Bjorn Wesen @ 2001-07-05 18:16 UTC (permalink / raw) To: Vipin Malik; +Cc: jffs-dev, MTD for Linux On Wed, 4 Jul 2001, Vipin Malik wrote: > I think I surely said it in context of JFFS (not JFFS2) loosing integrity > (including files at random) during power fail tests and I stand behind > those results till proven otherwise. Have you guys tested the JFFS fs under > power fail? What version are you using and what were your results? We've tested it but probably not in more than a couple of hundred cycles; I've never seen that floating bit error before, perhaps it's just some flash chips that get bitten by that and it might depend on the hardware as well (resident charge in capacitors etc). > would you rather go with? With the maturing of JFFS2, IMHO folks should be > encouraged to migrate to JFFS2 if possible (I am). Is there anything that > JFFS gives you that you don't get with JFFS2? All products on sale from Axis still run 2.0.. next generation will be 2.4 and some sort of JFFS, and it will be JFFS2 if the bugs are sorted out (no theoretical reason why JFFS2 shouldn't be perfect of course, it's just a matter of finetuning :) Well apart from compression-code and latency; after all you cannot both have synchronous writes, compression and expecting the application to not be blocked.. (The rest of the system should not be blocked though, that's just a matter of being able to yield due to need_resched inside the compression code) > >The problems arise from the vague definition of what the desired state > >would be - is it the data before the last write(), and what happens if you > >receive a signal ? > > Isn't it the same case as what happens when you get a power fail? (please > pardon my lack of understanding of signals in kernels. Can the execution > that was interrupted with a signal ever resume at the interrupted point?) Depends on the system call and underlying filesystem; for a normal read/write, they probably just return the number of chars read/written up to the point of the signal (just as they can by the API). And hence my comment that it's no use trying to enforce atomic behaviour for entire write() chunks. Your app can catch a signal, return from a half-written write and then crash before you can write() the "missing" chars. So if you want to do the "atomic write" you need to disable all signal checking inside the write paths, which means going back to the non-generic write VFS functions and coincidentally you'll need to block the rest of the system as well (see 2'nd above paragraph) because you can't reschedule without a signal-check. It's simply not a tenable scenario :) I'd much rather see the "start transaction/end transaction" ioctl's than trying to make write be atomic. > > Writes to mmap'ed pages can't use that mechanism, and > >you'll be stuck with using write()'s when you really probably want to use > >libc wrappers like fwrite and fprintf. > > That's true, but it's a tradeoff: If the task wants reliable writes to the > fs, it must not use any lib calls. As a matter of fact, that's the last > thing you want to use anyway as these wrappers buffer the programs writes, > defeating the purpose of the default mechanism of O_SYNC of the JFFS(2) fs. I think that's a non sequiteur, especially given that the individual write itself is not atomic anyway. It can't matter if you do fprintf or a write() in a loop (since that's exactly what fprintf does eventually anyway). As long as writes are enforced to be sequential, I think that's enough. Does not JFFS2 queue writes internally anyway BTW ? And if you have O_SYNC (assuming JFFS adheres to it) when fprintf returns you can be as guaranteed that the data has been written as if you'd done it yourself with a write(). > points in it that are being updated frequently. Each file has an overhead > (as well a max # of files limit on the fs). How reasonable is it to put > 5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB > in a single (db) file) vs at least 5000*64(file overhead)+5000*8 = 360KB as > separate files, assuming that you can even fit 5000 files on your partition. I think either a transaction mechanism or an entirely different flash filesystem (not VFS-based) need to be used if that is a common usage scenario. > >The kernel-level transactional extension would probably be quite difficult > >to get consistent also, because Linux VFS does not know about it yet (this > >is eventually changing with the integration of the general journalling > >layer I guess). I get a headache thinking about it, perhaps it's possible > >perhaps it's not; perhaps this code already exist in the other journalling > >filesystems, perhaps it does not. > > I cannot speak intelligently about this so I'll keep my mouth shut :) IIRC the main holding points against merging reiserfs before was that it really should wait until VFS is made aware of journalling concepts in order to avoid "half way" solutions, and that in turn was dependant on the ext3 developers etc... Thing is, I think JFFS2 uses the generic file writing in VFS which means that VFS itself fetches and updates pages in the page-cache (or similar) which means an overall more complex situation for JFFS which wants to write this transactionally without inter-process dependencies etc.. I.e. suppose process A is writing to file X while B is reading from it, and writing to file Y at the same time. A starts a transaction and writes. If VFS does not know about transactions, it will simply put the writes in the page-cache so B might read them and write to file Y. So if a crash occurs, yes, file X is intact but Y is screwed up. So the writes need to be queued up in JFFS or VFS or you need to guarantee that only the process doing the writes have access to the file at the same time. This is a major obstacle, and I don't know how it's solved in reiser, JFS and XFS (if they support user-level transactions at all) without patching VFS and the page-cache. > any config or db directly on the fs unreasonable. (if you've been following > my jitter tests recently, JFFS2 can block for 10's of seconds when it > getting quite full). Probably possible but that's an implementation problem not a theoretical problem. In a "run time" phase (flash is almost all dirty, space exist and writes are coming in) there should never need to be more latency that what it takes to GC the same amount of space as you want to write. And as I wrote above somewhere, while the writing process needs to be blocked (in O_SYNC) there is no reason to block other processes from scheduling in, unless I've missed something major... > transactions in the fs. Anyway, there is a new project that is being > started on developing (or modifying an existing embedded db (mird)) to > provide for this transaction level processing for embedded systems on > JFFS(2). In addition to providing transactions it will also provide a One alternative is a completely user-mode flash DB. Have a deamon which have access to a raw flash device and implements a transactional database on that device. No need for a kernel system really.. > caching layer that will allow the transaction log to be put on *another* > non-volatile medium if such is available in your system. The big advantage Why would this be necessary ? /BW ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic 2001-07-05 18:16 ` Bjorn Wesen @ 2001-07-06 13:40 ` Vipin Malik 2001-07-07 9:25 ` Bjorn Wesen 0 siblings, 1 reply; 30+ messages in thread From: Vipin Malik @ 2001-07-06 13:40 UTC (permalink / raw) To: Bjorn Wesen; +Cc: jffs-dev, MTD for Linux Hi, > > Have you guys tested the JFFS fs under > > power fail? What version are you using and what were your results? > >We've tested it but probably not in more than a couple of hundred >cycles; I've never seen that floating bit error before, perhaps it's just >some flash chips that get bitten by that and it might depend on the >hardware as well (resident charge in capacitors etc). I believe that David also mentioned that he has seen that error also. It's detection is very proportional to the probability of power failing in the middle of a sector erase. So the larger number of sector erases that one does, as well as the larger number of power fail one does, the higher the probability of seeing it. With a few hundred tests, I'm not surprised that you haven't seem it. >Well apart from compression-code and >latency; after all you cannot both have synchronous writes, compression >and expecting the application to not be blocked.. HeHe, well, maybe the fs can (will or may?) block, but in all realistic situations it's unacceptable for a real world embedded app to block for multiple seconds while the fs is "busy". Where does the app store any data value updates it's generating (specially if they have to be stored immediately in a non-volatile manner)? >(The rest of the system should not be blocked though, that's just a matter >of being able to yield due to need_resched inside the >compression code) My latest tests indicate that this is already the case. A POSIX RT task (not interacting with JFFS2) does not block (for too long) even if the underlying JFFS2 fs is blocked for >40 seconds! > > >The problems arise from the vague definition of what the desired state > > >would be - is it the data before the last write(), and what happens if you > > >receive a signal ? > > > > Isn't it the same case as what happens when you get a power fail? (please > > pardon my lack of understanding of signals in kernels. Can the execution > > that was interrupted with a signal ever resume at the interrupted point?) > >Depends on the system call and underlying filesystem; for a >normal read/write, they probably just return the number of chars >read/written up to the point of the signal (just as they can by the >API). And hence my comment that it's no use trying to enforce atomic >behaviour for entire write() chunks. Your app can catch a signal, return >from a half-written write and then crash before you can write() the >"missing" chars. I guess you are right. This is best handled as an "out of band" solution- i.e. with ioctl transactions, or a transaction db etc. >As long as writes are enforced to be sequential, I think that's >enough. Does not JFFS2 queue writes internally anyway BTW ? And if you >have O_SYNC (assuming JFFS adheres to it) when fprintf returns you can be >as guaranteed that the data has been written as if you'd done it yourself >with a write(). Hmm, I was under the impression that lib fprintf, fread, fwrite etc. all work with some delimiter, usually '\n' and specially in the case of fprintf(), the data is buffered till a '\n' is detected. I assumed (perhaps incorrectly) that a similar mechanism may be at play with the lib file i/o calls as well. > > points in it that are being updated frequently. Each file has an overhead > > (as well a max # of files limit on the fs). How reasonable is it to put > > 5000, 8 byte files on a 1MB JFFS(2) fs? (this file would only occupy <50KB > > in a single (db) file) vs at least 5000*64(file overhead)+5000*8 = > 360KB as > > separate files, assuming that you can even fit 5000 files on your > partition. > >I think either a transaction mechanism or an entirely different flash >filesystem (not VFS-based) need to be used if that is a common usage >scenario. That's why we are looking at using a transaction db (mird) to provide this functionality rather than hack JFFS2 (and or the VFS) to support it. > > any config or db directly on the fs unreasonable. (if you've been > following > > my jitter tests recently, JFFS2 can block for 10's of seconds when it > > getting quite full). > >Probably possible but that's an implementation problem not a theoretical >problem. In a "run time" phase (flash is almost all dirty, space exist and >writes are coming in) there should never need to be more latency that what >it takes to GC the same amount of space as you want to write. When the rubber meets the road, implementation problems and theoretical problems are indistinguishable :) The reality is that JFFS2 can block for 10's of seconds on a reasonable powerful processor (a 133MHz 486). Tweaking may get that down to a few seconds, but unless there is a design or implementation bug in JFFS2, there will always be some processing required to GC when there is no more ready free space on the flash. At this time a task updating variables on the FS will block. The question is: How long a block is acceptable? IMHO, anything more than a few hundred ms will be unacceptable to a reasonable percentage of embedded applications. I know it is unacceptable for my application. I generate data updates 5 times a seconds and I want that data stored reliability on the flash fs, as well not be blocked for more than 200ms. >One alternative is a completely user-mode flash DB. Have a deamon which >have access to a raw flash device and implements a transactional database >on that device. No need for a kernel system really.. The biggest problem with this is the one has to reinvent all the major flash interface features of JFFS2. Not a elegant solution IMHO. > > caching layer that will allow the transaction log to be put on *another* > > non-volatile medium if such is available in your system. The big advantage > >Why would this be necessary ? To provide for 0 latency writes for tasks updating data values, when the underlying fs is blocked and cannot accept any more writes for another "few" (at the moment >40) seconds. Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic 2001-07-06 13:40 ` Vipin Malik @ 2001-07-07 9:25 ` Bjorn Wesen 2001-07-07 13:06 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Bjorn Wesen @ 2001-07-07 9:25 UTC (permalink / raw) To: Vipin Malik; +Cc: jffs-dev, MTD for Linux On Fri, 6 Jul 2001, Vipin Malik wrote: > >latency; after all you cannot both have synchronous writes, compression > >and expecting the application to not be blocked.. > > HeHe, well, maybe the fs can (will or may?) block, but in all realistic > situations it's unacceptable for a real world embedded app to block for > multiple seconds while the fs is "busy". Where does the app store any data You might not like it but you cannot have it any other way :) Fact: flash chip sectors takes long to erase (1-2 seconds) Fact: you need to erase to make room for new data Hence, if you need the app to do synchronous writing, it will need to wait. > Hmm, I was under the impression that lib fprintf, fread, fwrite etc. all > work with some delimiter, usually '\n' and specially in the case of No, but now that I think about it they are not synchronous either (since they buffer and return). > >problem. In a "run time" phase (flash is almost all dirty, space exist and > >writes are coming in) there should never need to be more latency that what > >it takes to GC the same amount of space as you want to write. > > When the rubber meets the road, implementation problems and theoretical > problems are indistinguishable :) > The reality is that JFFS2 can block for 10's of seconds on a reasonable > powerful processor (a 133MHz 486). Yes but that might BE the time it takes to make room for the data you want to write.. > time a task updating variables on the FS will block. The question is: How > long a block is acceptable? IMHO, anything more than a few hundred ms will > be unacceptable to a reasonable percentage of embedded applications. I know Then you can't use flash chips in your embedded application :) > > > caching layer that will allow the transaction log to be put on *another* > > > non-volatile medium if such is available in your system. The big advantage > > > >Why would this be necessary ? > > To provide for 0 latency writes for tasks updating data values, when the > underlying fs is blocked and cannot accept any more writes for another > "few" (at the moment >40) seconds. So what happens when that gets full and need to be erased ? All you'd do is interleave the writes and postpone the problem a bit. If you mean that the transactional log will "never" get full and require erasing, then yes, that would work but I doubt the "never" constraint :) Some flash chip configurations might allow you to erase one sector while writing to another; this is transiently good if you only write one sector worth of information during the time it takes to erase the other sector. As soon as you go over that you hit the latency again. /BW ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: On the "safe filesystem" and write() topic 2001-07-07 9:25 ` Bjorn Wesen @ 2001-07-07 13:06 ` Vipin Malik 0 siblings, 0 replies; 30+ messages in thread From: Vipin Malik @ 2001-07-07 13:06 UTC (permalink / raw) To: Bjorn Wesen; +Cc: jffs-dev, MTD for Linux At 11:25 AM 7/7/2001 +0200, Bjorn Wesen wrote: > > HeHe, well, maybe the fs can (will or may?) block, but in all realistic > > situations it's unacceptable for a real world embedded app to block for > > multiple seconds while the fs is "busy". Where does the app store any data > >You might not like it but you cannot have it any other way :) > >Fact: flash chip sectors takes long to erase (1-2 seconds) > >Fact: you need to erase to make room for new data > >Hence, if you need the app to do synchronous writing, it will need to >wait. > > > > time a task updating variables on the FS will block. The question is: How > > long a block is acceptable? IMHO, anything more than a few hundred ms will > > be unacceptable to a reasonable percentage of embedded applications. I > know > >Then you can't use flash chips in your embedded application :) > > > > > caching layer that will allow the transaction log to be put on > *another* > > > > non-volatile medium if such is available in your system. The big > advantage > > > > > >Why would this be necessary ? > > > > To provide for 0 latency writes for tasks updating data values, when the > > underlying fs is blocked and cannot accept any more writes for another > > "few" (at the moment >40) seconds. > >So what happens when that gets full and need to be erased ? All you'd do >is interleave the writes and postpone the problem a bit. If you mean that >the transactional log will "never" get full and require erasing, then yes, >that would work but I doubt the "never" constraint :) That's why if 0 latency writes are important to a design, they must put this cache on a 0-erase-latency non-volatile medium like a battery backed RAM or FRAM. Then a simple equation will help one size it for one's particular need, namely: C_KB = size_of_nonvolatile_cache_device_required; t1 = max_block_time_of_flash_FS_sec; t2 = time_to_xfer_C_KB_to_flash_FS_sec; NEW_KB_PER_SEC = max_new_data_generating_rate_KB_per_sec; C_KB = (t1+t2) * NEW_KB_PER_SEC; That's what the 0 latency write, transaction protected, embedded database project on the dev list is for. Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 14:35 ` Abraham vd Merwe ` (2 preceding siblings ...) 2001-06-21 15:11 ` Herman Oosthuysen @ 2001-06-21 21:26 ` Russ Dill 2001-06-22 8:22 ` Abraham vd Merwe [not found] ` <20010622102154.E1828@crystal.2d3d.co.za> 3 siblings, 2 replies; 30+ messages in thread From: Russ Dill @ 2001-06-21 21:26 UTC (permalink / raw) To: MTD for Linux If its just a config file, why make all this so complicated? struct node { u32 magic; char valid; u32 version; u32 data_crc; u32 hdr_crc; char data[DATA_SIZE]; }; set aside 2-4 eraseblocks (preferably paramater blocks) and on mount, find the valid config, walk though the flash and find the valid node with the matching crc's and highest version (watch wraparound). on writing a new config, if there is space left in the current erase block, put it after the last one, after finishing writing it, set the previos config's valid field to zero (flash lets you do this). If the eraseblock is full, write in the next eraseblock, and when you are done, erase the previous eraseblock. All of this can be done in userspace or with a userspace library, just mmap an mtd, and then use the erase ioctls. databases and logs...thats another story ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-21 21:26 ` safe flash filesystem Russ Dill @ 2001-06-22 8:22 ` Abraham vd Merwe [not found] ` <20010622102154.E1828@crystal.2d3d.co.za> 1 sibling, 0 replies; 30+ messages in thread From: Abraham vd Merwe @ 2001-06-22 8:22 UTC (permalink / raw) To: MTD for Linux [-- Attachment #1: Type: text/plain, Size: 963 bytes --] Hi Russ! > If its just a config file, why make all this so complicated? > > struct node { > > u32 magic; > char valid; > u32 version; > u32 data_crc; > u32 hdr_crc; > char data[DATA_SIZE]; > }; Yes, this is something in the lines I was thinking of. But what complicates things is if you start taking things like avoiding damaged blocks into account, wear levelling (this is fairly easy to solve) and keeping the flash unfragmented. -- Regards Abraham Walking on water wasn't built in a day. -- Jack Kerouac __________________________________________________________ Abraham vd Merwe - 2d3D, Inc. Device Driver Development, Outsourcing, Embedded Systems Cell: +27 82 565 4451 Snailmail: Tel: +27 21 761 7549 Block C, Antree Park Fax: +27 21 761 7648 Doncaster Road Email: abraham@2d3d.co.za Kenilworth, 7700 Http: http://www.2d3d.com South Africa [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <20010622102154.E1828@crystal.2d3d.co.za>]
* Re: safe flash filesystem [not found] ` <20010622102154.E1828@crystal.2d3d.co.za> @ 2001-06-22 17:23 ` Russ Dill 2001-06-25 7:45 ` Abraham vd Merwe 0 siblings, 1 reply; 30+ messages in thread From: Russ Dill @ 2001-06-22 17:23 UTC (permalink / raw) To: Abraham vd Merwe, linux-mtd Abraham vd Merwe wrote: > Hi Russ! > > >>If its just a config file, why make all this so complicated? >> >>struct node { >> >> u32 magic; >> char valid; >> u32 version; >> u32 data_crc; >> u32 hdr_crc; >> char data[DATA_SIZE]; >>}; >> > > Yes, this is something in the lines I was thinking of. But what complicates > things is if you start taking things like avoiding damaged blocks into > account, wear levelling (this is fairly easy to solve) and keeping the flash > unfragmented. > > if you only eraseblocks when you need to, you always have at least N-1 eraseblocks of pevious data, (where N is the number of eraseblocks used). A CRC can be done after the store to see if the node written is ok, if not, write it again (in the next node). since its a small amount of data (maybe 4-8k) and written linearly, wear leveling and fragmentation is not a problem. Lets say 4 parameter blocks of 16k a peice are used, that would be 1 erase cycle per 8 configs written, this would allow 800,000 configs to be written on standard flash. If a config was written at a rate of once an hour, it would last 93 years. If it were on 2 128k standard blocks, then you wolud have 3.2M configs written, which at the same rate, would last about 332 years. Remember, you are only performing an erase cycle after a block fills up, not for every write. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-22 17:23 ` Russ Dill @ 2001-06-25 7:45 ` Abraham vd Merwe 2001-06-25 7:59 ` Russ Dill 0 siblings, 1 reply; 30+ messages in thread From: Abraham vd Merwe @ 2001-06-25 7:45 UTC (permalink / raw) To: Russ Dill; +Cc: MTD for Linux [-- Attachment #1: Type: text/plain, Size: 1927 bytes --] Hi Russ! > > Yes, this is something in the lines I was thinking of. But what complicates > > things is if you start taking things like avoiding damaged blocks into > > account, wear levelling (this is fairly easy to solve) and keeping the flash > > unfragmented. > > > if you only eraseblocks when you need to, you always have at least N-1 > eraseblocks of pevious data, (where N is the number of eraseblocks > used). A CRC can be done after the store to see if the node written is > ok, if not, write it again (in the next node). since its a small amount > of data (maybe 4-8k) and written linearly, wear leveling and > fragmentation is not a problem. Lets say 4 parameter blocks of 16k a > peice are used, that would be 1 erase cycle per 8 configs written, this > would allow 800,000 configs to be written on standard flash. If a config > was written at a rate of once an hour, it would last 93 years. If it > were on 2 128k standard blocks, then you wolud have 3.2M configs > written, which at the same rate, would last about 332 years. Remember, > you are only performing an erase cycle after a block fills up, not for > every write. True, but once the flash fills up you have to start moving things around to erase entire blocks and then the whole 4k-8k thing doesn't hold anymore. But anyhow, like you said, it's not the most complicated thing in the world. -- Regards Abraham You don't have to know how the computer works, just how to work the computer. __________________________________________________________ Abraham vd Merwe - 2d3D, Inc. Device Driver Development, Outsourcing, Embedded Systems Cell: +27 82 565 4451 Snailmail: Tel: +27 21 761 7549 Block C, Antree Park Fax: +27 21 761 7648 Doncaster Road Email: abraham@2d3d.co.za Kenilworth, 7700 Http: http://www.2d3d.com South Africa [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-25 7:45 ` Abraham vd Merwe @ 2001-06-25 7:59 ` Russ Dill 2001-06-25 14:11 ` Vipin Malik 0 siblings, 1 reply; 30+ messages in thread From: Russ Dill @ 2001-06-25 7:59 UTC (permalink / raw) To: Abraham vd Merwe; +Cc: MTD for Linux Abraham vd Merwe wrote: > > Hi Russ! > > > > Yes, this is something in the lines I was thinking of. But what complicates > > > things is if you start taking things like avoiding damaged blocks into > > > account, wear levelling (this is fairly easy to solve) and keeping the flash > > > unfragmented. > > > > > if you only eraseblocks when you need to, you always have at least N-1 > > eraseblocks of pevious data, (where N is the number of eraseblocks > > used). A CRC can be done after the store to see if the node written is > > ok, if not, write it again (in the next node). since its a small amount > > of data (maybe 4-8k) and written linearly, wear leveling and > > fragmentation is not a problem. Lets say 4 parameter blocks of 16k a > > peice are used, that would be 1 erase cycle per 8 configs written, this > > would allow 800,000 configs to be written on standard flash. If a config > > was written at a rate of once an hour, it would last 93 years. If it > > were on 2 128k standard blocks, then you wolud have 3.2M configs > > written, which at the same rate, would last about 332 years. Remember, > > you are only performing an erase cycle after a block fills up, not for > > every write. > > True, but once the flash fills up you have to start moving things around to > erase entire blocks and then the whole 4k-8k thing doesn't hold anymore. > > But anyhow, like you said, it's not the most complicated thing in the world. you are overcomplicating things, there is one config file, and the flash is filled linearly, so once a block is full of written configs (only one of which being the current, valid config), the next eraseblock is erased. There is no moving things around, once we fill a block, all the other blocks have much older versions of the config, and we could care less ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: safe flash filesystem 2001-06-25 7:59 ` Russ Dill @ 2001-06-25 14:11 ` Vipin Malik 0 siblings, 0 replies; 30+ messages in thread From: Vipin Malik @ 2001-06-25 14:11 UTC (permalink / raw) To: Russ Dill, Abraham vd Merwe; +Cc: MTD for Linux, elw_dev_list At 12:59 AM 6/25/2001 -0700, Russ Dill wrote: >Abraham vd Merwe wrote: > > > > Hi Russ! > > > > > > Yes, this is something in the lines I was thinking of. But what > complicates > > > > things is if you start taking things like avoiding damaged blocks into > > > > account, wear levelling (this is fairly easy to solve) and keeping > the flash > > > > unfragmented. > > > > > > > if you only eraseblocks when you need to, you always have at least N-1 > > > eraseblocks of pevious data, (where N is the number of eraseblocks > > > used). A CRC can be done after the store to see if the node written is > > > ok, if not, write it again (in the next node). since its a small amount > > > of data (maybe 4-8k) and written linearly, wear leveling and > > > fragmentation is not a problem. Lets say 4 parameter blocks of 16k a > > > peice are used, that would be 1 erase cycle per 8 configs written, this > > > would allow 800,000 configs to be written on standard flash. If a config > > > was written at a rate of once an hour, it would last 93 years. If it > > > were on 2 128k standard blocks, then you wolud have 3.2M configs > > > written, which at the same rate, would last about 332 years. Remember, > > > you are only performing an erase cycle after a block fills up, not for > > > every write. > > > > True, but once the flash fills up you have to start moving things around to > > erase entire blocks and then the whole 4k-8k thing doesn't hold anymore. > > > > But anyhow, like you said, it's not the most complicated thing in the > world. > >you are overcomplicating things, there is one config file, and the flash >is filled linearly, so once a block is full of written configs (only one >of which being the current, valid config), the next eraseblock is >erased. There is no moving things around, once we fill a block, all the >other blocks have much older versions of the config, and we could care >less Russ, you are assuming a very trivial implementation (i.e. to a trivial requirement), where the solution is the duplicate the entire config file and rewrite it *every time*, even if just one of the config variables changed. (is my interpretation correct?). While this may be what is required of a _few_ designs out there, it is very difficult to extend, specially if you now want to store a "few" data values whose value updates more frequently than your config values. How are you going to handle this? IMHO, this approach is tyring to reinvent the wheel- thinking it will be easier this time (compared to JFFS which does essentially the same thing) because some "features" are not required. This may be very well be true for a particular case this time, but it sure won't work in most cases, and I would suspect for quite a lot of cases. How many embedded systems out there don't generate "data" value updates, as compared to only requiring (mostly static ) config files. I would be interested in hearing what the typical requirement is of the folks reading this. This is really not a JFFS/MTD discussion per se and we run the risk of polluting this list. If you care, just reply to me and the elw_dev_list@embeddedLinuxWorks.com where this discussion is already going on. (or subscribe at: http://www.embeddedlinuxworks.com/cgi-bin/signup/signup-dev.cgi) (Russ,) I've written a first cut, requirement spec for what I think would be required of most embedded systems that store config data as well as regular data value updates (and logs). Have you seen it? Regards, Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: safe flash filesystem @ 2001-06-21 16:05 Vipin Malik 0 siblings, 0 replies; 30+ messages in thread From: Vipin Malik @ 2001-06-21 16:05 UTC (permalink / raw) To: 'chris.read@clrassociates.co.uk', 'Abraham vd Merwe' Cc: 'MTD for Linux' >I would also be very interested in this. >The ability to retain consistency after multiple power outages >is crucial to many of the types of project upon which I work. >The problem can be quite complex if you get a power fail in a garbage >collection started as a result of a power fail during a previous GC. Well then subscribe to the "Dev list" at the following address. This discussion has now been taken there. You may also want to read: http://www.embeddedLinuxWorks.com/articles/jffs_guide.html and http://www.embeddedLinuxWorks.com/articles/db_project.html Regards, Vipin ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2001-07-07 12:49 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-21 10:54 safe flash filesystem Abraham vd Merwe
2001-06-21 13:43 ` Vipin Malik
2001-06-21 13:57 ` Abraham vd Merwe
2001-06-21 14:29 ` Vipin Malik
2001-06-21 14:35 ` Abraham vd Merwe
2001-06-21 15:05 ` Vipin Malik
2001-06-21 15:36 ` Chris Read
2001-06-21 15:09 ` Joakim Tjernlund
2001-06-21 15:34 ` Vipin Malik
2001-06-21 19:34 ` Joakim Tjernlund
2001-06-21 19:47 ` Joakim Tjernlund
2001-06-21 15:11 ` Herman Oosthuysen
2001-06-21 17:54 ` Tim Riker
2001-06-21 19:43 ` Vipin Malik
2001-06-21 19:35 ` Tim Riker
2001-06-21 19:56 ` Vipin Malik
2001-06-21 21:17 ` Kyle Harris
2001-07-03 23:53 ` On the "safe filesystem" and write() topic Bjorn Wesen
2001-07-04 14:10 ` Vipin Malik
2001-07-05 18:16 ` Bjorn Wesen
2001-07-06 13:40 ` Vipin Malik
2001-07-07 9:25 ` Bjorn Wesen
2001-07-07 13:06 ` Vipin Malik
2001-06-21 21:26 ` safe flash filesystem Russ Dill
2001-06-22 8:22 ` Abraham vd Merwe
[not found] ` <20010622102154.E1828@crystal.2d3d.co.za>
2001-06-22 17:23 ` Russ Dill
2001-06-25 7:45 ` Abraham vd Merwe
2001-06-25 7:59 ` Russ Dill
2001-06-25 14:11 ` Vipin Malik
-- strict thread matches above, loose matches on Subject: below --
2001-06-21 16:05 Vipin Malik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox