* [Fwd: power down]
@ 1999-12-06 23:41 Vipin Malik
1999-12-07 15:47 ` Bob Canup
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Vipin Malik @ 1999-12-06 23:41 UTC (permalink / raw)
To: MTD
Bob Canup wrote:
>
> The reason that I said that expecting anything to work during power down
> is wishful thinking is this: once the voltage to a digital chip goes
> below the minimum specification of the chip, the behavior of the chip
> becomes indeterminate.
That's why the stuff you need to protect during a power down (SRAM say),
has
its own backup battery and writes to the SRAM are shut off as soon as
the system voltage falls below the operational threshold.
>
> For example: the old Western Digital 1791 double density disk controller
> chip would sometimes glitch in such a way during power down that it
> would write to the floppy - you could see the floppy light blink when
> this happened.
Someone's buggy design does not mean that a better way does not exist.
Obviously the chip was buggy if it exhibited this behavior.
>
> Unless chips are specifically designed to handle power down conditions
> this sort of thing happens. For example - any competently designed
> Flash memory has to refuse to write if the voltage is below spec.
This is true. Flash chips will not initiate a write if power is not
within specs. So this helps design a system that CAN survive random
power downs.
>
> As to flushing the buffers and doing a shutdown when a power fail
> condition occurs - I believe that Linux already has code to handle a
> power down such as I described. What I have described is very similar to
> a UPS signaling the kernel that power is going down. Linux can do an
> ordered shutdown when it receives the signal.
Unfortunately the times involved are an order of magnitude different. An
embedded system may not have more than a few hundred milliseconds at
best. a UPS will provide a few minutes of power at worst. If the lowest
layers (in this case MTD) cannot guarantee handling of power downs, how
will the upper layer help?
>
> Qualifying digital circuitry with a POWER GOOD signal is very similar to
> protecting the circuitry with a typical 'SCR over voltage crowbar
> circuit': it makes the engineer feel good - but it doesn't actually do
> much of anything.
I'm sorry. I do not agree with this one bit. A low voltage detect
generated reset signal can gate (stop) writes to SRAM within sub 1 nano
seconds intervals. Don't see how the SCR analogy is relevant here.
>
> Why doesn't the crowbar work? After all, it is a text book circuit. The
> answer is that the SCR is a power device which takes on the order of 10
> microseconds to turn on while the delicate chips are destroyed by a few
> nanoseconds of over voltage. The result is that the SCR never turns on -
> the fuse blows because the weakest digital chip shorts the power supply
> to ground. One could "protect" SCR's with digital chips, but not the
> other way around.
>
> Another example of "feel good engineering" is the power on self test
> which most computers have. One can only test non critical sections of
> the machine: if anything critical is broken the POST won't run - and a
> tech will have to figure out what is wrong. It's a bit like asking
> yourself "Am I alive?" If you can ask the question the answer is always
> "Yes".
Actually this can be a very good answer to why we are on this planet! If
we weren't we would not be asking the question!! Anyway not relevant
here :)
<desperate plea>
Come on guys (and gals). Am I championing a lost cause here? Have we
given up on power down reliability of nonvolatile data in embedded
systems under Linux?
Is anyone interested in this!? Lurkers please respond. How many people
read this list anyway?
</desperate plea>
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
Vipin Malik
Daniel Industries
vmalik@danielind.com
All content my views and not my employers etc. etc.
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-06 23:41 [Fwd: power down] Vipin Malik
@ 1999-12-07 15:47 ` Bob Canup
1999-12-20 4:03 ` Stuart Lynne
1999-12-07 20:36 ` Jon Burford
1999-12-08 15:10 ` David Woodhouse
2 siblings, 1 reply; 18+ messages in thread
From: Bob Canup @ 1999-12-07 15:47 UTC (permalink / raw)
To: MTD
Vipin Malik wrote:
> Bob Canup wrote:
> >
I don't think that you understand what we're trying to tell you. There is a
difference in philosophy.
If you are running a flash as a normal read - write imitation of a disk there
are severe time limitations as to how long the flash is going to work because
of the limit on write cycles which flash technology has. As has been pointed
out in an earlier post - one write a second will ruin a flash chip in a few
weeks - which is not a very long for an embedded system to work.
Because of this limitation most of the people in this group who do design
with flash use it in a Write Rarely Read Mostly manner. The only time the
flash is written to is when there is a firmware upgrade. This is also the
manner in which flash chips are used on conventional PC motherboards - if you
lose power during a firmware upgrade - you are in trouble - nor do I see any
practical method of handling that problem.
If you are trying to use the flash in a data - logging application where the
file system has to be read - write to store data you are very quickly going
to run into the write cycle limitations of the technology. I don't think that
flash is the correct technology to use in such an application.
We use our DOC2000 in read only mode - with things like /var in volatile ram
disk - we have found this to be a satisfactory way of doing things.
Now - as to the issue of a POWER GOOD signal. The inverse of a POWER GOOD
signal is *POWER BAD. The reason for sending this signal to a chip is to tell
it that it won't function properly if it attempts to perform its operation.
But if the power is bad the chip is not working properly so it can't respond
properly to the *POWER BAD signal. This is the equivalent of saying to a dead
man "You're dead". That is true - but it does little good to tell him that.
That is the reason that there are no POWER GOOD input pins on anybody's
chips. In addition the analog detectors which generate the *POWER BAD signal
do not respond in nanoseconds - that is the reason for my analogy to the SCR
crowbar. By the time that the analog detectors respond you have been
operating the chips out of spec for a long time by digital standards.
Now because of the Yin and Yang nature of reality there can be some use to a
properly designed power fail detection circuit - but the way they are mostly
used is as a placebo.
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [Fwd: power down]
@ 1999-12-07 16:36 Oron Ogdan
0 siblings, 0 replies; 18+ messages in thread
From: Oron Ogdan @ 1999-12-07 16:36 UTC (permalink / raw)
To: MTD
Bob Canup wrote :
>I don't think that you understand what we're trying to tell you. There is a
>difference in philosophy.
>
>If you are running a flash as a normal read - write imitation of a disk
there
>are severe time limitations as to how long the flash is going to work
because
>of the limit on write cycles which flash technology has. As has been
pointed
>out in an earlier post - one write a second will ruin a flash chip in a few
>weeks - which is not a very long for an embedded system to work.
This is actually not correct, With good wear leveling the calculations are
different.
when you implement a correct wear leveling mechanism such as the one
DiskOnChip implements,
writing again and again to sector number 1 will spread the writes and erase
cycles all
over the media, In a pretty average way.
If you take for example an 8MB flash and write a page (512 bytes) every one
second,
You will get something like 512 years until the device starts to wear out.
8MB flash has 16384 pages and each supports a minimum of 1M erase cycles. Of
course
there is some overhead of garbage collection and writing and erasing pages
when
implementing flash management algorithms but still, Even if we take the over
head as
100% it's still above two hundred years.
Oron Ogdan
European Technical Manager
M-Systems Flash Disk Pioneers
email : orono@m-sys.com
efax, evoice : +1.(603).761.5426
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
begin 600 winmail.dat
M>)\^(@$0`0:0"``$```````!``$``0>0!@`(````Y`0```````#H``$(@`<`
M&````$E032Y-:6-R;W-O9G0@36%I;"Y.;W1E`#$(`06``P`.````SP<,``<`
M$@`D`#H``@!;`0$&``<``0````````$@@`,`#@```,\'#``'`!(`)``[``(`
M7`$!"8`!`"$```!&044Q,$(Q-4(S04-$,S$Q038T1#$P0C,U,$,Q,#`P,``%
M!P$$@`$`%@```%)%.B!;1G=D.B!P;W=E<B!D;W=N70`I!P$-@`0``@````(`
M`@`!`Y`&`/0*```V````"P`"``$````#`"X```````(!,0`!````1@``````
M```!2J_C%X?2$:F0`&"7CWYC!P"Q>),%*A?2$:F&`&"7CWYC````;ZWM``"Q
M>),%*A?2$:F&`&"7CWYC```!,B7Z`````$``.0!`1?%'T4"_`1X`<``!````
M$@```%M&=V0Z('!O=V5R(&1O=VY=`````@%Q``$````;`````;]`RN*I3(#]
M%JRQ$=.IL@!@EX]^8P``^OFP``(!"1`!````]@0``/($``!S!P``3%I&=95!
M)W,#``H`<F-P9S$R->(R`T-T97@%00$#`??_"H`"I`/D!Q,"@`_S`%`$5C\(
M50>R$24.40,!`@!C:.$*P'-E=#(&``;#$27V,P1&$[<P$BP1,PCO"?>V.Q@?
M#C`U$2(,8&,`4#,+"0%D,S864`NF($(,;V(2(`!P=7`@=W,#8`ZP(#H*H@J$
M"H`^*$D@9`(@)P5`=&A5"X!K'T%A!4!Y"&`@]'5N!(%S`9`@4!V`'\*8=V4G
M&"`?0')Y"X#*9Q]`;Q]`96P#("`!<"X@5&@$D!W0!``@*F$>960&D&8C`6YC
MFR,A`Z!P'V`7L'-O)-#L>2X>91YF9A_S"L`=T(YR($`#`"'A82!F"V#,<V@C
M8"-1(&X%L`#`JP,@&"!A(-`M'8%I'<&7!W`I8!_0:0(@(&\F</LGD"/P<Q^2
M(P$>92;2%!#.=B,"*?`'@"!L*:<C4?,$("(1:&\'X!>P(>(B\-,GI2-!9V\A
MU7<%L!^0\F(%D&%U%!`>92I!+E(/+*,J,`.@*41C>6-L_P>1(/`-X"?P)[0.
ML!/0*&"Y%[!G>2VP)]`BP$$$(/\T(2_Q"?`DP"\Q#K`+,3"4WG4%0"2A`Y$H
MX'(LH!*!_S4P()`I$0(@'=`I1">0%!#O!:`@P@,0*+%U-E(GI1/0/P4@-D,G
MH`?0'F4A0&5K_P0@*2$RPR-!*&`%0">0+`'_-``N`P(0!<`V@@;0"8`!`/D@
MT'-Y()`]L"]V)74>!$<BX"-!(T)C='4'0&P?-``\0@6A&"!`@"P@5_\I8"?P
M+R`$<"$Q"L$R<"P`QRR@+B4P(&QC=0M@+07[(7$C]G0E=2#P-0$@`@=P_PM0
M/;!%`2>!045";0>`$]'U`P!S/I!S&M`G\RY2-Z(Z1"JA3PA`.@-&1G,L]T5%
M*5$G4V<+<39Q(-!,-'\B$3AA(A`%P!U0/<$%P#'Y.,1S<"C3+E(I0R-1(,'?
M!)`GT#(G0+$>!&\L`2Y#ZP>`(_!A0;!).5).P0)`W30`82P!3#`WP6$E9B9E
MW0&0:RYQ!;$.P&%&0C9R^#A-0B>F(,(W]0JP4X&L*#4.("_P>4^1*3:@WSRC
M-Z(X9$&P'@19(!$XT_]3@`5`)2`'@!]2(?`LH%3A_5?">3:Q!"`@0"GP`R`N
M4OT!`'8-X"O!`9``("US0G,?-A$BP!X$5@@THC$V,WPX-%=33[4`T"?P2/!P
MWS4P79(GD"FP`P!M3>`J,NPQ35`+(L!/)G`%H`AP_U!!'@0JXR,R6L(J,"P!
M(O#G*/$J04Q`<F)3<A>1,G#_0(`J`E:F)U-/Y2'28$1%HG\>!$8W(=(GM`.!
M4W%&A&SS+R`I46AM-,$V(2"0..']0;!%+``#H`:0(3%4M$ES#U&"9<,H$1X$
M,3`P)?DC,'0G91%LTB-@!N`L`/\?0"^P+;`@01@A6^0_*QX$QD\#8`.@3V=D
M`'`>!-Y%"'`E,"C@`Z!4,X(-X)4HH4UK,W(>!$TM!K";/F($($8GPTGR(%`J
M`;\)X!0`'@H]L`MP`R`Z*C"A<U%O0&TM/D$N!:#B;7AU9F%X0;`K\"\P0R1Q
M>3`K,2XH'#`S$"DN-S9[H#4T,NXV<B]]7W)L5"(@($!(\+YB!/(P`$&P%!`@
MP2)_V;%(0'1D(B("`,!J!;!['O`$8$`+@`-0*/`HX2X5!;!G'@1]A"```!X`
M0A`!````'P```#PS.#1$,D,R1"XV039&-$-%-D!G;S)F87@N8V]M/@```@$4
M.@$````0````^N$+%;.LTQ&F31"S4,$```L`0#H!`````P#>/Z]O```#`/$_
M"00```,`_3_D!```'@`Q0`$````&````3U)/3D\````#`!I``````!X`,$`!
M````!@```$]23TY/`````P`90``````#``E9`P````,``(`((`8``````,``
M``````!&`````%*%```G:@$`'@`!@`@@!@``````P````````$8`````5(4`
M``$````$````.2XP``L`RX`((`8``````,````````!&``````:%````````
M`P`"@`@@!@``````P````````$8``````84````````+``.`""`&``````#`
M````````1@`````#A0````````L`!(`((`8``````,````````!&``````Z%
M`````````P`%@`@@!@``````P````````$8`````$(4````````#``:`""`&
M``````#`````````1@`````1A0````````,`!X`((`8``````,````````!&
M`````!B%`````````P`F```````#`#8``````!X`"(`((`8``````,``````
M``!&`````#:%```!`````0`````````>``F`""`&``````#`````````1@``
M```WA0```0````$`````````'@`*@`@@!@``````P````````$8`````.(4`
M``$````!``````````,`@!#_____"P#R$`$````"`4<``0```#<```!C/553
M.V$](#MP/4TM4WES=&5M<SML/4U3+4580TA!3D=%+3DY,3(P-S$V,S8U.%HM
M,3(V.#8```(!^3\!````3`````````#<IT#(P$(0&K2Y"``K+^&"`0``````
M```O3SU-+5-94U1%35,O3U4]5$5,+4%6258O0TX]4D5#25!)14Y44R]#3CU/
M4D].3P`>`/@_`0````L```!/<F]N($]G9&%N```>`#A``0````8```!/4D].
M3P````(!^S\!````3`````````#<IT#(P$(0&K2Y"``K+^&"`0`````````O
M3SU-+5-94U1%35,O3U4]5$5,+4%6258O0TX]4D5#25!)14Y44R]#3CU/4D].
M3P`>`/H_`0````L```!/<F]N($]G9&%N```>`#E``0````8```!/4D].3P``
M`$``!S`0+09#T4"_`4``"#"`9A%(T4"_`1X`/0`!````!0```%)%.B``````
M'@`=#@$````2````6T9W9#H@<&]W97(@9&]W;ET````>`#40`0```#L````\
M0C$W.#DS,#4R03$W1#(Q,4$Y.#8P,#8P.3<X1C=%-C,P,31!,#$S-4!M86EL
M+FUS>7,N8V\N:6P^```+`"D```````L`(P```````P`&$&;7.S4#``<0D`0`
M``,`$!```````P`1$``````>``@0`0```&4```!"3T)#04Y54%=23U1%.DE$
M3TY45$A)3DM42$%464]554Y$15)35$%.1%=(051715)%5%)924Y'5$]414Q,
M64]55$A%4D5)4T%$249&15)%3D-%24Y02$E,3U-/4$A924993U5!``````(!
M?P`!````.P```#Q",3<X.3,P-3)!,3=$,C$Q03DX-C`P-C`Y-SA&-T4V,S`Q
;-$$P,3,U0&UA:6PN;7-Y<RYC;RYI;#X``#&K
`
end
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-06 23:41 [Fwd: power down] Vipin Malik
1999-12-07 15:47 ` Bob Canup
@ 1999-12-07 20:36 ` Jon Burford
1999-12-13 14:49 ` Adi Linden
1999-12-08 15:10 ` David Woodhouse
2 siblings, 1 reply; 18+ messages in thread
From: Jon Burford @ 1999-12-07 20:36 UTC (permalink / raw)
To: Vipin Malik, MTD
I am actually extremely interested in this issue, although I am not very
qualified to present possible solutions. I am primarily a systems and
software guy and have been constructing an embedded linux system which boots
off an M-Systems DOC2000 and runs mostly out of ram disk. The board I am
using has a watchdog timer which could spuriously reset the board (just like
hitting the reset button on your PC). Power failures are also a reality I
must deal with. I must at least make an attempt to guarantee that the
system will always come back up (the damaged DOC2000 filesystem will be
repaired by e2fsck upon subsequent boot up). To give you an idea of
what/when I am doing flash writes, I am running postgres whose db files are
in flash and am doing about a 20-100 byte record insert per minute (on
average). The log files in /var/log/* are also in flash. There are no
custom apps which write often to syslog and I am not running mail (although
I am running apache which I could, but haven't yet turned off logging for).
I mount the DOC2000 on /usr, but write only to the logs and db files (I have
'chattr i' on all other files in /usr). What I would like to get an opinion
on is:
1) What is the probability that e2fsck will not be able to reapair the
filesystem?
2) What is the probability that I will damage the boot sector and lilo will
not be able to being to boot at all?
3) Since I use a pretty standard 5/12 V switching power supply and embedded
PC board (a 40W compact version of a standard PC power supply w/o fan), do I
have any hope in making HW or SW changes to possibly reduce or fix this
problem?
Any suggestions or insight much appreciated.
Regards,
Jon
----- Original Message -----
From: Vipin Malik <vmalik@danielind.com>
To: MTD <mtd@imladris.mvhi.com>
Sent: Monday, December 06, 1999 3:41 PM
Subject: [Fwd: power down]
> Bob Canup wrote:
> >
> > The reason that I said that expecting anything to work during power down
> > is wishful thinking is this: once the voltage to a digital chip goes
> > below the minimum specification of the chip, the behavior of the chip
> > becomes indeterminate.
>
> That's why the stuff you need to protect during a power down (SRAM say),
> has
> its own backup battery and writes to the SRAM are shut off as soon as
> the system voltage falls below the operational threshold.
>
> >
> > For example: the old Western Digital 1791 double density disk controller
> > chip would sometimes glitch in such a way during power down that it
> > would write to the floppy - you could see the floppy light blink when
> > this happened.
>
> Someone's buggy design does not mean that a better way does not exist.
> Obviously the chip was buggy if it exhibited this behavior.
>
>
> >
> > Unless chips are specifically designed to handle power down conditions
> > this sort of thing happens. For example - any competently designed
> > Flash memory has to refuse to write if the voltage is below spec.
>
> This is true. Flash chips will not initiate a write if power is not
> within specs. So this helps design a system that CAN survive random
> power downs.
>
> >
> > As to flushing the buffers and doing a shutdown when a power fail
> > condition occurs - I believe that Linux already has code to handle a
> > power down such as I described. What I have described is very similar to
> > a UPS signaling the kernel that power is going down. Linux can do an
> > ordered shutdown when it receives the signal.
>
> Unfortunately the times involved are an order of magnitude different. An
> embedded system may not have more than a few hundred milliseconds at
> best. a UPS will provide a few minutes of power at worst. If the lowest
> layers (in this case MTD) cannot guarantee handling of power downs, how
> will the upper layer help?
>
> >
> > Qualifying digital circuitry with a POWER GOOD signal is very similar to
> > protecting the circuitry with a typical 'SCR over voltage crowbar
> > circuit': it makes the engineer feel good - but it doesn't actually do
> > much of anything.
>
> I'm sorry. I do not agree with this one bit. A low voltage detect
> generated reset signal can gate (stop) writes to SRAM within sub 1 nano
> seconds intervals. Don't see how the SCR analogy is relevant here.
>
>
> >
> > Why doesn't the crowbar work? After all, it is a text book circuit. The
> > answer is that the SCR is a power device which takes on the order of 10
> > microseconds to turn on while the delicate chips are destroyed by a few
> > nanoseconds of over voltage. The result is that the SCR never turns on -
> > the fuse blows because the weakest digital chip shorts the power supply
> > to ground. One could "protect" SCR's with digital chips, but not the
> > other way around.
> >
> > Another example of "feel good engineering" is the power on self test
> > which most computers have. One can only test non critical sections of
> > the machine: if anything critical is broken the POST won't run - and a
> > tech will have to figure out what is wrong. It's a bit like asking
> > yourself "Am I alive?" If you can ask the question the answer is always
> > "Yes".
>
> Actually this can be a very good answer to why we are on this planet! If
> we weren't we would not be asking the question!! Anyway not relevant
> here :)
>
>
> <desperate plea>
> Come on guys (and gals). Am I championing a lost cause here? Have we
> given up on power down reliability of nonvolatile data in embedded
> systems under Linux?
>
> Is anyone interested in this!? Lurkers please respond. How many people
> read this list anyway?
> </desperate plea>
>
>
> >
> > To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
>
> Vipin Malik
> Daniel Industries
> vmalik@danielind.com
> All content my views and not my employers etc. etc.
>
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-06 23:41 [Fwd: power down] Vipin Malik
1999-12-07 15:47 ` Bob Canup
1999-12-07 20:36 ` Jon Burford
@ 1999-12-08 15:10 ` David Woodhouse
2 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 1999-12-08 15:10 UTC (permalink / raw)
To: Vipin Malik; +Cc: MTD
vmalik@danielind.com said:
> <desperate plea> Come on guys (and gals). Am I championing a lost
> cause here? Have we given up on power down reliability of nonvolatile
> data in embedded systems under Linux?
Power down reliability of such data is an absolute must. I expect embedded
systems using NFTL and ext3 to be absolutely reliable even if the plug is
pulled half-way through a write.
This means we end up with two separate journalling / logging mechanisms - one
at the block device level (ext3) and one at the flash media level (NFTL). As I
said - ideally, we ditch that in the end and have a journalling/log-structured
filesystem directly on the flash media.
> Is anyone interested in this!? Lurkers please respond. How many people
> read this list anyway? </desperate plea>
infradead /var/lib/majordomo/lists $ wc -l mtd
64 mtd
--
dwmw2
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Fwd: Power Down]
@ 1999-12-08 20:42 Vipin Malik
0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 1999-12-08 20:42 UTC (permalink / raw)
To: MTD
Bob Canup wrote:
>
> Watch dogs are generally there to catch the problem of a run-away
> machine - this ought to be a very rare occurrence.
>
> According to Vipin's statistics about 1 in 250 random power failures
> during writes to a DOC2000...
Actually it was a m-systems IDE2000 (1.3") IDE flash hard drive. NOT the
DOC2000! Just making that clear.
>...results in a bad sector on the device. Since
> you are required to run the chip in RW mode the only way I see to avoid
> the problem is a UPS on the front end - with signaling to indicate power
> failure so that an ordered shutdown could occur.
>
> As far as the problem of a bad sector which he discussed I have not seen
> any solutions other than the erase and start over one he originally came
> up with - which for the reasons he discussed - is unacceptable.
>
> The first step toward solving a problem is understanding exactly what
> the problem is. My theory is that if you interrupt a sector write while
> it is in progress the data and the error checking code don't match -
> thus you get a bad sector. Any other theories?
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Fwd: Power Down]
@ 1999-12-08 20:48 Vipin Malik
0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 1999-12-08 20:48 UTC (permalink / raw)
To: MTD
Oron Ogdan wrote:
>
> Since DiskOnChip is meant to be used in embedded system, we in M-systems
>
> do is provide a hard disk emulation on flash which is
> resistant to power failures. We power cycle the media for several
> months to check our algorithms are indeed power fail resistant.
I did not test the DOC. If you do the same with the IDE2000 line of
products, then I am telling you, there IS A PROBLEM there!
The type of problems that I saw, were loss of the sectors at the lowest
level. trying to read those resulted in various errors from the disk.
Some were:
CRC error, Unrecoverable error etc.
The only way to recover from these errors was to do a byte write to the
sector. This would make the sector readable but erase the entire 512
byte sector.
This was even before we got to e2fsck!
Unfortunately since it could be ANY sector on the raw disk, on the file
system, it could be a sector that contained 4 inodes. Maybe even 4
directories, thus resulting in the loss of multitude of files. Not only
the last file that was being written to when power failed. This
catastrophic, and not acceptable.
I went back and forth with a couple of guys from m-sys on this but it
never went anywhere.
BTW, this behaviour was observed on multiple disks (>2).
>
> That means that the NFTL structures on the media
> are resistant to any power loss during any stage of the algorithm.
> The only thing that can happen is that you will have what's called
> orphan
> units that need to be scanned for and released on mount.
>
> But this only protects the logical / physical mapping, It does not
> guarantee any damage to the file system on those logical sectors was
> not caused.
>
> The only way to be resistant to power failures in the file system level
> is to use a log-structured file system. I heard ext3 is log-structured
> but
> I am sure one of the Linux guys here knows more about this, Any other
> file system
> that takes power failures into account, (I am afraid to guess NFTL
> ????).
> Some of our customers use their own home brewed LFSs and do it
> successfully.
>
> Oron
>
> -----Original Message-----
> From: Bob Canup [mailto:rcanup@go2fax.com]
> Sent: Tuesday, December 07, 1999 9:19 PM
> To: MTD
> Subject: Re:Power Down
>
> Watch dogs are generally there to catch the problem of a run-away
> machine - this ought to be a very rare occurrence.
>
> According to Vipin's statistics about 1 in 250 random power failures
> during writes to a DOC2000 results in a bad sector on the device. Since
> you are required to run the chip in RW mode the only way I see to avoid
> the problem is a UPS on the front end - with signaling to indicate power
> failure so that an ordered shutdown could occur.
>
> As far as the problem of a bad sector which he discussed I have not seen
> any solutions other than the erase and start over one he originally came
> up with - which for the reasons he discussed - is unacceptable.
>
> The first step toward solving a problem is understanding exactly what
> the problem is. My theory is that if you interrupt a sector write while
> it is in progress the data and the error checking code don't match -
> thus you get a bad sector. Any other theories?
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Fwd: Power Down]
@ 1999-12-08 21:32 Vipin Malik
1999-12-09 11:10 ` David Woodhouse
0 siblings, 1 reply; 18+ messages in thread
From: Vipin Malik @ 1999-12-08 21:32 UTC (permalink / raw)
To: MTD
David Woodhouse wrote:
>
> Orono@m-sys.com said:
> > The only way to be resistant to power failures in the file system
> > level is to use a log-structured file system.
>
> > I heard ext3 is log-structured but I am sure one of the Linux guys here
> > knows more about this,
>
> ext3 is a journalling filesystem. I believe that it's not log-structured.
Is ext3 already available in beta form? Is it included in the latest
2.3.x kernels?
>
> Journalling is sufficient for protection from power failures. Stephen,
> would you care to elaborate on the difference?
What is the overhead of Journalling? (CPU AND flash space). Of course if
that solves the problem, then that is the most important thing (for me
at least).
>
> > Any other file system that takes power failures into account, (I am afraid
> > to guess NFTL ????).
>
> I believe that NTFS is also journalling but not log-structured.
>
> > Some of our customers use their own home brewed LFSs and do it successfully.
>
> Personally, I'm inclined to believe that we should run a filesystem directly
> on the flash device - rather than faking a block device and running a 'normal'
> filesystem on top of that.
I think that would probably be the best idea also, since in repairing
low level partitions vs a higher level f/s, could get really difficult
to prove that it is reliable.
The only problem with this approach is that it is not the usual "linux"
layered approach. But maybe that approach does not work well with
embedded systems and is a hit we are willing to accept?
Another problem with the usual "layered" approach is that the VFS uses a
lot of buffering in its layer. This is usually desirable for read/write
performance, however for logs etc. on FLASH this is probably NOT
desirable.
When I write a log, I want it written and saved. I guess one could do
sync() etc. every time but not very elegant.
>
> I've been rushed off my feet here with other things for a while, but as soon as
> I get back to it, after I've fixed the NFTL and DiskOnChip Millennium support,
> that's what I'm intending to look at.
Do you have an estimate for the time frame that you are looking at?
I am willing to help out with coding, as I am extremely interested in
this (and
have access to lots of different embedded hardware with FLASH, SRAM etc.
on it, and if needed I could possibly buy more :).
>
> If you're running ext2 on a DiskOnChip you should mount it with the noatime
> option if you can - this will eliminate a lot of write activity.
>
> --
> dwmw2
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Fwd: Power Down]
@ 1999-12-08 21:36 Vipin Malik
1999-12-08 23:02 ` Bob Canup
0 siblings, 1 reply; 18+ messages in thread
From: Vipin Malik @ 1999-12-08 21:36 UTC (permalink / raw)
To: MTD
Bob Canup wrote:
>
> It is obvious that a physical medium such as a disk is vulnerable to
> having a bad sector created by the process that I described. The proof
> is simple: pop out a diskette while you are writing to it and you stand
> a good chance of creating a sector in which the CRC and data are out of
> sync. When you attempt to read the sector you will get a bad CRC.
>
> This occurs in a diskette because the writing process is a serial event;
> it is spread over time. So there is a window in which an interruption
> can create a bad sector.
>
> Let us assume the the DOC writes all of the bytes in a page including
> the ECC code in parallel, let us also assume that you have an internal
> bit which marks a sector as good when that process has completed. There
> nevertheless is a time during the 'burn' of the bits where we are in an
> analog state of changing the bits. If power is lost at that time - some
> of the bits will not have changed to their proper state. Even if the
> page is not marked as good an attempt to read the page will result in an
> ECC and data which do not match and the result is a bad sector. The
> sector may be easily recovered by erasing it and starting over - but as
> long as there is an analog aspect to changing the states - the bits will
> not all change at the same instant and a window for corruption exists.
Ah! buy having a CRC on the *ENTIRE* sector gets around this problem.
Unless ALL the bits are burned in, the CRC will not match on a read.
As to what happens the next time power comes back on, I guess that one
does not erase the "good" sector till the new one is completely written.
This way, at least you have the last (old) data still available.
>
> Vipin's original post said that he saw bad sectors about once in every
> 250 power down cycles. Oran is telling us that can't occur.
I mailed a detailed post about this one a few mails ago. Plese refer
that.
>
> Of course if my analysis is correct then you are safe to erase the bad
> sector - it was the last one being written; the file system would then
> be left in a state in which e2fsck could hopefully repair it.
Unfortunately, if the lower layer driver does not know WHAT goes in that
sector (inodes, data, etc), it could end up erasing a sector with inodes
in it. This is something that e2fsck will not be able to recover from
(at least without potentially killing a bunch of files on the system).
>
> Or am I off in left field with this?
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Fwd: Power Down]
@ 1999-12-08 21:39 Vipin Malik
0 siblings, 0 replies; 18+ messages in thread
From: Vipin Malik @ 1999-12-08 21:39 UTC (permalink / raw)
To: MTD
David Woodhouse wrote:
>
> rcanup@go2fax.com said:
> > Vipin's original post said that he saw bad sectors about once in
> > every 250 power down cycles. Oran is telling us that can't occur.
>
> At the block device level, that definitely shouldn't occur. The ext2 may of
> course get confused, but that's why you should be using ext3 on it if you ever
> expect it to lose power.
I can gurantee you that it did with at least 3 or 4 IDE2000 disks that I
tested. Also with a couple of compact flash disks that I tested. The
compact flash were so bad (1 in about 4 problems on power downs) that I
just gave up on those as completely unacceptable.
>
> > Of course if my analysis is correct then you are safe to erase the bad
> > sector - it was the last one being written; the file system would then
> > be left in a state in which e2fsck could hopefully repair it.
>
> With NFTL, that's definitely the way it's designed. You write the data, write
> the ECC checksum, and then mark it valid. The old version of that block remains
> on the media until it's later reclaimed.
>
> If you are interrupted during a write, it's obvious and can be fixed.
Is this functionality already there in the drivers on the web site. Has
this been tested during power downs during writes to flash?
>
> --
> dwmw2
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: Power Down]
1999-12-08 21:36 Vipin Malik
@ 1999-12-08 23:02 ` Bob Canup
1999-12-09 11:02 ` David Woodhouse
1999-12-09 14:56 ` Bob Canup
0 siblings, 2 replies; 18+ messages in thread
From: Bob Canup @ 1999-12-08 23:02 UTC (permalink / raw)
To: MTD
Vipin Malik wrote:
> Bob Canup wrote:
> >
> > It is obvious that a physical medium such as a disk is vulnerable to
> > having a bad sector created by the process that I described. The proof
> > is simple: pop out a diskette while you are writing to it and you stand
> > a good chance of creating a sector in which the CRC and data are out of
> > sync. When you attempt to read the sector you will get a bad CRC.
> >
> > This occurs in a diskette because the writing process is a serial event;
> > it is spread over time. So there is a window in which an interruption
> > can create a bad sector.
> >
> > Let us assume the the DOC writes all of the bytes in a page including
> > the ECC code in parallel, let us also assume that you have an internal
> > bit which marks a sector as good when that process has completed. There
> > nevertheless is a time during the 'burn' of the bits where we are in an
> > analog state of changing the bits. If power is lost at that time - some
> > of the bits will not have changed to their proper state. Even if the
> > page is not marked as good an attempt to read the page will result in an
> > ECC and data which do not match and the result is a bad sector. The
> > sector may be easily recovered by erasing it and starting over - but as
> > long as there is an analog aspect to changing the states - the bits will
> > not all change at the same instant and a window for corruption exists.
>
> Ah! buy having a CRC on the *ENTIRE* sector gets around this problem.
> Unless ALL the bits are burned in, the CRC will not match on a read.
> As to what happens the next time power comes back on, I guess that one
> does not erase the "good" sector till the new one is completely written.
> This way, at least you have the last (old) data still available.
>
> >
What I was trying to do was outline the cause of the bad sector - CRC error
problem during power loss. Certainly a CRC on the entire sector reports the
problem - that is why one uses CRC's. You can identify a bad sector - the
question is what do you do about it?
You are worried about losing inodes and directories to bad sectors. Let me
pose a question: suppose that the power failed just before the write to the
inode or directory sector - that there was no bad sector created - that the
data was just never written to the recording media - do you see any way to
recover from that problem?
I can't see any difference between erasing a bad sector after the next power
up and the case of a slightly earlier power failure where the data was never
written in the first place; you wind up with an identical file system in both
cases. If you can survive the case of the earlier power failure failing to
record the sector then you can survive the case of the bad sector being
erased.
The only difference that I see is that in the case of the bad sector you
know something happened, in the case of the data never being recorded you
don't.
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: Power Down]
1999-12-08 23:02 ` Bob Canup
@ 1999-12-09 11:02 ` David Woodhouse
1999-12-09 14:56 ` Bob Canup
1 sibling, 0 replies; 18+ messages in thread
From: David Woodhouse @ 1999-12-09 11:02 UTC (permalink / raw)
To: Bob Canup; +Cc: MTD
rcanup@go2fax.com said:
> You can identify a bad sector - the question is what do you do about
> it?
Use a journalling filesystem. Make sure that any sector that's being written
is disposable - if it gets wiped, then you can just do without it. Only remove
your old copy of the filesystem data thats in that sector when you have a 100%
guarantee that your write has hit the media correctly.
--
dwmw2
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: Power Down]
1999-12-08 21:32 Vipin Malik
@ 1999-12-09 11:10 ` David Woodhouse
0 siblings, 0 replies; 18+ messages in thread
From: David Woodhouse @ 1999-12-09 11:10 UTC (permalink / raw)
To: Vipin Malik; +Cc: MTD
vmalik@danielind.com said:
> > I've been rushed off my feet here with other things for a while, but
> > as soon as I get back to it, after I've fixed the NFTL and DiskOnChip
> > Millennium support, that's what I'm intending to look at.
> Do you have an estimate for the time frame that you are looking at?
We have an EU research project coming to an end in February/March, and I'm
running our end of it.
I'm hoping to snatch a few days in the next week or two to merge in all the MTD
fixes I've been sent and look at the problems which have been reported, and
make a new release.
After that, I'm unlikely to be able to be very productive until mid-to-late
Feb.
> I am willing to help out with coding, as I am extremely interested in
> this (and have access to lots of different embedded hardware with
> FLASH, SRAM etc. on it, and if needed I could possibly buy more :).
I have documentation on NFTL and the DiskOnChip hardware which I am permitted
to release under NDA to someone who is serious about contributing to the
development - your assistance would be greatly appreciated because I was
hoping to have that completed by now.
There is also some work to be done with FTL on NOR flash, I believe. It should
work, but apparently there are some problems with it. I don't currently have
access to any NOR flash devices so haven't tested it recently.
Also, someone needs to provide new PCMCIA bulkmem drivers which use the MTD
system.
There's plenty of work to go round.
--
dwmw2
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: Power Down]
1999-12-08 23:02 ` Bob Canup
1999-12-09 11:02 ` David Woodhouse
@ 1999-12-09 14:56 ` Bob Canup
1999-12-20 4:22 ` Stuart Lynne
1 sibling, 1 reply; 18+ messages in thread
From: Bob Canup @ 1999-12-09 14:56 UTC (permalink / raw)
To: MTD
Bob Canup wrote:
> Vipin Malik wrote:
>
> > Bob Canup wrote:
> > >
> >
> I can't see any difference between erasing a bad sector after the next power
> up and the case of a slightly earlier power failure where the data was never
> written in the first place; you wind up with an identical file system in both
> cases. If you can survive the case of the earlier power failure failing to
> record the sector then you can survive the case of the bad sector being
> erased.
>
> The only difference that I see is that in the case of the bad sector you
> know something happened, in the case of the data never being recorded you
> don't.
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
On the way home last night it occurred to me that if you are doing a
read-modify-write cycle that it does make a difference where the error occurs.
In this case if the sector is never written the old data is still valid.
It seems to me the wear leveling necessary in a flash can be a benefit here; if
I understand correctly we don't overwrite the physical pages - only when a page
is 'committed' does the chip change its mapping so that it appears that the
sector was overwritten.
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-07 20:36 ` Jon Burford
@ 1999-12-13 14:49 ` Adi Linden
1999-12-13 19:07 ` Jon Burford
0 siblings, 1 reply; 18+ messages in thread
From: Adi Linden @ 1999-12-13 14:49 UTC (permalink / raw)
To: Jon Burford; +Cc: Vipin Malik, MTD
> I mount the DOC2000 on /usr, but write only to the logs and db files (I have
> 'chattr i' on all other files in /usr). What I would like to get an opinion
> on is:
If you mount the doc2000 on /usr and have evrything else in ramdisk,
perhaps mount /usr read-only. It won't corrupt that way.
TTYL,
Adi
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-13 14:49 ` Adi Linden
@ 1999-12-13 19:07 ` Jon Burford
0 siblings, 0 replies; 18+ messages in thread
From: Jon Burford @ 1999-12-13 19:07 UTC (permalink / raw)
To: Adi Linden; +Cc: Vipin Malik, MTD
I would have NEVER thought of that! Too bad I need to write to this
partition (as detailed in the rest of my message that you did not include
here). I have a database on board which needs persistent storage. And yes,
I have thought of the previously mentioned options of running everything in
ramdisk and periodically moving files to the flash. Of course, odds say
this is less susceptible to corruption, but I don't have enough ram (I am
using the biggest SIMM my board will take already) for this.
-Jon
----- Original Message -----
From: Adi Linden <adi@adis.on.ca>
To: Jon Burford <jburford@xsilogy.com>
Cc: Vipin Malik <vmalik@danielind.com>; MTD <mtd@imladris.mvhi.com>
Sent: Monday, December 13, 1999 6:49 AM
Subject: Re: [Fwd: power down]
>
> > I mount the DOC2000 on /usr, but write only to the logs and db files (I
have
> > 'chattr i' on all other files in /usr). What I would like to get an
opinion
> > on is:
>
> If you mount the doc2000 on /usr and have evrything else in ramdisk,
> perhaps mount /usr read-only. It won't corrupt that way.
>
> TTYL,
> Adi
>
>
>
> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: power down]
1999-12-07 15:47 ` Bob Canup
@ 1999-12-20 4:03 ` Stuart Lynne
0 siblings, 0 replies; 18+ messages in thread
From: Stuart Lynne @ 1999-12-20 4:03 UTC (permalink / raw)
To: mtd
In article <384D2C2D.6A6F4CE6@go2fax.com>, Bob Canup <rcanup@go2fax.com> wrote:
>Vipin Malik wrote:
>
>> Bob Canup wrote:
>> >
>
>I don't think that you understand what we're trying to tell you. There is a
>difference in philosophy.
>
>If you are running a flash as a normal read - write imitation of a disk there
>are severe time limitations as to how long the flash is going to work because
>of the limit on write cycles which flash technology has. As has been pointed
>out in an earlier post - one write a second will ruin a flash chip in a few
>weeks - which is not a very long for an embedded system to work.
Assuming load levelling across a 4mb flash drive, if you average 4kb per
write and write once per second continously you will write each sector a
about 40 times per day.
AMD specs a minimum program/erase cycles of 100,000 per sector (flash
sector) and 1,000,000 per device. To reach 100,000 writes at 50 per day
would take over 5 years.
>Because of this limitation most of the people in this group who do design
>with flash use it in a Write Rarely Read Mostly manner. The only time the
>flash is written to is when there is a firmware upgrade. This is also the
>manner in which flash chips are used on conventional PC motherboards - if you
>lose power during a firmware upgrade - you are in trouble - nor do I see any
>practical method of handling that problem.
Well if you have control over your design simply ensure that you have two
banks and can boot from either of them. Upgrading involves booting from
one to upgrade the other and then selecting the new bank as the default
boot.
--
Stuart Lynne <sl@fireplug.net> __O
<http://www.thinlinux.org> _-\<,_ 604-461-7532
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00 (_)/ (_) 88 EC A3 EE 2D 1C 15 68
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Fwd: Power Down]
1999-12-09 14:56 ` Bob Canup
@ 1999-12-20 4:22 ` Stuart Lynne
0 siblings, 0 replies; 18+ messages in thread
From: Stuart Lynne @ 1999-12-20 4:22 UTC (permalink / raw)
To: mtd
In article <384FC311.1E57CBD6@go2fax.com>, Bob Canup <rcanup@go2fax.com> wrote:
>Bob Canup wrote:
>
>> Vipin Malik wrote:
>>
>> > Bob Canup wrote:
>> > >
>> >
>> I can't see any difference between erasing a bad sector after the next power
>> up and the case of a slightly earlier power failure where the data was never
>> written in the first place; you wind up with an identical file system in both
>> cases. If you can survive the case of the earlier power failure failing to
>> record the sector then you can survive the case of the bad sector being
>> erased.
>>
>> The only difference that I see is that in the case of the bad sector you
>> know something happened, in the case of the data never being recorded you
>> don't.
>>
>> To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
>
>On the way home last night it occurred to me that if you are doing a
>read-modify-write cycle that it does make a difference where the error occurs.
>In this case if the sector is never written the old data is still valid.
>
>It seems to me the wear leveling necessary in a flash can be a benefit here; if
>I understand correctly we don't overwrite the physical pages - only when a page
>is 'committed' does the chip change its mapping so that it appears that the
>sector was overwritten.
The typical algorithm for writing new blocks into a flash system emulating a
disk device assumes that for each block you can store you have a logical
sector number and status.
- find an empty block
- write your data into it
- update the old data block status to say the logical block is being moved
- update the new data block status to show where the logical block now is
- update the old data block status to invalidate the entry
- start a sector erase if you need to
To read a block you look for a entry that shows the new location. If you can't find
one but did find one that was marked as being moved you fall back to that.
This gets complicated because you have to handle both sector erases and
block writes not completing from power failures. Which probably means that
you don't want to do anything to freshly erased sectors. I.e. a freshly
erased sector is identified simply because it is in its default erased
state. You can't dedicate sectors to any particular use (because you need to
load balance).
--
Stuart Lynne <sl@fireplug.net> __O
<http://www.thinlinux.org> _-\<,_ 604-461-7532
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00 (_)/ (_) 88 EC A3 EE 2D 1C 15 68
To unsubscribe, send "unsubscribe mtd" to majordomo@infradead.org
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~1999-12-20 4:19 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-12-06 23:41 [Fwd: power down] Vipin Malik
1999-12-07 15:47 ` Bob Canup
1999-12-20 4:03 ` Stuart Lynne
1999-12-07 20:36 ` Jon Burford
1999-12-13 14:49 ` Adi Linden
1999-12-13 19:07 ` Jon Burford
1999-12-08 15:10 ` David Woodhouse
-- strict thread matches above, loose matches on Subject: below --
1999-12-07 16:36 Oron Ogdan
1999-12-08 20:42 [Fwd: Power Down] Vipin Malik
1999-12-08 20:48 Vipin Malik
1999-12-08 21:32 Vipin Malik
1999-12-09 11:10 ` David Woodhouse
1999-12-08 21:36 Vipin Malik
1999-12-08 23:02 ` Bob Canup
1999-12-09 11:02 ` David Woodhouse
1999-12-09 14:56 ` Bob Canup
1999-12-20 4:22 ` Stuart Lynne
1999-12-08 21:39 Vipin Malik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox