* Mean Time Between Failure - UBI clarifications @ 2011-02-24 12:39 Navaneethan P 2011-02-24 14:04 ` Ricard Wanderlof 0 siblings, 1 reply; 7+ messages in thread From: Navaneethan P @ 2011-02-24 12:39 UTC (permalink / raw) To: linux-mtd Hi Linux-mtd users, In our product, we are using 128MB of NAND Flash (Samsung / Micron). The whole NAND flash is configured as a single MTD partition. We are using UBI over the MTD partition. With this input, we wanted to calculate the Mean Time between failures (MTBF) of our product. In this context, 1) We wanted to term ’bitflip’ as a failure. Is our understanding correct or should we only consider a bad block as a failure? 2) Is there any standard way to findout the number of bitflips from the UBI? If no, is it suggested to modify the UBI subsystem of the Linux kernel to get the bit flip counter? 3) Is there any standard software / approach which can be used to find out the reliability / MTBF / MTTF (Mean Time To Failure) of our NAND Flash? Could some one clarify in this regard? Thanks in advance. Regards, Navaneethan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-24 12:39 Mean Time Between Failure - UBI clarifications Navaneethan P @ 2011-02-24 14:04 ` Ricard Wanderlof 2011-02-25 12:36 ` Artem Bityutskiy 2011-02-28 15:22 ` Navaneethan P 0 siblings, 2 replies; 7+ messages in thread From: Ricard Wanderlof @ 2011-02-24 14:04 UTC (permalink / raw) To: Navaneethan P; +Cc: linux-mtd@lists.infradead.org On Thu, 24 Feb 2011, Navaneethan P wrote: > Hi Linux-mtd users, > > > In our product, we are using 128MB of NAND Flash (Samsung / Micron). > The whole NAND flash is configured as a single MTD partition. We are > using UBI over the MTD partition. > > With this input, we wanted to calculate the Mean Time between failures > (MTBF) of our product. In this context, > > 1) We wanted to term ?bitflip? as a failure. Is our understanding > correct or should we only consider a bad block as a failure? I'd say it's a failure in the sense that the raw data from the flash is not what you expect, but UBI handles this transparently so it's not a failure from the user's point of view. Furthermore, bitflips are inherent to the design of nand flashes, and it does not indicate that there is actually anything abnormal about a particular bit. A bad block is more of a failure in that it can contain bits which are unreliable, or stuck at a particular bit level. At least this is the case for blocks that have been detected bad at the factory and marked as such, but they are not really part of the equation since they should not be used anyway. The ordinary way for a block to 'fail' is when the number of erase/write cycles performed on the block causes it to physically wear out. A worn-out block has lower data retention (i.e. larger susceptibility to bitflips) than other blocks. Usually if an erase or write operation times out (i.e. the on-chip erase/write algorithm on the flash times out before the operation is completed, and indicates a failure status to the host) the block is considered 'bad'. However, note that it is not necessarily an either-or situation. The block might not suddenly go dead. Instead, its data retention characteristics and erase/write cycle times can get worse and worse as the block is erased and rewritten. At some time, the on-chip algorithm on the flash signals that erase or write took too long, but the characteristics of the block might be far below spec before then. It's up to you as a user to decide when the block is 'bad' in this case. > 2) Is there any standard way to findout the number of bitflips from > the UBI? If no, is it suggested to modify the UBI subsystem of the > Linux kernel to get the bit flip counter? mtd supplies statistics counters that might help. For each mtd partition there is one counter which is increased every time a read operation requires ECC to correct a bit (i.e. a correctable single bit error), and one counter for ECC failures (two-bit errors). I don't know about UBI, someone else probably does. > 3) Is there any standard software / approach which can be used to find > out the reliability / MTBF / MTTF (Mean Time To Failure) of our NAND > Flash? The manufacturers provide some data, however my experience has been that it is very difficult to get any form of reliability information. One way would be to take the spec of the number of erase/write cycles that the flash can handle (probably 100 000 for your flash), and calculate how much data will be written to the flash over a certain amount of time. When you reach 100 000 writes to any given block it can constitute a failure. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-24 14:04 ` Ricard Wanderlof @ 2011-02-25 12:36 ` Artem Bityutskiy 2011-02-25 12:40 ` Ricard Wanderlof 2011-02-28 15:22 ` Navaneethan P 1 sibling, 1 reply; 7+ messages in thread From: Artem Bityutskiy @ 2011-02-25 12:36 UTC (permalink / raw) To: Ricard Wanderlof; +Cc: Navaneethan P, linux-mtd@lists.infradead.org On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote: > I don't know about UBI, someone else probably does. No, UBI does not collect bit-flips statistics. It could be changed though, someone could send a patch. But I agree that bit-flips are wrong metric for MTBF. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-25 12:36 ` Artem Bityutskiy @ 2011-02-25 12:40 ` Ricard Wanderlof 2011-02-25 12:45 ` Artem Bityutskiy 0 siblings, 1 reply; 7+ messages in thread From: Ricard Wanderlof @ 2011-02-25 12:40 UTC (permalink / raw) To: Artem Bityutskiy; +Cc: linux-mtd@lists.infradead.org On Fri, 25 Feb 2011, Artem Bityutskiy wrote: > On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote: >> I don't know about UBI, someone else probably does. > > No, UBI does not collect bit-flips statistics. It could be changed > though, someone could send a patch. But don't bit flips eventually trigger bit scrubbing in UBI? /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-25 12:40 ` Ricard Wanderlof @ 2011-02-25 12:45 ` Artem Bityutskiy 0 siblings, 0 replies; 7+ messages in thread From: Artem Bityutskiy @ 2011-02-25 12:45 UTC (permalink / raw) To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org On Fri, 2011-02-25 at 13:40 +0100, Ricard Wanderlof wrote: > On Fri, 25 Feb 2011, Artem Bityutskiy wrote: > > > On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote: > >> I don't know about UBI, someone else probably does. > > > > No, UBI does not collect bit-flips statistics. It could be changed > > though, someone could send a patch. > > But don't bit flips eventually trigger bit scrubbing in UBI? Yes, they do, but after scrubbing we forget about them. But we could store the bit-flips counter in UBI headers, so "paranoid" users could get per-eraseblock bit-flips count. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-24 14:04 ` Ricard Wanderlof 2011-02-25 12:36 ` Artem Bityutskiy @ 2011-02-28 15:22 ` Navaneethan P 2011-02-28 15:32 ` Ricard Wanderlof 1 sibling, 1 reply; 7+ messages in thread From: Navaneethan P @ 2011-02-28 15:22 UTC (permalink / raw) To: Ricard Wanderlof Cc: Stefan.Bigler, navaneethan.p, linux-mtd@lists.infradead.org, Sundararajan.Somasundaram, Shenbaga.Nathan Thank you for your response. On Thu, Feb 24, 2011 at 7:34 PM, Ricard Wanderlof <ricard.wanderlof@axis.com> wrote: > mtd supplies statistics counters that might help. For each mtd partition > there is one counter which is increased every time a read operation requires > ECC to correct a bit (i.e. a correctable single bit error), and one counter > for ECC failures (two-bit errors). I checked the mtd directory of sysfs and also the source code of mtd. However I could not find the mtd statistics counter from the available source mtd-2.6.28. Could you please guide me as to where the statistics counters are implemented inside mtd source. Thanks and Regards, Navaneethan P ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications 2011-02-28 15:22 ` Navaneethan P @ 2011-02-28 15:32 ` Ricard Wanderlof 0 siblings, 0 replies; 7+ messages in thread From: Ricard Wanderlof @ 2011-02-28 15:32 UTC (permalink / raw) To: Navaneethan P Cc: Stefan.Bigler@keymile.com, navaneethan.p@aricent.com, Ricard Wanderlöf, linux-mtd@lists.infradead.org, Sundararajan.Somasundaram@aricent.com, Shenbaga.Nathan@aricent.com On Mon, 28 Feb 2011, Navaneethan P wrote: > Thank you for your response. > > On Thu, Feb 24, 2011 at 7:34 PM, Ricard Wanderlof > <ricard.wanderlof@axis.com> wrote: >> mtd supplies statistics counters that might help. For each mtd partition >> there is one counter which is increased every time a read operation requires >> ECC to correct a bit (i.e. a correctable single bit error), and one counter >> for ECC failures (two-bit errors). > > I checked the mtd directory of sysfs and also the source code of mtd. > However I could not find the mtd statistics counter from the available > source mtd-2.6.28. Could you please guide me as to where the > statistics counters are implemented inside mtd source. They're in mtd->ecc_stats . See drivers/mtd/nand/nand_base.c for instance. drivers/mtd/mtdchar.c:mtd_ioctl (case ECCGETSTATS) retrievs the statistics to userspace for the ECCGETSTATS ioctl. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-28 15:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-24 12:39 Mean Time Between Failure - UBI clarifications Navaneethan P 2011-02-24 14:04 ` Ricard Wanderlof 2011-02-25 12:36 ` Artem Bityutskiy 2011-02-25 12:40 ` Ricard Wanderlof 2011-02-25 12:45 ` Artem Bityutskiy 2011-02-28 15:22 ` Navaneethan P 2011-02-28 15:32 ` Ricard Wanderlof
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox