* Mean Time Between Failure - UBI clarifications
@ 2011-02-24 12:39 Navaneethan P
2011-02-24 14:04 ` Ricard Wanderlof
0 siblings, 1 reply; 7+ messages in thread
From: Navaneethan P @ 2011-02-24 12:39 UTC (permalink / raw)
To: linux-mtd
Hi Linux-mtd users,
In our product, we are using 128MB of NAND Flash (Samsung / Micron).
The whole NAND flash is configured as a single MTD partition. We are
using UBI over the MTD partition.
With this input, we wanted to calculate the Mean Time between failures
(MTBF) of our product. In this context,
1) We wanted to term ’bitflip’ as a failure. Is our understanding
correct or should we only consider a bad block as a failure?
2) Is there any standard way to findout the number of bitflips from
the UBI? If no, is it suggested to modify the UBI subsystem of the
Linux kernel to get the bit flip counter?
3) Is there any standard software / approach which can be used to find
out the reliability / MTBF / MTTF (Mean Time To Failure) of our NAND
Flash?
Could some one clarify in this regard?
Thanks in advance.
Regards,
Navaneethan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-24 12:39 Mean Time Between Failure - UBI clarifications Navaneethan P
@ 2011-02-24 14:04 ` Ricard Wanderlof
2011-02-25 12:36 ` Artem Bityutskiy
2011-02-28 15:22 ` Navaneethan P
0 siblings, 2 replies; 7+ messages in thread
From: Ricard Wanderlof @ 2011-02-24 14:04 UTC (permalink / raw)
To: Navaneethan P; +Cc: linux-mtd@lists.infradead.org
On Thu, 24 Feb 2011, Navaneethan P wrote:
> Hi Linux-mtd users,
>
>
> In our product, we are using 128MB of NAND Flash (Samsung / Micron).
> The whole NAND flash is configured as a single MTD partition. We are
> using UBI over the MTD partition.
>
> With this input, we wanted to calculate the Mean Time between failures
> (MTBF) of our product. In this context,
>
> 1) We wanted to term ?bitflip? as a failure. Is our understanding
> correct or should we only consider a bad block as a failure?
I'd say it's a failure in the sense that the raw data from the flash is
not what you expect, but UBI handles this transparently so it's not a
failure from the user's point of view. Furthermore, bitflips are inherent
to the design of nand flashes, and it does not indicate that there is
actually anything abnormal about a particular bit.
A bad block is more of a failure in that it can contain bits which are
unreliable, or stuck at a particular bit level. At least this is the case
for blocks that have been detected bad at the factory and marked as such,
but they are not really part of the equation since they should not be used
anyway.
The ordinary way for a block to 'fail' is when the number of erase/write
cycles performed on the block causes it to physically wear out. A worn-out
block has lower data retention (i.e. larger susceptibility to bitflips)
than other blocks. Usually if an erase or write operation times out (i.e.
the on-chip erase/write algorithm on the flash times out before the
operation is completed, and indicates a failure status to the host) the
block is considered 'bad'. However, note that it is not necessarily an
either-or situation. The block might not suddenly go dead. Instead, its
data retention characteristics and erase/write cycle times can get worse
and worse as the block is erased and rewritten. At some time, the on-chip
algorithm on the flash signals that erase or write took too long, but the
characteristics of the block might be far below spec before then.
It's up to you as a user to decide when the block is 'bad' in this case.
> 2) Is there any standard way to findout the number of bitflips from
> the UBI? If no, is it suggested to modify the UBI subsystem of the
> Linux kernel to get the bit flip counter?
mtd supplies statistics counters that might help. For each mtd partition
there is one counter which is increased every time a read operation
requires ECC to correct a bit (i.e. a correctable single bit error), and
one counter for ECC failures (two-bit errors).
I don't know about UBI, someone else probably does.
> 3) Is there any standard software / approach which can be used to find
> out the reliability / MTBF / MTTF (Mean Time To Failure) of our NAND
> Flash?
The manufacturers provide some data, however my experience has been that
it is very difficult to get any form of reliability information.
One way would be to take the spec of the number of erase/write cycles that
the flash can handle (probably 100 000 for your flash), and calculate how
much data will be written to the flash over a certain amount of time. When
you reach 100 000 writes to any given block it can constitute a failure.
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-24 14:04 ` Ricard Wanderlof
@ 2011-02-25 12:36 ` Artem Bityutskiy
2011-02-25 12:40 ` Ricard Wanderlof
2011-02-28 15:22 ` Navaneethan P
1 sibling, 1 reply; 7+ messages in thread
From: Artem Bityutskiy @ 2011-02-25 12:36 UTC (permalink / raw)
To: Ricard Wanderlof; +Cc: Navaneethan P, linux-mtd@lists.infradead.org
On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote:
> I don't know about UBI, someone else probably does.
No, UBI does not collect bit-flips statistics. It could be changed
though, someone could send a patch.
But I agree that bit-flips are wrong metric for MTBF.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-25 12:36 ` Artem Bityutskiy
@ 2011-02-25 12:40 ` Ricard Wanderlof
2011-02-25 12:45 ` Artem Bityutskiy
0 siblings, 1 reply; 7+ messages in thread
From: Ricard Wanderlof @ 2011-02-25 12:40 UTC (permalink / raw)
To: Artem Bityutskiy; +Cc: linux-mtd@lists.infradead.org
On Fri, 25 Feb 2011, Artem Bityutskiy wrote:
> On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote:
>> I don't know about UBI, someone else probably does.
>
> No, UBI does not collect bit-flips statistics. It could be changed
> though, someone could send a patch.
But don't bit flips eventually trigger bit scrubbing in UBI?
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-25 12:40 ` Ricard Wanderlof
@ 2011-02-25 12:45 ` Artem Bityutskiy
0 siblings, 0 replies; 7+ messages in thread
From: Artem Bityutskiy @ 2011-02-25 12:45 UTC (permalink / raw)
To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org
On Fri, 2011-02-25 at 13:40 +0100, Ricard Wanderlof wrote:
> On Fri, 25 Feb 2011, Artem Bityutskiy wrote:
>
> > On Thu, 2011-02-24 at 15:04 +0100, Ricard Wanderlof wrote:
> >> I don't know about UBI, someone else probably does.
> >
> > No, UBI does not collect bit-flips statistics. It could be changed
> > though, someone could send a patch.
>
> But don't bit flips eventually trigger bit scrubbing in UBI?
Yes, they do, but after scrubbing we forget about them.
But we could store the bit-flips counter in UBI headers, so "paranoid"
users could get per-eraseblock bit-flips count.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-24 14:04 ` Ricard Wanderlof
2011-02-25 12:36 ` Artem Bityutskiy
@ 2011-02-28 15:22 ` Navaneethan P
2011-02-28 15:32 ` Ricard Wanderlof
1 sibling, 1 reply; 7+ messages in thread
From: Navaneethan P @ 2011-02-28 15:22 UTC (permalink / raw)
To: Ricard Wanderlof
Cc: Stefan.Bigler, navaneethan.p, linux-mtd@lists.infradead.org,
Sundararajan.Somasundaram, Shenbaga.Nathan
Thank you for your response.
On Thu, Feb 24, 2011 at 7:34 PM, Ricard Wanderlof
<ricard.wanderlof@axis.com> wrote:
> mtd supplies statistics counters that might help. For each mtd partition
> there is one counter which is increased every time a read operation requires
> ECC to correct a bit (i.e. a correctable single bit error), and one counter
> for ECC failures (two-bit errors).
I checked the mtd directory of sysfs and also the source code of mtd.
However I could not find the mtd statistics counter from the available
source mtd-2.6.28. Could you please guide me as to where the
statistics counters are implemented inside mtd source.
Thanks and Regards,
Navaneethan P
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mean Time Between Failure - UBI clarifications
2011-02-28 15:22 ` Navaneethan P
@ 2011-02-28 15:32 ` Ricard Wanderlof
0 siblings, 0 replies; 7+ messages in thread
From: Ricard Wanderlof @ 2011-02-28 15:32 UTC (permalink / raw)
To: Navaneethan P
Cc: Stefan.Bigler@keymile.com, navaneethan.p@aricent.com,
Ricard Wanderlöf, linux-mtd@lists.infradead.org,
Sundararajan.Somasundaram@aricent.com,
Shenbaga.Nathan@aricent.com
On Mon, 28 Feb 2011, Navaneethan P wrote:
> Thank you for your response.
>
> On Thu, Feb 24, 2011 at 7:34 PM, Ricard Wanderlof
> <ricard.wanderlof@axis.com> wrote:
>> mtd supplies statistics counters that might help. For each mtd partition
>> there is one counter which is increased every time a read operation requires
>> ECC to correct a bit (i.e. a correctable single bit error), and one counter
>> for ECC failures (two-bit errors).
>
> I checked the mtd directory of sysfs and also the source code of mtd.
> However I could not find the mtd statistics counter from the available
> source mtd-2.6.28. Could you please guide me as to where the
> statistics counters are implemented inside mtd source.
They're in mtd->ecc_stats . See drivers/mtd/nand/nand_base.c for
instance. drivers/mtd/mtdchar.c:mtd_ioctl (case ECCGETSTATS) retrievs the
statistics to userspace for the ECCGETSTATS ioctl.
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-28 15:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-24 12:39 Mean Time Between Failure - UBI clarifications Navaneethan P
2011-02-24 14:04 ` Ricard Wanderlof
2011-02-25 12:36 ` Artem Bityutskiy
2011-02-25 12:40 ` Ricard Wanderlof
2011-02-25 12:45 ` Artem Bityutskiy
2011-02-28 15:22 ` Navaneethan P
2011-02-28 15:32 ` Ricard Wanderlof
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).