From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Martin Townsend <mtownsend1973@gmail.com>
Cc: Ricard Wanderlof <ricard.wanderlof@axis.com>,
Richard Weinberger <richard@nod.at>,
"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: UBIFS question
Date: Thu, 17 Mar 2016 15:55:44 +0100 [thread overview]
Message-ID: <20160317155544.3b43bbb9@bbrezillon> (raw)
In-Reply-To: <CABatt_zxp-6s++zLeWfijdTT4fevgTPk97sue-3GGj_6wGgF0w@mail.gmail.com>
Hi Martin,
On Thu, 17 Mar 2016 12:54:43 +0000
Martin Townsend <mtownsend1973@gmail.com> wrote:
> Hi Ricard, Richard
>
> On Thu, Mar 17, 2016 at 11:43 AM, Ricard Wanderlof
> <ricard.wanderlof@axis.com> wrote:
> >
> >> > We expect the flash devices to start failing quicker than normally
> >> > expected due to the environment in which they will be operating in, so
> >> > sudden NAND blocks turning bad will eventually happen and what we
> >> > would like to do is try and capture this as soon as possible.
> >> > The boards are not accessible as they will be located in very remote
> >> > locations so detecting these failures before the system locks up would
> >> > be an advantage so we can report home with the information and fail
> >> > over to the other filesystem (providing that hasn't also been
> >> > corrupted).
> >>
> >> Dealing with sudden bad NAND blocks is almost impossible.
> >> Unless you have a copy of each block.
> >> NAND is not expected to gain bad blocks without an indication like
> >> correctable bitflips.
>
> I'm not interested in dealing with sudden bad NAND blocks, I accept
> this will more than likely happen at some point but what I am
> interested in is early detection. Once the system has booted most
> files will be cached to memory and the product that the flash devices
> are in is designed to run for many months without being power cycled
> so what I'm looking to do is monitor the health of the flash devices.
> Ideally I would like to know FEC counts but I doubt I will get this
> information :) But checking LEBs, pages etc for bad checksums would be
> great.
>
> >
> > Yes, although the NAND flash documentation sometimes reads like blocks can
> > suddenly 'go bad' for no special reason, in practice it is due to
> > excessive erase/write cycles, i.e. its a wear problem.
> >
> > However, I don't know, if you are operating the flash in an environment
> > where there is cosmic radiation that can actually damage the chip for
> > instance, then of course any part of the chip could fail randomly with a
> > fairly high probability. But NAND bad block management is not designed to
> > take care of that case, which is why bad block detection is only done
> > during block erasure (i.e. when a block fails to erase).
> >
> I'm not sure how much I can say I'm afraid as I'm under NDA but assume
> that it is going to be operating in an environment where it's
> receiving more cosmic radiation than expected. So I could look at the
> bad block detection code to get some ideas? I don't necessary want to
> mark blocks as bad I just want to detect them so I have an idea that
> the flash is failing.
I guess you're more worried about bitflips than blocks becoming bad
(which, AFAIK, can only happen when writing or erasing a block, not
when reading it).
If bitflips detection/prevention is what your looking for, I guess
ubihealthd (developed by Richard) could help.
[1]https://lwn.net/Articles/663751/
[2]https://lkml.org/lkml/2015/3/29/31
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
next prev parent reply other threads:[~2016-03-17 14:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-16 9:54 UBIFS question Martin Townsend
2016-03-16 23:12 ` Richard Weinberger
2016-03-17 8:33 ` Martin Townsend
2016-03-17 8:56 ` Richard Weinberger
2016-03-17 11:16 ` Martin Townsend
2016-03-17 11:25 ` Richard Weinberger
2016-03-17 11:43 ` Ricard Wanderlof
2016-03-17 12:54 ` Martin Townsend
2016-03-17 14:55 ` Boris Brezillon [this message]
2016-03-17 15:39 ` Martin Townsend
2016-03-17 15:59 ` Richard Weinberger
-- strict thread matches above, loose matches on Subject: below --
2009-07-10 18:43 UBIFS Question Laurent .
2009-07-10 20:01 ` Corentin Chary
2009-07-11 14:55 ` Artem Bityutskiy
2009-07-14 6:11 ` Laurent .
2009-07-14 7:22 ` Artem Bityutskiy
2009-07-11 15:54 ` Vitaly Wool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160317155544.3b43bbb9@bbrezillon \
--to=boris.brezillon@free-electrons.com \
--cc=linux-mtd@lists.infradead.org \
--cc=mtownsend1973@gmail.com \
--cc=ricard.wanderlof@axis.com \
--cc=richard@nod.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox