Re: UBIFS corruption after power cut - possibly unstable bits issue?

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: Tim Harvey <tharvey@gateworks.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-mtd@lists.infradead.org
Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue?
Date: Tue, 27 Oct 2015 20:52:46 +0100	[thread overview]
Message-ID: <562FD60E.9020807@nod.at> (raw)
In-Reply-To: <CAJ+vNU1GV1GxYfLgHh2ZerAGjRxi=azXHLg_dNO=BaUrkkDU1w@mail.gmail.com>

Tim,

Am 27.10.2015 um 20:01 schrieb Tim Harvey:
> I'm not understanding what is making you say that the issue I
> encountered is 'not' the unstable bits issue described at
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
> understanding is that the 'unstable bit' issue refers to bits which
> are truly unstable and can read either way each and every read due to
> not getting properly erased/written.

You are right. I was sorting out the unstable bits issue a bit too
early. I'm sorry.
Let's double check. Can you enable UBI verbose logging while testing?
Such that we can see which blocks were written/erased while the power cut
happened?

> If I understand what you are saying you are thinking that my issue is
> instead the result of a never-used PEB that had bit-flips from the
> manufacturer in which case the bits would read the same every time?
> How can we know this PEB was never before used and isn't one that was
> being erased/written during a power cut?

I've seen bit flips on cheap SLC NANDs which came out of a sudden.
According to the FAE I was talking to this is legit for NAND
as long the flipping bits are fixable by the ECC engine.

> In my test scenario where the rootfs is mounted from the kernel
> read-only, but later mounted read-write by userspace (yet not being
> specifically written to by userspace) then power-cut should 'any' NAND
> writes would be occurring at all? And if not as I suspect, then how
> could a subsequent boot end up using a PEB that may have been never
> previously used and have bit-flips from the manufacturer?

UBIFS's has a wandering journal. During the remount it moved maybe.
But for a more expressive analysis I'd need a nanddump to find out which
blocks are in which role.
Can you share the nanddump?

> Should we be doing an erase block on every NAND block during our board
> manufacturing process to avoid this?

Sorry, I don't understand this sentence.
Do you mean a full erasure of the whole NAND?
If so, it would not help as the bit flips can come later.
(Without writing/erasing the block)
The root cause is that your NFC cannot correct bit flips on empty pages.

> It sounds like this 'unexpected bit-flips on erased pages from the
> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
> be updated to refer to this known issue as well as the unstable bit
> issue?

As I said the root cause is that some NFCs cannot correct bit flips on empty
pages.
Instead of putting warnings to ubifs.html I'd love to see a solution on the
said drivers or MTD core.

> I can add some debugging to find out - what specifically would be
> helpful to add?

A hexdump of the buffer would be a good start.

> Thanks for the help!

Thanks for sharing your issues. This is the only way
to address them.
That said, as far on no board I had access to I was able to reproduce the unstable bits
issue. It was always something else.

Thanks,
//richard

next prev parent reply	other threads:[~2015-10-27 19:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31   ` Tim Harvey
2015-10-26 21:41     ` Richard Weinberger
2015-10-27 19:01       ` Tim Harvey
2015-10-27 19:52         ` Richard Weinberger [this message]
2015-11-02 20:27           ` Tim Harvey
2015-11-02 20:31             ` Tim Harvey
2015-11-02 21:31               ` Richard Weinberger
2015-11-02 22:11                 ` Brian Norris
2015-11-03 13:38               ` Boris Brezillon
2015-11-16 15:01                 ` Tim Harvey
2015-11-30 21:58                   ` Tim Harvey
2015-12-01  9:12                     ` Boris Brezillon
2015-11-03  9:10             ` Artem Bityutskiy
2015-11-03 10:06   ` Michal Suchanek
2015-11-03 10:18     ` Ricard Wanderlof
2015-11-03 10:43     ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562FD60E.9020807@nod.at \
    --to=richard@nod.at \
    --cc=adrian.hunter@intel.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).