Re: UBIFS corruption after power cut - possibly unstable bits issue?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: Tim Harvey <tharvey@gateworks.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-mtd@lists.infradead.org
Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue?
Date: Tue, 27 Oct 2015 20:52:46 +0100	[thread overview]
Message-ID: <562FD60E.9020807@nod.at> (raw)
In-Reply-To: <CAJ+vNU1GV1GxYfLgHh2ZerAGjRxi=azXHLg_dNO=BaUrkkDU1w@mail.gmail.com>

Tim,

Am 27.10.2015 um 20:01 schrieb Tim Harvey:
> I'm not understanding what is making you say that the issue I
> encountered is 'not' the unstable bits issue described at
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My
> understanding is that the 'unstable bit' issue refers to bits which
> are truly unstable and can read either way each and every read due to
> not getting properly erased/written.

You are right. I was sorting out the unstable bits issue a bit too
early. I'm sorry.
Let's double check. Can you enable UBI verbose logging while testing?
Such that we can see which blocks were written/erased while the power cut
happened?

> If I understand what you are saying you are thinking that my issue is
> instead the result of a never-used PEB that had bit-flips from the
> manufacturer in which case the bits would read the same every time?
> How can we know this PEB was never before used and isn't one that was
> being erased/written during a power cut?

I've seen bit flips on cheap SLC NANDs which came out of a sudden.
According to the FAE I was talking to this is legit for NAND
as long the flipping bits are fixable by the ECC engine.

> In my test scenario where the rootfs is mounted from the kernel
> read-only, but later mounted read-write by userspace (yet not being
> specifically written to by userspace) then power-cut should 'any' NAND
> writes would be occurring at all? And if not as I suspect, then how
> could a subsequent boot end up using a PEB that may have been never
> previously used and have bit-flips from the manufacturer?

UBIFS's has a wandering journal. During the remount it moved maybe.
But for a more expressive analysis I'd need a nanddump to find out which
blocks are in which role.
Can you share the nanddump?

> Should we be doing an erase block on every NAND block during our board
> manufacturing process to avoid this?

Sorry, I don't understand this sentence.
Do you mean a full erasure of the whole NAND?
If so, it would not help as the bit flips can come later.
(Without writing/erasing the block)
The root cause is that your NFC cannot correct bit flips on empty pages.

> It sounds like this 'unexpected bit-flips on erased pages from the
> mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND.
> Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page
> be updated to refer to this known issue as well as the unstable bit
> issue?

As I said the root cause is that some NFCs cannot correct bit flips on empty
pages.
Instead of putting warnings to ubifs.html I'd love to see a solution on the
said drivers or MTD core.

> I can add some debugging to find out - what specifically would be
> helpful to add?

A hexdump of the buffer would be a good start.

> Thanks for the help!

Thanks for sharing your issues. This is the only way
to address them.
That said, as far on no board I had access to I was able to reproduce the unstable bits
issue. It was always something else.

Thanks,
//richard

next prev parent reply	other threads:[~2015-10-27 19:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31   ` Tim Harvey
2015-10-26 21:41     ` Richard Weinberger
2015-10-27 19:01       ` Tim Harvey
2015-10-27 19:52         ` Richard Weinberger [this message]
2015-11-02 20:27           ` Tim Harvey
2015-11-02 20:31             ` Tim Harvey
2015-11-02 21:31               ` Richard Weinberger
2015-11-02 22:11                 ` Brian Norris
2015-11-03 13:38               ` Boris Brezillon
2015-11-16 15:01                 ` Tim Harvey
2015-11-30 21:58                   ` Tim Harvey
2015-12-01  9:12                     ` Boris Brezillon
2015-11-03  9:10             ` Artem Bityutskiy
2015-11-03 10:06   ` Michal Suchanek
2015-11-03 10:18     ` Ricard Wanderlof
2015-11-03 10:43     ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562FD60E.9020807@nod.at \
    --to=richard@nod.at \
    --cc=adrian.hunter@intel.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.