All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Tim Harvey <tharvey@gateworks.com>
Cc: Richard Weinberger <richard@nod.at>,
	Elie De Brauwer <eliedebrauwer@gmail.com>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-mtd@lists.infradead.org,
	Huang Shijie <shijie.huang@arm.com>,
	Brian Norris <computersforpeace@gmail.com>
Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue?
Date: Tue, 1 Dec 2015 10:12:49 +0100	[thread overview]
Message-ID: <20151201101249.1fc3448f@bbrezillon> (raw)
In-Reply-To: <CAJ+vNU2M5getk6=J-wXDhQf1qwPr7ivRQFtO8pV1695WK7ZR+g@mail.gmail.com>

On Mon, 30 Nov 2015 13:58:34 -0800
Tim Harvey <tharvey@gateworks.com> wrote:

> On Mon, Nov 16, 2015 at 7:01 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> > On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:
> >> Hi Tim,
> >>
> >> On Mon, 2 Nov 2015 12:31:11 -0800
> >> Tim Harvey <tharvey@gateworks.com> wrote:
> >>
> >>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> >>> > [    8.635364] UBIFS (ubi0:0): recovery needed
> >>> > [    8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [    8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [    8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > ^^^^ non correctable ecc error on PEB 2254  - I verified that this was
> >>> > not the first time this PEB has been used
> >>
> >> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
> >> erasing the block the bit stays at 0). Have you tried to erase this
> >> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
> >> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
> >
> > Boris,
> >
> > I examined the bad PEB on several boards now that I have reproduced
> > this issue with and found no stuck bits (no 0's following erase, no
> > 1's following erase and raw write all ff's).
> >
> > So in this case it doesn't appear to be a bad block. Incidentally for
> > UBI/UBIFS, what is in charge of detecting bad blocks, how are they
> > detected, and when/how are they marked?
> >
> >>
> >>> >
> >>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
> >>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> >>> > ideas. I find it interesting that in one case that patch resolves the
> >>> > issue and in the other it does not.
> >>
> >> I posted a slightly reworked version of Huang's patch [1] a while ago
> >> addressing the "account for bitflips in OOB area" problem, but maybe we
> >> could do better (avoid this extra "read in raw mode" step, or use the
> >> generic nand_check_erased_ecc_chunk() function when ECC bytes are
> >> aligned).
> >>
> >> Best Regards,
> >>
> >> Boris
> >>
> >> [1]https://patchwork.ozlabs.org/patch/416543/
> >
> > At this point I likely need to reproduce this problem with additional
> > debugging enabled to show what last erased and/or wrote to the PEB's
> > that are corrupt. I will also try your patch as well and see if that
> > resolves anything.
> >
> > Regards,
> >
> > Tim
> 
> Boris,
> 
> I tried your patch [1] on a week-long test over 10x IMX6 boards
> booting over 60K times across temperature ranges and the patch
> resolved many previous failures to mount rootfs errors (previously I
> would encounter around 1% failure to mount rootfs). In addition I saw
> no nand corruption where I would have expected to see it several times
> with those numbers so I suspect this may have resolved that as well.
> 
> Can you re-submit your patch for inclusion and/or discussion?

I'm quite busy on other topics lately, but feel free to adapt/resubmit
the patch.

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

  reply	other threads:[~2015-12-01  9:13 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31   ` Tim Harvey
2015-10-26 21:41     ` Richard Weinberger
2015-10-27 19:01       ` Tim Harvey
2015-10-27 19:52         ` Richard Weinberger
2015-11-02 20:27           ` Tim Harvey
2015-11-02 20:31             ` Tim Harvey
2015-11-02 21:31               ` Richard Weinberger
2015-11-02 22:11                 ` Brian Norris
2015-11-03 13:38               ` Boris Brezillon
2015-11-16 15:01                 ` Tim Harvey
2015-11-30 21:58                   ` Tim Harvey
2015-12-01  9:12                     ` Boris Brezillon [this message]
2015-11-03  9:10             ` Artem Bityutskiy
2015-11-03 10:06   ` Michal Suchanek
2015-11-03 10:18     ` Ricard Wanderlof
2015-11-03 10:43     ` Artem Bityutskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151201101249.1fc3448f@bbrezillon \
    --to=boris.brezillon@free-electrons.com \
    --cc=adrian.hunter@intel.com \
    --cc=computersforpeace@gmail.com \
    --cc=dedekind1@gmail.com \
    --cc=eliedebrauwer@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    --cc=shijie.huang@arm.com \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.