From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Tim Harvey <tharvey@gateworks.com>
Cc: Richard Weinberger <richard@nod.at>,
Elie De Brauwer <eliedebrauwer@gmail.com>,
Artem Bityutskiy <dedekind1@gmail.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-mtd@lists.infradead.org,
Huang Shijie <shijie.huang@arm.com>,
Brian Norris <computersforpeace@gmail.com>
Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue?
Date: Tue, 1 Dec 2015 10:12:49 +0100 [thread overview]
Message-ID: <20151201101249.1fc3448f@bbrezillon> (raw)
In-Reply-To: <CAJ+vNU2M5getk6=J-wXDhQf1qwPr7ivRQFtO8pV1695WK7ZR+g@mail.gmail.com>
On Mon, 30 Nov 2015 13:58:34 -0800
Tim Harvey <tharvey@gateworks.com> wrote:
> On Mon, Nov 16, 2015 at 7:01 AM, Tim Harvey <tharvey@gateworks.com> wrote:
> > On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:
> >> Hi Tim,
> >>
> >> On Mon, 2 Nov 2015 12:31:11 -0800
> >> Tim Harvey <tharvey@gateworks.com> wrote:
> >>
> >>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey <tharvey@gateworks.com> wrote:
> >>> > [ 8.635364] UBIFS (ubi0:0): recovery needed
> >>> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while
> >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry
> >>> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was
> >>> > not the first time this PEB has been used
> >>
> >> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after
> >> erasing the block the bit stays at 0). Have you tried to erase this
> >> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode
> >> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)?
> >
> > Boris,
> >
> > I examined the bad PEB on several boards now that I have reproduced
> > this issue with and found no stuck bits (no 0's following erase, no
> > 1's following erase and raw write all ff's).
> >
> > So in this case it doesn't appear to be a bad block. Incidentally for
> > UBI/UBIFS, what is in charge of detecting bad blocks, how are they
> > detected, and when/how are they marked?
> >
> >>
> >>> >
> >>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to
> >>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more
> >>> > ideas. I find it interesting that in one case that patch resolves the
> >>> > issue and in the other it does not.
> >>
> >> I posted a slightly reworked version of Huang's patch [1] a while ago
> >> addressing the "account for bitflips in OOB area" problem, but maybe we
> >> could do better (avoid this extra "read in raw mode" step, or use the
> >> generic nand_check_erased_ecc_chunk() function when ECC bytes are
> >> aligned).
> >>
> >> Best Regards,
> >>
> >> Boris
> >>
> >> [1]https://patchwork.ozlabs.org/patch/416543/
> >
> > At this point I likely need to reproduce this problem with additional
> > debugging enabled to show what last erased and/or wrote to the PEB's
> > that are corrupt. I will also try your patch as well and see if that
> > resolves anything.
> >
> > Regards,
> >
> > Tim
>
> Boris,
>
> I tried your patch [1] on a week-long test over 10x IMX6 boards
> booting over 60K times across temperature ranges and the patch
> resolved many previous failures to mount rootfs errors (previously I
> would encounter around 1% failure to mount rootfs). In addition I saw
> no nand corruption where I would have expected to see it several times
> with those numbers so I suspect this may have resolved that as well.
>
> Can you re-submit your patch for inclusion and/or discussion?
I'm quite busy on other topics lately, but feel free to adapt/resubmit
the patch.
Best Regards,
Boris
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
next prev parent reply other threads:[~2015-12-01 9:13 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-26 19:37 UBIFS corruption after power cut - possibly unstable bits issue? Tim Harvey
2015-10-26 20:01 ` Richard Weinberger
2015-10-26 20:31 ` Tim Harvey
2015-10-26 21:41 ` Richard Weinberger
2015-10-27 19:01 ` Tim Harvey
2015-10-27 19:52 ` Richard Weinberger
2015-11-02 20:27 ` Tim Harvey
2015-11-02 20:31 ` Tim Harvey
2015-11-02 21:31 ` Richard Weinberger
2015-11-02 22:11 ` Brian Norris
2015-11-03 13:38 ` Boris Brezillon
2015-11-16 15:01 ` Tim Harvey
2015-11-30 21:58 ` Tim Harvey
2015-12-01 9:12 ` Boris Brezillon [this message]
2015-11-03 9:10 ` Artem Bityutskiy
2015-11-03 10:06 ` Michal Suchanek
2015-11-03 10:18 ` Ricard Wanderlof
2015-11-03 10:43 ` Artem Bityutskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151201101249.1fc3448f@bbrezillon \
--to=boris.brezillon@free-electrons.com \
--cc=adrian.hunter@intel.com \
--cc=computersforpeace@gmail.com \
--cc=dedekind1@gmail.com \
--cc=eliedebrauwer@gmail.com \
--cc=linux-mtd@lists.infradead.org \
--cc=richard@nod.at \
--cc=shijie.huang@arm.com \
--cc=tharvey@gateworks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).