From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from down.free-electrons.com ([37.187.137.238] helo=mail.free-electrons.com) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1a3gzn-0003et-U5 for linux-mtd@lists.infradead.org; Tue, 01 Dec 2015 09:13:15 +0000 Date: Tue, 1 Dec 2015 10:12:49 +0100 From: Boris Brezillon To: Tim Harvey Cc: Richard Weinberger , Elie De Brauwer , Artem Bityutskiy , Adrian Hunter , linux-mtd@lists.infradead.org, Huang Shijie , Brian Norris Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue? Message-ID: <20151201101249.1fc3448f@bbrezillon> In-Reply-To: References: <562E8697.50207@nod.at> <562E9E0B.5030204@nod.at> <562FD60E.9020807@nod.at> <20151103143850.45c4ded9@bbrezillon> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, 30 Nov 2015 13:58:34 -0800 Tim Harvey wrote: > On Mon, Nov 16, 2015 at 7:01 AM, Tim Harvey wrote: > > On Tue, Nov 3, 2015 at 5:38 AM, Boris Brezillon > > wrote: > >> Hi Tim, > >> > >> On Mon, 2 Nov 2015 12:31:11 -0800 > >> Tim Harvey wrote: > >> > >>> On Mon, Nov 2, 2015 at 12:27 PM, Tim Harvey wrote: > >>> > [ 8.635364] UBIFS (ubi0:0): recovery needed > >>> > [ 8.676203] ubi0 warning: ubi_io_read: error -74 (ECC error) while > >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry > >>> > [ 8.692460] ubi0 warning: ubi_io_read: error -74 (ECC error) while > >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry > >>> > [ 8.708741] ubi0 warning: ubi_io_read: error -74 (ECC error) while > >>> > reading 69632 bytes from PEB 2254:192512, read only 69632 bytes, retry > >>> > ^^^^ non correctable ecc error on PEB 2254 - I verified that this was > >>> > not the first time this PEB has been used > >> > >> I suspect one of the bit in PEB 2254 to be stuck at 0 (even after > >> erasing the block the bit stays at 0). Have you tried to erase this > >> block (flash_erase /dev/mtd2 0x23380000 1) and dump it in raw mode > >> (nanddump -n -l 0x40000 -s 0x23380000 -f /tmp/dump /dev/mtd2)? > > > > Boris, > > > > I examined the bad PEB on several boards now that I have reproduced > > this issue with and found no stuck bits (no 0's following erase, no > > 1's following erase and raw write all ff's). > > > > So in this case it doesn't appear to be a bad block. Incidentally for > > UBI/UBIFS, what is in charge of detecting bad blocks, how are they > > detected, and when/how are they marked? > > > >> > >>> > > >>> > I've cc'd Huang, Elie, and Brian who were involved in the patch to > >>> > detect bit-flips in gpmi-nand.c reads - perhaps they have some more > >>> > ideas. I find it interesting that in one case that patch resolves the > >>> > issue and in the other it does not. > >> > >> I posted a slightly reworked version of Huang's patch [1] a while ago > >> addressing the "account for bitflips in OOB area" problem, but maybe we > >> could do better (avoid this extra "read in raw mode" step, or use the > >> generic nand_check_erased_ecc_chunk() function when ECC bytes are > >> aligned). > >> > >> Best Regards, > >> > >> Boris > >> > >> [1]https://patchwork.ozlabs.org/patch/416543/ > > > > At this point I likely need to reproduce this problem with additional > > debugging enabled to show what last erased and/or wrote to the PEB's > > that are corrupt. I will also try your patch as well and see if that > > resolves anything. > > > > Regards, > > > > Tim > > Boris, > > I tried your patch [1] on a week-long test over 10x IMX6 boards > booting over 60K times across temperature ranges and the patch > resolved many previous failures to mount rootfs errors (previously I > would encounter around 1% failure to mount rootfs). In addition I saw > no nand corruption where I would have expected to see it several times > with those numbers so I suspect this may have resolved that as well. > > Can you re-submit your patch for inclusion and/or discussion? I'm quite busy on other topics lately, but feel free to adapt/resubmit the patch. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com