From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mo6-p05-ob.rzone.de ([2a01:238:20a:202:5305::1]) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UbXuM-0004Dd-Kq for linux-mtd@lists.infradead.org; Sun, 12 May 2013 15:09:56 +0000 Message-ID: <518FB09D.3070101@denx.de> Date: Sun, 12 May 2013 17:09:17 +0200 From: Stefan Roese MIME-Version: 1.0 To: Vikram Narayanan Subject: Re: mtd_oobtest fails with GPMI-NAND References: <50F97DB5.7040801@gmail.com> <50FCA426.6030309@freescale.com> <5105E4CE.1090800@gmail.com> <5105EE9B.9050405@freescale.com> <5106AFAA.1020502@gmail.com> <51072EA4.7000201@freescale.com> <51073373.4080006@gmail.com> <510735B3.1040509@freescale.com> <5107F899.5090506@gmail.com> <5108852C.5040002@freescale.com> <510CB507.4020105@gmail.com> <518A6228.6050608@denx.de> <518B96FC.6000806@gmail.com> <518C91AA.8040107@denx.de> <518F86CE.6060408@gmail.com> In-Reply-To: <518F86CE.6060408@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Huang Shijie , linux-mtd@lists.infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi Vikram, On 05/12/2013 02:10 PM, Vikram Narayanan wrote: >>>> I might be seeing something similar on my iMX6 board. Here >>>> mtd_subpagetest sometimes fails. >>> >>> AFAIR, subpagetest passed for me when I tested. >>> >>> BTW, What kind of errors are you getting? Any logs? >> >> Sure. Here 2 logs with v3.9.1: >> >> # insmod mtd_subpagetest.ko dev=3 >> [ 315.947734] mtd_subpagetest: verified up to eraseblock 0 >> [ 317.140658] mtd_subpagetest: error: read failed at 0xd60800 > >> [ 2561.996470] mtd_subpagetest: error: read failed at 0xe380800 > > Looks like it is happening at random. Two different physical locations > on two different runs. These locations (blocks) where the errors happen are not always identical. But some blocks reoccur. And this Linux test stops when one error is detected. The U-Boot reported multiple blocks, where some blocks were most of the time defective. So its not really random. >>>> What is the current status on your platform? Did you resolve this >>>> problem? If yes, what did you have to change/fix? >>> >>> Unfortunately no. I haven't got enough time to look into this. >> >> Too bad. Could you please explain again, how you first noticed >> that the you might have a problem with NAND on your imx6 board? > > For me, the error (-74) happened when the kernel is finally trying to > mount the rootfs. It is random as well. > > > As suggested in the above link, I tried to run the oobtest and it was > failing due to the missing ecc_layout structure. That's how this thread > was born. > >> We noticed a problem first in U-Boot by using the following >> commands: >> >> => nand erase.part ubi; ubi part ubi >> >> This works the first time without any problem. But the 2nd time >> it leads to "uncorrectable errors" (0xfe) while reading from some >> blocks. And those failing blocks tend to be the same (more or less). > >> Perhaps you might want to test this in U-Boot (if you use it) >> as well. > > I'm using u-boot v2012.10, with no extra patches for mtd/ubi layers. > > I tried in both the ways. Issued "nand erase; ubi part" and "ubi part" > alone. For me, It didn't give any "uncorrectable errors" error you've > mentioned. > > The only error I came across in u-boot is "fixable bit-flip" issue. My > colleagues have reported it sometime back. But I couldn't reproduce it > and neither could they. Could you please post a link to this thread? > Can you please post the logs for the "uncorrectable errors" in u-boot? > That might give some hints to Huang. Sure. Please note that I tracked the error (-74) back to the U-Boot imx NAND driver (mxs_nand.c): /* Loop over status bytes, accumulating ECC status. */ status = nand_info->oob_buf + mxs_nand_aux_status_offset(); for (i = 0; i < mxs_nand_ecc_chunk_cnt(mtd->writesize); i++) { if (status[i] == 0x00) continue; if (status[i] == 0xff) continue; if (status[i] == 0xfe) { failed++; continue; } corrected += status[i]; } I could also add some code to the Linux driver to see, if the error happens at the same place (status == 0xfe). I'm pretty sure that this is the case though. Here the logs from U-Boot: => nand erase.part ubi; NAND erase.part: device 0 offset 0x1100000, size 0x1ef00000 Erasing at 0x1ffe0000 -- 100% complete. OK => ubi part ubi UBI: mtd1 is detached from ubi0 UBI: attaching mtd1 to ubi0 UBI: physical eraseblock size: 131072 bytes (128 KiB) UBI: logical eraseblock size: 126976 bytes UBI: smallest flash I/O unit: 2048 UBI: VID header offset: 2048 (aligned 2048) UBI: data offset: 4096 UBI error: ubi_io_read: error -74 while reading 64 bytes from PEB 3093:0, read 64 bytes UBI error: ubi_io_read: error -74 while reading 64 bytes from PEB 3956:0, read 64 bytes UBI error: ubi_read_volume_table: the layout volume was not found UBI error: ubi_init: cannot attach mtd1 UBI error: ubi_init: UBI error: cannot initialize UBI, error -22 UBI init error 22 Another thing to mention is, that using this command (nand erase; nand erase;ubi part) never triggers this problem. So an additional erase somehow seems to fix this problem. Still not sure why this is the case. Vikram, which NAND part/chip are you using again? Thanks, Stefan