From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from arroyo.ext.ti.com ([192.94.94.40]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1aELAQ-00072g-9d for linux-mtd@lists.infradead.org; Wed, 30 Dec 2015 18:08:11 +0000 From: "Franklin S Cooper Jr." To: Boris Brezillon CC: , Subject: Re: Testing generic empty page bit flips recovery References: <5683E5CC.6020008@ti.com> <20151230154055.662b4df8@bbrezillon> <5683F960.90901@ti.com> <20151230170207.45e7fb01@bbrezillon> <56840911.7080404@ti.com> <20151230175938.69c319cb@bbrezillon> <56841842.7060102@ti.com> <20151230185348.4f0ad161@bbrezillon> Message-ID: <56841D70.6090702@ti.com> Date: Wed, 30 Dec 2015 12:07:44 -0600 MIME-Version: 1.0 In-Reply-To: <20151230185348.4f0ad161@bbrezillon> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 12/30/2015 11:53 AM, Boris Brezillon wrote: > On Wed, 30 Dec 2015 11:45:38 -0600 > "Franklin S Cooper Jr." wrote: > >> >> On 12/30/2015 10:59 AM, Boris Brezillon wrote: >>> On Wed, 30 Dec 2015 10:40:49 -0600 >>> "Franklin S Cooper Jr." wrote: >>> >>>> On 12/30/2015 10:02 AM, Boris Brezillon wrote: >>>>> On Wed, 30 Dec 2015 09:33:52 -0600 >>>>> "Franklin S Cooper Jr." wrote: >>>>> >>>>>> On 12/30/2015 08:40 AM, Boris Brezillon wrote: >>>>>>> Hi Franklin, >>>>>>> >>>>>>> On Wed, 30 Dec 2015 08:10:20 -0600 >>>>>>> "Franklin S Cooper Jr." wrote: >>>>>>> >>>>>>>> I am trying to follow up on this discussion from this patch >>>>>>>> set (https://patchwork.ozlabs.org/patch/539059/) which >>>>>>>> suggested that Michael instead test the generic bitflips >>>>>>>> recovery that is implemented by Boris "mtd: nand: properly >>>>>>>> handle bitflips in erased pages" patchset >>>>>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html). >>>>>>>> I would like to test Boris patchset but first I need to >>>>>>>> recreate the error that his patch is fixing. >>>>>>>> >>>>>>>> The error that the patchset is attempting to fix isn't >>>>>>>> something I have ever encountered before. Currently I am >>>>>>>> trying to reproduce this issue on a TI K2E evm that uses the >>>>>>>> davinci nand driver. I flashed the nand's file-system >>>>>>>> partition with a ubi filesystem and the board is currently >>>>>>>> set to boot using the file-system on the nand. After about >>>>>>>> 60 secs I cut the power from the board and boot the board >>>>>>>> again. What I would expect is that the board will eventually >>>>>>>> fail to mount the ubi filesystem but currently the board has >>>>>>>> ran for over 24 hours and powered on and off over 1400 times >>>>>>>> and its still mounting the file-system perfectly fine. >>>>>>>> >>>>>>>> Any suggestions on a test case that I can use to force the >>>>>>>> empty page bit flips error? >>>>>>>> >>>>>>>> >>>>>>> The davinci driver seems to support raw accesses, so you can try to >>>>>>> apply this patch [1] against the mtd-utils tree (not sure it still >>>>>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the >>>>>>> nandflipbits tool: >>>>>>> >>>>>>> # flash_erase /dev/mtdX 1 >>>>>>> # nandflipbits /dev/mtdX 1@ >>>>>>> # nanddump -f /tmp/dump -s -l /dev/mtdX >>>>>>> >>>>>>> Without the patch, nanddump should complain about uncorrectable errors, >>>>>>> and if you hexdump /dev/dump you should see the bitflip. >>>>>>> If nanddump does not complain after applying my patch, then it means it >>>>>>> fixes the "bitflips in erased pages" bug. >>>>>>> >>>>>>> Best Regards, >>>>>>> >>>>>>> Boris >>>>>>> >>>>>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html >>>>>> Hi Boris, >>>>>> >>>>>> Thanks for the quick reply. I built mtd-utils with your >>>>>> patch and ran the suggested commands on a 4.1 based kernel >>>>>> without your kernel patchset and I didn't see your expected >>>>>> output. The 4.1 based kernel hasn't had any changes to >>>>>> davinci_nand or nand subsystem that would address this >>>>>> bitflip error. >>>>>> >>>>>> I'm currently going to attempt to run the same test on the >>>>>> latest mainline. >>>>>> >>>>>> Here is the output I received when I ran your suggested >>>>>> commands on the 4.1 based kernel.Any >>>>>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1 >>>>>> Erasing 128 Kibyte @ 0 -- 100 % complete >>>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096 >>>>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048 >>>>>> /dev/mtd4 >>>>>> ECC failed: 0 >>>>>> ECC corrected: 0 >>>>>> Number of bad blocks: 0 >>>>>> Number of bbt blocks: 4 >>>>>> Block size 131072, page size 2048, OOB size 64 >>>>>> root@k2e-evm:~# hexdump /tmp/dump >>>>>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff >>>>>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff >>>>>> * >>>>>> 0000800 >>>>>> >>>>>> Any thoughts on why I'm not seeing the expected error? >>>>>> >>>>> Oh, actually this behavior is explained in the commit message: >>>>> >>>>> "Currently empty page bit flips are not corrected and report 0 errors." >>>>> >>>>> Which explains why you're seeing the bitflip in the dump, but nothing >>>>> reported by the MTD layer. >>>>> >>>>> After applying my patch, the bitflip should simply disappear. You can >>>>> then try to generate more bitflips than the engine can actually fix >>>>> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD >>>>> reports an uncorrectable error. >>>> I verified that I am indeed using ecc4bit mode. >>>> >>>> I attempted to run the series of nandflipsbits as you >>>> suggested but I get "invalid bit description" error from the >>>> utility. Some reason I can only use the nandflipsbits >>>> utility for bits 1-7. Anything higher and I get the "Invalid >>>> bit description" error. >>> Indeed. I developed that tool a long time ago and didn't remember that >>> the bit field is encoding the bit offset within a byte. This command >>> should work. >>> >>> nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47 >>> >>>> On the latest master commit I ran nandflipsbits for bits 1-7 >>>> at address 0. However, I still didn't receive any error from >>>> nanddump although I do see the flip bits from the hexdump >>>> /tmp/dump output. >>> How many of them do you see? >>> >>>> I then applied your patchset ontop of the latest mainline >>>> and ran nandflipsbits for bits 1-7 at address 0. >>>> I get the below output which seems to be correct. >>>> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0 >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0 >>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048 >>>> /dev/mtd4 >>>> >>>> ECC failed: 1 >>>> ECC corrected: 18 >>>> Number of bad blocks: 0 >>>> Number of bbt blocks: 4 >>>> Block size 131072, page size 2048, OOB size 64 >>>> Dumping data starting at 0x00000000 and ending at 0x00000800... >>>> ECC: 4 corrected bitflip(s) at offset 0x00000000 >>>> root@k2e-evm:~# hexdump /tmp/dump >>>> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff >>>> * >>>> 0000800 >>> Hm, that's weird. You should get an ECC failure since the ECC strength >>> is only 4bits/512byte and you 8 bits have been flipped. >>> >>>> One thing that confuses me is if I repeatedly call nanddump >>>> I continue to get the "ECC: 4 corrected bitflips" message >>>> and the "ECC corrected" count increases by 4 each time. If >>>> these bits are being corrected which is apparent from >>>> looking at the output of nanddump shouldn't sequential calls >>>> indicate that no bitflips needed to be corrected since it >>>> was corrected previously? >>> Nope, they're corrected on the fly and only in RAM, so each time you >>> read the page, you'll have to fix the bitflips until you erase and >>> rewrite the faulty block. >>> >>> >> Hi Boris, >> >> Here is the entire output that should answer your questions. >> >> In the log I am running the following commands: >> flash_erase /dev/mtd4 0 0 >> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4 >> hexdump /tmp/dump >> ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47 >> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4 >> hexdump /tmp/dump >> >> Output on mainline kernel without bitflip correction patches: >> http://pastebin.com/MgBVxALR >> >> Output on mainline kernel with bitflip correction patches: >> http://pastebin.com/NdKv0NhV >> >> Some reason I'm only getting 1 bit being corrected when >> using the bitflip correction patches. Comparing my logs from >> before to now the only difference I'm seeing is that ECC >> failed is increasing but ECC corrected isn't changing. >> > That's what I was expecting: your ECC engine is only fixing > 4bits/512byte, which is why the bitflip in erased page correction fail > when you have more than 4 bits flipped in a given 512byte block. > > Now try to flip only 4 bits instead of 5: > > ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46 Here is the output: root@k2e-evm:~/# ./flash_erase /dev/mtd4 0 1 root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4 ECC failed: 5 ECC corrected: 0 Number of bad blocks: 0 Number of bbt blocks: 4 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x00000800... root@k2e-evm:~/# hexdump /tmp/dump 0000000 ffff ffff ffff ffff ffff ffff ffff ffff * 0000800 root@k2e-evm:~/# ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46 root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4 ECC failed: 5 ECC corrected: 0 Number of bad blocks: 0 Number of bbt blocks: 4 Block size 131072, page size 2048, OOB size 64 Dumping data starting at 0x00000000 and ending at 0x00000800... ECC: 4 corrected bitflip(s) at offset 0x00000000 root@k2e-evm:~/# hexdump /tmp/dump 0000000 ffff ffff ffff ffff ffff ffff ffff ffff * 0000800 Running nanddump again shows that 4 bits were corrected. So it seems like things are working as expected. It seems like patches 2-5 from your patchset weren't pulled in because you and Brian wanted more testing on other platforms. If your going to submit a rev 4 please feel free to CC me so I can test the patches out and add a Tested-by. If not feel free to add a Tested-by for your current rev 3 patchset or if you can bounce those emails my way I can add it myself. Which ever approach you prefer. Thank you for your help and let me know if there is any further test you would like me to run. > > >