* Testing generic empty page bit flips recovery
@ 2015-12-30 14:10 Franklin S Cooper Jr.
2015-12-30 14:40 ` Boris Brezillon
0 siblings, 1 reply; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 14:10 UTC (permalink / raw)
To: boris.brezillon, computersforpeace, linux-mtd
I am trying to follow up on this discussion from this patch
set (https://patchwork.ozlabs.org/patch/539059/) which
suggested that Michael instead test the generic bitflips
recovery that is implemented by Boris "mtd: nand: properly
handle bitflips in erased pages" patchset
(http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
I would like to test Boris patchset but first I need to
recreate the error that his patch is fixing.
The error that the patchset is attempting to fix isn't
something I have ever encountered before. Currently I am
trying to reproduce this issue on a TI K2E evm that uses the
davinci nand driver. I flashed the nand's file-system
partition with a ubi filesystem and the board is currently
set to boot using the file-system on the nand. After about
60 secs I cut the power from the board and boot the board
again. What I would expect is that the board will eventually
fail to mount the ubi filesystem but currently the board has
ran for over 24 hours and powered on and off over 1400 times
and its still mounting the file-system perfectly fine.
Any suggestions on a test case that I can use to force the
empty page bit flips error?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 14:10 Testing generic empty page bit flips recovery Franklin S Cooper Jr.
@ 2015-12-30 14:40 ` Boris Brezillon
2015-12-30 15:33 ` Franklin S Cooper Jr.
0 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 14:40 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
Hi Franklin,
On Wed, 30 Dec 2015 08:10:20 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> I am trying to follow up on this discussion from this patch
> set (https://patchwork.ozlabs.org/patch/539059/) which
> suggested that Michael instead test the generic bitflips
> recovery that is implemented by Boris "mtd: nand: properly
> handle bitflips in erased pages" patchset
> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> I would like to test Boris patchset but first I need to
> recreate the error that his patch is fixing.
>
> The error that the patchset is attempting to fix isn't
> something I have ever encountered before. Currently I am
> trying to reproduce this issue on a TI K2E evm that uses the
> davinci nand driver. I flashed the nand's file-system
> partition with a ubi filesystem and the board is currently
> set to boot using the file-system on the nand. After about
> 60 secs I cut the power from the board and boot the board
> again. What I would expect is that the board will eventually
> fail to mount the ubi filesystem but currently the board has
> ran for over 24 hours and powered on and off over 1400 times
> and its still mounting the file-system perfectly fine.
>
> Any suggestions on a test case that I can use to force the
> empty page bit flips error?
>
>
The davinci driver seems to support raw accesses, so you can try to
apply this patch [1] against the mtd-utils tree (not sure it still
applies cleany, but it should work with mtd-utils-1.5.1), and use the
nandflipbits tool:
# flash_erase /dev/mtdX <offset> 1
# nandflipbits /dev/mtdX 1@<offset>
# nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
Without the patch, nanddump should complain about uncorrectable errors,
and if you hexdump /dev/dump you should see the bitflip.
If nanddump does not complain after applying my patch, then it means it
fixes the "bitflips in erased pages" bug.
Best Regards,
Boris
[1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 14:40 ` Boris Brezillon
@ 2015-12-30 15:33 ` Franklin S Cooper Jr.
2015-12-30 15:55 ` Boris Brezillon
2015-12-30 16:02 ` Boris Brezillon
0 siblings, 2 replies; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 15:33 UTC (permalink / raw)
To: Boris Brezillon; +Cc: computersforpeace, linux-mtd
On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> Hi Franklin,
>
> On Wed, 30 Dec 2015 08:10:20 -0600
> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>> I am trying to follow up on this discussion from this patch
>> set (https://patchwork.ozlabs.org/patch/539059/) which
>> suggested that Michael instead test the generic bitflips
>> recovery that is implemented by Boris "mtd: nand: properly
>> handle bitflips in erased pages" patchset
>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
>> I would like to test Boris patchset but first I need to
>> recreate the error that his patch is fixing.
>>
>> The error that the patchset is attempting to fix isn't
>> something I have ever encountered before. Currently I am
>> trying to reproduce this issue on a TI K2E evm that uses the
>> davinci nand driver. I flashed the nand's file-system
>> partition with a ubi filesystem and the board is currently
>> set to boot using the file-system on the nand. After about
>> 60 secs I cut the power from the board and boot the board
>> again. What I would expect is that the board will eventually
>> fail to mount the ubi filesystem but currently the board has
>> ran for over 24 hours and powered on and off over 1400 times
>> and its still mounting the file-system perfectly fine.
>>
>> Any suggestions on a test case that I can use to force the
>> empty page bit flips error?
>>
>>
> The davinci driver seems to support raw accesses, so you can try to
> apply this patch [1] against the mtd-utils tree (not sure it still
> applies cleany, but it should work with mtd-utils-1.5.1), and use the
> nandflipbits tool:
>
> # flash_erase /dev/mtdX <offset> 1
> # nandflipbits /dev/mtdX 1@<offset>
> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
>
> Without the patch, nanddump should complain about uncorrectable errors,
> and if you hexdump /dev/dump you should see the bitflip.
> If nanddump does not complain after applying my patch, then it means it
> fixes the "bitflips in erased pages" bug.
>
> Best Regards,
>
> Boris
>
> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
Hi Boris,
Thanks for the quick reply. I built mtd-utils with your
patch and ran the suggested commands on a 4.1 based kernel
without your kernel patchset and I didn't see your expected
output. The 4.1 based kernel hasn't had any changes to
davinci_nand or nand subsystem that would address this
bitflip error.
I'm currently going to attempt to run the same test on the
latest mainline.
Here is the output I received when I ran your suggested
commands on the 4.1 based kernel.Any
root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
Erasing 128 Kibyte @ 0 -- 100 % complete
root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
/dev/mtd4
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 4
Block size 131072, page size 2048, OOB size 64
root@k2e-evm:~# hexdump /tmp/dump
0000000 fffd ffff ffff ffff ffff ffff ffff ffff
0000010 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000800
Any thoughts on why I'm not seeing the expected error?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 15:33 ` Franklin S Cooper Jr.
@ 2015-12-30 15:55 ` Boris Brezillon
2015-12-30 16:02 ` Boris Brezillon
1 sibling, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 15:55 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
On Wed, 30 Dec 2015 09:33:52 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>
> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> > Hi Franklin,
> >
> > On Wed, 30 Dec 2015 08:10:20 -0600
> > "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >
> >> I am trying to follow up on this discussion from this patch
> >> set (https://patchwork.ozlabs.org/patch/539059/) which
> >> suggested that Michael instead test the generic bitflips
> >> recovery that is implemented by Boris "mtd: nand: properly
> >> handle bitflips in erased pages" patchset
> >> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> >> I would like to test Boris patchset but first I need to
> >> recreate the error that his patch is fixing.
> >>
> >> The error that the patchset is attempting to fix isn't
> >> something I have ever encountered before. Currently I am
> >> trying to reproduce this issue on a TI K2E evm that uses the
> >> davinci nand driver. I flashed the nand's file-system
> >> partition with a ubi filesystem and the board is currently
> >> set to boot using the file-system on the nand. After about
> >> 60 secs I cut the power from the board and boot the board
> >> again. What I would expect is that the board will eventually
> >> fail to mount the ubi filesystem but currently the board has
> >> ran for over 24 hours and powered on and off over 1400 times
> >> and its still mounting the file-system perfectly fine.
> >>
> >> Any suggestions on a test case that I can use to force the
> >> empty page bit flips error?
> >>
> >>
> > The davinci driver seems to support raw accesses, so you can try to
> > apply this patch [1] against the mtd-utils tree (not sure it still
> > applies cleany, but it should work with mtd-utils-1.5.1), and use the
> > nandflipbits tool:
> >
> > # flash_erase /dev/mtdX <offset> 1
> > # nandflipbits /dev/mtdX 1@<offset>
> > # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
> >
> > Without the patch, nanddump should complain about uncorrectable errors,
> > and if you hexdump /dev/dump you should see the bitflip.
> > If nanddump does not complain after applying my patch, then it means it
> > fixes the "bitflips in erased pages" bug.
> >
> > Best Regards,
> >
> > Boris
> >
> > [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
>
> Hi Boris,
>
> Thanks for the quick reply. I built mtd-utils with your
> patch and ran the suggested commands on a 4.1 based kernel
> without your kernel patchset and I didn't see your expected
> output. The 4.1 based kernel hasn't had any changes to
> davinci_nand or nand subsystem that would address this
> bitflip error.
>
> I'm currently going to attempt to run the same test on the
> latest mainline.
>
> Here is the output I received when I ran your suggested
> commands on the 4.1 based kernel.Any
> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
> Erasing 128 Kibyte @ 0 -- 100 % complete
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
You should probably use a block aligned offset (in your case a block is
128k), but that's not the problem here.
> /dev/mtd4
> ECC failed: 0
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 4
> Block size 131072, page size 2048, OOB size 64
> root@k2e-evm:~# hexdump /tmp/dump
> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
^
The bitflip is here.
> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
> *
> 0000800
>
> Any thoughts on why I'm not seeing the expected error?
>
Is ecc4bit mode really selected (ti,davinci-ecc-bits = 4 in your DT
node)?
You can add a trace there [1] to check that.
[1]http://lxr.free-electrons.com/source/drivers/mtd/nand/davinci_nand.c?v=4.1#L706
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 15:33 ` Franklin S Cooper Jr.
2015-12-30 15:55 ` Boris Brezillon
@ 2015-12-30 16:02 ` Boris Brezillon
2015-12-30 16:40 ` Franklin S Cooper Jr.
1 sibling, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 16:02 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
On Wed, 30 Dec 2015 09:33:52 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>
> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> > Hi Franklin,
> >
> > On Wed, 30 Dec 2015 08:10:20 -0600
> > "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >
> >> I am trying to follow up on this discussion from this patch
> >> set (https://patchwork.ozlabs.org/patch/539059/) which
> >> suggested that Michael instead test the generic bitflips
> >> recovery that is implemented by Boris "mtd: nand: properly
> >> handle bitflips in erased pages" patchset
> >> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> >> I would like to test Boris patchset but first I need to
> >> recreate the error that his patch is fixing.
> >>
> >> The error that the patchset is attempting to fix isn't
> >> something I have ever encountered before. Currently I am
> >> trying to reproduce this issue on a TI K2E evm that uses the
> >> davinci nand driver. I flashed the nand's file-system
> >> partition with a ubi filesystem and the board is currently
> >> set to boot using the file-system on the nand. After about
> >> 60 secs I cut the power from the board and boot the board
> >> again. What I would expect is that the board will eventually
> >> fail to mount the ubi filesystem but currently the board has
> >> ran for over 24 hours and powered on and off over 1400 times
> >> and its still mounting the file-system perfectly fine.
> >>
> >> Any suggestions on a test case that I can use to force the
> >> empty page bit flips error?
> >>
> >>
> > The davinci driver seems to support raw accesses, so you can try to
> > apply this patch [1] against the mtd-utils tree (not sure it still
> > applies cleany, but it should work with mtd-utils-1.5.1), and use the
> > nandflipbits tool:
> >
> > # flash_erase /dev/mtdX <offset> 1
> > # nandflipbits /dev/mtdX 1@<offset>
> > # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
> >
> > Without the patch, nanddump should complain about uncorrectable errors,
> > and if you hexdump /dev/dump you should see the bitflip.
> > If nanddump does not complain after applying my patch, then it means it
> > fixes the "bitflips in erased pages" bug.
> >
> > Best Regards,
> >
> > Boris
> >
> > [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
>
> Hi Boris,
>
> Thanks for the quick reply. I built mtd-utils with your
> patch and ran the suggested commands on a 4.1 based kernel
> without your kernel patchset and I didn't see your expected
> output. The 4.1 based kernel hasn't had any changes to
> davinci_nand or nand subsystem that would address this
> bitflip error.
>
> I'm currently going to attempt to run the same test on the
> latest mainline.
>
> Here is the output I received when I ran your suggested
> commands on the 4.1 based kernel.Any
> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
> Erasing 128 Kibyte @ 0 -- 100 % complete
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
> /dev/mtd4
> ECC failed: 0
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 4
> Block size 131072, page size 2048, OOB size 64
> root@k2e-evm:~# hexdump /tmp/dump
> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
> *
> 0000800
>
> Any thoughts on why I'm not seeing the expected error?
>
Oh, actually this behavior is explained in the commit message:
"Currently empty page bit flips are not corrected and report 0 errors."
Which explains why you're seeing the bitflip in the dump, but nothing
reported by the MTD layer.
After applying my patch, the bitflip should simply disappear. You can
then try to generate more bitflips than the engine can actually fix
(nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
reports an uncorrectable error.
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 16:02 ` Boris Brezillon
@ 2015-12-30 16:40 ` Franklin S Cooper Jr.
2015-12-30 16:52 ` Steve deRosier
2015-12-30 16:59 ` Boris Brezillon
0 siblings, 2 replies; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 16:40 UTC (permalink / raw)
To: Boris Brezillon; +Cc: computersforpeace, linux-mtd
On 12/30/2015 10:02 AM, Boris Brezillon wrote:
> On Wed, 30 Dec 2015 09:33:52 -0600
> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>>
>> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
>>> Hi Franklin,
>>>
>>> On Wed, 30 Dec 2015 08:10:20 -0600
>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>
>>>> I am trying to follow up on this discussion from this patch
>>>> set (https://patchwork.ozlabs.org/patch/539059/) which
>>>> suggested that Michael instead test the generic bitflips
>>>> recovery that is implemented by Boris "mtd: nand: properly
>>>> handle bitflips in erased pages" patchset
>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
>>>> I would like to test Boris patchset but first I need to
>>>> recreate the error that his patch is fixing.
>>>>
>>>> The error that the patchset is attempting to fix isn't
>>>> something I have ever encountered before. Currently I am
>>>> trying to reproduce this issue on a TI K2E evm that uses the
>>>> davinci nand driver. I flashed the nand's file-system
>>>> partition with a ubi filesystem and the board is currently
>>>> set to boot using the file-system on the nand. After about
>>>> 60 secs I cut the power from the board and boot the board
>>>> again. What I would expect is that the board will eventually
>>>> fail to mount the ubi filesystem but currently the board has
>>>> ran for over 24 hours and powered on and off over 1400 times
>>>> and its still mounting the file-system perfectly fine.
>>>>
>>>> Any suggestions on a test case that I can use to force the
>>>> empty page bit flips error?
>>>>
>>>>
>>> The davinci driver seems to support raw accesses, so you can try to
>>> apply this patch [1] against the mtd-utils tree (not sure it still
>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
>>> nandflipbits tool:
>>>
>>> # flash_erase /dev/mtdX <offset> 1
>>> # nandflipbits /dev/mtdX 1@<offset>
>>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
>>>
>>> Without the patch, nanddump should complain about uncorrectable errors,
>>> and if you hexdump /dev/dump you should see the bitflip.
>>> If nanddump does not complain after applying my patch, then it means it
>>> fixes the "bitflips in erased pages" bug.
>>>
>>> Best Regards,
>>>
>>> Boris
>>>
>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
>> Hi Boris,
>>
>> Thanks for the quick reply. I built mtd-utils with your
>> patch and ran the suggested commands on a 4.1 based kernel
>> without your kernel patchset and I didn't see your expected
>> output. The 4.1 based kernel hasn't had any changes to
>> davinci_nand or nand subsystem that would address this
>> bitflip error.
>>
>> I'm currently going to attempt to run the same test on the
>> latest mainline.
>>
>> Here is the output I received when I ran your suggested
>> commands on the 4.1 based kernel.Any
>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
>> Erasing 128 Kibyte @ 0 -- 100 % complete
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
>> /dev/mtd4
>> ECC failed: 0
>> ECC corrected: 0
>> Number of bad blocks: 0
>> Number of bbt blocks: 4
>> Block size 131072, page size 2048, OOB size 64
>> root@k2e-evm:~# hexdump /tmp/dump
>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
>> *
>> 0000800
>>
>> Any thoughts on why I'm not seeing the expected error?
>>
> Oh, actually this behavior is explained in the commit message:
>
> "Currently empty page bit flips are not corrected and report 0 errors."
>
> Which explains why you're seeing the bitflip in the dump, but nothing
> reported by the MTD layer.
>
> After applying my patch, the bitflip should simply disappear. You can
> then try to generate more bitflips than the engine can actually fix
> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
> reports an uncorrectable error.
I verified that I am indeed using ecc4bit mode.
I attempted to run the series of nandflipsbits as you
suggested but I get "invalid bit description" error from the
utility. Some reason I can only use the nandflipsbits
utility for bits 1-7. Anything higher and I get the "Invalid
bit description" error.
On the latest master commit I ran nandflipsbits for bits 1-7
at address 0. However, I still didn't receive any error from
nanddump although I do see the flip bits from the hexdump
/tmp/dump output.
I then applied your patchset ontop of the latest mainline
and ran nandflipsbits for bits 1-7 at address 0.
I get the below output which seems to be correct.
root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
/dev/mtd4
ECC failed: 1
ECC corrected: 18
Number of bad blocks: 0
Number of bbt blocks: 4
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00000800...
ECC: 4 corrected bitflip(s) at offset 0x00000000
root@k2e-evm:~# hexdump /tmp/dump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000800
One thing that confuses me is if I repeatedly call nanddump
I continue to get the "ECC: 4 corrected bitflips" message
and the "ECC corrected" count increases by 4 each time. If
these bits are being corrected which is apparent from
looking at the output of nanddump shouldn't sequential calls
indicate that no bitflips needed to be corrected since it
was corrected previously?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 16:40 ` Franklin S Cooper Jr.
@ 2015-12-30 16:52 ` Steve deRosier
2015-12-30 17:02 ` Franklin S Cooper Jr.
2015-12-30 16:59 ` Boris Brezillon
1 sibling, 1 reply; 13+ messages in thread
From: Steve deRosier @ 2015-12-30 16:52 UTC (permalink / raw)
To: Franklin S Cooper Jr.
Cc: Boris Brezillon, Brian Norris, linux-mtd@lists.infradead.org
On Wed, Dec 30, 2015 at 8:40 AM, Franklin S Cooper Jr. <fcooper@ti.com> wrote:
>
> One thing that confuses me is if I repeatedly call nanddump
> I continue to get the "ECC: 4 corrected bitflips" message
> and the "ECC corrected" count increases by 4 each time. If
> these bits are being corrected which is apparent from
> looking at the output of nanddump shouldn't sequential calls
> indicate that no bitflips needed to be corrected since it
> was corrected previously?
>
Hi Franklin,
I'm making a guess at the source of your confusion, but I've had to
answer this to repeated colleagues recently, so I'll give it a try.
I think you're expecting the "correction" to actually be written and
fixed on-flash. This is not the case.
Bitflip corrections are applied as the flash data is read. Every time.
So once a bit is flipped on the NAND, it will always stay that way on
the physical device until erased and rewritten. Which doesn't happen
for something as minor as a single bit-flip. So from that point on, it
will always read and correct the flip and report it to the kernel log.
UBI will move the data once the threshold gets hit, but up until that
point it will continue to read the same bitflip and correct it.
Hope that helps.
- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 16:40 ` Franklin S Cooper Jr.
2015-12-30 16:52 ` Steve deRosier
@ 2015-12-30 16:59 ` Boris Brezillon
2015-12-30 17:45 ` Franklin S Cooper Jr.
1 sibling, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 16:59 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
On Wed, 30 Dec 2015 10:40:49 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>
> On 12/30/2015 10:02 AM, Boris Brezillon wrote:
> > On Wed, 30 Dec 2015 09:33:52 -0600
> > "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >
> >>
> >> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> >>> Hi Franklin,
> >>>
> >>> On Wed, 30 Dec 2015 08:10:20 -0600
> >>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>
> >>>> I am trying to follow up on this discussion from this patch
> >>>> set (https://patchwork.ozlabs.org/patch/539059/) which
> >>>> suggested that Michael instead test the generic bitflips
> >>>> recovery that is implemented by Boris "mtd: nand: properly
> >>>> handle bitflips in erased pages" patchset
> >>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> >>>> I would like to test Boris patchset but first I need to
> >>>> recreate the error that his patch is fixing.
> >>>>
> >>>> The error that the patchset is attempting to fix isn't
> >>>> something I have ever encountered before. Currently I am
> >>>> trying to reproduce this issue on a TI K2E evm that uses the
> >>>> davinci nand driver. I flashed the nand's file-system
> >>>> partition with a ubi filesystem and the board is currently
> >>>> set to boot using the file-system on the nand. After about
> >>>> 60 secs I cut the power from the board and boot the board
> >>>> again. What I would expect is that the board will eventually
> >>>> fail to mount the ubi filesystem but currently the board has
> >>>> ran for over 24 hours and powered on and off over 1400 times
> >>>> and its still mounting the file-system perfectly fine.
> >>>>
> >>>> Any suggestions on a test case that I can use to force the
> >>>> empty page bit flips error?
> >>>>
> >>>>
> >>> The davinci driver seems to support raw accesses, so you can try to
> >>> apply this patch [1] against the mtd-utils tree (not sure it still
> >>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
> >>> nandflipbits tool:
> >>>
> >>> # flash_erase /dev/mtdX <offset> 1
> >>> # nandflipbits /dev/mtdX 1@<offset>
> >>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
> >>>
> >>> Without the patch, nanddump should complain about uncorrectable errors,
> >>> and if you hexdump /dev/dump you should see the bitflip.
> >>> If nanddump does not complain after applying my patch, then it means it
> >>> fixes the "bitflips in erased pages" bug.
> >>>
> >>> Best Regards,
> >>>
> >>> Boris
> >>>
> >>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
> >> Hi Boris,
> >>
> >> Thanks for the quick reply. I built mtd-utils with your
> >> patch and ran the suggested commands on a 4.1 based kernel
> >> without your kernel patchset and I didn't see your expected
> >> output. The 4.1 based kernel hasn't had any changes to
> >> davinci_nand or nand subsystem that would address this
> >> bitflip error.
> >>
> >> I'm currently going to attempt to run the same test on the
> >> latest mainline.
> >>
> >> Here is the output I received when I ran your suggested
> >> commands on the 4.1 based kernel.Any
> >> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
> >> Erasing 128 Kibyte @ 0 -- 100 % complete
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
> >> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
> >> /dev/mtd4
> >> ECC failed: 0
> >> ECC corrected: 0
> >> Number of bad blocks: 0
> >> Number of bbt blocks: 4
> >> Block size 131072, page size 2048, OOB size 64
> >> root@k2e-evm:~# hexdump /tmp/dump
> >> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
> >> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
> >> *
> >> 0000800
> >>
> >> Any thoughts on why I'm not seeing the expected error?
> >>
> > Oh, actually this behavior is explained in the commit message:
> >
> > "Currently empty page bit flips are not corrected and report 0 errors."
> >
> > Which explains why you're seeing the bitflip in the dump, but nothing
> > reported by the MTD layer.
> >
> > After applying my patch, the bitflip should simply disappear. You can
> > then try to generate more bitflips than the engine can actually fix
> > (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
> > reports an uncorrectable error.
>
> I verified that I am indeed using ecc4bit mode.
>
> I attempted to run the series of nandflipsbits as you
> suggested but I get "invalid bit description" error from the
> utility. Some reason I can only use the nandflipsbits
> utility for bits 1-7. Anything higher and I get the "Invalid
> bit description" error.
Indeed. I developed that tool a long time ago and didn't remember that
the bit field is encoding the bit offset within a byte. This command
should work.
nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
>
> On the latest master commit I ran nandflipsbits for bits 1-7
> at address 0. However, I still didn't receive any error from
> nanddump although I do see the flip bits from the hexdump
> /tmp/dump output.
How many of them do you see?
>
> I then applied your patchset ontop of the latest mainline
> and ran nandflipsbits for bits 1-7 at address 0.
> I get the below output which seems to be correct.
>
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
> /dev/mtd4
>
> ECC failed: 1
> ECC corrected: 18
> Number of bad blocks: 0
> Number of bbt blocks: 4
> Block size 131072, page size 2048, OOB size 64
> Dumping data starting at 0x00000000 and ending at 0x00000800...
> ECC: 4 corrected bitflip(s) at offset 0x00000000
> root@k2e-evm:~# hexdump /tmp/dump
> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
> *
> 0000800
Hm, that's weird. You should get an ECC failure since the ECC strength
is only 4bits/512byte and you 8 bits have been flipped.
>
> One thing that confuses me is if I repeatedly call nanddump
> I continue to get the "ECC: 4 corrected bitflips" message
> and the "ECC corrected" count increases by 4 each time. If
> these bits are being corrected which is apparent from
> looking at the output of nanddump shouldn't sequential calls
> indicate that no bitflips needed to be corrected since it
> was corrected previously?
Nope, they're corrected on the fly and only in RAM, so each time you
read the page, you'll have to fix the bitflips until you erase and
rewrite the faulty block.
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 16:52 ` Steve deRosier
@ 2015-12-30 17:02 ` Franklin S Cooper Jr.
0 siblings, 0 replies; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 17:02 UTC (permalink / raw)
To: Steve deRosier
Cc: Boris Brezillon, Brian Norris, linux-mtd@lists.infradead.org
On 12/30/2015 10:52 AM, Steve deRosier wrote:
> On Wed, Dec 30, 2015 at 8:40 AM, Franklin S Cooper Jr. <fcooper@ti.com> wrote:
>> One thing that confuses me is if I repeatedly call nanddump
>> I continue to get the "ECC: 4 corrected bitflips" message
>> and the "ECC corrected" count increases by 4 each time. If
>> these bits are being corrected which is apparent from
>> looking at the output of nanddump shouldn't sequential calls
>> indicate that no bitflips needed to be corrected since it
>> was corrected previously?
>>
> Hi Franklin,
>
> I'm making a guess at the source of your confusion, but I've had to
> answer this to repeated colleagues recently, so I'll give it a try.
>
> I think you're expecting the "correction" to actually be written and
> fixed on-flash. This is not the case.
>
> Bitflip corrections are applied as the flash data is read. Every time.
> So once a bit is flipped on the NAND, it will always stay that way on
> the physical device until erased and rewritten. Which doesn't happen
> for something as minor as a single bit-flip. So from that point on, it
> will always read and correct the flip and report it to the kernel log.
>
> UBI will move the data once the threshold gets hit, but up until that
> point it will continue to read the same bitflip and correct it.
>
> Hope that helps.
>
> - Steve
Steve,
That makes perfect sense. Thanks for explaining.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 16:59 ` Boris Brezillon
@ 2015-12-30 17:45 ` Franklin S Cooper Jr.
2015-12-30 17:53 ` Boris Brezillon
0 siblings, 1 reply; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 17:45 UTC (permalink / raw)
To: Boris Brezillon; +Cc: computersforpeace, linux-mtd
On 12/30/2015 10:59 AM, Boris Brezillon wrote:
> On Wed, 30 Dec 2015 10:40:49 -0600
> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>>
>> On 12/30/2015 10:02 AM, Boris Brezillon wrote:
>>> On Wed, 30 Dec 2015 09:33:52 -0600
>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>
>>>> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
>>>>> Hi Franklin,
>>>>>
>>>>> On Wed, 30 Dec 2015 08:10:20 -0600
>>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>>>
>>>>>> I am trying to follow up on this discussion from this patch
>>>>>> set (https://patchwork.ozlabs.org/patch/539059/) which
>>>>>> suggested that Michael instead test the generic bitflips
>>>>>> recovery that is implemented by Boris "mtd: nand: properly
>>>>>> handle bitflips in erased pages" patchset
>>>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
>>>>>> I would like to test Boris patchset but first I need to
>>>>>> recreate the error that his patch is fixing.
>>>>>>
>>>>>> The error that the patchset is attempting to fix isn't
>>>>>> something I have ever encountered before. Currently I am
>>>>>> trying to reproduce this issue on a TI K2E evm that uses the
>>>>>> davinci nand driver. I flashed the nand's file-system
>>>>>> partition with a ubi filesystem and the board is currently
>>>>>> set to boot using the file-system on the nand. After about
>>>>>> 60 secs I cut the power from the board and boot the board
>>>>>> again. What I would expect is that the board will eventually
>>>>>> fail to mount the ubi filesystem but currently the board has
>>>>>> ran for over 24 hours and powered on and off over 1400 times
>>>>>> and its still mounting the file-system perfectly fine.
>>>>>>
>>>>>> Any suggestions on a test case that I can use to force the
>>>>>> empty page bit flips error?
>>>>>>
>>>>>>
>>>>> The davinci driver seems to support raw accesses, so you can try to
>>>>> apply this patch [1] against the mtd-utils tree (not sure it still
>>>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
>>>>> nandflipbits tool:
>>>>>
>>>>> # flash_erase /dev/mtdX <offset> 1
>>>>> # nandflipbits /dev/mtdX 1@<offset>
>>>>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
>>>>>
>>>>> Without the patch, nanddump should complain about uncorrectable errors,
>>>>> and if you hexdump /dev/dump you should see the bitflip.
>>>>> If nanddump does not complain after applying my patch, then it means it
>>>>> fixes the "bitflips in erased pages" bug.
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Boris
>>>>>
>>>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
>>>> Hi Boris,
>>>>
>>>> Thanks for the quick reply. I built mtd-utils with your
>>>> patch and ran the suggested commands on a 4.1 based kernel
>>>> without your kernel patchset and I didn't see your expected
>>>> output. The 4.1 based kernel hasn't had any changes to
>>>> davinci_nand or nand subsystem that would address this
>>>> bitflip error.
>>>>
>>>> I'm currently going to attempt to run the same test on the
>>>> latest mainline.
>>>>
>>>> Here is the output I received when I ran your suggested
>>>> commands on the 4.1 based kernel.Any
>>>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
>>>> Erasing 128 Kibyte @ 0 -- 100 % complete
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
>>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
>>>> /dev/mtd4
>>>> ECC failed: 0
>>>> ECC corrected: 0
>>>> Number of bad blocks: 0
>>>> Number of bbt blocks: 4
>>>> Block size 131072, page size 2048, OOB size 64
>>>> root@k2e-evm:~# hexdump /tmp/dump
>>>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
>>>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
>>>> *
>>>> 0000800
>>>>
>>>> Any thoughts on why I'm not seeing the expected error?
>>>>
>>> Oh, actually this behavior is explained in the commit message:
>>>
>>> "Currently empty page bit flips are not corrected and report 0 errors."
>>>
>>> Which explains why you're seeing the bitflip in the dump, but nothing
>>> reported by the MTD layer.
>>>
>>> After applying my patch, the bitflip should simply disappear. You can
>>> then try to generate more bitflips than the engine can actually fix
>>> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
>>> reports an uncorrectable error.
>> I verified that I am indeed using ecc4bit mode.
>>
>> I attempted to run the series of nandflipsbits as you
>> suggested but I get "invalid bit description" error from the
>> utility. Some reason I can only use the nandflipsbits
>> utility for bits 1-7. Anything higher and I get the "Invalid
>> bit description" error.
> Indeed. I developed that tool a long time ago and didn't remember that
> the bit field is encoding the bit offset within a byte. This command
> should work.
>
> nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
>
>> On the latest master commit I ran nandflipsbits for bits 1-7
>> at address 0. However, I still didn't receive any error from
>> nanddump although I do see the flip bits from the hexdump
>> /tmp/dump output.
> How many of them do you see?
>
>> I then applied your patchset ontop of the latest mainline
>> and ran nandflipsbits for bits 1-7 at address 0.
>> I get the below output which seems to be correct.
>>
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
>> /dev/mtd4
>>
>> ECC failed: 1
>> ECC corrected: 18
>> Number of bad blocks: 0
>> Number of bbt blocks: 4
>> Block size 131072, page size 2048, OOB size 64
>> Dumping data starting at 0x00000000 and ending at 0x00000800...
>> ECC: 4 corrected bitflip(s) at offset 0x00000000
>> root@k2e-evm:~# hexdump /tmp/dump
>> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
>> *
>> 0000800
> Hm, that's weird. You should get an ECC failure since the ECC strength
> is only 4bits/512byte and you 8 bits have been flipped.
>
>> One thing that confuses me is if I repeatedly call nanddump
>> I continue to get the "ECC: 4 corrected bitflips" message
>> and the "ECC corrected" count increases by 4 each time. If
>> these bits are being corrected which is apparent from
>> looking at the output of nanddump shouldn't sequential calls
>> indicate that no bitflips needed to be corrected since it
>> was corrected previously?
> Nope, they're corrected on the fly and only in RAM, so each time you
> read the page, you'll have to fix the bitflips until you erase and
> rewrite the faulty block.
>
>
Hi Boris,
Here is the entire output that should answer your questions.
In the log I am running the following commands:
flash_erase /dev/mtd4 0 0
./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
hexdump /tmp/dump
./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
hexdump /tmp/dump
Output on mainline kernel without bitflip correction patches:
http://pastebin.com/MgBVxALR
Output on mainline kernel with bitflip correction patches:
http://pastebin.com/NdKv0NhV
Some reason I'm only getting 1 bit being corrected when
using the bitflip correction patches. Comparing my logs from
before to now the only difference I'm seeing is that ECC
failed is increasing but ECC corrected isn't changing.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 17:45 ` Franklin S Cooper Jr.
@ 2015-12-30 17:53 ` Boris Brezillon
2015-12-30 18:07 ` Franklin S Cooper Jr.
0 siblings, 1 reply; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 17:53 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
On Wed, 30 Dec 2015 11:45:38 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>
> On 12/30/2015 10:59 AM, Boris Brezillon wrote:
> > On Wed, 30 Dec 2015 10:40:49 -0600
> > "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >
> >>
> >> On 12/30/2015 10:02 AM, Boris Brezillon wrote:
> >>> On Wed, 30 Dec 2015 09:33:52 -0600
> >>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>
> >>>> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> >>>>> Hi Franklin,
> >>>>>
> >>>>> On Wed, 30 Dec 2015 08:10:20 -0600
> >>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>>>
> >>>>>> I am trying to follow up on this discussion from this patch
> >>>>>> set (https://patchwork.ozlabs.org/patch/539059/) which
> >>>>>> suggested that Michael instead test the generic bitflips
> >>>>>> recovery that is implemented by Boris "mtd: nand: properly
> >>>>>> handle bitflips in erased pages" patchset
> >>>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> >>>>>> I would like to test Boris patchset but first I need to
> >>>>>> recreate the error that his patch is fixing.
> >>>>>>
> >>>>>> The error that the patchset is attempting to fix isn't
> >>>>>> something I have ever encountered before. Currently I am
> >>>>>> trying to reproduce this issue on a TI K2E evm that uses the
> >>>>>> davinci nand driver. I flashed the nand's file-system
> >>>>>> partition with a ubi filesystem and the board is currently
> >>>>>> set to boot using the file-system on the nand. After about
> >>>>>> 60 secs I cut the power from the board and boot the board
> >>>>>> again. What I would expect is that the board will eventually
> >>>>>> fail to mount the ubi filesystem but currently the board has
> >>>>>> ran for over 24 hours and powered on and off over 1400 times
> >>>>>> and its still mounting the file-system perfectly fine.
> >>>>>>
> >>>>>> Any suggestions on a test case that I can use to force the
> >>>>>> empty page bit flips error?
> >>>>>>
> >>>>>>
> >>>>> The davinci driver seems to support raw accesses, so you can try to
> >>>>> apply this patch [1] against the mtd-utils tree (not sure it still
> >>>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
> >>>>> nandflipbits tool:
> >>>>>
> >>>>> # flash_erase /dev/mtdX <offset> 1
> >>>>> # nandflipbits /dev/mtdX 1@<offset>
> >>>>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
> >>>>>
> >>>>> Without the patch, nanddump should complain about uncorrectable errors,
> >>>>> and if you hexdump /dev/dump you should see the bitflip.
> >>>>> If nanddump does not complain after applying my patch, then it means it
> >>>>> fixes the "bitflips in erased pages" bug.
> >>>>>
> >>>>> Best Regards,
> >>>>>
> >>>>> Boris
> >>>>>
> >>>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
> >>>> Hi Boris,
> >>>>
> >>>> Thanks for the quick reply. I built mtd-utils with your
> >>>> patch and ran the suggested commands on a 4.1 based kernel
> >>>> without your kernel patchset and I didn't see your expected
> >>>> output. The 4.1 based kernel hasn't had any changes to
> >>>> davinci_nand or nand subsystem that would address this
> >>>> bitflip error.
> >>>>
> >>>> I'm currently going to attempt to run the same test on the
> >>>> latest mainline.
> >>>>
> >>>> Here is the output I received when I ran your suggested
> >>>> commands on the 4.1 based kernel.Any
> >>>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
> >>>> Erasing 128 Kibyte @ 0 -- 100 % complete
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
> >>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
> >>>> /dev/mtd4
> >>>> ECC failed: 0
> >>>> ECC corrected: 0
> >>>> Number of bad blocks: 0
> >>>> Number of bbt blocks: 4
> >>>> Block size 131072, page size 2048, OOB size 64
> >>>> root@k2e-evm:~# hexdump /tmp/dump
> >>>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
> >>>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
> >>>> *
> >>>> 0000800
> >>>>
> >>>> Any thoughts on why I'm not seeing the expected error?
> >>>>
> >>> Oh, actually this behavior is explained in the commit message:
> >>>
> >>> "Currently empty page bit flips are not corrected and report 0 errors."
> >>>
> >>> Which explains why you're seeing the bitflip in the dump, but nothing
> >>> reported by the MTD layer.
> >>>
> >>> After applying my patch, the bitflip should simply disappear. You can
> >>> then try to generate more bitflips than the engine can actually fix
> >>> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
> >>> reports an uncorrectable error.
> >> I verified that I am indeed using ecc4bit mode.
> >>
> >> I attempted to run the series of nandflipsbits as you
> >> suggested but I get "invalid bit description" error from the
> >> utility. Some reason I can only use the nandflipsbits
> >> utility for bits 1-7. Anything higher and I get the "Invalid
> >> bit description" error.
> > Indeed. I developed that tool a long time ago and didn't remember that
> > the bit field is encoding the bit offset within a byte. This command
> > should work.
> >
> > nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
> >
> >> On the latest master commit I ran nandflipsbits for bits 1-7
> >> at address 0. However, I still didn't receive any error from
> >> nanddump although I do see the flip bits from the hexdump
> >> /tmp/dump output.
> > How many of them do you see?
> >
> >> I then applied your patchset ontop of the latest mainline
> >> and ran nandflipsbits for bits 1-7 at address 0.
> >> I get the below output which seems to be correct.
> >>
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
> >> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
> >> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
> >> /dev/mtd4
> >>
> >> ECC failed: 1
> >> ECC corrected: 18
> >> Number of bad blocks: 0
> >> Number of bbt blocks: 4
> >> Block size 131072, page size 2048, OOB size 64
> >> Dumping data starting at 0x00000000 and ending at 0x00000800...
> >> ECC: 4 corrected bitflip(s) at offset 0x00000000
> >> root@k2e-evm:~# hexdump /tmp/dump
> >> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
> >> *
> >> 0000800
> > Hm, that's weird. You should get an ECC failure since the ECC strength
> > is only 4bits/512byte and you 8 bits have been flipped.
> >
> >> One thing that confuses me is if I repeatedly call nanddump
> >> I continue to get the "ECC: 4 corrected bitflips" message
> >> and the "ECC corrected" count increases by 4 each time. If
> >> these bits are being corrected which is apparent from
> >> looking at the output of nanddump shouldn't sequential calls
> >> indicate that no bitflips needed to be corrected since it
> >> was corrected previously?
> > Nope, they're corrected on the fly and only in RAM, so each time you
> > read the page, you'll have to fix the bitflips until you erase and
> > rewrite the faulty block.
> >
> >
>
> Hi Boris,
>
> Here is the entire output that should answer your questions.
>
> In the log I am running the following commands:
> flash_erase /dev/mtd4 0 0
> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> hexdump /tmp/dump
> ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> hexdump /tmp/dump
>
> Output on mainline kernel without bitflip correction patches:
> http://pastebin.com/MgBVxALR
>
> Output on mainline kernel with bitflip correction patches:
> http://pastebin.com/NdKv0NhV
>
> Some reason I'm only getting 1 bit being corrected when
> using the bitflip correction patches. Comparing my logs from
> before to now the only difference I'm seeing is that ECC
> failed is increasing but ECC corrected isn't changing.
>
That's what I was expecting: your ECC engine is only fixing
4bits/512byte, which is why the bitflip in erased page correction fail
when you have more than 4 bits flipped in a given 512byte block.
Now try to flip only 4 bits instead of 5:
./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 17:53 ` Boris Brezillon
@ 2015-12-30 18:07 ` Franklin S Cooper Jr.
2015-12-30 19:43 ` Boris Brezillon
0 siblings, 1 reply; 13+ messages in thread
From: Franklin S Cooper Jr. @ 2015-12-30 18:07 UTC (permalink / raw)
To: Boris Brezillon; +Cc: computersforpeace, linux-mtd
On 12/30/2015 11:53 AM, Boris Brezillon wrote:
> On Wed, 30 Dec 2015 11:45:38 -0600
> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>>
>> On 12/30/2015 10:59 AM, Boris Brezillon wrote:
>>> On Wed, 30 Dec 2015 10:40:49 -0600
>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>
>>>> On 12/30/2015 10:02 AM, Boris Brezillon wrote:
>>>>> On Wed, 30 Dec 2015 09:33:52 -0600
>>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>>>
>>>>>> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
>>>>>>> Hi Franklin,
>>>>>>>
>>>>>>> On Wed, 30 Dec 2015 08:10:20 -0600
>>>>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>>>>>>>
>>>>>>>> I am trying to follow up on this discussion from this patch
>>>>>>>> set (https://patchwork.ozlabs.org/patch/539059/) which
>>>>>>>> suggested that Michael instead test the generic bitflips
>>>>>>>> recovery that is implemented by Boris "mtd: nand: properly
>>>>>>>> handle bitflips in erased pages" patchset
>>>>>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
>>>>>>>> I would like to test Boris patchset but first I need to
>>>>>>>> recreate the error that his patch is fixing.
>>>>>>>>
>>>>>>>> The error that the patchset is attempting to fix isn't
>>>>>>>> something I have ever encountered before. Currently I am
>>>>>>>> trying to reproduce this issue on a TI K2E evm that uses the
>>>>>>>> davinci nand driver. I flashed the nand's file-system
>>>>>>>> partition with a ubi filesystem and the board is currently
>>>>>>>> set to boot using the file-system on the nand. After about
>>>>>>>> 60 secs I cut the power from the board and boot the board
>>>>>>>> again. What I would expect is that the board will eventually
>>>>>>>> fail to mount the ubi filesystem but currently the board has
>>>>>>>> ran for over 24 hours and powered on and off over 1400 times
>>>>>>>> and its still mounting the file-system perfectly fine.
>>>>>>>>
>>>>>>>> Any suggestions on a test case that I can use to force the
>>>>>>>> empty page bit flips error?
>>>>>>>>
>>>>>>>>
>>>>>>> The davinci driver seems to support raw accesses, so you can try to
>>>>>>> apply this patch [1] against the mtd-utils tree (not sure it still
>>>>>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
>>>>>>> nandflipbits tool:
>>>>>>>
>>>>>>> # flash_erase /dev/mtdX <offset> 1
>>>>>>> # nandflipbits /dev/mtdX 1@<offset>
>>>>>>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
>>>>>>>
>>>>>>> Without the patch, nanddump should complain about uncorrectable errors,
>>>>>>> and if you hexdump /dev/dump you should see the bitflip.
>>>>>>> If nanddump does not complain after applying my patch, then it means it
>>>>>>> fixes the "bitflips in erased pages" bug.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Boris
>>>>>>>
>>>>>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
>>>>>> Hi Boris,
>>>>>>
>>>>>> Thanks for the quick reply. I built mtd-utils with your
>>>>>> patch and ran the suggested commands on a 4.1 based kernel
>>>>>> without your kernel patchset and I didn't see your expected
>>>>>> output. The 4.1 based kernel hasn't had any changes to
>>>>>> davinci_nand or nand subsystem that would address this
>>>>>> bitflip error.
>>>>>>
>>>>>> I'm currently going to attempt to run the same test on the
>>>>>> latest mainline.
>>>>>>
>>>>>> Here is the output I received when I ran your suggested
>>>>>> commands on the 4.1 based kernel.Any
>>>>>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
>>>>>> Erasing 128 Kibyte @ 0 -- 100 % complete
>>>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
>>>>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
>>>>>> /dev/mtd4
>>>>>> ECC failed: 0
>>>>>> ECC corrected: 0
>>>>>> Number of bad blocks: 0
>>>>>> Number of bbt blocks: 4
>>>>>> Block size 131072, page size 2048, OOB size 64
>>>>>> root@k2e-evm:~# hexdump /tmp/dump
>>>>>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
>>>>>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
>>>>>> *
>>>>>> 0000800
>>>>>>
>>>>>> Any thoughts on why I'm not seeing the expected error?
>>>>>>
>>>>> Oh, actually this behavior is explained in the commit message:
>>>>>
>>>>> "Currently empty page bit flips are not corrected and report 0 errors."
>>>>>
>>>>> Which explains why you're seeing the bitflip in the dump, but nothing
>>>>> reported by the MTD layer.
>>>>>
>>>>> After applying my patch, the bitflip should simply disappear. You can
>>>>> then try to generate more bitflips than the engine can actually fix
>>>>> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
>>>>> reports an uncorrectable error.
>>>> I verified that I am indeed using ecc4bit mode.
>>>>
>>>> I attempted to run the series of nandflipsbits as you
>>>> suggested but I get "invalid bit description" error from the
>>>> utility. Some reason I can only use the nandflipsbits
>>>> utility for bits 1-7. Anything higher and I get the "Invalid
>>>> bit description" error.
>>> Indeed. I developed that tool a long time ago and didn't remember that
>>> the bit field is encoding the bit offset within a byte. This command
>>> should work.
>>>
>>> nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
>>>
>>>> On the latest master commit I ran nandflipsbits for bits 1-7
>>>> at address 0. However, I still didn't receive any error from
>>>> nanddump although I do see the flip bits from the hexdump
>>>> /tmp/dump output.
>>> How many of them do you see?
>>>
>>>> I then applied your patchset ontop of the latest mainline
>>>> and ran nandflipsbits for bits 1-7 at address 0.
>>>> I get the below output which seems to be correct.
>>>>
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
>>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
>>>> /dev/mtd4
>>>>
>>>> ECC failed: 1
>>>> ECC corrected: 18
>>>> Number of bad blocks: 0
>>>> Number of bbt blocks: 4
>>>> Block size 131072, page size 2048, OOB size 64
>>>> Dumping data starting at 0x00000000 and ending at 0x00000800...
>>>> ECC: 4 corrected bitflip(s) at offset 0x00000000
>>>> root@k2e-evm:~# hexdump /tmp/dump
>>>> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
>>>> *
>>>> 0000800
>>> Hm, that's weird. You should get an ECC failure since the ECC strength
>>> is only 4bits/512byte and you 8 bits have been flipped.
>>>
>>>> One thing that confuses me is if I repeatedly call nanddump
>>>> I continue to get the "ECC: 4 corrected bitflips" message
>>>> and the "ECC corrected" count increases by 4 each time. If
>>>> these bits are being corrected which is apparent from
>>>> looking at the output of nanddump shouldn't sequential calls
>>>> indicate that no bitflips needed to be corrected since it
>>>> was corrected previously?
>>> Nope, they're corrected on the fly and only in RAM, so each time you
>>> read the page, you'll have to fix the bitflips until you erase and
>>> rewrite the faulty block.
>>>
>>>
>> Hi Boris,
>>
>> Here is the entire output that should answer your questions.
>>
>> In the log I am running the following commands:
>> flash_erase /dev/mtd4 0 0
>> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
>> hexdump /tmp/dump
>> ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
>> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
>> hexdump /tmp/dump
>>
>> Output on mainline kernel without bitflip correction patches:
>> http://pastebin.com/MgBVxALR
>>
>> Output on mainline kernel with bitflip correction patches:
>> http://pastebin.com/NdKv0NhV
>>
>> Some reason I'm only getting 1 bit being corrected when
>> using the bitflip correction patches. Comparing my logs from
>> before to now the only difference I'm seeing is that ECC
>> failed is increasing but ECC corrected isn't changing.
>>
> That's what I was expecting: your ECC engine is only fixing
> 4bits/512byte, which is why the bitflip in erased page correction fail
> when you have more than 4 bits flipped in a given 512byte block.
>
> Now try to flip only 4 bits instead of 5:
>
> ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46
Here is the output:
root@k2e-evm:~/# ./flash_erase /dev/mtd4 0 1
root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
ECC failed: 5
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 4
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00000800...
root@k2e-evm:~/# hexdump /tmp/dump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000800
root@k2e-evm:~/# ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46
root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
ECC failed: 5
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 4
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00000800...
ECC: 4 corrected bitflip(s) at offset 0x00000000
root@k2e-evm:~/# hexdump /tmp/dump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000800
Running nanddump again shows that 4 bits were corrected.
So it seems like things are working as expected.
It seems like patches 2-5 from your patchset weren't pulled
in because you and Brian wanted more testing on other
platforms. If your going to submit a rev 4 please feel free
to CC me so I can test the patches out and add a Tested-by.
If not feel free to add a Tested-by for your current rev 3
patchset or if you can bounce those emails my way I can add
it myself. Which ever approach you prefer.
Thank you for your help and let me know if there is any
further test you would like me to run.
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Testing generic empty page bit flips recovery
2015-12-30 18:07 ` Franklin S Cooper Jr.
@ 2015-12-30 19:43 ` Boris Brezillon
0 siblings, 0 replies; 13+ messages in thread
From: Boris Brezillon @ 2015-12-30 19:43 UTC (permalink / raw)
To: Franklin S Cooper Jr.; +Cc: computersforpeace, linux-mtd
On Wed, 30 Dec 2015 12:07:44 -0600
"Franklin S Cooper Jr." <fcooper@ti.com> wrote:
>
>
> On 12/30/2015 11:53 AM, Boris Brezillon wrote:
> > On Wed, 30 Dec 2015 11:45:38 -0600
> > "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >
> >>
> >> On 12/30/2015 10:59 AM, Boris Brezillon wrote:
> >>> On Wed, 30 Dec 2015 10:40:49 -0600
> >>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>
> >>>> On 12/30/2015 10:02 AM, Boris Brezillon wrote:
> >>>>> On Wed, 30 Dec 2015 09:33:52 -0600
> >>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>>>
> >>>>>> On 12/30/2015 08:40 AM, Boris Brezillon wrote:
> >>>>>>> Hi Franklin,
> >>>>>>>
> >>>>>>> On Wed, 30 Dec 2015 08:10:20 -0600
> >>>>>>> "Franklin S Cooper Jr." <fcooper@ti.com> wrote:
> >>>>>>>
> >>>>>>>> I am trying to follow up on this discussion from this patch
> >>>>>>>> set (https://patchwork.ozlabs.org/patch/539059/) which
> >>>>>>>> suggested that Michael instead test the generic bitflips
> >>>>>>>> recovery that is implemented by Boris "mtd: nand: properly
> >>>>>>>> handle bitflips in erased pages" patchset
> >>>>>>>> (http://lists.infradead.org/pipermail/linux-mtd/2015-September/061617.html).
> >>>>>>>> I would like to test Boris patchset but first I need to
> >>>>>>>> recreate the error that his patch is fixing.
> >>>>>>>>
> >>>>>>>> The error that the patchset is attempting to fix isn't
> >>>>>>>> something I have ever encountered before. Currently I am
> >>>>>>>> trying to reproduce this issue on a TI K2E evm that uses the
> >>>>>>>> davinci nand driver. I flashed the nand's file-system
> >>>>>>>> partition with a ubi filesystem and the board is currently
> >>>>>>>> set to boot using the file-system on the nand. After about
> >>>>>>>> 60 secs I cut the power from the board and boot the board
> >>>>>>>> again. What I would expect is that the board will eventually
> >>>>>>>> fail to mount the ubi filesystem but currently the board has
> >>>>>>>> ran for over 24 hours and powered on and off over 1400 times
> >>>>>>>> and its still mounting the file-system perfectly fine.
> >>>>>>>>
> >>>>>>>> Any suggestions on a test case that I can use to force the
> >>>>>>>> empty page bit flips error?
> >>>>>>>>
> >>>>>>>>
> >>>>>>> The davinci driver seems to support raw accesses, so you can try to
> >>>>>>> apply this patch [1] against the mtd-utils tree (not sure it still
> >>>>>>> applies cleany, but it should work with mtd-utils-1.5.1), and use the
> >>>>>>> nandflipbits tool:
> >>>>>>>
> >>>>>>> # flash_erase /dev/mtdX <offset> 1
> >>>>>>> # nandflipbits /dev/mtdX 1@<offset>
> >>>>>>> # nanddump -f /tmp/dump -s <offset> -l <page-size> /dev/mtdX
> >>>>>>>
> >>>>>>> Without the patch, nanddump should complain about uncorrectable errors,
> >>>>>>> and if you hexdump /dev/dump you should see the bitflip.
> >>>>>>> If nanddump does not complain after applying my patch, then it means it
> >>>>>>> fixes the "bitflips in erased pages" bug.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>>
> >>>>>>> Boris
> >>>>>>>
> >>>>>>> [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
> >>>>>> Hi Boris,
> >>>>>>
> >>>>>> Thanks for the quick reply. I built mtd-utils with your
> >>>>>> patch and ran the suggested commands on a 4.1 based kernel
> >>>>>> without your kernel patchset and I didn't see your expected
> >>>>>> output. The 4.1 based kernel hasn't had any changes to
> >>>>>> davinci_nand or nand subsystem that would address this
> >>>>>> bitflip error.
> >>>>>>
> >>>>>> I'm currently going to attempt to run the same test on the
> >>>>>> latest mainline.
> >>>>>>
> >>>>>> Here is the output I received when I ran your suggested
> >>>>>> commands on the 4.1 based kernel.Any
> >>>>>> root@k2e-evm:~# ./flash_erase /dev/mtd4 4096 1
> >>>>>> Erasing 128 Kibyte @ 0 -- 100 % complete
> >>>>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@4096
> >>>>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 4096 -l 2048
> >>>>>> /dev/mtd4
> >>>>>> ECC failed: 0
> >>>>>> ECC corrected: 0
> >>>>>> Number of bad blocks: 0
> >>>>>> Number of bbt blocks: 4
> >>>>>> Block size 131072, page size 2048, OOB size 64
> >>>>>> root@k2e-evm:~# hexdump /tmp/dump
> >>>>>> 0000000 fffd ffff ffff ffff ffff ffff ffff ffff
> >>>>>> 0000010 ffff ffff ffff ffff ffff ffff ffff ffff
> >>>>>> *
> >>>>>> 0000800
> >>>>>>
> >>>>>> Any thoughts on why I'm not seeing the expected error?
> >>>>>>
> >>>>> Oh, actually this behavior is explained in the commit message:
> >>>>>
> >>>>> "Currently empty page bit flips are not corrected and report 0 errors."
> >>>>>
> >>>>> Which explains why you're seeing the bitflip in the dump, but nothing
> >>>>> reported by the MTD layer.
> >>>>>
> >>>>> After applying my patch, the bitflip should simply disappear. You can
> >>>>> then try to generate more bitflips than the engine can actually fix
> >>>>> (nandflipbits /dev/mtd4 1@0:5@0:49@0:98@0:132@0) and check that MTD
> >>>>> reports an uncorrectable error.
> >>>> I verified that I am indeed using ecc4bit mode.
> >>>>
> >>>> I attempted to run the series of nandflipsbits as you
> >>>> suggested but I get "invalid bit description" error from the
> >>>> utility. Some reason I can only use the nandflipsbits
> >>>> utility for bits 1-7. Anything higher and I get the "Invalid
> >>>> bit description" error.
> >>> Indeed. I developed that tool a long time ago and didn't remember that
> >>> the bit field is encoding the bit offset within a byte. This command
> >>> should work.
> >>>
> >>> nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
> >>>
> >>>> On the latest master commit I ran nandflipsbits for bits 1-7
> >>>> at address 0. However, I still didn't receive any error from
> >>>> nanddump although I do see the flip bits from the hexdump
> >>>> /tmp/dump output.
> >>> How many of them do you see?
> >>>
> >>>> I then applied your patchset ontop of the latest mainline
> >>>> and ran nandflipsbits for bits 1-7 at address 0.
> >>>> I get the below output which seems to be correct.
> >>>>
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 1@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 2@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 3@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 4@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 5@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 6@0
> >>>> root@k2e-evm:~# ./nandflipbits /dev/mtd4 7@0
> >>>> root@k2e-evm:~# ./nanddump -f /tmp/dump -s 0 -l 2048
> >>>> /dev/mtd4
> >>>>
> >>>> ECC failed: 1
> >>>> ECC corrected: 18
> >>>> Number of bad blocks: 0
> >>>> Number of bbt blocks: 4
> >>>> Block size 131072, page size 2048, OOB size 64
> >>>> Dumping data starting at 0x00000000 and ending at 0x00000800...
> >>>> ECC: 4 corrected bitflip(s) at offset 0x00000000
> >>>> root@k2e-evm:~# hexdump /tmp/dump
> >>>> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
> >>>> *
> >>>> 0000800
> >>> Hm, that's weird. You should get an ECC failure since the ECC strength
> >>> is only 4bits/512byte and you 8 bits have been flipped.
> >>>
> >>>> One thing that confuses me is if I repeatedly call nanddump
> >>>> I continue to get the "ECC: 4 corrected bitflips" message
> >>>> and the "ECC corrected" count increases by 4 each time. If
> >>>> these bits are being corrected which is apparent from
> >>>> looking at the output of nanddump shouldn't sequential calls
> >>>> indicate that no bitflips needed to be corrected since it
> >>>> was corrected previously?
> >>> Nope, they're corrected on the fly and only in RAM, so each time you
> >>> read the page, you'll have to fix the bitflips until you erase and
> >>> rewrite the faulty block.
> >>>
> >>>
> >> Hi Boris,
> >>
> >> Here is the entire output that should answer your questions.
> >>
> >> In the log I am running the following commands:
> >> flash_erase /dev/mtd4 0 0
> >> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> >> hexdump /tmp/dump
> >> ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46:5@47
> >> ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> >> hexdump /tmp/dump
> >>
> >> Output on mainline kernel without bitflip correction patches:
> >> http://pastebin.com/MgBVxALR
> >>
> >> Output on mainline kernel with bitflip correction patches:
> >> http://pastebin.com/NdKv0NhV
> >>
> >> Some reason I'm only getting 1 bit being corrected when
> >> using the bitflip correction patches. Comparing my logs from
> >> before to now the only difference I'm seeing is that ECC
> >> failed is increasing but ECC corrected isn't changing.
> >>
> > That's what I was expecting: your ECC engine is only fixing
> > 4bits/512byte, which is why the bitflip in erased page correction fail
> > when you have more than 4 bits flipped in a given 512byte block.
> >
> > Now try to flip only 4 bits instead of 5:
> >
> > ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46
>
> Here is the output:
> root@k2e-evm:~/# ./flash_erase /dev/mtd4 0 1
> root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> ECC failed: 5
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 4
> Block size 131072, page size 2048, OOB size 64
> Dumping data starting at 0x00000000 and ending at 0x00000800...
> root@k2e-evm:~/# hexdump /tmp/dump
> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
> *
> 0000800
> root@k2e-evm:~/# ./nandflipbits /dev/mtd4 1@0:5@0:7@30:3@46
> root@k2e-evm:~/# ./nanddump -f /tmp/dump -s 0 -l 2048 /dev/mtd4
> ECC failed: 5
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 4
> Block size 131072, page size 2048, OOB size 64
> Dumping data starting at 0x00000000 and ending at 0x00000800...
> ECC: 4 corrected bitflip(s) at offset 0x00000000
> root@k2e-evm:~/# hexdump /tmp/dump
> 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
> *
> 0000800
>
> Running nanddump again shows that 4 bits were corrected.
> So it seems like things are working as expected.
>
> It seems like patches 2-5 from your patchset weren't pulled
> in because you and Brian wanted more testing on other
> platforms. If your going to submit a rev 4 please feel free
> to CC me so I can test the patches out and add a Tested-by.
Just sent a v4. Feel free to test it and add your
Tested-by/Acked-by/Reviewed-by.
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-12-30 19:44 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-30 14:10 Testing generic empty page bit flips recovery Franklin S Cooper Jr.
2015-12-30 14:40 ` Boris Brezillon
2015-12-30 15:33 ` Franklin S Cooper Jr.
2015-12-30 15:55 ` Boris Brezillon
2015-12-30 16:02 ` Boris Brezillon
2015-12-30 16:40 ` Franklin S Cooper Jr.
2015-12-30 16:52 ` Steve deRosier
2015-12-30 17:02 ` Franklin S Cooper Jr.
2015-12-30 16:59 ` Boris Brezillon
2015-12-30 17:45 ` Franklin S Cooper Jr.
2015-12-30 17:53 ` Boris Brezillon
2015-12-30 18:07 ` Franklin S Cooper Jr.
2015-12-30 19:43 ` Boris Brezillon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).