public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* [question]MTD:unstable bit issues?
@ 2012-10-24  5:14 hejianet
  2012-10-24  8:13 ` Thomas.Betker
  2012-11-12 15:37 ` Artem Bityutskiy
  0 siblings, 2 replies; 8+ messages in thread
From: hejianet @ 2012-10-24  5:14 UTC (permalink / raw)
  To: linux-mtd

My hardware: vcs452 nandflash
my kernel:2.6.16(I know it is too old:(,but this problem seems not
relevant to the kernel version)

test steps:
1.copy a 100M file to the nand flash partition from remote client
2.use fsync to flush the mem to flash
3.use md5 to check whether the original file is different from the
copied one
4.repeat power-cuts test for 5 times, when booting up, it will check
the md5 again.
5.do step1 again

after a whole night test,(power-cut for 100 times), the step 4 failed.
1)there are some bytes changed:
-----------------------------------------------------
-original-file D0 40 E8 3E 00 00 BA 1A D0 40 0A 00 43 00 00 00
+file-in-flash D0 40 0F 41 00 00 BA 1A D0 40 0A 00 43 00 00 00
----------------------------------------------------- 
2)a 4k bytes hole with all 0 in the flash file.
-----------------------------------------------------
-original-file DD AF C8 EE 57 EB 84 CE A1 7F B1 38 E7 22 51 2F
-    94 AE 77 5B 48 EA 61 8C 09 BB C5 74 6E B1 87 D2 
-    A9 32 8D 2A E7 7F D2 C3 DB 1A 92 E7 66 7C B8 4E 
......
     
+file-in-flash 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
+    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
......
-----------------------------------------------------

I search google and find it might be relevant to "unstable bits
issue"(bit-flip) in nand.
I wonder is the bit-flip the possible reason to cause such a 4k all-zero
hole?
Thanks for any suggestion.

B.R.
Jia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  5:14 [question]MTD:unstable bit issues? hejianet
@ 2012-10-24  8:13 ` Thomas.Betker
  2012-10-24  8:52   ` hejianet
  2012-10-24  9:53   ` Jamie Lokier
  2012-11-12 15:37 ` Artem Bityutskiy
  1 sibling, 2 replies; 8+ messages in thread
From: Thomas.Betker @ 2012-10-24  8:13 UTC (permalink / raw)
  To: hejianet; +Cc: linux-mtd

Hello Jia:

> ----------------------------------------------------- 
> 2)a 4k bytes hole with all 0 in the flash file.
> -----------------------------------------------------
> -original-file DD AF C8 EE 57 EB 84 CE A1 7F B1 38 E7 22 51 2F
> -    94 AE 77 5B 48 EA 61 8C 09 BB C5 74 6E B1 87 D2 
> -    A9 32 8D 2A E7 7F D2 C3 DB 1A 92 E7 66 7C B8 4E 
> ......
> 
> +file-in-flash 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> +    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> +    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> ......
> -----------------------------------------------------

We have also seen this "4k zeros" issue for some time. I never found out 
was was happening because the issue was suddenly no longer reproducible. 
:-(

In our case, though, we didn't have NAND flash, but JFFF2 with serial NOR 
flash. So I would guess that this is not a NAND problem.

Which file system do you run? Is it JFFS2?

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  8:13 ` Thomas.Betker
@ 2012-10-24  8:52   ` hejianet
  2012-10-24  9:01     ` Thomas.Betker
  2012-10-24  9:53   ` Jamie Lokier
  1 sibling, 1 reply; 8+ messages in thread
From: hejianet @ 2012-10-24  8:52 UTC (permalink / raw)
  To: linux-mtd, Thomas.Betker

Hi Thomas
Yes, we use jffs2.
How did u solve your bug? :)
On 2012-10-24 16:13, Thomas.Betker@rohde-schwarz.com wrote:
> Hello Jia:
>
>> ----------------------------------------------------- 
>> 2)a 4k bytes hole with all 0 in the flash file.
>> -----------------------------------------------------
>> -original-file DD AF C8 EE 57 EB 84 CE A1 7F B1 38 E7 22 51 2F
>> -    94 AE 77 5B 48 EA 61 8C 09 BB C5 74 6E B1 87 D2 
>> -    A9 32 8D 2A E7 7F D2 C3 DB 1A 92 E7 66 7C B8 4E 
>> ......
>>
>> +file-in-flash 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> +    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> +    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>> ......
>> -----------------------------------------------------
> We have also seen this "4k zeros" issue for some time. I never found out 
> was was happening because the issue was suddenly no longer reproducible. 
> :-(
>
> In our case, though, we didn't have NAND flash, but JFFF2 with serial NOR 
> flash. So I would guess that this is not a NAND problem.
>
> Which file system do you run? Is it JFFS2?
>
> Best regards,
> Thomas
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  8:52   ` hejianet
@ 2012-10-24  9:01     ` Thomas.Betker
  2012-10-24  9:21       ` hejianet
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas.Betker @ 2012-10-24  9:01 UTC (permalink / raw)
  To: hejianet; +Cc: linux-mtd

Hello Jia:

> Yes, we use jffs2.
> How did u solve your bug? :)

> >> ----------------------------------------------------- 
> >> 2)a 4k bytes hole with all 0 in the flash file.
> >> -----------------------------------------------------

Unfortunately, we didn't solve it. Some day, it was simply gone; I have no 
idea what had changed. I couldn't reproduce it anymore, so I couldn't do 
any further analysis.

Anyway, most likely this is not a NAND issue or MTD issue, but a JFFS2 
issue. And I would certainly want to hear about it if you manage to find 
and fix it ...

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  9:01     ` Thomas.Betker
@ 2012-10-24  9:21       ` hejianet
  2012-11-12 15:39         ` Artem Bityutskiy
  0 siblings, 1 reply; 8+ messages in thread
From: hejianet @ 2012-10-24  9:21 UTC (permalink / raw)
  To: linux-mtd

Hi Thomas
till now, we have no evidence that it is a jffs2.
we only observed a ubi error msg(not 100% sure it caused this hole):

Oct 15 20:09:10 (none) kernel: UBI error: ubi_io_write error -5 while writing 2048 bytes to PEB 1663:20480, written 0 bytes

Besides, it only appeared in power-cut stress cut, so I suspect it is relevant
to "unstable bits issue". from google information, this issue appears in both
nand and nor flash, but more frequently in nand.

On 2012-10-24 17:01, Thomas.Betker@rohde-schwarz.com wrote:
> Hello Jia:
>
>> Yes, we use jffs2.
>> How did u solve your bug? :)
>>>> ----------------------------------------------------- 
>>>> 2)a 4k bytes hole with all 0 in the flash file.
>>>> -----------------------------------------------------
> Unfortunately, we didn't solve it. Some day, it was simply gone; I have no 
> idea what had changed. I couldn't reproduce it anymore, so I couldn't do 
> any further analysis.
>
> Anyway, most likely this is not a NAND issue or MTD issue, but a JFFS2 
> issue. And I would certainly want to hear about it if you manage to find 
> and fix it ...
>
> Best regards,
> Thomas
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  8:13 ` Thomas.Betker
  2012-10-24  8:52   ` hejianet
@ 2012-10-24  9:53   ` Jamie Lokier
  1 sibling, 0 replies; 8+ messages in thread
From: Jamie Lokier @ 2012-10-24  9:53 UTC (permalink / raw)
  To: Thomas.Betker; +Cc: linux-mtd, hejianet

Thomas.Betker@rohde-schwarz.com wrote:
> Hello Jia:
> We have also seen this "4k zeros" issue for some time. I never found out 
> was was happening because the issue was suddenly no longer reproducible. 
> :-(
> 
> In our case, though, we didn't have NAND flash, but JFFF2 with serial NOR 
> flash. So I would guess that this is not a NAND problem.

I've seen it a number of times with JFFS2 with parallel NOR, on
ancient kernels (2.4.26-uc0).  Typically the symptom would be an
executable crashing, and a 4k hole would be causing that.

We always put it down to a faulty board with poor signal and/or timings
to the NOR, as it "seemed" to happen on particular boards more, and
simply removed those boards from service.  However testing wasn't very
systematic.

If JFFS2 sees a corrupt block (detected by CRC), it simply discards
that block from the file data as if it's never been written, making a
hole.  An I/O error would be much nicer than corrupt data, but a hole
is what we get.

It might have been caused by faulty boards but in a different way:
Some of our boards crashed every few weeks because the manufacturer's
tolerances between CPU and DRAM were too tight.  After a very long
time in the field (years), we learned to underclock those slightly.
Maybe, occasional DRAM corruption was also causing occasional NOR
corruption as a side effect (e.g. writing/reading data that didn't
match the CRC during JFFS2 GC), leading to 4k holes in previously good
files (bad but understandable); or maybe occasional crashes caused it
(not acceptable and shouldn't happen).  But I am not sure the two
issues were correlated anyway.

-- Jamie

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  5:14 [question]MTD:unstable bit issues? hejianet
  2012-10-24  8:13 ` Thomas.Betker
@ 2012-11-12 15:37 ` Artem Bityutskiy
  1 sibling, 0 replies; 8+ messages in thread
From: Artem Bityutskiy @ 2012-11-12 15:37 UTC (permalink / raw)
  To: hejianet; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

On Wed, 2012-10-24 at 13:14 +0800, hejianet wrote:
> I search google and find it might be relevant to "unstable bits
> issue"(bit-flip) in nand.
> I wonder is the bit-flip the possible reason to cause such a 4k all-zero
> hole?
> Thanks for any suggestion.

I do not think it is related to unstable bits. It may be related to
JFFS2 having a bug and not flushing the write-buffer on fsync, so fsync
does not work. Check that.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [question]MTD:unstable bit issues?
  2012-10-24  9:21       ` hejianet
@ 2012-11-12 15:39         ` Artem Bityutskiy
  0 siblings, 0 replies; 8+ messages in thread
From: Artem Bityutskiy @ 2012-11-12 15:39 UTC (permalink / raw)
  To: hejianet; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

On Wed, 2012-10-24 at 17:21 +0800, hejianet wrote:
> Hi Thomas
> till now, we have no evidence that it is a jffs2.

> we only observed a ubi error msg(not 100% sure it caused this hole):
> 
> Oct 15 20:09:10 (none) kernel: UBI error: ubi_io_write error -5 while writing 2048 bytes to PEB 1663:20480, written 0 bytes
> 
> Besides, it only appeared in power-cut stress cut, so I suspect it is relevant
> to "unstable bits issue". from google information, this issue appears in both
> nand and nor flash, but more frequently in nand.

Unstable bits issue does not give you tons of zeroes. I really think
that this is either a JFFS2 bug - fsync does not work, or your test
scripts have a bug - you power-cut before fsync or you do fsync, then
write more to this file, then power-cut.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-11-12 15:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-24  5:14 [question]MTD:unstable bit issues? hejianet
2012-10-24  8:13 ` Thomas.Betker
2012-10-24  8:52   ` hejianet
2012-10-24  9:01     ` Thomas.Betker
2012-10-24  9:21       ` hejianet
2012-11-12 15:39         ` Artem Bityutskiy
2012-10-24  9:53   ` Jamie Lokier
2012-11-12 15:37 ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox