public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* Eraseblocks torture: OneNAND results
@ 2006-12-07 14:30 Artem Bityutskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Artem Bityutskiy @ 2006-12-07 14:30 UTC (permalink / raw)
  To: Kyungmin Park; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 2851 bytes --]

Hello Kyungmin,

We have a test board with KFN2G16Q2M 256MiB OneNAND. We decided to write
a torture test which will torture few eraseblocks just in order to see
what happens.

We wrote a small test program which basically erases several eraseblocks
in a cycle till it gets an error. The test program is attached
(torture.git.tar.bz2), and it is also available at
git://git.infradead.org/~dedekind/torture.git.

By default the program does the following:

1. Erases an eraseblock. Reads it back and makes sure there are only
0xFF bytes.
2. Writes 0x55/0xAA pattern. In case of NAND we store 0x55 at one page,
0xAA at the next and so on. Each next erase we switch 0x55 and 0xAA
bytes.
3. Read the eraseblock back and make sure we read the same data.

And so on till an error occurs. Of course we check return codes.

The reason for this test is just because we are curious how our OneNAND
setup behaves in case of worn-out eraseblocks.

We have got kind of strange result. What we have is that after several
million erase cycles we start reading incorrect data back. Sometimes
there are one-bit errors, sometimes many-byte errors. We do not get any
error code from mtd->read(). We do not see single-bit errors corrected.
mtd->write() and mtd->erase() functions do not return any error as well.

Kyungmin, did you do any kind of tests like this? I offer you to try our
test too.

Other people may also try to wear-out few eraseblocks on their devices
and see what happens. Then for example, mount JFFS2 and see what it
says/does.

But please, beware, the test may damage your system so run it only if
you know exactly what you do. Authors are not responsible for any
damaged caused by this test.

----------------------------------------------------------------
[54592.767700] EB torture: Page 0 has 4 bytes/16 bits failing verify,
starting at offset 0x1c
[54592.776214] Offset           Read                   Written
[54592.781860] 0x018:  aa aa aa aa 00 00 00 00  ***   aa aa aa aa aa aa
aa aa
[54592.789123] 0x020:  aa aa aa aa aa aa aa aa        aa aa aa aa aa aa
aa aa
----------------------------------------------------------------
[90073.926055] EB torture: Page 0 has 20 bytes/160 bits failing verify,
starting at offset 0xc
[90073.934661] Offset           Read                   Written
[90073.940490] 0x008:  55 55 55 55 aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.947784] 0x010:  aa aa aa aa aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.954895] 0x018:  aa aa aa aa aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.962158] 0x020:  55 55 55 55 55 55 55 55        55 55 55 55 55 55
55 55
----------------------------------------------------------------


Below go some examples of test failures.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

[-- Attachment #2: torture.git.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 13108 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Eraseblocks torture: OneNAND results
@ 2006-12-08  2:00 Kyungmin Park
  2006-12-08  6:19 ` Artem Bityutskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kyungmin Park @ 2006-12-08  2:00 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Hi Artem,

Okay, I also try to test attached program for this weekend.

However, I have a question
There's some strange pattern in log.
In any case. it can't occur from 0xaa(0b1010) to 0x55(0b0101) since it's impossible to change from 0 to 1 even though it's possible from 1 to 0

Anyway, I also ask the hardware team to check this problem.

And also could you send the chip dump data to me? since it takes a long time to worn-out. I first analyze the worn-out chip data.

If you have any issues or updated news. please let me know. 

Thank you,
Kyungmin Park

------- Original Message -------
Sender : Artem Bityutskiy<dedekind@infradead.org> 
Date   : Dec 07, 2006 23:30
Title  : Eraseblocks torture: OneNAND results

Hello Kyungmin,

We have a test board with KFN2G16Q2M 256MiB OneNAND. We decided to write
a torture test which will torture few eraseblocks just in order to see
what happens.

We wrote a small test program which basically erases several eraseblocks
in a cycle till it gets an error. The test program is attached
(torture.git.tar.bz2), and it is also available at
git://git.infradead.org/~dedekind/torture.git.

By default the program does the following:

1. Erases an eraseblock. Reads it back and makes sure there are only
0xFF bytes.
2. Writes 0x55/0xAA pattern. In case of NAND we store 0x55 at one page,
0xAA at the next and so on. Each next erase we switch 0x55 and 0xAA
bytes.
3. Read the eraseblock back and make sure we read the same data.

And so on till an error occurs. Of course we check return codes.

The reason for this test is just because we are curious how our OneNAND
setup behaves in case of worn-out eraseblocks.

We have got kind of strange result. What we have is that after several
million erase cycles we start reading incorrect data back. Sometimes
there are one-bit errors, sometimes many-byte errors. We do not get any
error code from mtd->read(). We do not see single-bit errors corrected.
mtd->write() and mtd->erase() functions do not return any error as well.

Kyungmin, did you do any kind of tests like this? I offer you to try our
test too.

Other people may also try to wear-out few eraseblocks on their devices
and see what happens. Then for example, mount JFFS2 and see what it
says/does.

But please, beware, the test may damage your system so run it only if
you know exactly what you do. Authors are not responsible for any
damaged caused by this test.

----------------------------------------------------------------
[54592.767700] EB torture: Page 0 has 4 bytes/16 bits failing verify,
starting at offset 0x1c
[54592.776214] Offset           Read                   Written
[54592.781860] 0x018:  aa aa aa aa 00 00 00 00  ***   aa aa aa aa aa aa
aa aa
[54592.789123] 0x020:  aa aa aa aa aa aa aa aa        aa aa aa aa aa aa
aa aa
----------------------------------------------------------------
[90073.926055] EB torture: Page 0 has 20 bytes/160 bits failing verify,
starting at offset 0xc
[90073.934661] Offset           Read                   Written
[90073.940490] 0x008:  55 55 55 55 aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.947784] 0x010:  aa aa aa aa aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.954895] 0x018:  aa aa aa aa aa aa aa aa  ***   55 55 55 55 55 55
55 55
[90073.962158] 0x020:  55 55 55 55 55 55 55 55        55 55 55 55 55 55
55 55
----------------------------------------------------------------


Below go some examples of test failures.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Eraseblocks torture: OneNAND results
@ 2006-12-08  7:42 Kyungmin Park
  2006-12-08  8:08 ` Artem Bityutskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kyungmin Park @ 2006-12-08  7:42 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Hi Artem,

> 2. There is a "check" module option which is enabled by default. It
> slows the test down considerably. So I recommend to disable checking at
> first, run the test for, say 4 million erase cycles, then re-run it with
> checking enabled. So that you first screw up the eraseblocks, then you
> start checking data. There is a handy "cycles_count" option.
>
> 3. By default the test tortures 32 eraseblocks. You may configure this
> via a module parameter. Just glance inside of the torture.c.

Yes I already modified the source for my environment. and also add check initial bad block.

in tort_init()

       while (1) {
                int i;

                for(i = eb; i < eb + ebcnt; i++) {
                        err = ebtest(i);
                        /* Skip initial bad block */
                        if (err == -EFAULT)
                                continue;
                        if (err)
                                break;
                }

in ebtest()

        err = mtd->erase(mtd, &ei);
        if (unlikely(err)) {
                printk(PRINT_PREF "error %d while erasing EB %d\n", err, ebnum);
                /* Initial bad block case */
                if (err == -EIO)
                        err = -EFAULT;
                return err;
        }

> P.S.: Test git: git://git.infradead.org/~dedekind/torture.git

I already downloaded it.

After weekend test. I will send the results.


Thank you,
Kyungmin Park

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Eraseblocks torture: OneNAND results
@ 2006-12-11  8:31 Kyungmin Park
  2006-12-13 13:46 ` Artem Bityutskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kyungmin Park @ 2006-12-11  8:31 UTC (permalink / raw)
  To: linux-mtd

Hi,

For now, there's no special things.

Currently I used the latest torture test file.

I think I need the more time to make a block worn-out.

Thank you,
Kyungmin Park

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Eraseblocks torture: OneNAND results
@ 2006-12-15  5:02 Kyungmin Park
  2006-12-15  7:54 ` Enrico Migliore
  2006-12-21 15:30 ` Jarkko Lavinen
  0 siblings, 2 replies; 16+ messages in thread
From: Kyungmin Park @ 2006-12-15  5:02 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Hi, Artem

> FYI: now I see that the tortured eraseblocks do not contain all 0xFFs
> after erase which is strange - the driver must have returned an error.
> But mtd->erase is totally silent about this. Most probably it is a bug
> in the OneNAND driver.
>
> May you please take a look at onenand_wait() from
> drivers/mtd/onenand/onenand_base.c in mtd-2.6.git. I see the following
> code there:
>
> -----------------------------------------------------------------------
> ctrl = this->read_word(this->base + ONENAND_REG_CTRL_STATUS);
> 
> if (ctrl & ONENAND_CTRL_ERROR) {
>     /* It maybe occur at initial bad block */
>     DEBUG(MTD_DEBUG_LEVEL0, "onenand_wait: controller error = 0x%04x\n",
> ctrl);
>     /* Clear other interrupt bits for preventing ECC error */
>     interrupt &= ONENAND_INT_MASTER;
> }

> AFAIU, this is exactly the place when we should catch erase errors. But
> what we do - we only change local 'interrupt' variable and later return
> 0. So we do not report about errors. This looks suspiciously. May you
> comment on this?

Yes, you're right. onenand_wait has a bug. It don't report the any error. it's my falut.
And also it don't check the locked block error.

The below patch fix the onenand_wait bug. (This is temporary one. I also try to fix another things)

please test this one.
(You may have some parts already. please ignore it)

In my opition, if the block goes worn-out. it occurs as following.
First, 2-bit ecc read error occurs. (bit error)
Second, Write failed. (page error)
Finally, Erase failed. (block error)

Thank you,
Kyungmin Park

P.S., In target environment, it still can't report any error. I'm also surpise with the OneNAND which has good erase guarantee.


--

Index: drivers/mtd/onenand/onenand_base.c
===================================================================
RCS file: /cvsroot/linux-2.6.18-omap/drivers/mtd/onenand/onenand_base.c,v
retrieving revision 1.2
diff -u -p -r1.2 onenand_base.c
--- drivers/mtd/onenand/onenand_base.c	12 Oct 2006 06:59:27 -0000	1.2
+++ drivers/mtd/onenand/onenand_base.c	15 Dec 2006 04:36:02 -0000
@@ -316,22 +316,20 @@ static int onenand_wait(struct mtd_info 
 	ctrl = this->read_word(this->base + ONENAND_REG_CTRL_STATUS);
 
 	if (ctrl & ONENAND_CTRL_ERROR) {
-		/* It maybe occur at initial bad block */
 		DEBUG(MTD_DEBUG_LEVEL0, "onenand_wait: controller error = 0x%04x\n", ctrl);
-		/* Clear other interrupt bits for preventing ECC error */
-		interrupt &= ONENAND_INT_MASTER;
-	}
-
-	if (ctrl & ONENAND_CTRL_LOCK) {
-		DEBUG(MTD_DEBUG_LEVEL0, "onenand_wait: it's locked error = 0x%04x\n", ctrl);
-		return -EACCES;
+		if (ctrl & ONENAND_CTRL_LOCK)
+			DEBUG(MTD_DEBUG_LEVEL0, "onenand_erase: Device is write protected!!!\n");
+		return ctrl;
 	}
 
 	if (interrupt & ONENAND_INT_READ) {
 		ecc = this->read_word(this->base + ONENAND_REG_ECC_STATUS);
-		if (ecc & ONENAND_ECC_2BIT_ALL) {
+		if (ecc) {
 			DEBUG(MTD_DEBUG_LEVEL0, "onenand_wait: ECC error = 0x%04x\n", ecc);
-			return -EBADMSG;
+			if (ecc & ONENAND_ECC_2BIT_ALL)
+				mtd->ecc_stats.failed++;
+			else if (ecc & ONENAND_ECC_1BIT_ALL)
+				mtd->ecc_stats.corrected++;
 		}
 	}
 
@@ -608,6 +606,7 @@ static int onenand_read(struct mtd_info 
 	size_t *retlen, u_char *buf)
 {
 	struct onenand_chip *this = mtd->priv;
+	struct mtd_ecc_stats stats;
 	int read = 0, column;
 	int thislen;
 	int ret = 0;
@@ -626,6 +625,7 @@ static int onenand_read(struct mtd_info 
 
 	/* TODO handling oob */
 
+	stats = mtd->ecc_stats;
 	while (read < len) {
 		thislen = min_t(int, mtd->writesize, len - read);
 
@@ -643,16 +643,16 @@ static int onenand_read(struct mtd_info 
 
 		this->read_bufferram(mtd, ONENAND_DATARAM, buf, column, thislen);
 
-		read += thislen;
-
-		if (read == len)
-			break;
-
 		if (ret) {
 			DEBUG(MTD_DEBUG_LEVEL0, "onenand_read: read failed = %d\n", ret);
 			goto out;
 		}
 
+		read += thislen;
+
+		if (read == len)
+			break;
+
 		from += thislen;
 		buf += thislen;
 	}
@@ -667,7 +667,10 @@ out:
 	 * retlen == desired len and result == -EBADMSG
 	 */
 	*retlen = read;
-	return ret;
+	if (mtd->ecc_stats.failed - stats.failed)
+		return -EBADMSG;
+
+	return mtd->ecc_stats.corrected - stats.corrected ? -EUCLEAN : 0;
 }
 
 /**
@@ -716,15 +719,16 @@ int onenand_do_read_oob(struct mtd_info 
 
 		this->read_bufferram(mtd, ONENAND_SPARERAM, buf, column, thislen);
 
+		if (ret) {
+			DEBUG(MTD_DEBUG_LEVEL0, "onenand_read_oob: read failed = 0x%x\n", ret);
+			goto out;
+		}
+
 		read += thislen;
 
 		if (read == len)
 			break;
 
-		if (ret) {
-			DEBUG(MTD_DEBUG_LEVEL0, "onenand_read_oob: read failed = %d\n", ret);
-			goto out;
-		}
 
 		buf += thislen;
 
@@ -1083,10 +1087,7 @@ static int onenand_erase(struct mtd_info
 		ret = this->wait(mtd, FL_ERASING);
 		/* Check, if it is write protected */
 		if (ret) {
-			if (ret == -EPERM)
-				DEBUG(MTD_DEBUG_LEVEL0, "onenand_erase: Device is write protected!!!\n");
-			else
-				DEBUG(MTD_DEBUG_LEVEL0, "onenand_erase: Failed erase, block %d\n", (unsigned) (addr >> this->erase_shift));
+			DEBUG(MTD_DEBUG_LEVEL0, "onenand_erase: Failed erase, block %d\n", (unsigned) (addr >> this->erase_shift));
 			instr->state = MTD_ERASE_FAILED;
 			instr->fail_addr = addr;
 			goto erase_exit;
Index: drivers/mtd/onenand/onenand_bbt.c
===================================================================
RCS file: /cvsroot/linux-2.6.18-omap/drivers/mtd/onenand/onenand_bbt.c,v
retrieving revision 1.1.1.1
diff -u -p -r1.1.1.1 onenand_bbt.c
--- drivers/mtd/onenand/onenand_bbt.c	12 Oct 2006 05:49:24 -0000	1.1.1.1
+++ drivers/mtd/onenand/onenand_bbt.c	15 Dec 2006 04:36:02 -0000
@@ -93,7 +93,8 @@ static int create_bbt(struct mtd_info *m
 			ret = onenand_do_read_oob(mtd, from + j * mtd->writesize + bd->offs,
 						  readlen, &retlen, &buf[0]);
 
-			if (ret)
+			/* Handle initial bad block */
+			if (ret && !(ret & ONENAND_CTRL_LOAD))
 				return ret;
 
 			if (check_short_pattern(&buf[j * scanlen], scanlen, mtd->writesize, bd)) {


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Eraseblocks torture: OneNAND results
@ 2006-12-22  7:58 Kyungmin Park
  2006-12-22  9:22 ` Artem Bityutskiy
  0 siblings, 1 reply; 16+ messages in thread
From: Kyungmin Park @ 2006-12-22  7:58 UTC (permalink / raw)
  To: linux-mtd

Hi Jarkko Lavinen

> I have tried your patch on an Omap 2420 base test board with OneNand
> and it seems to work.

Oh it's good news, In our environment. it's hard to produce the problem.
I erase it more than 1,000k (2 weeks). but it's still working.

> I had previously run an earlier version of torture test and had some
> worn out erase blocks available. When I try to erase them I see often
> controller error occuring and it is caught and returned correctly.

I commited the patch in OneNAND MTD git. and it will be merged into mtd.

> But I have also encounted cases where there is no erase error but yet
> erase verify fails. I then tried a retry aften the failed verify and 
> on the second read the erase block is blank, all FF, as it should.

As you know. If the block is worn-out. we don't sure its behavior.

Actually I don't have any idea. How can we handle this one?

Thank you,
Kyungmin Park

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-12-22  9:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-07 14:30 Eraseblocks torture: OneNAND results Artem Bityutskiy
  -- strict thread matches above, loose matches on Subject: below --
2006-12-08  2:00 Kyungmin Park
2006-12-08  6:19 ` Artem Bityutskiy
2006-12-08 13:43   ` Ricard Wanderlof
2006-12-08 13:52     ` Artem Bityutskiy
2006-12-08  7:42 Kyungmin Park
2006-12-08  8:08 ` Artem Bityutskiy
2006-12-08 13:30   ` Artem Bityutskiy
2006-12-11  8:31 Kyungmin Park
2006-12-13 13:46 ` Artem Bityutskiy
2006-12-15  5:02 Kyungmin Park
2006-12-15  7:54 ` Enrico Migliore
2006-12-15  8:44   ` Ricard Wanderlof
2006-12-21 15:30 ` Jarkko Lavinen
2006-12-22  7:58 Kyungmin Park
2006-12-22  9:22 ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox