public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* OneNAND: Rate of write errors
@ 2007-02-22  0:21 Julianne C.
  2007-02-22  9:28 ` Adrian Hunter
  0 siblings, 1 reply; 5+ messages in thread
From: Julianne C. @ 2007-02-22  0:21 UTC (permalink / raw)
  To: linux-mtd

We are still struggling to understand and manage the OneNAND part on a
LogicPD PXA270 board.  We are using the mtd development snapshot build
of 2-15-07 for the fs and device layers.  Our requirements lead us to
use JFFS2 as the file system.

What we are seeing is that when we write to a file system that is
freshly erased and mounted using the command:
>mount -t jffs2 /dev/mtdblockx /mnt
and then performing some operation like tar or rsync to place files in
the new fs, we see about 5 to 8 "write errors" of the form per MB:

onenand_write: verify failed -74
Write of 2663 bytes at 0x007a6e14 failed. returned -74, retlen 0
Not marking the space at 0x007a6e14 as dirty because the flash driver
returned retlen zero

In further testing, we have replaced the memcmp function in
onenand_verify with a procedure that manually goes through the list,
and issues a printk statement for each bad byte it detects.  Here is a
sample of the bad bytes we see:

Cmp failed [1596]  eb  00
Cmp failed [1594]  e6  9f
Cmp failed [1954]  7b  4d
Cmp failed [1654]  ae  00
Cmp failed [1972]  82  00
Cmp failed [462]  d3  00
Cmp failed [972]  a7  26
Cmp failed [1242]  d8  8d
Cmp failed [54]  6e  a0
Cmp failed [824]  3a  56
Cmp failed [1360]  78  67
Cmp failed [1584]  82  00
Cmp failed [1376]  00  5a
Cmp failed [64]  3f  00
Cmp failed [444]  90  e5
Cmp failed [310]  94  2d
Cmp failed [1764]  7a  04
Cmp failed [1030]  f8  14
Cmp failed [68]  1e  72
Cmp failed [1910]  de  01
Cmp failed [780]  37  00
Cmp failed [1536]  76  00
Cmp failed [1064]  2c  00
Cmp failed [644]  58  00
Cmp failed [1428]  25  00
Cmp failed [440]  89  00
Cmp failed [1852]  6d  00

where the first byte is the expected buffer value, while the second is
what is actually seen, and the value in the brackets is the index in
the 2048 byte array being tested.

These values were accumulated over about 4 MB of writes to the fs.

Is this common to see this many errors in that amount of page writes?
If not, are there adjustments that can be made to the device setup to
help reduce these errors?

^ permalink raw reply	[flat|nested] 5+ messages in thread
* OneNAND: Rate of write errors
@ 2007-02-22 16:35 Julianne C.
  2007-02-23  8:04 ` Adrian Hunter
  2007-02-26  0:41 ` Kyungmin Park
  0 siblings, 2 replies; 5+ messages in thread
From: Julianne C. @ 2007-02-22 16:35 UTC (permalink / raw)
  To: linux-mtd

Further thought about the numerous write errors to the OneNAND part
got me thinking about the symptoms, i.e., when we see the -EBADMSG
error return, there is no corresponding fault reported in the ECC
status register.  Consequently, we concluded that the bufferram may be
getting corrupted before the data is ever committed to the NAND array.

Hence, we rewrote the code for the setup as follows in the
onenand_write procedure:

        do
        {
            this->write_bufferram (mtd,
                                   ONENAND_DATARAM,
                                   wbuf,
                                   0,
                                   mtd->writesize);

            ret = onenand_do_check_bufferram (mtd,
                                              ONENAND_DATARAM,
                                              wbuf,
                                              0,
                                              mtd->writesize);

            if (ret != 0) // then
            {
                retrys = retrys + 1;

                printk (KERN_WARNING
                        "onenandwrite: bad buffer ram, retrying (%d)\n",
                        retrys);
            } // end if
        } while (ret != 0 &&
                 retrys < max_retrys);

        if (retrys >= max_retrys) // then
        {
            ret = -EBADMSG;

            break;
        } // end if

With max_retrys set to three (we have seen double attempts) to make
this work all the time.  There are no more errors reported back to the
JFFS2 system, and the file system cleanly mounts and unmounts.

This does verify the suspicion that the buffer was corrupted before it
was committed.  Does anyone have any idea how or why the data in the
bufferram might be corrupted?

Julianne C.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-02-26  0:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-22  0:21 OneNAND: Rate of write errors Julianne C.
2007-02-22  9:28 ` Adrian Hunter
  -- strict thread matches above, loose matches on Subject: below --
2007-02-22 16:35 Julianne C.
2007-02-23  8:04 ` Adrian Hunter
2007-02-26  0:41 ` Kyungmin Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox