public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* Bad Blocks On JFFS2/NAND
@ 2004-10-19 11:22 Simon Haynes
  2004-10-19 11:50 ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 11:22 UTC (permalink / raw)
  To: linux-mtd


I have experienced a problem in which a JFF2 filesystem on NAND became full. 
This is a root file system and constant writes to a logfile filled the 
filesystem. On investigation it was found that the NAND device now had 
hundreds of bad blocks. 

I started to investigate this and found that JFFS2 was announcing 

Newly-erased block contained word 0x1985e002 at offset 0x020f7e00

Messages which result in my mtd/jffs2 code marking the block bad. What I find 
strange is that a subsequent scan list the new block at a different 16k 
offset when the device erasesize is 16k, in this case 0x020f0000.
Is that because my device is 128Mb and JFFS2 is using this 'virtual erase 
size' of 32k ?

I have observed this now on several different NAND devices and it seems to be 
more prominent while performing small writes.

I am currently trying to work out if the erase is not completing, or this is 
the wrong block or something else. 

I was wondering if anyone has seen something similar

Cheers

Simon. 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 11:22 Bad Blocks On JFFS2/NAND Simon Haynes
@ 2004-10-19 11:50 ` Thomas Gleixner
  2004-10-19 14:39   ` Simon Haynes
  2004-10-19 21:45   ` Michael Moedt
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 11:50 UTC (permalink / raw)
  To: simon; +Cc: linux-mtd

On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> I have experienced a problem in which a JFF2 filesystem on NAND became full. 
> This is a root file system and constant writes to a logfile filled the 
> filesystem. On investigation it was found that the NAND device now had 
> hundreds of bad blocks. 
> 
> I started to investigate this and found that JFFS2 was announcing 
> 
> Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
> 
> Messages which result in my mtd/jffs2 code marking the block bad. What I find 
> strange is that a subsequent scan list the new block at a different 16k 
> offset when the device erasesize is 16k, in this case 0x020f0000.
> Is that because my device is 128Mb and JFFS2 is using this 'virtual erase 
> size' of 32k ?

Yes. The bad block code scans/marks physical blocks and JFFS2 operates
on virtual ones, if the device size is big enough.

> I have observed this now on several different NAND devices and it seems to be 
> more prominent while performing small writes.
>
> I am currently trying to work out if the erase is not completing, or this is 
> the wrong block or something else. 

Hmm, are you using Ready/Busy Pin or the timeout ?

tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 11:50 ` Thomas Gleixner
@ 2004-10-19 14:39   ` Simon Haynes
  2004-10-19 15:02     ` Thomas Gleixner
  2004-10-19 21:45   ` Michael Moedt
  1 sibling, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 14:39 UTC (permalink / raw)
  To: tglx; +Cc: linux-mtd

Thomas,

I am using the ready pin.

On Tuesday 19 Oct 2004 12:50 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> > I have experienced a problem in which a JFF2 filesystem on NAND became
> > full. This is a root file system and constant writes to a logfile filled
> > the filesystem. On investigation it was found that the NAND device now
> > had hundreds of bad blocks.
> >
> > I started to investigate this and found that JFFS2 was announcing
> >
> > Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
> >
> > Messages which result in my mtd/jffs2 code marking the block bad. What I
> > find strange is that a subsequent scan list the new block at a different
> > 16k offset when the device erasesize is 16k, in this case 0x020f0000. Is
> > that because my device is 128Mb and JFFS2 is using this 'virtual erase
> > size' of 32k ?
>
> Yes. The bad block code scans/marks physical blocks and JFFS2 operates
> on virtual ones, if the device size is big enough.
>
> > I have observed this now on several different NAND devices and it seems
> > to be more prominent while performing small writes.
> >
> > I am currently trying to work out if the erase is not completing, or this
> > is the wrong block or something else.
>
> Hmm, are you using Ready/Busy Pin or the timeout ?
>
> tglx
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 14:39   ` Simon Haynes
@ 2004-10-19 15:02     ` Thomas Gleixner
  2004-10-19 15:18       ` Simon Haynes
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 15:02 UTC (permalink / raw)
  To: simon; +Cc: linux-mtd

On Tue, 2004-10-19 at 16:39, Simon Haynes wrote:
> Thomas,
> 
> I am using the ready pin.

Hmm, the strange thing is, that it does not complain about a failed
erase. It complaines, when checking if the chip is really erased.

Can you try the hack below, which invalidates the page cache
unconditionally on erase ?

tglx

RCS file: /home/cvs/mtd/drivers/mtd/nand/nand_base.c,v
retrieving revision 1.121
diff -u -r1.121 nand_base.c
--- nand_base.c 6 Oct 2004 19:53:11 -0000       1.121
+++ nand_base.c 19 Oct 2004 15:09:35 -0000
@@ -2091,8 +2091,7 @@

                /* Invalidate the page cache, if we erase the block
which contains
                   the current cached page */
-               if (page <= this->pagebuf && this->pagebuf < (page +
pages_per_block))
-                       this->pagebuf = -1;
+               this->pagebuf = -1;

                this->erase_cmd (mtd, page & this->pagemask);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 15:02     ` Thomas Gleixner
@ 2004-10-19 15:18       ` Simon Haynes
  0 siblings, 0 replies; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 15:18 UTC (permalink / raw)
  To: tglx; +Cc: linux-mtd

I am using an old mtd layer which has nand.c and not nand_base.c.
I have hacked this accordingly and I will see if I can produce the same 
problem.

I did try to update my kernel but patchin.sh failed and I had to hack some 
header files to get a kernel. This kernel crashed in a couple of different 
places when writing to my SSFDC partition. I guess things have changed here 
again ?

Cheers

Simon.

On Tuesday 19 Oct 2004 4:02 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 16:39, Simon Haynes wrote:
> > Thomas,
> >
> > I am using the ready pin.
>
> Hmm, the strange thing is, that it does not complain about a failed
> erase. It complaines, when checking if the chip is really erased.
>
> Can you try the hack below, which invalidates the page cachen
> unconditionally on erase ?
>
> tglx
>
> RCS file: /home/cvs/mtd/drivers/mtd/nand/nand_base.c,v
> retrieving revision 1.121
> diff -u -r1.121 nand_base.c
> --- nand_base.c 6 Oct 2004 19:53:11 -0000       1.121
> +++ nand_base.c 19 Oct 2004 15:09:35 -0000
> @@ -2091,8 +2091,7 @@
>
>                 /* Invalidate the page cache, if we erase the block
> which contains
>                    the current cached page */
> -               if (page <= this->pagebuf && this->pagebuf < (page +
> pages_per_block))
> -                       this->pagebuf = -1;
> +               this->pagebuf = -1;
>
>                 this->erase_cmd (mtd, page & this->pagemask);
>
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 11:50 ` Thomas Gleixner
  2004-10-19 14:39   ` Simon Haynes
@ 2004-10-19 21:45   ` Michael Moedt
  2004-10-19 21:55     ` Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Moedt @ 2004-10-19 21:45 UTC (permalink / raw)
  To: tglx, simon; +Cc: linux-mtd


--- Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> > I have experienced a problem in which a JFF2 filesystem on NAND
> > became full. 
> > This is a root file system and constant writes to a logfile
> > filled the filesystem. On investigation it was found that the
> > NAND device now had hundreds of bad blocks.
> > 
> > I started to investigate this and found that JFFS2 was announcing
> > 
> > Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
> > 
> > Messages which result in my mtd/jffs2 code marking the block bad.
> > What I find strange is that a subsequent scan list the new block
> > at a different 16k offset when the device erasesize is 16k, in
> > this case 0x020f0000.
> > Is that because my device is 128Mb and JFFS2 is using this
> > 'virtual erase size' of 32k ?
> 
> Yes. The bad block code scans/marks physical blocks and JFFS2
> operates
> on virtual ones, if the device size is big enough.
> 
> > I have observed this now on several different NAND devices and it
> > seems to be more prominent while performing small writes.
> >
> > I am currently trying to work out if the erase is not completing,
> > or this is the wrong block or something else. 
> 
> Hmm, are you using Ready/Busy Pin or the timeout ?
> 
> tglx
> 

Hi guys.  This topic has given me a little bit of concern.  Could you
try to answer a few questions for me?

1. Do you know what usually causes the "Newly-erased block contained
word   "... error?
Is it caused by a interrupted (or otherwise failed) erase?  Would
power-fail cause this?

2. Would this cause good blocks to be incorrectly [and permanently]
marked as bad?


I think I may have seen something similar on my system.  I'm
considering writing a test to see if this is a problem for me, but
I'd like to learn more about this also.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 21:45   ` Michael Moedt
@ 2004-10-19 21:55     ` Thomas Gleixner
  2004-10-20  9:40       ` Simon Haynes
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 21:55 UTC (permalink / raw)
  To: Michael Moedt; +Cc: linux-mtd, simon

On Tue, 2004-10-19 at 23:45, Michael Moedt wrote:
> 1. Do you know what usually causes the "Newly-erased block contained
> word   "... error?
> Is it caused by a interrupted (or otherwise failed) erase?  Would
> power-fail cause this?

It's caused by a failed erase. If the chip interface is correct then it
is usually a problem of a bad page/block. If not (e.g. wrong timing) it
might return too early from an erase.

> 2. Would this cause good blocks to be incorrectly [and permanently]
> marked as bad?

Only if failes 5 times after mount. The counter is reset with each
mount.

> 
> I think I may have seen something similar on my system.  I'm
> considering writing a test to see if this is a problem for me, but
> I'd like to learn more about this also.

Which version of the mtd/nand code ?

There was a problem with the page cache, which was not invalidated on
erase some time ago, but its fixed.

tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-19 21:55     ` Thomas Gleixner
@ 2004-10-20  9:40       ` Simon Haynes
  2004-10-20  9:53         ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-20  9:40 UTC (permalink / raw)
  To: tglx, Michael Moedt; +Cc: linux-mtd


This 5 error business is only if you have relatively new code. Old code 
generates an incorrect and permanent bad block after 1 failure. I am testing 
the suggestion by Thomas that this is a page cache problem by always marking 
an erase page as invalid. No errors as yet but I will keep you posted.

Cheers

Simon.

On Tuesday 19 Oct 2004 10:55 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 23:45, Michael Moedt wrote:
> > 1. Do you know what usually causes the "Newly-erased block contained
> > word   "... error?
> > Is it caused by a interrupted (or otherwise failed) erase?  Would
> > power-fail cause this?
>
> It's caused by a failed erase. If the chip interface is correct then it
> is usually a problem of a bad page/block. If not (e.g. wrong timing) it
> might return too early from an erase.
>
> > 2. Would this cause good blocks to be incorrectly [and permanently]
> > marked as bad?
>
> Only if failes 5 times after mount. The counter is reset with each
> mount.
>
> > I think I may have seen something similar on my system.  I'm
> > considering writing a test to see if this is a problem for me, but
> > I'd like to learn more about this also.
>
> Which version of the mtd/nand code ?
>
> There was a problem with the page cache, which was not invalidated on
> erase some time ago, but its fixed.
>
> tglx
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad Blocks On JFFS2/NAND
  2004-10-20  9:40       ` Simon Haynes
@ 2004-10-20  9:53         ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-20  9:53 UTC (permalink / raw)
  To: simon; +Cc: linux-mtd

On Wed, 2004-10-20 at 11:40, Simon Haynes wrote:
> This 5 error business is only if you have relatively new code. Old code 
> generates an incorrect and permanent bad block after 1 failure. I am testing 
> the suggestion by Thomas that this is a page cache problem by always marking 
> an erase page as invalid. No errors as yet but I will keep you posted.

A cvs history search should reveal the changes to the nr errors thing
and to the page cache fixes. So you can verify against your particular
version.

tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-10-20 10:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-19 11:22 Bad Blocks On JFFS2/NAND Simon Haynes
2004-10-19 11:50 ` Thomas Gleixner
2004-10-19 14:39   ` Simon Haynes
2004-10-19 15:02     ` Thomas Gleixner
2004-10-19 15:18       ` Simon Haynes
2004-10-19 21:45   ` Michael Moedt
2004-10-19 21:55     ` Thomas Gleixner
2004-10-20  9:40       ` Simon Haynes
2004-10-20  9:53         ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox