* Bad Blocks On JFFS2/NAND
@ 2004-10-19 11:22 Simon Haynes
2004-10-19 11:50 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 11:22 UTC (permalink / raw)
To: linux-mtd
I have experienced a problem in which a JFF2 filesystem on NAND became full.
This is a root file system and constant writes to a logfile filled the
filesystem. On investigation it was found that the NAND device now had
hundreds of bad blocks.
I started to investigate this and found that JFFS2 was announcing
Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
Messages which result in my mtd/jffs2 code marking the block bad. What I find
strange is that a subsequent scan list the new block at a different 16k
offset when the device erasesize is 16k, in this case 0x020f0000.
Is that because my device is 128Mb and JFFS2 is using this 'virtual erase
size' of 32k ?
I have observed this now on several different NAND devices and it seems to be
more prominent while performing small writes.
I am currently trying to work out if the erase is not completing, or this is
the wrong block or something else.
I was wondering if anyone has seen something similar
Cheers
Simon.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 11:22 Bad Blocks On JFFS2/NAND Simon Haynes
@ 2004-10-19 11:50 ` Thomas Gleixner
2004-10-19 14:39 ` Simon Haynes
2004-10-19 21:45 ` Michael Moedt
0 siblings, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 11:50 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> I have experienced a problem in which a JFF2 filesystem on NAND became full.
> This is a root file system and constant writes to a logfile filled the
> filesystem. On investigation it was found that the NAND device now had
> hundreds of bad blocks.
>
> I started to investigate this and found that JFFS2 was announcing
>
> Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
>
> Messages which result in my mtd/jffs2 code marking the block bad. What I find
> strange is that a subsequent scan list the new block at a different 16k
> offset when the device erasesize is 16k, in this case 0x020f0000.
> Is that because my device is 128Mb and JFFS2 is using this 'virtual erase
> size' of 32k ?
Yes. The bad block code scans/marks physical blocks and JFFS2 operates
on virtual ones, if the device size is big enough.
> I have observed this now on several different NAND devices and it seems to be
> more prominent while performing small writes.
>
> I am currently trying to work out if the erase is not completing, or this is
> the wrong block or something else.
Hmm, are you using Ready/Busy Pin or the timeout ?
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 11:50 ` Thomas Gleixner
@ 2004-10-19 14:39 ` Simon Haynes
2004-10-19 15:02 ` Thomas Gleixner
2004-10-19 21:45 ` Michael Moedt
1 sibling, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 14:39 UTC (permalink / raw)
To: tglx; +Cc: linux-mtd
Thomas,
I am using the ready pin.
On Tuesday 19 Oct 2004 12:50 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> > I have experienced a problem in which a JFF2 filesystem on NAND became
> > full. This is a root file system and constant writes to a logfile filled
> > the filesystem. On investigation it was found that the NAND device now
> > had hundreds of bad blocks.
> >
> > I started to investigate this and found that JFFS2 was announcing
> >
> > Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
> >
> > Messages which result in my mtd/jffs2 code marking the block bad. What I
> > find strange is that a subsequent scan list the new block at a different
> > 16k offset when the device erasesize is 16k, in this case 0x020f0000. Is
> > that because my device is 128Mb and JFFS2 is using this 'virtual erase
> > size' of 32k ?
>
> Yes. The bad block code scans/marks physical blocks and JFFS2 operates
> on virtual ones, if the device size is big enough.
>
> > I have observed this now on several different NAND devices and it seems
> > to be more prominent while performing small writes.
> >
> > I am currently trying to work out if the erase is not completing, or this
> > is the wrong block or something else.
>
> Hmm, are you using Ready/Busy Pin or the timeout ?
>
> tglx
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 14:39 ` Simon Haynes
@ 2004-10-19 15:02 ` Thomas Gleixner
2004-10-19 15:18 ` Simon Haynes
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 15:02 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Tue, 2004-10-19 at 16:39, Simon Haynes wrote:
> Thomas,
>
> I am using the ready pin.
Hmm, the strange thing is, that it does not complain about a failed
erase. It complaines, when checking if the chip is really erased.
Can you try the hack below, which invalidates the page cache
unconditionally on erase ?
tglx
RCS file: /home/cvs/mtd/drivers/mtd/nand/nand_base.c,v
retrieving revision 1.121
diff -u -r1.121 nand_base.c
--- nand_base.c 6 Oct 2004 19:53:11 -0000 1.121
+++ nand_base.c 19 Oct 2004 15:09:35 -0000
@@ -2091,8 +2091,7 @@
/* Invalidate the page cache, if we erase the block
which contains
the current cached page */
- if (page <= this->pagebuf && this->pagebuf < (page +
pages_per_block))
- this->pagebuf = -1;
+ this->pagebuf = -1;
this->erase_cmd (mtd, page & this->pagemask);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 15:02 ` Thomas Gleixner
@ 2004-10-19 15:18 ` Simon Haynes
0 siblings, 0 replies; 9+ messages in thread
From: Simon Haynes @ 2004-10-19 15:18 UTC (permalink / raw)
To: tglx; +Cc: linux-mtd
I am using an old mtd layer which has nand.c and not nand_base.c.
I have hacked this accordingly and I will see if I can produce the same
problem.
I did try to update my kernel but patchin.sh failed and I had to hack some
header files to get a kernel. This kernel crashed in a couple of different
places when writing to my SSFDC partition. I guess things have changed here
again ?
Cheers
Simon.
On Tuesday 19 Oct 2004 4:02 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 16:39, Simon Haynes wrote:
> > Thomas,
> >
> > I am using the ready pin.
>
> Hmm, the strange thing is, that it does not complain about a failed
> erase. It complaines, when checking if the chip is really erased.
>
> Can you try the hack below, which invalidates the page cachen
> unconditionally on erase ?
>
> tglx
>
> RCS file: /home/cvs/mtd/drivers/mtd/nand/nand_base.c,v
> retrieving revision 1.121
> diff -u -r1.121 nand_base.c
> --- nand_base.c 6 Oct 2004 19:53:11 -0000 1.121
> +++ nand_base.c 19 Oct 2004 15:09:35 -0000
> @@ -2091,8 +2091,7 @@
>
> /* Invalidate the page cache, if we erase the block
> which contains
> the current cached page */
> - if (page <= this->pagebuf && this->pagebuf < (page +
> pages_per_block))
> - this->pagebuf = -1;
> + this->pagebuf = -1;
>
> this->erase_cmd (mtd, page & this->pagemask);
>
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 11:50 ` Thomas Gleixner
2004-10-19 14:39 ` Simon Haynes
@ 2004-10-19 21:45 ` Michael Moedt
2004-10-19 21:55 ` Thomas Gleixner
1 sibling, 1 reply; 9+ messages in thread
From: Michael Moedt @ 2004-10-19 21:45 UTC (permalink / raw)
To: tglx, simon; +Cc: linux-mtd
--- Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 2004-10-19 at 13:22, Simon Haynes wrote:
> > I have experienced a problem in which a JFF2 filesystem on NAND
> > became full.
> > This is a root file system and constant writes to a logfile
> > filled the filesystem. On investigation it was found that the
> > NAND device now had hundreds of bad blocks.
> >
> > I started to investigate this and found that JFFS2 was announcing
> >
> > Newly-erased block contained word 0x1985e002 at offset 0x020f7e00
> >
> > Messages which result in my mtd/jffs2 code marking the block bad.
> > What I find strange is that a subsequent scan list the new block
> > at a different 16k offset when the device erasesize is 16k, in
> > this case 0x020f0000.
> > Is that because my device is 128Mb and JFFS2 is using this
> > 'virtual erase size' of 32k ?
>
> Yes. The bad block code scans/marks physical blocks and JFFS2
> operates
> on virtual ones, if the device size is big enough.
>
> > I have observed this now on several different NAND devices and it
> > seems to be more prominent while performing small writes.
> >
> > I am currently trying to work out if the erase is not completing,
> > or this is the wrong block or something else.
>
> Hmm, are you using Ready/Busy Pin or the timeout ?
>
> tglx
>
Hi guys. This topic has given me a little bit of concern. Could you
try to answer a few questions for me?
1. Do you know what usually causes the "Newly-erased block contained
word "... error?
Is it caused by a interrupted (or otherwise failed) erase? Would
power-fail cause this?
2. Would this cause good blocks to be incorrectly [and permanently]
marked as bad?
I think I may have seen something similar on my system. I'm
considering writing a test to see if this is a problem for me, but
I'd like to learn more about this also.
Thanks,
Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 21:45 ` Michael Moedt
@ 2004-10-19 21:55 ` Thomas Gleixner
2004-10-20 9:40 ` Simon Haynes
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-19 21:55 UTC (permalink / raw)
To: Michael Moedt; +Cc: linux-mtd, simon
On Tue, 2004-10-19 at 23:45, Michael Moedt wrote:
> 1. Do you know what usually causes the "Newly-erased block contained
> word "... error?
> Is it caused by a interrupted (or otherwise failed) erase? Would
> power-fail cause this?
It's caused by a failed erase. If the chip interface is correct then it
is usually a problem of a bad page/block. If not (e.g. wrong timing) it
might return too early from an erase.
> 2. Would this cause good blocks to be incorrectly [and permanently]
> marked as bad?
Only if failes 5 times after mount. The counter is reset with each
mount.
>
> I think I may have seen something similar on my system. I'm
> considering writing a test to see if this is a problem for me, but
> I'd like to learn more about this also.
Which version of the mtd/nand code ?
There was a problem with the page cache, which was not invalidated on
erase some time ago, but its fixed.
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-19 21:55 ` Thomas Gleixner
@ 2004-10-20 9:40 ` Simon Haynes
2004-10-20 9:53 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Simon Haynes @ 2004-10-20 9:40 UTC (permalink / raw)
To: tglx, Michael Moedt; +Cc: linux-mtd
This 5 error business is only if you have relatively new code. Old code
generates an incorrect and permanent bad block after 1 failure. I am testing
the suggestion by Thomas that this is a page cache problem by always marking
an erase page as invalid. No errors as yet but I will keep you posted.
Cheers
Simon.
On Tuesday 19 Oct 2004 10:55 pm, Thomas Gleixner wrote:
> On Tue, 2004-10-19 at 23:45, Michael Moedt wrote:
> > 1. Do you know what usually causes the "Newly-erased block contained
> > word "... error?
> > Is it caused by a interrupted (or otherwise failed) erase? Would
> > power-fail cause this?
>
> It's caused by a failed erase. If the chip interface is correct then it
> is usually a problem of a bad page/block. If not (e.g. wrong timing) it
> might return too early from an erase.
>
> > 2. Would this cause good blocks to be incorrectly [and permanently]
> > marked as bad?
>
> Only if failes 5 times after mount. The counter is reset with each
> mount.
>
> > I think I may have seen something similar on my system. I'm
> > considering writing a test to see if this is a problem for me, but
> > I'd like to learn more about this also.
>
> Which version of the mtd/nand code ?
>
> There was a problem with the page cache, which was not invalidated on
> erase some time ago, but its fixed.
>
> tglx
>
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bad Blocks On JFFS2/NAND
2004-10-20 9:40 ` Simon Haynes
@ 2004-10-20 9:53 ` Thomas Gleixner
0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2004-10-20 9:53 UTC (permalink / raw)
To: simon; +Cc: linux-mtd
On Wed, 2004-10-20 at 11:40, Simon Haynes wrote:
> This 5 error business is only if you have relatively new code. Old code
> generates an incorrect and permanent bad block after 1 failure. I am testing
> the suggestion by Thomas that this is a page cache problem by always marking
> an erase page as invalid. No errors as yet but I will keep you posted.
A cvs history search should reveal the changes to the nr errors thing
and to the page cache fixes. So you can verify against your particular
version.
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-10-20 10:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-19 11:22 Bad Blocks On JFFS2/NAND Simon Haynes
2004-10-19 11:50 ` Thomas Gleixner
2004-10-19 14:39 ` Simon Haynes
2004-10-19 15:02 ` Thomas Gleixner
2004-10-19 15:18 ` Simon Haynes
2004-10-19 21:45 ` Michael Moedt
2004-10-19 21:55 ` Thomas Gleixner
2004-10-20 9:40 ` Simon Haynes
2004-10-20 9:53 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox