public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* JFFS2 & NAND failure
@ 2004-11-17 17:15 Estelle HAMMACHE
  2004-11-18 16:27 ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-11-17 17:15 UTC (permalink / raw)
  To: David Woodhouse, linux-mtd

Hello,

here are a few corrections (I think) regarding the management
of write or erase error with NAND flash.
 - bad_count was not initialised
 - during wbuf flushing, if the previously written node filled
   wbuf exactly, "buf" may be used instead of "wbuf"
   in jffs2_wbuf_recover (access to null pointer)
 - there is no refiling of nextblock if a write error occurs
   during writev_ecc (direct write, not using wbuf), 
   so the next write may occur on the failed block
 - if a write error occurs, but part of the data was written,
   an obsolete raw node ref is added but nextblock has changed.

Additionally I have a question about the retry cases in
jffs2_write_dnode and jffs2_write_dirent.
If the write error occurs during an API call (not GC),
jffs2_reserve_space is called again to find space to rewrite
the node. However since we refiled the bad block, the free space
was reduced. Is it not possible then that jffs2_reserve_space
will need to gc to find some space ?
In the case of a deletion dirent, the previous dirent may be
GCed so its version number is increased. When the deletion
dirent is finally written in the retry, its version number
is older than the GCed node, so it is (quietly) obsoleted.
In the case of a dnode, I think there may be a deadlock
if a dnode belonging to the same file is GCed, since the
GC will attempt to lock f->sem.
I actually saw the dirent case in my tests but since I wrote
some nodes manually I am not sure this is a valid problem in
ordinary JFFS2 processing ?
Could the solution be to call jffs2_reserve_space_gc instead
of jffs2_reserve_space ?

BR
Estelle





diff -auNr jffs2/wbuf.c myjffs2/wbuf.c
--- jffs2/wbuf.c	2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/wbuf.c	2004-11-17 17:36:14.556445000 +0100
@@ -263,7 +263,7 @@
 			kfree(buf);
 		return;
 	}
-	if (end-start >= c->wbuf_pagesize) {
+	if (end-start > c->wbuf_pagesize) {
 		/* Need to do another write immediately. This, btw,
 		 means that we'll be writing from 'buf' and not from
 		 the wbuf. Since if we're writing from the wbuf there
@@ -744,9 +744,42 @@
 		
 		if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) {
 			/* At this point we have no problem,
-			   c->wbuf is empty. 
+			   c->wbuf is empty. However refile nextblock to avoid
+                           writing again to same address.
 			*/
 			*retlen = donelen;
+			struct jffs2_eraseblock *jeb;
+			spin_lock(&c->erase_completion_lock);
+                  
+			jeb = &c->blocks[outvec_to / c->sector_size];
+			D1(printk("About to refile bad block at %08x\n", jeb->offset));
+
+			D2(jffs2_dump_block_lists(c));
+			/* File the existing block on the bad_used_list.... */
+			if (c->nextblock == jeb)
+				c->nextblock = NULL;
+			else /* Not sure this should ever happen... need more coffee */
+				list_del(&jeb->list);
+			if (jeb->first_node) {
+				D1(printk("Refiling block at %08x to bad_used_list\n", jeb->offset));
+				list_add(&jeb->list, &c->bad_used_list);
+			} else {
+				D1(printk("Refiling block at %08x to erase_pending_list\n", jeb->offset));
+				list_add(&jeb->list, &c->erase_pending_list);
+				c->nr_erasing_blocks++;
+				jffs2_erase_pending_trigger(c);
+			}
+			D2(jffs2_dump_block_lists(c));
+
+			/* Adjust its size counts accordingly */
+			c->wasted_size += jeb->free_size;
+			c->free_size -= jeb->free_size;
+			jeb->wasted_size += jeb->free_size;
+			jeb->free_size = 0;
+
+			ACCT_SANITY_CHECK(c,jeb);
+			D1(ACCT_PARANOIA_CHECK(jeb));
+			spin_unlock(&c->erase_completion_lock);
 			return ret;
 		}
 		

diff -auNr jffs2/nodemgmt.c myjffs2/nodemgmt.c
--- jffs2/nodemgmt.c	2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/nodemgmt.c	2004-11-17 17:40:40.130248000 +0100
@@ -308,7 +308,7 @@
 
 	D1(printk(KERN_DEBUG "jffs2_add_physical_node_ref(): Node at 0x%x(%d), size 0x%x\n", ref_offset(new), ref_flags(new),
len));
 #if 1
-	if (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size)) {
+	if ((!ref_obsolete(new)) && (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size -
jeb->free_size))) {
 		printk(KERN_WARNING "argh. node added in wrong place\n");
 		jffs2_free_raw_node_ref(new);
 		return -EINVAL;



diff -auNr jffs2/build.c myjffs2/build.c
--- jffs2/build.c	2004-11-17 00:00:13.000000000 +0100
+++ myjffs2/build.c	2004-11-17 17:29:20.384186000 +0100
@@ -310,6 +310,7 @@
 		c->blocks[i].used_size = 0;
 		c->blocks[i].first_node = NULL;
 		c->blocks[i].last_node = NULL;
+		c->blocks[i].bad_count = 0;
 	}
 
 	init_MUTEX(&c->alloc_sem);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-17 17:15 JFFS2 & NAND failure Estelle HAMMACHE
@ 2004-11-18 16:27 ` David Woodhouse
  2004-11-18 17:54   ` Estelle HAMMACHE
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-11-18 16:27 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Wed, 2004-11-17 at 18:15 +0100, Estelle HAMMACHE wrote:
> Hello,
> 
> here are a few corrections (I think) regarding the management
> of write or erase error with NAND flash.
>  - bad_count was not initialised

Ok.

>  - during wbuf flushing, if the previously written node filled
>    wbuf exactly, "buf" may be used instead of "wbuf"
>    in jffs2_wbuf_recover (access to null pointer)


Hmmm. But now with your change we can end up with a completely full wbuf
which we don't actually flush; we leave it in memory. We should write
that immediately, surely?

>  - there is no refiling of nextblock if a write error occurs
>    during writev_ecc (direct write, not using wbuf), 
>    so the next write may occur on the failed block

OK. Priorities on printk though please.

>  - if a write error occurs, but part of the data was written,
>    an obsolete raw node ref is added but nextblock has changed.

Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref
in the _new_ nextblock, surely?

> Additionally I have a question about the retry cases in
> jffs2_write_dnode and jffs2_write_dirent.
> If the write error occurs during an API call (not GC),
> jffs2_reserve_space is called again to find space to rewrite
> the node. However since we refiled the bad block, the free space
> was reduced. Is it not possible then that jffs2_reserve_space
> will need to gc to find some space ?

It is possible, yes.

> In the case of a deletion dirent, the previous dirent may be
> GCed so its version number is increased. When the deletion
> dirent is finally written in the retry, its version number
> is older than the GCed node, so it is (quietly) obsoleted.

Hmm, true. We should check f->highest_version after we reallocate space,
and update ri->version or rd->version accordingly.

> In the case of a dnode, I think there may be a deadlock
> if a dnode belonging to the same file is GCed, since the
> GC will attempt to lock f->sem.

No, because we unlock f->sem before calling jffs2_reserve_space.

> I actually saw the dirent case in my tests but since I wrote
> some nodes manually I am not sure this is a valid problem in
> ordinary JFFS2 processing ?

I think you're right -- the version number is indeed a real problem.

> Could the solution be to call jffs2_reserve_space_gc instead
> of jffs2_reserve_space ?

No, because if the write fails and then we're short of space, we want to
return -ENOSPC. We don't want to keep eating into the space qwhich is
reserved for GC.

Thanks for the report and the patches. Would you like to send me a SSH
public key so that I can give you an account and you can commit
directly?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-18 16:27 ` David Woodhouse
@ 2004-11-18 17:54   ` Estelle HAMMACHE
  2004-11-19 13:17     ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-11-18 17:54 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> >  - during wbuf flushing, if the previously written node filled
> >    wbuf exactly, "buf" may be used instead of "wbuf"
> >    in jffs2_wbuf_recover (access to null pointer)
> 
> Hmmm. But now with your change we can end up with a completely full wbuf
> which we don't actually flush; we leave it in memory. We should write
> that immediately, surely?

Not necessarily I think. If we write it immediately and the write fails,
we have a fatal error. But if we leave it be, the next call to 
jffs2_flash_writev will flush the buffer and we get 1 more try. 
Unless there is something I don't understand, there is no harm in 
leaving the wbuf full.

> >  - there is no refiling of nextblock if a write error occurs
> >    during writev_ecc (direct write, not using wbuf),
> >    so the next write may occur on the failed block
> 
> OK. Priorities on printk though please.

OK. (was actually a copy/paste so I'll check the wbuf recover too
and maybe make a function for the refiling).

> >  - if a write error occurs, but part of the data was written,
> >    an obsolete raw node ref is added but nextblock has changed.
> 
> Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref
> in the _new_ nextblock, surely?

No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and
jffs2_write_dnode. I think this happens only if the write error occurs
on mtd->writev_ecc and part of the data was successfully written by
jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says 
some data was written. In this case, when jffs2_write_dirent/dnode
adds this obsolete raw node ref for the dirty space, nextblock was 
modified during the refiling and / or recovery.


> > Additionally I have a question about the retry cases in
> > jffs2_write_dnode and jffs2_write_dirent.
> > If the write error occurs during an API call (not GC),
> > jffs2_reserve_space is called again to find space to rewrite
> > the node. However since we refiled the bad block, the free space
> > was reduced. Is it not possible then that jffs2_reserve_space
> > will need to gc to find some space ?
> 
> It is possible, yes.
> 
> > In the case of a deletion dirent, the previous dirent may be
> > GCed so its version number is increased. When the deletion
> > dirent is finally written in the retry, its version number
> > is older than the GCed node, so it is (quietly) obsoleted.
> 
> Hmm, true. We should check f->highest_version after we reallocate space,
> and update ri->version or rd->version accordingly.

This was my first idea too but I found it ugly to tamper with
structures which are clearly the responsibility of the caller.
However this is a recovery case so... maybe it is necessary.

> > In the case of a dnode, I think there may be a deadlock
> > if a dnode belonging to the same file is GCed, since the
> > GC will attempt to lock f->sem.
> 
> No, because we unlock f->sem before calling jffs2_reserve_space.

Right, I got confused.

> > I actually saw the dirent case in my tests but since I wrote
> > some nodes manually I am not sure this is a valid problem in
> > ordinary JFFS2 processing ?
> 
> I think you're right -- the version number is indeed a real problem.
> 
> > Could the solution be to call jffs2_reserve_space_gc instead
> > of jffs2_reserve_space ?
> 
> No, because if the write fails and then we're short of space, we want to
> return -ENOSPC. We don't want to keep eating into the space qwhich is
> reserved for GC.

If we use jffs2_reserve_space_gc both in jffs2_wbuf_recover and the
dnode/dirent retry cases I believe we will write at most 2*4KB = 8KB 
this way ? Then further API calls will do the GC. Is this unacceptable ?

> Thanks for the report and the patches. Would you like to send me a SSH
> public key so that I can give you an account and you can commit
> directly?

We have a firewall here so I don't think CVS will work. I will ask.
Like many people I rely on snapshots. BTW if JFFS2 and JFFS3 are 
maintained in parallel do we get 2 sets of snapshots ? or is there
another way to get both versions ?
Anyway I'll send a corrected patch to the list when decision is made
about the node version problem - I guess the version check/update ?

Bye
Estelle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-18 17:54   ` Estelle HAMMACHE
@ 2004-11-19 13:17     ` David Woodhouse
  2004-11-19 16:22       ` Estelle HAMMACHE
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-11-19 13:17 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Thu, 2004-11-18 at 18:54 +0100, Estelle HAMMACHE wrote:
> David Woodhouse wrote:
> > >  - during wbuf flushing, if the previously written node filled
> > >    wbuf exactly, "buf" may be used instead of "wbuf"
> > >    in jffs2_wbuf_recover (access to null pointer)
> > 
> > Hmmm. But now with your change we can end up with a completely full wbuf
> > which we don't actually flush; we leave it in memory. We should write
> > that immediately, surely?
> 
> Not necessarily I think. If we write it immediately and the write fails,
> we have a fatal error. But if we leave it be, the next call to 
> jffs2_flash_writev will flush the buffer and we get 1 more try. 
> Unless there is something I don't understand, there is no harm in 
> leaving the wbuf full.

Generally we should flush the wbuf as soon as we can. And if there's
going to be an error when we retry, surely we want that to happen
_immediately_ so that the _current_ write() call returns an error,
rather than leaving it in the wbuf and then losing it later?
 
> > OK. Priorities on printk though please.
> 
> OK. (was actually a copy/paste so I'll check the wbuf recover too
> and maybe make a function for the refiling).

Hm. Those are all my fault then -- but do as I say, not as I do :)

> > >  - if a write error occurs, but part of the data was written,
> > >    an obsolete raw node ref is added but nextblock has changed.
> > 
> > Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref
> > in the _new_ nextblock, surely?
> 
> No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and
> jffs2_write_dnode. I think this happens only if the write error occurs
> on mtd->writev_ecc and part of the data was successfully written by
> jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says 
> some data was written. In this case, when jffs2_write_dirent/dnode
> adds this obsolete raw node ref for the dirty space, nextblock was 
> modified during the refiling and / or recovery.

OK... you mean the case where there was already a node in the wbuf and
the wbuf flush failed, so we rewrote that node to a new block, along
with the start of the new node? Then the write of the _rest_ of the new
node failed, and we return with 'retlen' set to the amount of the new
node that fitted into the wbuf and we actually written?

I think that ought to be OK though because we only refiled enough
raw_node_refs to cover the _previous_ nodes?

> > Hmm, true. We should check f->highest_version after we reallocate space,
> > and update ri->version or rd->version accordingly.
> 
> This was my first idea too but I found it ugly to tamper with
> structures which are clearly the responsibility of the caller.
> However this is a recovery case so... maybe it is necessary.

Well we can always turn it around and make them always the
responsibility of the callee instead -- set them _always_ in
jffs2_write_{dirent,dnode}?

> If we use jffs2_reserve_space_gc both in jffs2_wbuf_recover and the
> dnode/dirent retry cases I believe we will write at most 2*4KB = 8KB 
> this way ? Then further API calls will do the GC. Is this unacceptable ?

I also want to increase the maximum size of data node... I really don't
like letting anything else eat into our reserved space unless we really
need to. Fixing up the version number isn't so hard.

> We have a firewall here so I don't think CVS will work. I will ask.

A firewall surely shouldn't prevent _outgoing_ connections to port 22?
A web proxy which supports CONNECT requests may also allow you to make
connections to port 22 -- ssh can use that if it's available.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-19 13:17     ` David Woodhouse
@ 2004-11-19 16:22       ` Estelle HAMMACHE
  2004-11-20 18:57         ` David Woodhouse
  2004-11-20 19:19         ` David Woodhouse
  0 siblings, 2 replies; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-11-19 16:22 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

David Woodhouse wrote:
> 
> On Thu, 2004-11-18 at 18:54 +0100, Estelle HAMMACHE wrote:
> > David Woodhouse wrote:
> > > >  - during wbuf flushing, if the previously written node filled
> > > >    wbuf exactly, "buf" may be used instead of "wbuf"
> > > >    in jffs2_wbuf_recover (access to null pointer)
> > >
> > > Hmmm. But now with your change we can end up with a completely full wbuf
> > > which we don't actually flush; we leave it in memory. We should write
> > > that immediately, surely?
> >
> > Not necessarily I think. If we write it immediately and the write fails,
> > we have a fatal error. But if we leave it be, the next call to
> > jffs2_flash_writev will flush the buffer and we get 1 more try.
> > Unless there is something I don't understand, there is no harm in
> > leaving the wbuf full.
> 
> Generally we should flush the wbuf as soon as we can. And if there's
> going to be an error when we retry, surely we want that to happen
> _immediately_ so that the _current_ write() call returns an error,
> rather than leaving it in the wbuf and then losing it later?

Whether jffs2_wbuf_recover succeeds or not, __jffs2_flush_wbuf always
returns an error so that jffs2_flash_writev returns immediately, and
the caller knows it needs to reallocate space to try again. Then, 
since wbuf is full, it will be flushed before the new (retry) node 
can be written.
The node which was in wbuf was outstanding anyway. It is not part
of the data from the current call to jffs2_flash_writev.
What I meant is that if we write wbuf in jffs2_write_recover, like we
write buf, 1 write error means we never write the node, and we lose data.
Whereas if we leave the full wbuf, the node still has 2 regular write
attempts (flush and recover) before some data is lost. I can see
no difference between this "full wbuf" case and the usual 
jffs2_wbuf_recover where wbuf is left partly filled ?

I feel that writing wbuf immediately adds code without necessity
(or maybe I'm just too lazy to test it).
On the other hand, maybe jffs2_wbuf_recover should be tweaked to
resist to failure to write buf ? in this case writing a filled-up 
wbuf immediately too would be logical. How many blocks can we
afford to refile to the bad block lists in a single call to 
jffs2_flash_writev ?


> > > >  - if a write error occurs, but part of the data was written,
> > > >    an obsolete raw node ref is added but nextblock has changed.
> > >
> > > Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref
> > > in the _new_ nextblock, surely?
> >
> > No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and
> > jffs2_write_dnode. I think this happens only if the write error occurs
> > on mtd->writev_ecc and part of the data was successfully written by
> > jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says
> > some data was written. In this case, when jffs2_write_dirent/dnode
> > adds this obsolete raw node ref for the dirty space, nextblock was
> > modified during the refiling and / or recovery.
> 
> OK... you mean the case where there was already a node in the wbuf and
> the wbuf flush failed, so we rewrote that node to a new block, along
> with the start of the new node? Then the write of the _rest_ of the new
> node failed, and we return with 'retlen' set to the amount of the new
> node that fitted into the wbuf and we actually written?

I should have mentionned that the problem in 
jffs2_add_physical_node_ref cannot happen with the original code - 
it only happens with the modification of jffs2_flash_writev 
to refile nextblock on mtd->writev_ecc failure.
With the original code, nextblock it not refiled so there is no
problem in jffs2_add_physical_node_ref, but the retry occurs on the
same page which already failed in mtd->writev_ecc.

Here is an example of the obsolete node/refiled nextblock pb:
- page size 512 bytes
- start condition: wbuf contains 500 bytes
jffs2_write_dnode call, total node size 700 bytes
-->jffs2_flash_writev
------>fill up wbuf
------>donelen = 12
------>jffs2_flush_wbuf (succeeds)
------>c->mtd->writev_ecc to write 1 page - fails
------>nextblock is refiled (new code!)
------>*retlen = donelen   (= 12)
-->add obsolete frag for dirtied space 
------> free raw node & return early because c->nextblock == 0
-->probable segfault on (redundant?) jffs2_mark_node_obsolete
-->retry.

Maybe I ought to test nextblock == 0 in addition to the
obsolete flag to make the exclusion case more precise in
jffs2_add_physical_node_ref ?
Or is the "dirty" node really necessary ? what happens if we
don't list it in the raw node refs ?


> > > Hmm, true. We should check f->highest_version after we reallocate space,
> > > and update ri->version or rd->version accordingly.
> >
> > This was my first idea too but I found it ugly to tamper with
> > structures which are clearly the responsibility of the caller.
> > However this is a recovery case so... maybe it is necessary.
> 
> Well we can always turn it around and make them always the
> responsibility of the callee instead -- set them _always_ in
> jffs2_write_{dirent,dnode}?

Ok, let's do that.

bye
Estelle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-19 16:22       ` Estelle HAMMACHE
@ 2004-11-20 18:57         ` David Woodhouse
  2004-11-20 19:19         ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2004-11-20 18:57 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Fri, 2004-11-19 at 17:22 +0100, Estelle HAMMACHE wrote:
> I feel that writing wbuf immediately adds code without necessity
> (or maybe I'm just too lazy to test it).

The code is there already; it's just a case of making it use wbuf
instead of buf. That shouldn't be so hard.

> Maybe I ought to test nextblock == 0 in addition to the
> obsolete flag to make the exclusion case more precise in
> jffs2_add_physical_node_ref ?

Ah, OK. Yes, that might be appropriate.

> Or is the "dirty" node really necessary ? what happens if we
> don't list it in the raw node refs ?

We'll try writing to that same offset again, and will fail.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-19 16:22       ` Estelle HAMMACHE
  2004-11-20 18:57         ` David Woodhouse
@ 2004-11-20 19:19         ` David Woodhouse
  2004-11-20 22:13           ` David Woodhouse
  1 sibling, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-11-20 19:19 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Fri, 2004-11-19 at 17:22 +0100, Estelle HAMMACHE wrote:
> 
> Here is an example of the obsolete node/refiled nextblock pb:
> - page size 512 bytes
> - start condition: wbuf contains 500 bytes
> jffs2_write_dnode call, total node size 700 bytes
> -->jffs2_flash_writev
> ------>fill up wbuf
> ------>donelen = 12
> ------>jffs2_flush_wbuf (succeeds)
> ------>c->mtd->writev_ecc to write 1 page - fails
> ------>nextblock is refiled (new code!)
> ------>*retlen = donelen   (= 12)
> -->add obsolete frag for dirtied space 
> ------> free raw node & return early because c->nextblock == 0
> -->probable segfault on (redundant?) jffs2_mark_node_obsolete
> -->retry.

This is the problem, surely? If we've refiled the nextblock, we should
return retlen == 0.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-20 19:19         ` David Woodhouse
@ 2004-11-20 22:13           ` David Woodhouse
  2004-12-09 14:57             ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-11-20 22:13 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote:
> This is the problem, surely? If we've refiled the nextblock, we should
> return retlen == 0.

Can you try this patch against current CVS? I've already split the
refiling off into a separate function we can call, and I've also fixed
it to write from the wbuf if it's full.

Index: wbuf.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/wbuf.c,v
retrieving revision 1.81
diff -u -r1.81 wbuf.c
--- wbuf.c	20 Nov 2004 10:44:07 -0000	1.81
+++ wbuf.c	20 Nov 2004 22:06:25 -0000
@@ -264,12 +268,12 @@
 		return;
 	}
 	if (end-start >= c->wbuf_pagesize) {
-		/* Need to do another write immediately. This, btw,
-		 means that we'll be writing from 'buf' and not from
-		 the wbuf. Since if we're writing from the wbuf there
-		 won't be more than a wbuf full of data, now will
-		 there? :) */
-
+		/* Need to do another write immediately, but it's possible
+		   that this is just because the wbuf itself is completely
+		   full, and there's nothing earlier read back from the 
+		   flash. Hence 'buf' isn't necessarily what we're writing 
+		   from. */
+		unsigned char *rewrite_buf = buf?:c->wbuf;
 		uint32_t towrite = (end-start) - ((end-start)%c->wbuf_pagesize);
 
 		D1(printk(KERN_DEBUG "Write 0x%x bytes at 0x%08x in wbuf recover\n",
@@ -287,14 +291,15 @@
 #endif
 		if (jffs2_cleanmarker_oob(c))
 			ret = c->mtd->write_ecc(c->mtd, ofs, towrite, &retlen,
-						buf, NULL, c->oobinfo);
+						rewrite_buf, NULL, c->oobinfo);
 		else
-			ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, buf);
+			ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, rewrite_buf);
 
 		if (ret || retlen != towrite) {
 			/* Argh. We tried. Really we did. */
 			printk(KERN_CRIT "Recovery of wbuf failed due to a second write error\n");
-			kfree(buf);
+			if (buf)
+				kfree(buf);
 
 			if (retlen) {
 				struct jffs2_raw_node_ref *raw2;
@@ -316,10 +321,10 @@
 
 		c->wbuf_len = (end - start) - towrite;
 		c->wbuf_ofs = ofs + towrite;
-		memcpy(c->wbuf, buf + towrite, c->wbuf_len);
+		memmove(c->wbuf, buf + towrite, c->wbuf_len);
 		/* Don't muck about with c->wbuf_inodes. False positives are harmless. */
-
-		kfree(buf);
+		if (buf)
+			kfree(buf);
 	} else {
 		/* OK, now we're left with the dregs in whichever buffer we're using */
 		if (buf) {
@@ -757,9 +762,19 @@
 		
 		if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) {
 			/* At this point we have no problem,
-			   c->wbuf is empty. 
+			   c->wbuf is empty. However refile nextblock to avoid
+                           writing again to same address.
 			*/
+			struct jffs2_eraseblock *jeb;
+
 			*retlen = donelen;
+			spin_lock(&c->erase_completion_lock);
+                  
+			jeb = &c->blocks[outvec_to / c->sector_size];
+			jffs2_refile_block(c, jeb);
+
+			*retlen = 0;
+			spin_unlock(&c->erase_completion_lock);
 			goto exit;
 		}
 		

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-11-20 22:13           ` David Woodhouse
@ 2004-12-09 14:57             ` David Woodhouse
  2004-12-09 17:06               ` Estelle HAMMACHE
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2004-12-09 14:57 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Sat, 2004-11-20 at 22:13 +0000, David Woodhouse wrote:
> On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote:
> > This is the problem, surely? If we've refiled the nextblock, we should
> > return retlen == 0.
> 
> Can you try this patch against current CVS? I've already split the
> refiling off into a separate function we can call, and I've also fixed
> it to write from the wbuf if it's full.

Any progress? If it's working and you agree it's sufficient then we
should commit it.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-12-09 14:57             ` David Woodhouse
@ 2004-12-09 17:06               ` Estelle HAMMACHE
  2004-12-15 12:33                 ` Estelle HAMMACHE
  0 siblings, 1 reply; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-12-09 17:06 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd

Sorry I've been very busy lately and couldn't find time to report
back regarding this issue. There were only minor problems with
your patch.
Here is the patch that should settle the NAND failure problems
I've stumbled upon lately.
- block refiling in jffs2_flash_writev
- retry on flush wbuf (probably useful when umounting)
- retry in gc (might make some compilers complain ?)
- version thing in the write.c retry cases
  (I didn't make jffs2_write_dnode and jffs2_write_dirent update
   the version in all cases instead of the caller, because of the
   hole GC case where the version should remain the same as it
   was...)

- the modifications in jffs2_add_physical_node_ref are probably
  not necessary anymore ? not sure. The wasted_size thing is an
  old patch to avoid getting blocks with large amounts of wasted
  space in the clean_list.

(still no cvs access)
bye
Estelle

David Woodhouse wrote:
> 
> On Sat, 2004-11-20 at 22:13 +0000, David Woodhouse wrote:
> > On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote:
> > > This is the problem, surely? If we've refiled the nextblock, we should
> > > return retlen == 0.
> >
> > Can you try this patch against current CVS? I've already split the
> > refiling off into a separate function we can call, and I've also fixed
> > it to write from the wbuf if it's full.
> 
> Any progress? If it's working and you agree it's sufficient then we
> should commit it.
> 
> --
> dwmw2


diff -auNr jffs2/gc.c myjffs2/gc.c
--- jffs2/gc.c	2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/gc.c	2004-12-09 16:59:23.880059000 +0100
@@ -602,7 +602,7 @@
 			printk(KERN_NOTICE "Not marking the space at 0x%08x as dirty because the flash driver returned retlen zero\n", nraw->flash_offset);
                         jffs2_free_raw_node_ref(nraw);
 		}
-		if (!retried && (nraw == jffs2_alloc_raw_node_ref())) {
+		if (!retried && (nraw = jffs2_alloc_raw_node_ref())) {
 			/* Try to reallocate space and retry */
 			uint32_t dummy;
 			struct jffs2_eraseblock *jeb = &c->blocks[phys_ofs / c->sector_size];
@@ -628,7 +628,6 @@
 			jffs2_free_raw_node_ref(nraw);
 		}
 
-		jffs2_free_raw_node_ref(nraw);
 		if (!ret)
 			ret = -EIO;
 		goto out_node;
  
  
  
  
  
diff -auNr jffs2/nodemgmt.c myjffs2/nodemgmt.c
--- jffs2/nodemgmt.c	2004-11-26 15:08:31.000000000 +0100
+++ myjffs2/nodemgmt.c	2004-12-09 17:32:23.921567000 +0100
@@ -308,7 +308,10 @@
 
 	D1(printk(KERN_DEBUG "jffs2_add_physical_node_ref(): Node at 0x%x(%d), size 0x%x\n", ref_offset(new), ref_flags(new), len));
 #if 1
-	if (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size)) {
+	/* we could get some obsolete nodes after nextblock was refiled
+	   in wbuf.c */
+	if (  (c->nextblock || !ref_obsolete(new))
+	    &&(jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size))) {
 		printk(KERN_WARNING "argh. node added in wrong place\n");
 		jffs2_free_raw_node_ref(new);
 		return -EINVAL;
@@ -332,7 +335,7 @@
 		c->used_size += len;
 	}
 
-	if (!jeb->free_size && !jeb->dirty_size) {
+	if (!jeb->free_size && !jeb->dirty_size && !jeb->wasted_size) {
 		/* If it lives on the dirty_list, jffs2_reserve_space will put it there */
 		D1(printk(KERN_DEBUG "Adding full erase block at 0x%08x to clean_list (free 0x%08x, dirty 0x%08x, used 0x%08x\n",
 			  jeb->offset, jeb->free_size, jeb->dirty_size, jeb->used_size));
  
  
  
  
  
diff -auNr jffs2/wbuf.c myjffs2/wbuf.c
--- jffs2/wbuf.c	2004-11-26 15:08:31.000000000 +0100
+++ myjffs2/wbuf.c	2004-12-09 17:14:45.966959000 +0100
@@ -130,7 +130,10 @@
 	}
 }
 
-static void jffs2_block_refile(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb)
+#define REFILE_NOTEMPTY 0
+#define REFILE_ANYWAY   1
+
+static void jffs2_block_refile(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb, int allow_empty)
 {
 	D1(printk("About to refile bad block at %08x\n", jeb->offset));
 
@@ -144,7 +147,8 @@
 		D1(printk("Refiling block at %08x to bad_used_list\n", jeb->offset));
 		list_add(&jeb->list, &c->bad_used_list);
 	} else {
-		BUG();
+		if (allow_empty == REFILE_NOTEMPTY)
+			BUG();
 		/* It has to have had some nodes or we couldn't be here */
 		D1(printk("Refiling block at %08x to erase_pending_list\n", jeb->offset));
 		list_add(&jeb->list, &c->erase_pending_list);
@@ -179,7 +183,7 @@
 
 	jeb = &c->blocks[c->wbuf_ofs / c->sector_size];
 
-	jffs2_block_refile(c, jeb);
+	jffs2_block_refile(c, jeb, REFILE_NOTEMPTY);
 
 	/* Find the first node to be recovered, by skipping over every
 	   node which ends before the wbuf starts, or which is obsolete. */
@@ -269,12 +273,12 @@
 		return;
 	}
 	if (end-start >= c->wbuf_pagesize) {
-		/* Need to do another write immediately. This, btw,
-		 means that we'll be writing from 'buf' and not from
-		 the wbuf. Since if we're writing from the wbuf there
-		 won't be more than a wbuf full of data, now will
-		 there? :) */
-
+		/* Need to do another write immediately, but it's possible
+		that this is just because the wbuf itself is completely
+		full, and there's nothing earlier read back from the 
+		flash. Hence 'buf' isn't necessarily what we're writing 
+		from. */
+		unsigned char *rewrite_buf = buf?:c->wbuf;
 		uint32_t towrite = (end-start) - ((end-start)%c->wbuf_pagesize);
 
 		D1(printk(KERN_DEBUG "Write 0x%x bytes at 0x%08x in wbuf recover\n",
@@ -292,14 +296,15 @@
 #endif
 		if (jffs2_cleanmarker_oob(c))
 			ret = c->mtd->write_ecc(c->mtd, ofs, towrite, &retlen,
-						buf, NULL, c->oobinfo);
+						rewrite_buf, NULL, c->oobinfo);
 		else
-			ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, buf);
+			ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, rewrite_buf);
 
 		if (ret || retlen != towrite) {
 			/* Argh. We tried. Really we did. */
 			printk(KERN_CRIT "Recovery of wbuf failed due to a second write error\n");
-			kfree(buf);
+			if (buf)
+				kfree(buf);
 
 			if (retlen) {
 				struct jffs2_raw_node_ref *raw2;
@@ -321,10 +326,10 @@
 
 		c->wbuf_len = (end - start) - towrite;
 		c->wbuf_ofs = ofs + towrite;
-		memcpy(c->wbuf, buf + towrite, c->wbuf_len);
+		memmove(c->wbuf, rewrite_buf + towrite, c->wbuf_len);
 		/* Don't muck about with c->wbuf_inodes. False positives are harmless. */
-
-		kfree(buf);
+		if (buf)
+			kfree(buf);
 	} else {
 		/* OK, now we're left with the dregs in whichever buffer we're using */
 		if (buf) {
@@ -547,6 +552,12 @@
 		D1(printk(KERN_DEBUG "jffs2_flush_wbuf_gc() padding. Not finished checking\n"));
 		down_write(&c->wbuf_sem);
 		ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING);
+		/* retry flushing wbuf in case jffs2_wbuf_recover
+		   left some data in the wbuf */
+		if (ret)
+		{
+			ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING);
+		}
 		up_write(&c->wbuf_sem);
 	} else while (old_wbuf_len &&
 		      old_wbuf_ofs == c->wbuf_ofs) {
@@ -561,6 +572,12 @@
 			down(&c->alloc_sem);
 			down_write(&c->wbuf_sem);
 			ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING);
+			/* retry flushing wbuf in case jffs2_wbuf_recover
+			   left some data in the wbuf */
+			if (ret)
+			{
+				ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING);
+			}
 			up_write(&c->wbuf_sem);
 			break;
 		}
@@ -580,6 +597,9 @@
 
 	down_write(&c->wbuf_sem);
 	ret = __jffs2_flush_wbuf(c, PAD_NOACCOUNT);
+	/* retry - maybe wbuf recover left some data in wbuf. */
+	if (ret)
+		ret = __jffs2_flush_wbuf(c, PAD_NOACCOUNT);
 	up_write(&c->wbuf_sem);
 
 	return ret;
@@ -762,9 +782,18 @@
 		
 		if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) {
 			/* At this point we have no problem,
-			   c->wbuf is empty. 
+			   c->wbuf is empty. However refile nextblock to avoid
+			   writing again to same address.
 			*/
-			*retlen = donelen;
+			struct jffs2_eraseblock *jeb;
+
+			spin_lock(&c->erase_completion_lock);
+
+			jeb = &c->blocks[outvec_to / c->sector_size];
+			jffs2_block_refile(c, jeb, REFILE_ANYWAY);
+
+			*retlen = 0;
+			spin_unlock(&c->erase_completion_lock);
 			goto exit;
 		}
 		
  
  
  
  
  
diff -auNr jffs2/write.c myjffs2/write.c
--- jffs2/write.c	2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/write.c	2004-12-09 17:20:32.438719000 +0100
@@ -136,6 +136,21 @@
 	raw->__totlen = PAD(sizeof(*ri)+datalen);
 	raw->next_phys = NULL;
 
+	if ((alloc_mode!=ALLOC_GC) && (je32_to_cpu(ri->version) < f->highest_version))
+	{
+		if (! retried)
+		{
+			BUG();
+		}
+		else
+		{
+			D1(printk(KERN_DEBUG "jffs2_write_dnode : dnode_version %d,  highest version %d -> updating dnode\n", 
+					     je32_to_cpu(ri->version), f->highest_version));
+			ri->version = cpu_to_je32(++f->highest_version);
+			ri->node_crc = cpu_to_je32(crc32(0, ri, sizeof(*ri)-8));
+		}
+	}
+
 	ret = jffs2_flash_writev(c, vecs, cnt, flash_ofs, &retlen,
 				 (alloc_mode==ALLOC_GC)?0:f->inocache->ino);
 
@@ -280,6 +295,22 @@
 	raw->__totlen = PAD(sizeof(*rd)+namelen);
 	raw->next_phys = NULL;
 
+	if ((alloc_mode!=ALLOC_GC) && (je32_to_cpu(rd->version) < f->highest_version))
+	{
+		if (! retried)
+		{
+			BUG();
+		}
+		else
+		{
+			D1(printk(KERN_DEBUG "jffs2_write_dirent : dirent_version %d,  highest version %d -> updating dirent\n", 
+					     je32_to_cpu(rd->version), f->highest_version));
+			rd->version = cpu_to_je32(++f->highest_version);
+			fd->version = je32_to_cpu(rd->version);
+			rd->node_crc = cpu_to_je32(crc32(0, rd, sizeof(*rd)-8));
+		}
+	}
+
 	ret = jffs2_flash_writev(c, vecs, 2, flash_ofs, &retlen,
 				 (alloc_mode==ALLOC_GC)?0:je32_to_cpu(rd->pino));
 	if (ret || (retlen != sizeof(*rd) + namelen)) {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-12-09 17:06               ` Estelle HAMMACHE
@ 2004-12-15 12:33                 ` Estelle HAMMACHE
  2005-02-02 16:21                   ` Estelle HAMMACHE
  2005-04-04 12:58                   ` Artem B. Bityuckiy
  0 siblings, 2 replies; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-12-15 12:33 UTC (permalink / raw)
  To: linux-mtd

Hi everyone,

it seems there is a problem with jffs2_wbuf_recover and
the wbuf_sem...

jffs2_flash_writev
** down_write(&c->wbuf_sem);  !!!
** __jffs2_flush_wbuf
**** jffs2_wbuf_recover
******  jffs2_block_refile
********  nextblock = NULL;
******  jffs2_reserve_space_gc
********  jffs2_do_reserve_space
**********  jffs2_erase_pending_blocks
************  jffs2_mark_erased_block
**************  jffs2_flash_read
****************  down_read(&c->wbuf_sem); !!!

I believe that when checking a newly erased block, the
wbuf should not be used anyway, so this is probably easy to
correct. Is it ok to create a jffs2_flash_read_nobuf function
and call it only from jffs2_mark_erased_block ?

Also, still looking at the same recovery scheme, I believe
the following could happen if there are no free or erasing
blocks (but is it really possible ???)

jffs2_flash_writev
** __jffs2_flush_wbuf
**** jffs2_wbuf_recover
******  jffs2_block_refile
********  nextblock = NULL;
******  jffs2_reserve_space_gc
********  jffs2_do_reserve_space
********** jffs2_flush_wbuf_pad
************  __jffs2_flush_wbuf

bye
Estelle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-12-15 12:33                 ` Estelle HAMMACHE
@ 2005-02-02 16:21                   ` Estelle HAMMACHE
  2005-04-04 12:58                   ` Artem B. Bityuckiy
  1 sibling, 0 replies; 15+ messages in thread
From: Estelle HAMMACHE @ 2005-02-02 16:21 UTC (permalink / raw)
  To: linux-mtd

Estelle HAMMACHE wrote:
> 
> Hi everyone,
> 
> it seems there is a problem with jffs2_wbuf_recover and
> the wbuf_sem...
> 
> jffs2_flash_writev
> ** down_write(&c->wbuf_sem);  !!!
> ** __jffs2_flush_wbuf
> **** jffs2_wbuf_recover
> ******  jffs2_block_refile
> ********  nextblock = NULL;
> ******  jffs2_reserve_space_gc
> ********  jffs2_do_reserve_space
> **********  jffs2_erase_pending_blocks
> ************  jffs2_mark_erased_block
> **************  jffs2_flash_read
> ****************  down_read(&c->wbuf_sem); !!!
> 

After some thinking I wrote a smallish patch to correct 
this part of the problem (edited below to show the full
function). If there are no objections I will commit it
this week-end. The wbuf semaphore is locked only after
the first checks, I don't believe this can cause trouble
because the wbuf is not freed once it is allocated
and the later checks are enough to prevent copying
a wrong wbuf contents.

The other case I mentionned (no erasing block so
jffs2_do_reserve_space tries to flush the wbuf)
seems impossible - we would not have allowed writing
in this case.

bye
Estelle


 int jffs2_flash_read(struct jffs2_sb_info *c, loff_t ofs, size_t len, size_t *retlen, u_char *buf)
 {
 	loff_t	orbf = 0, owbf = 0, lwbf = 0;
 	int	ret;
 
 	/* Read flash */
 	if (!jffs2_can_mark_obsolete(c)) {
-		down_read(&c->wbuf_sem);
 
 		if (jffs2_cleanmarker_oob(c))
 			ret = c->mtd->read_ecc(c->mtd, ofs, len, retlen, buf, NULL, c->oobinfo);
 		else
 			ret = c->mtd->read(c->mtd, ofs, len, retlen, buf);
 
 		if ( (ret == -EBADMSG) && (*retlen == len) ) {
 			printk(KERN_WARNING "mtd->read(0x%zx bytes from 0x%llx) returned ECC error\n",
 			       len, ofs);
 			/* 
 			 * We have the raw data without ECC correction in the buffer, maybe 
 			 * we are lucky and all data or parts are correct. We check the node.
 			 * If data are corrupted node check will sort it out.
 			 * We keep this block, it will fail on write or erase and the we
 			 * mark it bad. Or should we do that now? But we should give him a chance.
 			 * Maybe we had a system crash or power loss before the ecc write or  
 			 * a erase was completed.
 			 * So we return success. :)
 			 */
 		 	ret = 0;
 		 }	
 	} else
 		return c->mtd->read(c->mtd, ofs, len, retlen, buf);
 
 	/* if no writebuffer available or write buffer empty, return */
 	if (!c->wbuf_pagesize || !c->wbuf_len)
-		goto exit;
+		return ret;
 
 	/* if we read in a different block, return */
 	if ( (ofs & ~(c->sector_size-1)) != (c->wbuf_ofs & ~(c->sector_size-1)) ) 
-		goto exit;
+		return ret;
+
+	/* Lock only if we have reason to believe wbuf contains relevant data,
+	   so that checking an erased block during wbuf recovery space allocation
+	   does not deadlock. */
+	down_read(&c->wbuf_sem);
 
 	if (ofs >= c->wbuf_ofs) {
 		owbf = (ofs - c->wbuf_ofs);	/* offset in write buffer */
 		if (owbf > c->wbuf_len)		/* is read beyond write buffer ? */
 			goto exit;
 		lwbf = c->wbuf_len - owbf;	/* number of bytes to copy */
 		if (lwbf > len)	
 			lwbf = len;
 	} else {	
 		orbf = (c->wbuf_ofs - ofs);	/* offset in read buffer */
 		if (orbf > len)			/* is write beyond write buffer ? */
 			goto exit;
 		lwbf = len - orbf; 		/* number of bytes to copy */
 		if (lwbf > c->wbuf_len)	
 			lwbf = c->wbuf_len;
 	}	
 	if (lwbf > 0)
 		memcpy(buf+orbf,c->wbuf+owbf,lwbf);
 
 exit:
 	up_read(&c->wbuf_sem);
 	return ret;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2004-12-15 12:33                 ` Estelle HAMMACHE
  2005-02-02 16:21                   ` Estelle HAMMACHE
@ 2005-04-04 12:58                   ` Artem B. Bityuckiy
  2005-04-04 13:58                     ` Estelle HAMMACHE
  1 sibling, 1 reply; 15+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-04 12:58 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

Estelle HAMMACHE wrote:
> Hi everyone,
> 
> it seems there is a problem with jffs2_wbuf_recover and
> the wbuf_sem...
> 
> jffs2_flash_writev
> ** down_write(&c->wbuf_sem);  !!!
> ** __jffs2_flush_wbuf
> **** jffs2_wbuf_recover
> ******  jffs2_block_refile
> ********  nextblock = NULL;
> ******  jffs2_reserve_space_gc
> ********  jffs2_do_reserve_space
> **********  jffs2_erase_pending_blocks
> ************  jffs2_mark_erased_block
> **************  jffs2_flash_read
> ****************  down_read(&c->wbuf_sem); !!!
> 
> I believe that when checking a newly erased block, the
> wbuf should not be used anyway, so this is probably easy to
> correct. Is it ok to create a jffs2_flash_read_nobuf function
> and call it only from jffs2_mark_erased_block ?
> 
Hello Estelle,

I've found that JFFS2 has wbuf problems on SMP again. The reason are the 
changes which were made to fix the deadlock you've spotted.
The wbuf_sem rwsem doesn't help if it is locked after you've read flash 
because by the time you copy data from wbuf it might have been chaneged. 
I.e:

/* A part of our data is in wbuf at this point */
mtd->read_ecc(...)
/* At this point another CPU fills wbuf and flushes it, so in contains 
the wrong data */
down_read(&c->wbuf_sem)
memcpy(buf, c->wbuf, len)
up_read(&c->wbuf_sem)

I'd prefer not to use jffs2_flash_read() in jffs2_mark_erased_block() 
but to directly read flash since it wbuf anyway must not correspond to a 
newly erased block.

Please, look at the attached patch.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

[-- Attachment #2: wbuf_sem-1.diff --]
[-- Type: text/plain, Size: 1817 bytes --]

diff -auNrp --exclude=CVS mtd/fs/jffs2/erase.c mtd-fixed/fs/jffs2/erase.c
--- mtd/fs/jffs2/erase.c	2005-04-04 16:37:02.099649680 +0400
+++ mtd-fixed/fs/jffs2/erase.c	2005-04-04 16:33:18.665344419 +0400
@@ -333,7 +333,11 @@ static void jffs2_mark_erased_block(stru
 
 			bad_offset = ofs;
 
-			ret = jffs2_flash_read(c, ofs, readlen, &retlen, ebuf);
+			if (!jffs2_is_writebuffered(c) || !jffs2_cleanmarker_oob(c))
+				ret = c->mtd->read(c->mtd, ofs, readlen, &retlen, ebuf);
+			else
+				ret = c->mtd->read_ecc(c->mtd, ofs, readlen, &retlen, ebuf, NULL, c->oobinfo);
+
 			if (ret) {
 				printk(KERN_WARNING "Read of newly-erased block at 0x%08x failed: %d. Putting on bad_list\n", ofs, ret);
 				goto bad;
diff -auNrp --exclude=CVS mtd/fs/jffs2/wbuf.c mtd-fixed/fs/jffs2/wbuf.c
--- mtd/fs/jffs2/wbuf.c	2005-04-04 16:37:09.133148264 +0400
+++ mtd-fixed/fs/jffs2/wbuf.c	2005-04-04 16:33:18.667343992 +0400
@@ -873,6 +873,7 @@ int jffs2_flash_read(struct jffs2_sb_inf
 		return c->mtd->read(c->mtd, ofs, len, retlen, buf);
 
 	/* Read flash */
+	down_read(&c->wbuf_sem);
 	if (jffs2_cleanmarker_oob(c))
 		ret = c->mtd->read_ecc(c->mtd, ofs, len, retlen, buf, NULL, c->oobinfo);
 	else
@@ -896,16 +897,11 @@ int jffs2_flash_read(struct jffs2_sb_inf
 
 	/* if no writebuffer available or write buffer empty, return */
 	if (!c->wbuf_pagesize || !c->wbuf_len)
-		return ret;;
+		goto exit;
 
 	/* if we read in a different block, return */
 	if (SECTOR_ADDR(ofs) != SECTOR_ADDR(c->wbuf_ofs))
-		return ret;
-
-	/* Lock only if we have reason to believe wbuf contains relevant data,
-	   so that checking an erased block during wbuf recovery space allocation
-	   does not deadlock. */
-	down_read(&c->wbuf_sem);
+		goto exit;
 
 	if (ofs >= c->wbuf_ofs) {
 		owbf = (ofs - c->wbuf_ofs);	/* offset in write buffer */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2005-04-04 12:58                   ` Artem B. Bityuckiy
@ 2005-04-04 13:58                     ` Estelle HAMMACHE
  2005-04-04 14:47                       ` Artem B. Bityuckiy
  0 siblings, 1 reply; 15+ messages in thread
From: Estelle HAMMACHE @ 2005-04-04 13:58 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: linux-mtd

Hi Artem,

I ran the deadlock test with your patch and did not
observe any deadlock nor any other problem.


> -                       ret = jffs2_flash_read(c, ofs, readlen, &retlen, ebuf);
> +                       if (!jffs2_is_writebuffered(c) || !jffs2_cleanmarker_oob(c))
> +                               ret = c->mtd->read(c->mtd, ofs, readlen, &retlen, ebuf);
> +                       else
> +                               ret = c->mtd->read_ecc(c->mtd, ofs, readlen, &retlen, ebuf, NULL, c->oobinfo);
> +
>                         if (ret) {
>                                 printk(KERN_WARNING "Read of newly-erased block at 0x%08x failed: %d. Putting on bad_list\n", ofs, ret);
>                                 goto bad;


However I don't use MTD drivers, and I have no ECC 
implementation so far, so I don't know whether 
read_ecc could return an error (maybe EBADMSG) when 
it finds that the page and OOB are all FF.
Maybe just calling mtd->read unconditionally is 
good here ?

bye
Estelle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: JFFS2 & NAND failure
  2005-04-04 13:58                     ` Estelle HAMMACHE
@ 2005-04-04 14:47                       ` Artem B. Bityuckiy
  0 siblings, 0 replies; 15+ messages in thread
From: Artem B. Bityuckiy @ 2005-04-04 14:47 UTC (permalink / raw)
  To: Estelle HAMMACHE; +Cc: linux-mtd

On Mon, 2005-04-04 at 15:58 +0200, Estelle HAMMACHE wrote:
> However I don't use MTD drivers, and I have no ECC 
> implementation so far, so I don't know whether 
> read_ecc could return an error (maybe EBADMSG) when 
> it finds that the page and OOB are all FF.
It should be fine.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-04-04 14:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-17 17:15 JFFS2 & NAND failure Estelle HAMMACHE
2004-11-18 16:27 ` David Woodhouse
2004-11-18 17:54   ` Estelle HAMMACHE
2004-11-19 13:17     ` David Woodhouse
2004-11-19 16:22       ` Estelle HAMMACHE
2004-11-20 18:57         ` David Woodhouse
2004-11-20 19:19         ` David Woodhouse
2004-11-20 22:13           ` David Woodhouse
2004-12-09 14:57             ` David Woodhouse
2004-12-09 17:06               ` Estelle HAMMACHE
2004-12-15 12:33                 ` Estelle HAMMACHE
2005-02-02 16:21                   ` Estelle HAMMACHE
2005-04-04 12:58                   ` Artem B. Bityuckiy
2005-04-04 13:58                     ` Estelle HAMMACHE
2005-04-04 14:47                       ` Artem B. Bityuckiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox