* JFFS2 & NAND failure
@ 2004-11-17 17:15 Estelle HAMMACHE
2004-11-18 16:27 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Estelle HAMMACHE @ 2004-11-17 17:15 UTC (permalink / raw)
To: David Woodhouse, linux-mtd
Hello,
here are a few corrections (I think) regarding the management
of write or erase error with NAND flash.
- bad_count was not initialised
- during wbuf flushing, if the previously written node filled
wbuf exactly, "buf" may be used instead of "wbuf"
in jffs2_wbuf_recover (access to null pointer)
- there is no refiling of nextblock if a write error occurs
during writev_ecc (direct write, not using wbuf),
so the next write may occur on the failed block
- if a write error occurs, but part of the data was written,
an obsolete raw node ref is added but nextblock has changed.
Additionally I have a question about the retry cases in
jffs2_write_dnode and jffs2_write_dirent.
If the write error occurs during an API call (not GC),
jffs2_reserve_space is called again to find space to rewrite
the node. However since we refiled the bad block, the free space
was reduced. Is it not possible then that jffs2_reserve_space
will need to gc to find some space ?
In the case of a deletion dirent, the previous dirent may be
GCed so its version number is increased. When the deletion
dirent is finally written in the retry, its version number
is older than the GCed node, so it is (quietly) obsoleted.
In the case of a dnode, I think there may be a deadlock
if a dnode belonging to the same file is GCed, since the
GC will attempt to lock f->sem.
I actually saw the dirent case in my tests but since I wrote
some nodes manually I am not sure this is a valid problem in
ordinary JFFS2 processing ?
Could the solution be to call jffs2_reserve_space_gc instead
of jffs2_reserve_space ?
BR
Estelle
diff -auNr jffs2/wbuf.c myjffs2/wbuf.c
--- jffs2/wbuf.c 2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/wbuf.c 2004-11-17 17:36:14.556445000 +0100
@@ -263,7 +263,7 @@
kfree(buf);
return;
}
- if (end-start >= c->wbuf_pagesize) {
+ if (end-start > c->wbuf_pagesize) {
/* Need to do another write immediately. This, btw,
means that we'll be writing from 'buf' and not from
the wbuf. Since if we're writing from the wbuf there
@@ -744,9 +744,42 @@
if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) {
/* At this point we have no problem,
- c->wbuf is empty.
+ c->wbuf is empty. However refile nextblock to avoid
+ writing again to same address.
*/
*retlen = donelen;
+ struct jffs2_eraseblock *jeb;
+ spin_lock(&c->erase_completion_lock);
+
+ jeb = &c->blocks[outvec_to / c->sector_size];
+ D1(printk("About to refile bad block at %08x\n", jeb->offset));
+
+ D2(jffs2_dump_block_lists(c));
+ /* File the existing block on the bad_used_list.... */
+ if (c->nextblock == jeb)
+ c->nextblock = NULL;
+ else /* Not sure this should ever happen... need more coffee */
+ list_del(&jeb->list);
+ if (jeb->first_node) {
+ D1(printk("Refiling block at %08x to bad_used_list\n", jeb->offset));
+ list_add(&jeb->list, &c->bad_used_list);
+ } else {
+ D1(printk("Refiling block at %08x to erase_pending_list\n", jeb->offset));
+ list_add(&jeb->list, &c->erase_pending_list);
+ c->nr_erasing_blocks++;
+ jffs2_erase_pending_trigger(c);
+ }
+ D2(jffs2_dump_block_lists(c));
+
+ /* Adjust its size counts accordingly */
+ c->wasted_size += jeb->free_size;
+ c->free_size -= jeb->free_size;
+ jeb->wasted_size += jeb->free_size;
+ jeb->free_size = 0;
+
+ ACCT_SANITY_CHECK(c,jeb);
+ D1(ACCT_PARANOIA_CHECK(jeb));
+ spin_unlock(&c->erase_completion_lock);
return ret;
}
diff -auNr jffs2/nodemgmt.c myjffs2/nodemgmt.c
--- jffs2/nodemgmt.c 2004-11-17 00:00:14.000000000 +0100
+++ myjffs2/nodemgmt.c 2004-11-17 17:40:40.130248000 +0100
@@ -308,7 +308,7 @@
D1(printk(KERN_DEBUG "jffs2_add_physical_node_ref(): Node at 0x%x(%d), size 0x%x\n", ref_offset(new), ref_flags(new),
len));
#if 1
- if (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size)) {
+ if ((!ref_obsolete(new)) && (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size -
jeb->free_size))) {
printk(KERN_WARNING "argh. node added in wrong place\n");
jffs2_free_raw_node_ref(new);
return -EINVAL;
diff -auNr jffs2/build.c myjffs2/build.c
--- jffs2/build.c 2004-11-17 00:00:13.000000000 +0100
+++ myjffs2/build.c 2004-11-17 17:29:20.384186000 +0100
@@ -310,6 +310,7 @@
c->blocks[i].used_size = 0;
c->blocks[i].first_node = NULL;
c->blocks[i].last_node = NULL;
+ c->blocks[i].bad_count = 0;
}
init_MUTEX(&c->alloc_sem);
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: JFFS2 & NAND failure 2004-11-17 17:15 JFFS2 & NAND failure Estelle HAMMACHE @ 2004-11-18 16:27 ` David Woodhouse 2004-11-18 17:54 ` Estelle HAMMACHE 0 siblings, 1 reply; 15+ messages in thread From: David Woodhouse @ 2004-11-18 16:27 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Wed, 2004-11-17 at 18:15 +0100, Estelle HAMMACHE wrote: > Hello, > > here are a few corrections (I think) regarding the management > of write or erase error with NAND flash. > - bad_count was not initialised Ok. > - during wbuf flushing, if the previously written node filled > wbuf exactly, "buf" may be used instead of "wbuf" > in jffs2_wbuf_recover (access to null pointer) Hmmm. But now with your change we can end up with a completely full wbuf which we don't actually flush; we leave it in memory. We should write that immediately, surely? > - there is no refiling of nextblock if a write error occurs > during writev_ecc (direct write, not using wbuf), > so the next write may occur on the failed block OK. Priorities on printk though please. > - if a write error occurs, but part of the data was written, > an obsolete raw node ref is added but nextblock has changed. Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref in the _new_ nextblock, surely? > Additionally I have a question about the retry cases in > jffs2_write_dnode and jffs2_write_dirent. > If the write error occurs during an API call (not GC), > jffs2_reserve_space is called again to find space to rewrite > the node. However since we refiled the bad block, the free space > was reduced. Is it not possible then that jffs2_reserve_space > will need to gc to find some space ? It is possible, yes. > In the case of a deletion dirent, the previous dirent may be > GCed so its version number is increased. When the deletion > dirent is finally written in the retry, its version number > is older than the GCed node, so it is (quietly) obsoleted. Hmm, true. We should check f->highest_version after we reallocate space, and update ri->version or rd->version accordingly. > In the case of a dnode, I think there may be a deadlock > if a dnode belonging to the same file is GCed, since the > GC will attempt to lock f->sem. No, because we unlock f->sem before calling jffs2_reserve_space. > I actually saw the dirent case in my tests but since I wrote > some nodes manually I am not sure this is a valid problem in > ordinary JFFS2 processing ? I think you're right -- the version number is indeed a real problem. > Could the solution be to call jffs2_reserve_space_gc instead > of jffs2_reserve_space ? No, because if the write fails and then we're short of space, we want to return -ENOSPC. We don't want to keep eating into the space qwhich is reserved for GC. Thanks for the report and the patches. Would you like to send me a SSH public key so that I can give you an account and you can commit directly? -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-18 16:27 ` David Woodhouse @ 2004-11-18 17:54 ` Estelle HAMMACHE 2004-11-19 13:17 ` David Woodhouse 0 siblings, 1 reply; 15+ messages in thread From: Estelle HAMMACHE @ 2004-11-18 17:54 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd David Woodhouse wrote: > > - during wbuf flushing, if the previously written node filled > > wbuf exactly, "buf" may be used instead of "wbuf" > > in jffs2_wbuf_recover (access to null pointer) > > Hmmm. But now with your change we can end up with a completely full wbuf > which we don't actually flush; we leave it in memory. We should write > that immediately, surely? Not necessarily I think. If we write it immediately and the write fails, we have a fatal error. But if we leave it be, the next call to jffs2_flash_writev will flush the buffer and we get 1 more try. Unless there is something I don't understand, there is no harm in leaving the wbuf full. > > - there is no refiling of nextblock if a write error occurs > > during writev_ecc (direct write, not using wbuf), > > so the next write may occur on the failed block > > OK. Priorities on printk though please. OK. (was actually a copy/paste so I'll check the wbuf recover too and maybe make a function for the refiling). > > - if a write error occurs, but part of the data was written, > > an obsolete raw node ref is added but nextblock has changed. > > Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref > in the _new_ nextblock, surely? No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and jffs2_write_dnode. I think this happens only if the write error occurs on mtd->writev_ecc and part of the data was successfully written by jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says some data was written. In this case, when jffs2_write_dirent/dnode adds this obsolete raw node ref for the dirty space, nextblock was modified during the refiling and / or recovery. > > Additionally I have a question about the retry cases in > > jffs2_write_dnode and jffs2_write_dirent. > > If the write error occurs during an API call (not GC), > > jffs2_reserve_space is called again to find space to rewrite > > the node. However since we refiled the bad block, the free space > > was reduced. Is it not possible then that jffs2_reserve_space > > will need to gc to find some space ? > > It is possible, yes. > > > In the case of a deletion dirent, the previous dirent may be > > GCed so its version number is increased. When the deletion > > dirent is finally written in the retry, its version number > > is older than the GCed node, so it is (quietly) obsoleted. > > Hmm, true. We should check f->highest_version after we reallocate space, > and update ri->version or rd->version accordingly. This was my first idea too but I found it ugly to tamper with structures which are clearly the responsibility of the caller. However this is a recovery case so... maybe it is necessary. > > In the case of a dnode, I think there may be a deadlock > > if a dnode belonging to the same file is GCed, since the > > GC will attempt to lock f->sem. > > No, because we unlock f->sem before calling jffs2_reserve_space. Right, I got confused. > > I actually saw the dirent case in my tests but since I wrote > > some nodes manually I am not sure this is a valid problem in > > ordinary JFFS2 processing ? > > I think you're right -- the version number is indeed a real problem. > > > Could the solution be to call jffs2_reserve_space_gc instead > > of jffs2_reserve_space ? > > No, because if the write fails and then we're short of space, we want to > return -ENOSPC. We don't want to keep eating into the space qwhich is > reserved for GC. If we use jffs2_reserve_space_gc both in jffs2_wbuf_recover and the dnode/dirent retry cases I believe we will write at most 2*4KB = 8KB this way ? Then further API calls will do the GC. Is this unacceptable ? > Thanks for the report and the patches. Would you like to send me a SSH > public key so that I can give you an account and you can commit > directly? We have a firewall here so I don't think CVS will work. I will ask. Like many people I rely on snapshots. BTW if JFFS2 and JFFS3 are maintained in parallel do we get 2 sets of snapshots ? or is there another way to get both versions ? Anyway I'll send a corrected patch to the list when decision is made about the node version problem - I guess the version check/update ? Bye Estelle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-18 17:54 ` Estelle HAMMACHE @ 2004-11-19 13:17 ` David Woodhouse 2004-11-19 16:22 ` Estelle HAMMACHE 0 siblings, 1 reply; 15+ messages in thread From: David Woodhouse @ 2004-11-19 13:17 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Thu, 2004-11-18 at 18:54 +0100, Estelle HAMMACHE wrote: > David Woodhouse wrote: > > > - during wbuf flushing, if the previously written node filled > > > wbuf exactly, "buf" may be used instead of "wbuf" > > > in jffs2_wbuf_recover (access to null pointer) > > > > Hmmm. But now with your change we can end up with a completely full wbuf > > which we don't actually flush; we leave it in memory. We should write > > that immediately, surely? > > Not necessarily I think. If we write it immediately and the write fails, > we have a fatal error. But if we leave it be, the next call to > jffs2_flash_writev will flush the buffer and we get 1 more try. > Unless there is something I don't understand, there is no harm in > leaving the wbuf full. Generally we should flush the wbuf as soon as we can. And if there's going to be an error when we retry, surely we want that to happen _immediately_ so that the _current_ write() call returns an error, rather than leaving it in the wbuf and then losing it later? > > OK. Priorities on printk though please. > > OK. (was actually a copy/paste so I'll check the wbuf recover too > and maybe make a function for the refiling). Hm. Those are all my fault then -- but do as I say, not as I do :) > > > - if a write error occurs, but part of the data was written, > > > an obsolete raw node ref is added but nextblock has changed. > > > > Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref > > in the _new_ nextblock, surely? > > No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and > jffs2_write_dnode. I think this happens only if the write error occurs > on mtd->writev_ecc and part of the data was successfully written by > jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says > some data was written. In this case, when jffs2_write_dirent/dnode > adds this obsolete raw node ref for the dirty space, nextblock was > modified during the refiling and / or recovery. OK... you mean the case where there was already a node in the wbuf and the wbuf flush failed, so we rewrote that node to a new block, along with the start of the new node? Then the write of the _rest_ of the new node failed, and we return with 'retlen' set to the amount of the new node that fitted into the wbuf and we actually written? I think that ought to be OK though because we only refiled enough raw_node_refs to cover the _previous_ nodes? > > Hmm, true. We should check f->highest_version after we reallocate space, > > and update ri->version or rd->version accordingly. > > This was my first idea too but I found it ugly to tamper with > structures which are clearly the responsibility of the caller. > However this is a recovery case so... maybe it is necessary. Well we can always turn it around and make them always the responsibility of the callee instead -- set them _always_ in jffs2_write_{dirent,dnode}? > If we use jffs2_reserve_space_gc both in jffs2_wbuf_recover and the > dnode/dirent retry cases I believe we will write at most 2*4KB = 8KB > this way ? Then further API calls will do the GC. Is this unacceptable ? I also want to increase the maximum size of data node... I really don't like letting anything else eat into our reserved space unless we really need to. Fixing up the version number isn't so hard. > We have a firewall here so I don't think CVS will work. I will ask. A firewall surely shouldn't prevent _outgoing_ connections to port 22? A web proxy which supports CONNECT requests may also allow you to make connections to port 22 -- ssh can use that if it's available. -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-19 13:17 ` David Woodhouse @ 2004-11-19 16:22 ` Estelle HAMMACHE 2004-11-20 18:57 ` David Woodhouse 2004-11-20 19:19 ` David Woodhouse 0 siblings, 2 replies; 15+ messages in thread From: Estelle HAMMACHE @ 2004-11-19 16:22 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd David Woodhouse wrote: > > On Thu, 2004-11-18 at 18:54 +0100, Estelle HAMMACHE wrote: > > David Woodhouse wrote: > > > > - during wbuf flushing, if the previously written node filled > > > > wbuf exactly, "buf" may be used instead of "wbuf" > > > > in jffs2_wbuf_recover (access to null pointer) > > > > > > Hmmm. But now with your change we can end up with a completely full wbuf > > > which we don't actually flush; we leave it in memory. We should write > > > that immediately, surely? > > > > Not necessarily I think. If we write it immediately and the write fails, > > we have a fatal error. But if we leave it be, the next call to > > jffs2_flash_writev will flush the buffer and we get 1 more try. > > Unless there is something I don't understand, there is no harm in > > leaving the wbuf full. > > Generally we should flush the wbuf as soon as we can. And if there's > going to be an error when we retry, surely we want that to happen > _immediately_ so that the _current_ write() call returns an error, > rather than leaving it in the wbuf and then losing it later? Whether jffs2_wbuf_recover succeeds or not, __jffs2_flush_wbuf always returns an error so that jffs2_flash_writev returns immediately, and the caller knows it needs to reallocate space to try again. Then, since wbuf is full, it will be flushed before the new (retry) node can be written. The node which was in wbuf was outstanding anyway. It is not part of the data from the current call to jffs2_flash_writev. What I meant is that if we write wbuf in jffs2_write_recover, like we write buf, 1 write error means we never write the node, and we lose data. Whereas if we leave the full wbuf, the node still has 2 regular write attempts (flush and recover) before some data is lost. I can see no difference between this "full wbuf" case and the usual jffs2_wbuf_recover where wbuf is left partly filled ? I feel that writing wbuf immediately adds code without necessity (or maybe I'm just too lazy to test it). On the other hand, maybe jffs2_wbuf_recover should be tweaked to resist to failure to write buf ? in this case writing a filled-up wbuf immediately too would be logical. How many blocks can we afford to refile to the bad block lists in a single call to jffs2_flash_writev ? > > > > - if a write error occurs, but part of the data was written, > > > > an obsolete raw node ref is added but nextblock has changed. > > > > > > Where? You mean in jffs2_wbuf_recover()? That's an obsolete raw node ref > > > in the _new_ nextblock, surely? > > > > No, it is the "Mark the space as dirtied" case in jffs2_write_dirent and > > jffs2_write_dnode. I think this happens only if the write error occurs > > on mtd->writev_ecc and part of the data was successfully written by > > jffs2_flush_wbuf or writev_ecc previously so jffs2_flash_writev says > > some data was written. In this case, when jffs2_write_dirent/dnode > > adds this obsolete raw node ref for the dirty space, nextblock was > > modified during the refiling and / or recovery. > > OK... you mean the case where there was already a node in the wbuf and > the wbuf flush failed, so we rewrote that node to a new block, along > with the start of the new node? Then the write of the _rest_ of the new > node failed, and we return with 'retlen' set to the amount of the new > node that fitted into the wbuf and we actually written? I should have mentionned that the problem in jffs2_add_physical_node_ref cannot happen with the original code - it only happens with the modification of jffs2_flash_writev to refile nextblock on mtd->writev_ecc failure. With the original code, nextblock it not refiled so there is no problem in jffs2_add_physical_node_ref, but the retry occurs on the same page which already failed in mtd->writev_ecc. Here is an example of the obsolete node/refiled nextblock pb: - page size 512 bytes - start condition: wbuf contains 500 bytes jffs2_write_dnode call, total node size 700 bytes -->jffs2_flash_writev ------>fill up wbuf ------>donelen = 12 ------>jffs2_flush_wbuf (succeeds) ------>c->mtd->writev_ecc to write 1 page - fails ------>nextblock is refiled (new code!) ------>*retlen = donelen (= 12) -->add obsolete frag for dirtied space ------> free raw node & return early because c->nextblock == 0 -->probable segfault on (redundant?) jffs2_mark_node_obsolete -->retry. Maybe I ought to test nextblock == 0 in addition to the obsolete flag to make the exclusion case more precise in jffs2_add_physical_node_ref ? Or is the "dirty" node really necessary ? what happens if we don't list it in the raw node refs ? > > > Hmm, true. We should check f->highest_version after we reallocate space, > > > and update ri->version or rd->version accordingly. > > > > This was my first idea too but I found it ugly to tamper with > > structures which are clearly the responsibility of the caller. > > However this is a recovery case so... maybe it is necessary. > > Well we can always turn it around and make them always the > responsibility of the callee instead -- set them _always_ in > jffs2_write_{dirent,dnode}? Ok, let's do that. bye Estelle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-19 16:22 ` Estelle HAMMACHE @ 2004-11-20 18:57 ` David Woodhouse 2004-11-20 19:19 ` David Woodhouse 1 sibling, 0 replies; 15+ messages in thread From: David Woodhouse @ 2004-11-20 18:57 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Fri, 2004-11-19 at 17:22 +0100, Estelle HAMMACHE wrote: > I feel that writing wbuf immediately adds code without necessity > (or maybe I'm just too lazy to test it). The code is there already; it's just a case of making it use wbuf instead of buf. That shouldn't be so hard. > Maybe I ought to test nextblock == 0 in addition to the > obsolete flag to make the exclusion case more precise in > jffs2_add_physical_node_ref ? Ah, OK. Yes, that might be appropriate. > Or is the "dirty" node really necessary ? what happens if we > don't list it in the raw node refs ? We'll try writing to that same offset again, and will fail. -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-19 16:22 ` Estelle HAMMACHE 2004-11-20 18:57 ` David Woodhouse @ 2004-11-20 19:19 ` David Woodhouse 2004-11-20 22:13 ` David Woodhouse 1 sibling, 1 reply; 15+ messages in thread From: David Woodhouse @ 2004-11-20 19:19 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Fri, 2004-11-19 at 17:22 +0100, Estelle HAMMACHE wrote: > > Here is an example of the obsolete node/refiled nextblock pb: > - page size 512 bytes > - start condition: wbuf contains 500 bytes > jffs2_write_dnode call, total node size 700 bytes > -->jffs2_flash_writev > ------>fill up wbuf > ------>donelen = 12 > ------>jffs2_flush_wbuf (succeeds) > ------>c->mtd->writev_ecc to write 1 page - fails > ------>nextblock is refiled (new code!) > ------>*retlen = donelen (= 12) > -->add obsolete frag for dirtied space > ------> free raw node & return early because c->nextblock == 0 > -->probable segfault on (redundant?) jffs2_mark_node_obsolete > -->retry. This is the problem, surely? If we've refiled the nextblock, we should return retlen == 0. -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-20 19:19 ` David Woodhouse @ 2004-11-20 22:13 ` David Woodhouse 2004-12-09 14:57 ` David Woodhouse 0 siblings, 1 reply; 15+ messages in thread From: David Woodhouse @ 2004-11-20 22:13 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote: > This is the problem, surely? If we've refiled the nextblock, we should > return retlen == 0. Can you try this patch against current CVS? I've already split the refiling off into a separate function we can call, and I've also fixed it to write from the wbuf if it's full. Index: wbuf.c =================================================================== RCS file: /home/cvs/mtd/fs/jffs2/wbuf.c,v retrieving revision 1.81 diff -u -r1.81 wbuf.c --- wbuf.c 20 Nov 2004 10:44:07 -0000 1.81 +++ wbuf.c 20 Nov 2004 22:06:25 -0000 @@ -264,12 +268,12 @@ return; } if (end-start >= c->wbuf_pagesize) { - /* Need to do another write immediately. This, btw, - means that we'll be writing from 'buf' and not from - the wbuf. Since if we're writing from the wbuf there - won't be more than a wbuf full of data, now will - there? :) */ - + /* Need to do another write immediately, but it's possible + that this is just because the wbuf itself is completely + full, and there's nothing earlier read back from the + flash. Hence 'buf' isn't necessarily what we're writing + from. */ + unsigned char *rewrite_buf = buf?:c->wbuf; uint32_t towrite = (end-start) - ((end-start)%c->wbuf_pagesize); D1(printk(KERN_DEBUG "Write 0x%x bytes at 0x%08x in wbuf recover\n", @@ -287,14 +291,15 @@ #endif if (jffs2_cleanmarker_oob(c)) ret = c->mtd->write_ecc(c->mtd, ofs, towrite, &retlen, - buf, NULL, c->oobinfo); + rewrite_buf, NULL, c->oobinfo); else - ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, buf); + ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, rewrite_buf); if (ret || retlen != towrite) { /* Argh. We tried. Really we did. */ printk(KERN_CRIT "Recovery of wbuf failed due to a second write error\n"); - kfree(buf); + if (buf) + kfree(buf); if (retlen) { struct jffs2_raw_node_ref *raw2; @@ -316,10 +321,10 @@ c->wbuf_len = (end - start) - towrite; c->wbuf_ofs = ofs + towrite; - memcpy(c->wbuf, buf + towrite, c->wbuf_len); + memmove(c->wbuf, buf + towrite, c->wbuf_len); /* Don't muck about with c->wbuf_inodes. False positives are harmless. */ - - kfree(buf); + if (buf) + kfree(buf); } else { /* OK, now we're left with the dregs in whichever buffer we're using */ if (buf) { @@ -757,9 +762,19 @@ if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) { /* At this point we have no problem, - c->wbuf is empty. + c->wbuf is empty. However refile nextblock to avoid + writing again to same address. */ + struct jffs2_eraseblock *jeb; + *retlen = donelen; + spin_lock(&c->erase_completion_lock); + + jeb = &c->blocks[outvec_to / c->sector_size]; + jffs2_refile_block(c, jeb); + + *retlen = 0; + spin_unlock(&c->erase_completion_lock); goto exit; } -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-11-20 22:13 ` David Woodhouse @ 2004-12-09 14:57 ` David Woodhouse 2004-12-09 17:06 ` Estelle HAMMACHE 0 siblings, 1 reply; 15+ messages in thread From: David Woodhouse @ 2004-12-09 14:57 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Sat, 2004-11-20 at 22:13 +0000, David Woodhouse wrote: > On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote: > > This is the problem, surely? If we've refiled the nextblock, we should > > return retlen == 0. > > Can you try this patch against current CVS? I've already split the > refiling off into a separate function we can call, and I've also fixed > it to write from the wbuf if it's full. Any progress? If it's working and you agree it's sufficient then we should commit it. -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-12-09 14:57 ` David Woodhouse @ 2004-12-09 17:06 ` Estelle HAMMACHE 2004-12-15 12:33 ` Estelle HAMMACHE 0 siblings, 1 reply; 15+ messages in thread From: Estelle HAMMACHE @ 2004-12-09 17:06 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd Sorry I've been very busy lately and couldn't find time to report back regarding this issue. There were only minor problems with your patch. Here is the patch that should settle the NAND failure problems I've stumbled upon lately. - block refiling in jffs2_flash_writev - retry on flush wbuf (probably useful when umounting) - retry in gc (might make some compilers complain ?) - version thing in the write.c retry cases (I didn't make jffs2_write_dnode and jffs2_write_dirent update the version in all cases instead of the caller, because of the hole GC case where the version should remain the same as it was...) - the modifications in jffs2_add_physical_node_ref are probably not necessary anymore ? not sure. The wasted_size thing is an old patch to avoid getting blocks with large amounts of wasted space in the clean_list. (still no cvs access) bye Estelle David Woodhouse wrote: > > On Sat, 2004-11-20 at 22:13 +0000, David Woodhouse wrote: > > On Sat, 2004-11-20 at 19:19 +0000, David Woodhouse wrote: > > > This is the problem, surely? If we've refiled the nextblock, we should > > > return retlen == 0. > > > > Can you try this patch against current CVS? I've already split the > > refiling off into a separate function we can call, and I've also fixed > > it to write from the wbuf if it's full. > > Any progress? If it's working and you agree it's sufficient then we > should commit it. > > -- > dwmw2 diff -auNr jffs2/gc.c myjffs2/gc.c --- jffs2/gc.c 2004-11-17 00:00:14.000000000 +0100 +++ myjffs2/gc.c 2004-12-09 16:59:23.880059000 +0100 @@ -602,7 +602,7 @@ printk(KERN_NOTICE "Not marking the space at 0x%08x as dirty because the flash driver returned retlen zero\n", nraw->flash_offset); jffs2_free_raw_node_ref(nraw); } - if (!retried && (nraw == jffs2_alloc_raw_node_ref())) { + if (!retried && (nraw = jffs2_alloc_raw_node_ref())) { /* Try to reallocate space and retry */ uint32_t dummy; struct jffs2_eraseblock *jeb = &c->blocks[phys_ofs / c->sector_size]; @@ -628,7 +628,6 @@ jffs2_free_raw_node_ref(nraw); } - jffs2_free_raw_node_ref(nraw); if (!ret) ret = -EIO; goto out_node; diff -auNr jffs2/nodemgmt.c myjffs2/nodemgmt.c --- jffs2/nodemgmt.c 2004-11-26 15:08:31.000000000 +0100 +++ myjffs2/nodemgmt.c 2004-12-09 17:32:23.921567000 +0100 @@ -308,7 +308,10 @@ D1(printk(KERN_DEBUG "jffs2_add_physical_node_ref(): Node at 0x%x(%d), size 0x%x\n", ref_offset(new), ref_flags(new), len)); #if 1 - if (jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size)) { + /* we could get some obsolete nodes after nextblock was refiled + in wbuf.c */ + if ( (c->nextblock || !ref_obsolete(new)) + &&(jeb != c->nextblock || (ref_offset(new)) != jeb->offset + (c->sector_size - jeb->free_size))) { printk(KERN_WARNING "argh. node added in wrong place\n"); jffs2_free_raw_node_ref(new); return -EINVAL; @@ -332,7 +335,7 @@ c->used_size += len; } - if (!jeb->free_size && !jeb->dirty_size) { + if (!jeb->free_size && !jeb->dirty_size && !jeb->wasted_size) { /* If it lives on the dirty_list, jffs2_reserve_space will put it there */ D1(printk(KERN_DEBUG "Adding full erase block at 0x%08x to clean_list (free 0x%08x, dirty 0x%08x, used 0x%08x\n", jeb->offset, jeb->free_size, jeb->dirty_size, jeb->used_size)); diff -auNr jffs2/wbuf.c myjffs2/wbuf.c --- jffs2/wbuf.c 2004-11-26 15:08:31.000000000 +0100 +++ myjffs2/wbuf.c 2004-12-09 17:14:45.966959000 +0100 @@ -130,7 +130,10 @@ } } -static void jffs2_block_refile(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb) +#define REFILE_NOTEMPTY 0 +#define REFILE_ANYWAY 1 + +static void jffs2_block_refile(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb, int allow_empty) { D1(printk("About to refile bad block at %08x\n", jeb->offset)); @@ -144,7 +147,8 @@ D1(printk("Refiling block at %08x to bad_used_list\n", jeb->offset)); list_add(&jeb->list, &c->bad_used_list); } else { - BUG(); + if (allow_empty == REFILE_NOTEMPTY) + BUG(); /* It has to have had some nodes or we couldn't be here */ D1(printk("Refiling block at %08x to erase_pending_list\n", jeb->offset)); list_add(&jeb->list, &c->erase_pending_list); @@ -179,7 +183,7 @@ jeb = &c->blocks[c->wbuf_ofs / c->sector_size]; - jffs2_block_refile(c, jeb); + jffs2_block_refile(c, jeb, REFILE_NOTEMPTY); /* Find the first node to be recovered, by skipping over every node which ends before the wbuf starts, or which is obsolete. */ @@ -269,12 +273,12 @@ return; } if (end-start >= c->wbuf_pagesize) { - /* Need to do another write immediately. This, btw, - means that we'll be writing from 'buf' and not from - the wbuf. Since if we're writing from the wbuf there - won't be more than a wbuf full of data, now will - there? :) */ - + /* Need to do another write immediately, but it's possible + that this is just because the wbuf itself is completely + full, and there's nothing earlier read back from the + flash. Hence 'buf' isn't necessarily what we're writing + from. */ + unsigned char *rewrite_buf = buf?:c->wbuf; uint32_t towrite = (end-start) - ((end-start)%c->wbuf_pagesize); D1(printk(KERN_DEBUG "Write 0x%x bytes at 0x%08x in wbuf recover\n", @@ -292,14 +296,15 @@ #endif if (jffs2_cleanmarker_oob(c)) ret = c->mtd->write_ecc(c->mtd, ofs, towrite, &retlen, - buf, NULL, c->oobinfo); + rewrite_buf, NULL, c->oobinfo); else - ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, buf); + ret = c->mtd->write(c->mtd, ofs, towrite, &retlen, rewrite_buf); if (ret || retlen != towrite) { /* Argh. We tried. Really we did. */ printk(KERN_CRIT "Recovery of wbuf failed due to a second write error\n"); - kfree(buf); + if (buf) + kfree(buf); if (retlen) { struct jffs2_raw_node_ref *raw2; @@ -321,10 +326,10 @@ c->wbuf_len = (end - start) - towrite; c->wbuf_ofs = ofs + towrite; - memcpy(c->wbuf, buf + towrite, c->wbuf_len); + memmove(c->wbuf, rewrite_buf + towrite, c->wbuf_len); /* Don't muck about with c->wbuf_inodes. False positives are harmless. */ - - kfree(buf); + if (buf) + kfree(buf); } else { /* OK, now we're left with the dregs in whichever buffer we're using */ if (buf) { @@ -547,6 +552,12 @@ D1(printk(KERN_DEBUG "jffs2_flush_wbuf_gc() padding. Not finished checking\n")); down_write(&c->wbuf_sem); ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING); + /* retry flushing wbuf in case jffs2_wbuf_recover + left some data in the wbuf */ + if (ret) + { + ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING); + } up_write(&c->wbuf_sem); } else while (old_wbuf_len && old_wbuf_ofs == c->wbuf_ofs) { @@ -561,6 +572,12 @@ down(&c->alloc_sem); down_write(&c->wbuf_sem); ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING); + /* retry flushing wbuf in case jffs2_wbuf_recover + left some data in the wbuf */ + if (ret) + { + ret = __jffs2_flush_wbuf(c, PAD_ACCOUNTING); + } up_write(&c->wbuf_sem); break; } @@ -580,6 +597,9 @@ down_write(&c->wbuf_sem); ret = __jffs2_flush_wbuf(c, PAD_NOACCOUNT); + /* retry - maybe wbuf recover left some data in wbuf. */ + if (ret) + ret = __jffs2_flush_wbuf(c, PAD_NOACCOUNT); up_write(&c->wbuf_sem); return ret; @@ -762,9 +782,18 @@ if (ret < 0 || wbuf_retlen != PAGE_DIV(totlen)) { /* At this point we have no problem, - c->wbuf is empty. + c->wbuf is empty. However refile nextblock to avoid + writing again to same address. */ - *retlen = donelen; + struct jffs2_eraseblock *jeb; + + spin_lock(&c->erase_completion_lock); + + jeb = &c->blocks[outvec_to / c->sector_size]; + jffs2_block_refile(c, jeb, REFILE_ANYWAY); + + *retlen = 0; + spin_unlock(&c->erase_completion_lock); goto exit; } diff -auNr jffs2/write.c myjffs2/write.c --- jffs2/write.c 2004-11-17 00:00:14.000000000 +0100 +++ myjffs2/write.c 2004-12-09 17:20:32.438719000 +0100 @@ -136,6 +136,21 @@ raw->__totlen = PAD(sizeof(*ri)+datalen); raw->next_phys = NULL; + if ((alloc_mode!=ALLOC_GC) && (je32_to_cpu(ri->version) < f->highest_version)) + { + if (! retried) + { + BUG(); + } + else + { + D1(printk(KERN_DEBUG "jffs2_write_dnode : dnode_version %d, highest version %d -> updating dnode\n", + je32_to_cpu(ri->version), f->highest_version)); + ri->version = cpu_to_je32(++f->highest_version); + ri->node_crc = cpu_to_je32(crc32(0, ri, sizeof(*ri)-8)); + } + } + ret = jffs2_flash_writev(c, vecs, cnt, flash_ofs, &retlen, (alloc_mode==ALLOC_GC)?0:f->inocache->ino); @@ -280,6 +295,22 @@ raw->__totlen = PAD(sizeof(*rd)+namelen); raw->next_phys = NULL; + if ((alloc_mode!=ALLOC_GC) && (je32_to_cpu(rd->version) < f->highest_version)) + { + if (! retried) + { + BUG(); + } + else + { + D1(printk(KERN_DEBUG "jffs2_write_dirent : dirent_version %d, highest version %d -> updating dirent\n", + je32_to_cpu(rd->version), f->highest_version)); + rd->version = cpu_to_je32(++f->highest_version); + fd->version = je32_to_cpu(rd->version); + rd->node_crc = cpu_to_je32(crc32(0, rd, sizeof(*rd)-8)); + } + } + ret = jffs2_flash_writev(c, vecs, 2, flash_ofs, &retlen, (alloc_mode==ALLOC_GC)?0:je32_to_cpu(rd->pino)); if (ret || (retlen != sizeof(*rd) + namelen)) { ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-12-09 17:06 ` Estelle HAMMACHE @ 2004-12-15 12:33 ` Estelle HAMMACHE 2005-02-02 16:21 ` Estelle HAMMACHE 2005-04-04 12:58 ` Artem B. Bityuckiy 0 siblings, 2 replies; 15+ messages in thread From: Estelle HAMMACHE @ 2004-12-15 12:33 UTC (permalink / raw) To: linux-mtd Hi everyone, it seems there is a problem with jffs2_wbuf_recover and the wbuf_sem... jffs2_flash_writev ** down_write(&c->wbuf_sem); !!! ** __jffs2_flush_wbuf **** jffs2_wbuf_recover ****** jffs2_block_refile ******** nextblock = NULL; ****** jffs2_reserve_space_gc ******** jffs2_do_reserve_space ********** jffs2_erase_pending_blocks ************ jffs2_mark_erased_block ************** jffs2_flash_read **************** down_read(&c->wbuf_sem); !!! I believe that when checking a newly erased block, the wbuf should not be used anyway, so this is probably easy to correct. Is it ok to create a jffs2_flash_read_nobuf function and call it only from jffs2_mark_erased_block ? Also, still looking at the same recovery scheme, I believe the following could happen if there are no free or erasing blocks (but is it really possible ???) jffs2_flash_writev ** __jffs2_flush_wbuf **** jffs2_wbuf_recover ****** jffs2_block_refile ******** nextblock = NULL; ****** jffs2_reserve_space_gc ******** jffs2_do_reserve_space ********** jffs2_flush_wbuf_pad ************ __jffs2_flush_wbuf bye Estelle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-12-15 12:33 ` Estelle HAMMACHE @ 2005-02-02 16:21 ` Estelle HAMMACHE 2005-04-04 12:58 ` Artem B. Bityuckiy 1 sibling, 0 replies; 15+ messages in thread From: Estelle HAMMACHE @ 2005-02-02 16:21 UTC (permalink / raw) To: linux-mtd Estelle HAMMACHE wrote: > > Hi everyone, > > it seems there is a problem with jffs2_wbuf_recover and > the wbuf_sem... > > jffs2_flash_writev > ** down_write(&c->wbuf_sem); !!! > ** __jffs2_flush_wbuf > **** jffs2_wbuf_recover > ****** jffs2_block_refile > ******** nextblock = NULL; > ****** jffs2_reserve_space_gc > ******** jffs2_do_reserve_space > ********** jffs2_erase_pending_blocks > ************ jffs2_mark_erased_block > ************** jffs2_flash_read > **************** down_read(&c->wbuf_sem); !!! > After some thinking I wrote a smallish patch to correct this part of the problem (edited below to show the full function). If there are no objections I will commit it this week-end. The wbuf semaphore is locked only after the first checks, I don't believe this can cause trouble because the wbuf is not freed once it is allocated and the later checks are enough to prevent copying a wrong wbuf contents. The other case I mentionned (no erasing block so jffs2_do_reserve_space tries to flush the wbuf) seems impossible - we would not have allowed writing in this case. bye Estelle int jffs2_flash_read(struct jffs2_sb_info *c, loff_t ofs, size_t len, size_t *retlen, u_char *buf) { loff_t orbf = 0, owbf = 0, lwbf = 0; int ret; /* Read flash */ if (!jffs2_can_mark_obsolete(c)) { - down_read(&c->wbuf_sem); if (jffs2_cleanmarker_oob(c)) ret = c->mtd->read_ecc(c->mtd, ofs, len, retlen, buf, NULL, c->oobinfo); else ret = c->mtd->read(c->mtd, ofs, len, retlen, buf); if ( (ret == -EBADMSG) && (*retlen == len) ) { printk(KERN_WARNING "mtd->read(0x%zx bytes from 0x%llx) returned ECC error\n", len, ofs); /* * We have the raw data without ECC correction in the buffer, maybe * we are lucky and all data or parts are correct. We check the node. * If data are corrupted node check will sort it out. * We keep this block, it will fail on write or erase and the we * mark it bad. Or should we do that now? But we should give him a chance. * Maybe we had a system crash or power loss before the ecc write or * a erase was completed. * So we return success. :) */ ret = 0; } } else return c->mtd->read(c->mtd, ofs, len, retlen, buf); /* if no writebuffer available or write buffer empty, return */ if (!c->wbuf_pagesize || !c->wbuf_len) - goto exit; + return ret; /* if we read in a different block, return */ if ( (ofs & ~(c->sector_size-1)) != (c->wbuf_ofs & ~(c->sector_size-1)) ) - goto exit; + return ret; + + /* Lock only if we have reason to believe wbuf contains relevant data, + so that checking an erased block during wbuf recovery space allocation + does not deadlock. */ + down_read(&c->wbuf_sem); if (ofs >= c->wbuf_ofs) { owbf = (ofs - c->wbuf_ofs); /* offset in write buffer */ if (owbf > c->wbuf_len) /* is read beyond write buffer ? */ goto exit; lwbf = c->wbuf_len - owbf; /* number of bytes to copy */ if (lwbf > len) lwbf = len; } else { orbf = (c->wbuf_ofs - ofs); /* offset in read buffer */ if (orbf > len) /* is write beyond write buffer ? */ goto exit; lwbf = len - orbf; /* number of bytes to copy */ if (lwbf > c->wbuf_len) lwbf = c->wbuf_len; } if (lwbf > 0) memcpy(buf+orbf,c->wbuf+owbf,lwbf); exit: up_read(&c->wbuf_sem); return ret; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2004-12-15 12:33 ` Estelle HAMMACHE 2005-02-02 16:21 ` Estelle HAMMACHE @ 2005-04-04 12:58 ` Artem B. Bityuckiy 2005-04-04 13:58 ` Estelle HAMMACHE 1 sibling, 1 reply; 15+ messages in thread From: Artem B. Bityuckiy @ 2005-04-04 12:58 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd [-- Attachment #1: Type: text/plain, Size: 1559 bytes --] Estelle HAMMACHE wrote: > Hi everyone, > > it seems there is a problem with jffs2_wbuf_recover and > the wbuf_sem... > > jffs2_flash_writev > ** down_write(&c->wbuf_sem); !!! > ** __jffs2_flush_wbuf > **** jffs2_wbuf_recover > ****** jffs2_block_refile > ******** nextblock = NULL; > ****** jffs2_reserve_space_gc > ******** jffs2_do_reserve_space > ********** jffs2_erase_pending_blocks > ************ jffs2_mark_erased_block > ************** jffs2_flash_read > **************** down_read(&c->wbuf_sem); !!! > > I believe that when checking a newly erased block, the > wbuf should not be used anyway, so this is probably easy to > correct. Is it ok to create a jffs2_flash_read_nobuf function > and call it only from jffs2_mark_erased_block ? > Hello Estelle, I've found that JFFS2 has wbuf problems on SMP again. The reason are the changes which were made to fix the deadlock you've spotted. The wbuf_sem rwsem doesn't help if it is locked after you've read flash because by the time you copy data from wbuf it might have been chaneged. I.e: /* A part of our data is in wbuf at this point */ mtd->read_ecc(...) /* At this point another CPU fills wbuf and flushes it, so in contains the wrong data */ down_read(&c->wbuf_sem) memcpy(buf, c->wbuf, len) up_read(&c->wbuf_sem) I'd prefer not to use jffs2_flash_read() in jffs2_mark_erased_block() but to directly read flash since it wbuf anyway must not correspond to a newly erased block. Please, look at the attached patch. -- Best Regards, Artem B. Bityuckiy, St.-Petersburg, Russia. [-- Attachment #2: wbuf_sem-1.diff --] [-- Type: text/plain, Size: 1817 bytes --] diff -auNrp --exclude=CVS mtd/fs/jffs2/erase.c mtd-fixed/fs/jffs2/erase.c --- mtd/fs/jffs2/erase.c 2005-04-04 16:37:02.099649680 +0400 +++ mtd-fixed/fs/jffs2/erase.c 2005-04-04 16:33:18.665344419 +0400 @@ -333,7 +333,11 @@ static void jffs2_mark_erased_block(stru bad_offset = ofs; - ret = jffs2_flash_read(c, ofs, readlen, &retlen, ebuf); + if (!jffs2_is_writebuffered(c) || !jffs2_cleanmarker_oob(c)) + ret = c->mtd->read(c->mtd, ofs, readlen, &retlen, ebuf); + else + ret = c->mtd->read_ecc(c->mtd, ofs, readlen, &retlen, ebuf, NULL, c->oobinfo); + if (ret) { printk(KERN_WARNING "Read of newly-erased block at 0x%08x failed: %d. Putting on bad_list\n", ofs, ret); goto bad; diff -auNrp --exclude=CVS mtd/fs/jffs2/wbuf.c mtd-fixed/fs/jffs2/wbuf.c --- mtd/fs/jffs2/wbuf.c 2005-04-04 16:37:09.133148264 +0400 +++ mtd-fixed/fs/jffs2/wbuf.c 2005-04-04 16:33:18.667343992 +0400 @@ -873,6 +873,7 @@ int jffs2_flash_read(struct jffs2_sb_inf return c->mtd->read(c->mtd, ofs, len, retlen, buf); /* Read flash */ + down_read(&c->wbuf_sem); if (jffs2_cleanmarker_oob(c)) ret = c->mtd->read_ecc(c->mtd, ofs, len, retlen, buf, NULL, c->oobinfo); else @@ -896,16 +897,11 @@ int jffs2_flash_read(struct jffs2_sb_inf /* if no writebuffer available or write buffer empty, return */ if (!c->wbuf_pagesize || !c->wbuf_len) - return ret;; + goto exit; /* if we read in a different block, return */ if (SECTOR_ADDR(ofs) != SECTOR_ADDR(c->wbuf_ofs)) - return ret; - - /* Lock only if we have reason to believe wbuf contains relevant data, - so that checking an erased block during wbuf recovery space allocation - does not deadlock. */ - down_read(&c->wbuf_sem); + goto exit; if (ofs >= c->wbuf_ofs) { owbf = (ofs - c->wbuf_ofs); /* offset in write buffer */ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2005-04-04 12:58 ` Artem B. Bityuckiy @ 2005-04-04 13:58 ` Estelle HAMMACHE 2005-04-04 14:47 ` Artem B. Bityuckiy 0 siblings, 1 reply; 15+ messages in thread From: Estelle HAMMACHE @ 2005-04-04 13:58 UTC (permalink / raw) To: Artem B. Bityuckiy; +Cc: linux-mtd Hi Artem, I ran the deadlock test with your patch and did not observe any deadlock nor any other problem. > - ret = jffs2_flash_read(c, ofs, readlen, &retlen, ebuf); > + if (!jffs2_is_writebuffered(c) || !jffs2_cleanmarker_oob(c)) > + ret = c->mtd->read(c->mtd, ofs, readlen, &retlen, ebuf); > + else > + ret = c->mtd->read_ecc(c->mtd, ofs, readlen, &retlen, ebuf, NULL, c->oobinfo); > + > if (ret) { > printk(KERN_WARNING "Read of newly-erased block at 0x%08x failed: %d. Putting on bad_list\n", ofs, ret); > goto bad; However I don't use MTD drivers, and I have no ECC implementation so far, so I don't know whether read_ecc could return an error (maybe EBADMSG) when it finds that the page and OOB are all FF. Maybe just calling mtd->read unconditionally is good here ? bye Estelle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: JFFS2 & NAND failure 2005-04-04 13:58 ` Estelle HAMMACHE @ 2005-04-04 14:47 ` Artem B. Bityuckiy 0 siblings, 0 replies; 15+ messages in thread From: Artem B. Bityuckiy @ 2005-04-04 14:47 UTC (permalink / raw) To: Estelle HAMMACHE; +Cc: linux-mtd On Mon, 2005-04-04 at 15:58 +0200, Estelle HAMMACHE wrote: > However I don't use MTD drivers, and I have no ECC > implementation so far, so I don't know whether > read_ecc could return an error (maybe EBADMSG) when > it finds that the page and OOB are all FF. It should be fine. -- Best Regards, Artem B. Bityuckiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-04-04 14:48 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-11-17 17:15 JFFS2 & NAND failure Estelle HAMMACHE 2004-11-18 16:27 ` David Woodhouse 2004-11-18 17:54 ` Estelle HAMMACHE 2004-11-19 13:17 ` David Woodhouse 2004-11-19 16:22 ` Estelle HAMMACHE 2004-11-20 18:57 ` David Woodhouse 2004-11-20 19:19 ` David Woodhouse 2004-11-20 22:13 ` David Woodhouse 2004-12-09 14:57 ` David Woodhouse 2004-12-09 17:06 ` Estelle HAMMACHE 2004-12-15 12:33 ` Estelle HAMMACHE 2005-02-02 16:21 ` Estelle HAMMACHE 2005-04-04 12:58 ` Artem B. Bityuckiy 2005-04-04 13:58 ` Estelle HAMMACHE 2005-04-04 14:47 ` Artem B. Bityuckiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox