public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* cfi_cmd0001.c broken w.r.t Erase Suspend
@ 2006-11-12 16:28 Joakim Tjernlund
  2006-11-13 12:02 ` cfi_cmd0001.c broken w.r.t Erase Suspend solved, patch included Joakim Tjernlund
  0 siblings, 1 reply; 3+ messages in thread
From: Joakim Tjernlund @ 2006-11-12 16:28 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

Hi List

I found some problems that I think is flash driver
related(cfi_cmd0001.c). Somehow the
flash is in Erase suspend when do_erase_oneblock has returned from
INVAL_CACHE_AND_WAIT
I have tried lots of "fixes" but the only one that works is to detect
the Suspend in do_erase_oneblock
and issue a new resume and go back to INVAL_CAHCE_AND_WAIT, see attached
patch.
I am uing Intel P30 flash and 2.6.19-rc3 on powerpc

How to reproduce:
apply patch to kernel

boot to userspace and do:
cp -a /bin /root/tt1
cp -a /bin /root/tt2
cp -a /bin /root/tt3
cp -a /bin /root/tt4
cp -a /bin /root/tt5

Fill up the rest JFFS2 FS to it is full using dd to create an huge file
until the FS is full.
make sure syslog and klogd is running and is logging to file on the
JFFS2 FS.

Now do
1)
rm -rf /root/tt5
cp -a /root/tt1 /root/tt5

wait a few secs to a minute, then go to 1)

a few other observations:

It is very hard to trigger a erase, seems like the gc daemon uses that
as its last resort. Why not
erase blocks as soon they are ready to be erased?

I have seen cp consume 100% CPU sevral times

Also noticed gc sleeping whil cp is running and making no progress.
had to kick gc with a kill -HUP <gc pid> to make it run to make room for
the cp.

The above observations makes me think there is a bug or two in JFFS2 as
well.

 Jocke

[-- Attachment #2: cfi_cmd0001.patch --]
[-- Type: text/x-patch, Size: 2370 bytes --]

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
index 2fd5cef..6f2d2d9 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -751,6 +751,8 @@ static int get_chip(struct map_info *map
 		map_write(map, CMD(0x70), adr);
 		chip->oldstate = FL_ERASING;
 		chip->state = FL_ERASE_SUSPENDING;
+		if (chip->erase_suspended)
+		   printk("chip already suspended");
 		chip->erase_suspended = 1;
 		for (;;) {
 			status = map_read(map, adr);
@@ -775,6 +777,9 @@ static int get_chip(struct map_info *map
 			/* Nobody will touch it while it's in state FL_ERASE_SUSPENDING.
 			   So we can just loop here. */
 		}
+		if (chip->erase_suspended != 1)
+		   printk("SUSPEND: chip not suspended");
+
 		chip->state = FL_STATUS;
 		return 0;
 
@@ -854,10 +859,15 @@ static void put_chip(struct map_info *ma
 		   sending the 0x70 (Read Status) command to an erasing
 		   chip and expecting it to be ignored, that's what we
 		   do. */
+		if (chip->erase_suspended != 1)
+		   printk("RESUME: chip not suspended");
+		chip->erase_suspended = 0;
 		map_write(map, CMD(0xd0), adr);
 		map_write(map, CMD(0x70), adr);
 		chip->oldstate = FL_READY;
 		chip->state = FL_ERASING;
+		if (chip->erase_suspended)
+		   printk("RESUME: chip suspended");
 		break;
 
 	case FL_XIP_WHILE_ERASING:
@@ -1718,7 +1728,7 @@ static int __xipram do_erase_oneblock(st
 	map_write(map, CMD(0xD0), adr);
 	chip->state = FL_ERASING;
 	chip->erase_suspended = 0;
-
+ again:
 	ret = INVAL_CACHE_AND_WAIT(map, chip, adr,
 				   adr, len,
 				   chip->erase_time);
@@ -1734,6 +1744,15 @@ static int __xipram do_erase_oneblock(st
 	map_write(map, CMD(0x70), adr);
 	chip->state = FL_STATUS;
 	status = map_read(map, adr);
+#if 1
+	if (map_word_bitsset(map, status, CMD(0x40))) {
+		printk("resume again:%lx\n", MERGESTATUS(status));
+		chip->state = FL_ERASING;
+		map_write(map, CMD(0x50), adr); /* reset status */
+		map_write(map, CMD(0xd0), adr); /* resume */
+		goto again;
+	}
+#endif
 
 	/* check for errors */
 	if (map_word_bitsset(map, status, CMD(0x3a))) {
@@ -1770,6 +1789,7 @@ static int __xipram do_erase_oneblock(st
 	xip_enable(map, chip, adr);
  out:	put_chip(map, chip, adr);
 	spin_unlock(chip->mutex);
+	printk("ERASE DONE: chip:%p adr:%x, status 0x%lx\n", chip, adr, MERGESTATUS(status));
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: cfi_cmd0001.c broken w.r.t Erase Suspend solved, patch included
  2006-11-12 16:28 cfi_cmd0001.c broken w.r.t Erase Suspend Joakim Tjernlund
@ 2006-11-13 12:02 ` Joakim Tjernlund
  2006-11-16 14:46   ` Artem Bityutskiy
  0 siblings, 1 reply; 3+ messages in thread
From: Joakim Tjernlund @ 2006-11-13 12:02 UTC (permalink / raw)
  Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 4536 bytes --]

Joakim Tjernlund wrote:
> Hi List
>
> I found some problems that I think is flash driver
> related(cfi_cmd0001.c). Somehow the
> flash is in Erase suspend when do_erase_oneblock has returned from
> INVAL_CACHE_AND_WAIT
> I have tried lots of "fixes" but the only one that works is to detect
> the Suspend in do_erase_oneblock
> and issue a new resume and go back to INVAL_CAHCE_AND_WAIT, see attached
> patch.
> I am uing Intel P30 flash and 2.6.19-rc3 on powerpc
>
> How to reproduce:
> apply patch to kernel
>
> boot to userspace and do:
> cp -a /bin /root/tt1
> cp -a /bin /root/tt2
> cp -a /bin /root/tt3
> cp -a /bin /root/tt4
> cp -a /bin /root/tt5
>
> Fill up the rest JFFS2 FS to it is full using dd to create an huge file
> until the FS is full.
> make sure syslog and klogd is running and is logging to file on the
> JFFS2 FS.
>
> Now do
> 1)
> rm -rf /root/tt5
> cp -a /root/tt1 /root/tt5
>
> wait a few secs to a minute, then go to 1)
>
> a few other observations:
>
> It is very hard to trigger a erase, seems like the gc daemon uses that
> as its last resort. Why not
> erase blocks as soon they are ready to be erased?
>
> I have seen cp consume 100% CPU sevral times
>
> Also noticed gc sleeping whil cp is running and making no progress.
> had to kick gc with a kill -HUP <gc pid> to make it run to make room for
> the cp.
>
> The above observations makes me think there is a bug or two in JFFS2 as
> well.
>
>  Jocke
>   
Found the problem, attached patch fixes the bug. I think this bug has
been there for quite some time.

While chasing down this bug I got severe FS corruption due to one write
failure. One write failure should not
corrupt the FS, managed to save this trace from the console(don't got
ksymoops handy, sorry. don't think its needed). This
problem is beyond my skills and I hope someone else can have a look.

Trace follows:

TMCU Flash Map Info: Chip not ready after erase suspended: status = 0x0
Write of 116 bytes at 0x05ea06f8 failed. returned -5, retlen 0
Not marking the space at 0x05ea06f8 as dirty because the flash driver
returned r
etlen zero
SUSPEND1: chip already suspended
resume again:c0, suspend flag:1
ERASE DONE: chip:c7f102e0 adr:6320000, status 0x80
resume again:c0, suspend flag:1
ERASE DONE: chip:c7f102e0 adr:6240000, status 0x80
resume again:c0, suspend flag:1
ERASE DONE: chip:c7f102e0 adr:6220000, status 0x80
Totlen for ref at c7a5b824 (0x05ea06f8-0x05ea076c) miscalculated as 0x0
instead
of 74
next c7a5b830 (0x05ea06f8-0x05ea076c)
jeb->wasted_size 0, dirty_size 0, used_size c, free_size 1f894
Badness in __jffs2_ref_totlen at fs/jffs2/nodelist.c:1219
Call Trace:
[C07F5980] [C0008464]  (unreliable)
[C07F59B0] [C000DEB0]
[C07F59F0] [C000F548]
--- Exception: 700[C07F5AD0] [C00CD924]
[C07F5AF0] [C00D8B3C]
[C07F5B10] [C00D8CEC]
[C07F5B60] [C00D2544]
[C07F5BE0] [C00D5194]
[C07F5C10] [C00D7110]
[C07F5C30] [C00D78BC]
[C07F5C70] [C00D7AFC]
[C07F5D00] [C005B9C8]
[C07F5D20] [C005BA64]
[C07F5D40] [C0072B68]
[C07F5EC0] [C00731F4]
[C07F5EF0] [C01F2880]
[C07F5F10] [C01F29DC]
[C07F5F60] [C01F2E90]
[C07F5F80] [C0003B10]
[C07F5FF0] [C000FDF8]
JFFS2 error: (1) jffs2_link_node_ref: Adding new ref c7a5b830 at
(0x05ea06f8-0x05ea076c) not immediately after previous
(0x05ea06f8-0x05ea076c
)
kernel BUG in jffs2_link_node_ref at fs/jffs2/nodelist.c:1098!
Oops: Exception in kernel mode, sig: 5 [#1]


NIP: C00CD95C LR: C00CD95C CTR: C00F6204
REGS: c07f5a20 TRAP: 0700   Not tainted  (2.6.19-rc3-g61b18413-dirty)
MSR: 00029032 <EE,ME,IR,DR>  CR: 24022022  XER: 00000000
TASK = c07fbb40[1] 'swapper' THREAD: c07f4000
GPR00: C00CD95C C07F5AD0 C07FBB40 00000093 00001A07 FFFFFFFF C010F934
C0215A0C
GPR08: C0210000 C0220000 00000000 C0210000 24022022 10078B2C 0FFFCD00
C03B4398
GPR16: 00000000 00000000 0000000A 05E9FFFF C7A5C735 00000006 00000000
0001CD28
GPR24: C7A5C720 C0233610 C03999C4 C03C5E48 00000074 C075D200 C03999C4
C7A5B830
Call Trace:
[C07F5AD0] [C00CD95C]  (unreliable)
[C07F5AF0] [C00D8B3C]
[C07F5B10] [C00D8CEC]
[C07F5B60] [C00D2544]
[C07F5BE0] [C00D5194]
[C07F5C10] [C00D7110]
[C07F5C30] [C00D78BC]
[C07F5C70] [C00D7AFC]
[C07F5D00] [C005B9C8]
[C07F5D20] [C005BA64]
[C07F5D40] [C0072B68]
[C07F5EC0] [C00731F4]
[C07F5EF0] [C01F2880]
[C07F5F10] [C01F29DC]
[C07F5F60] [C01F2E90]
[C07F5F80] [C0003B10]
[C07F5FF0] [C000FDF8]
Instruction dump:
7fe6fb78 80ff0004 38a55034 812b0004 54e7003a 808200a8 5529003a 7d07e214
7d491a14 3c60c01e 386394d4 4bf4e45d <0fe00000> 4bffff38 801d005c 7c00e214
 <0>Kernel panic - not syncing: Attempted to kill init!
 <0>Rebooting in 180 seconds..


[-- Attachment #2: cfi_cmd0001.patch2 --]
[-- Type: text/plain, Size: 507 bytes --]

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
index 2fd5cef..6c7cf95 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -1105,7 +1105,7 @@ static int inval_cache_and_wait_for_oper
 		}
 		spin_lock(chip->mutex);
 
-		if (chip->state != chip_state) {
+		while (chip->state != chip_state) {
 			/* Someone's suspended the operation: sleep */
 			DECLARE_WAITQUEUE(wait, current);
 			set_current_state(TASK_UNINTERRUPTIBLE);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: cfi_cmd0001.c broken w.r.t Erase Suspend solved, patch included
  2006-11-13 12:02 ` cfi_cmd0001.c broken w.r.t Erase Suspend solved, patch included Joakim Tjernlund
@ 2006-11-16 14:46   ` Artem Bityutskiy
  0 siblings, 0 replies; 3+ messages in thread
From: Artem Bityutskiy @ 2006-11-16 14:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linux-mtd

On Mon, 2006-11-13 at 13:02 +0100, Joakim Tjernlund wrote:
> JFFS2 error: (1) jffs2_link_node_ref: Adding new ref c7a5b830 at
> (0x05ea06f8-0x05ea076c) not immediately after previous
> (0x05ea06f8-0x05ea076c
> )
> kernel BUG in jffs2_link_node_ref at fs/jffs2/nodelist.c:1098!
> Oops: Exception in kernel mode, sig: 5 [#1]

I is difficult to help without being able to reproduce this. I would
suggest you to add prints at jffs2_link_node_ref() and realize what's
going on.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-11-16 15:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-12 16:28 cfi_cmd0001.c broken w.r.t Erase Suspend Joakim Tjernlund
2006-11-13 12:02 ` cfi_cmd0001.c broken w.r.t Erase Suspend solved, patch included Joakim Tjernlund
2006-11-16 14:46   ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox