[PATCH] mtd: cfi: Wait for Block Erase operation to finish

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH] mtd: cfi: Wait for Block Erase operation to finish
@ 2012-02-28 15:32 Paul Parsons
  2012-02-28 21:25 ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-02-28 15:32 UTC (permalink / raw)
  To: linux-mtd; +Cc: dwmw2, philipp.zabel

If an Erase Suspend Command (0xb0) is issued while a Block Erase operation
(0x20, 0xd0) is in progress, Status Register bit 6 (SR.6) will be set to 1.
After an Erase Resume Command (0xd0) is issued, SR.6 will be set back to 0.

Unfortunately when inval_cache_and_wait_for_operation() is called to wait for
a Block Erase operation to finish, it never checks that SR.6 = 0 before
returning. Consequently it can (and does) return while a Block Erase operation
is still suspended, resulting in random breakage.

This patch ensures that when inval_cache_and_wait_for_operation() is called to
wait for a Block Erase operation to finish (chip->state == FL_ERASING), it does
not return until both SR.7 = 1 (as before) and SR.6 = 0.

Signed-off-by: Paul Parsons <lost.distance@yahoo.com>
---

I found this after switching my HP iPAQ hx4700 from using jffs2 to ubifs. It
then consistently reported "block erase error: (bad VPP)" errors. Intriguingly,
the bootloader never reporting erase or programming failures; it turned out that
the bootloader never issued Erase Suspend Commands.

Should SR.6 = 0 also be checked after an Erase Suspend Command (0xb0) is issued?
This should preclude an unnecessary (and perhaps unpredictable) subsequent Erase
Resume Command (0xd0).

--- clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c	2012-02-25 20:18:16.000000000 +0000
+++ linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c	2012-02-28 03:13:30.521537614 +0000
@@ -1211,6 +1211,7 @@ static int inval_cache_and_wait_for_oper
 {
 	struct cfi_private *cfi = map->fldrv_priv;
 	map_word status, status_OK = CMD(0x80);
+	map_word status_76 = (chip->state == FL_ERASING) ? CMD(0xc0) : CMD(0x80);
 	int chip_state = chip->state;
 	unsigned int timeo, sleep_time, reset_timeo;

@@ -1239,7 +1240,7 @@ static int inval_cache_and_wait_for_oper
 		}

 		status = map_read(map, cmd_adr);
-		if (map_word_andequal(map, status, status_OK, status_OK))
+		if (map_word_andequal(map, status, status_76, status_OK))
 			break;

 		if (chip->erase_suspended && chip_state == FL_ERASING)  {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-28 15:32 [PATCH] mtd: cfi: Wait for Block Erase operation to finish Paul Parsons
@ 2012-02-28 21:25 ` Joakim Tjernlund
  2012-02-29  0:23   ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-02-28 21:25 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

>
> If an Erase Suspend Command (0xb0) is issued while a Block Erase operation
> (0x20, 0xd0) is in progress, Status Register bit 6 (SR.6) will be set to 1.
> After an Erase Resume Command (0xd0) is issued, SR.6 will be set back to 0.
>
> Unfortunately when inval_cache_and_wait_for_operation() is called to wait for
> a Block Erase operation to finish, it never checks that SR.6 = 0 before
> returning. Consequently it can (and does) return while a Block Erase operation
> is still suspended, resulting in random breakage.
>
> This patch ensures that when inval_cache_and_wait_for_operation() is called to
> wait for a Block Erase operation to finish (chip->state == FL_ERASING), it does
> not return until both SR.7 = 1 (as before) and SR.6 = 0.

Interesting but I do wonder, should not the
  if (chip->state != chip_state)
test just before the status test prevent this in the first place?

>
> Signed-off-by: Paul Parsons <lost.distance@yahoo.com>
> ---
>
> I found this after switching my HP iPAQ hx4700 from using jffs2 to ubifs. It
> then consistently reported "block erase error: (bad VPP)" errors. Intriguingly,
> the bootloader never reporting erase or programming failures; it turned out that
> the bootloader never issued Erase Suspend Commands.
>
> Should SR.6 = 0 also be checked after an Erase Suspend Command (0xb0) is issued?
> This should preclude an unnecessary (and perhaps unpredictable) subsequent Erase
> Resume Command (0xd0).
>
> --- clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c   2012-02-25 20:18:16.000000000 +0000
> +++ linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c   2012-02-28 03:13:30.521537614 +0000
> @@ -1211,6 +1211,7 @@ static int inval_cache_and_wait_for_oper
>  {
>     struct cfi_private *cfi = map->fldrv_priv;
>     map_word status, status_OK = CMD(0x80);
> +   map_word status_76 = (chip->state == FL_ERASING) ? CMD(0xc0) : CMD(0x80);

hmm, I wonder if you can get by without the chip->state test? Just always set it to 0xc0?

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-28 21:25 ` Joakim Tjernlund
@ 2012-02-29  0:23   ` Paul Parsons
  2012-02-29  8:58     ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-02-29  0:23 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

> > If an Erase Suspend Command (0xb0) is issued while a
> Block Erase operation
> > (0x20, 0xd0) is in progress, Status Register bit 6
> (SR.6) will be set to 1.
> > After an Erase Resume Command (0xd0) is issued, SR.6
> will be set back to 0.
> >
> > Unfortunately when inval_cache_and_wait_for_operation()
> is called to wait for
> > a Block Erase operation to finish, it never checks that
> SR.6 = 0 before
> > returning. Consequently it can (and does) return while
> a Block Erase operation
> > is still suspended, resulting in random breakage.
> >
> > This patch ensures that when
> inval_cache_and_wait_for_operation() is called to
> > wait for a Block Erase operation to finish
> (chip->state == FL_ERASING), it does
> > not return until both SR.7 = 1 (as before) and SR.6 =
> 0.
> 
> Interesting but I do wonder, should not the
>   if (chip->state != chip_state)
> test just before the status test prevent this in the first
> place?

I'm not sure I understand. The (chip->state != chip_state) test forces
inval_cache_and_wait_for_operation() to stop polling and sleep, pending
the operation which suspended it. After it wakes it resumes polling. The
test doesn't prevent inval_cache_and_wait_for_operation() returning
prematurely, i.e. before the Block Erase has finished.

An important point which I suspect has been overlooked is that both SR.7
and SR.6 are set when a Block Erase is suspended. Thus testing SR.7 only
is not sufficient to determine that a Block Erase has completed.

> > +   map_word status_76 = (chip->state
> == FL_ERASING) ? CMD(0xc0) : CMD(0x80);
> 
> hmm, I wonder if you can get by without the chip->state
> test? Just always set it to 0xc0?

No, unfortunately not. For example a write buffer sequence will finish
whether or not a Block Erase is currently suspended; thus it must test
SR.7 only, not SR.6.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29  0:23   ` Paul Parsons
@ 2012-02-29  8:58     ` Joakim Tjernlund
  2012-02-29 17:22       ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-02-29  8:58 UTC (permalink / raw)
  To: Paul Parsons, lost.distance; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/02/29 01:23:21:
> From: Paul Parsons <lost.distance@yahoo.com>
>
> > > If an Erase Suspend Command (0xb0) is issued while a
> > Block Erase operation
> > > (0x20, 0xd0) is in progress, Status Register bit 6
> > (SR.6) will be set to 1.
> > > After an Erase Resume Command (0xd0) is issued, SR.6
> > will be set back to 0.
> > >
> > > Unfortunately when inval_cache_and_wait_for_operation()
> > is called to wait for
> > > a Block Erase operation to finish, it never checks that
> > SR.6 = 0 before
> > > returning. Consequently it can (and does) return while
> > a Block Erase operation
> > > is still suspended, resulting in random breakage.
> > >
> > > This patch ensures that when
> > inval_cache_and_wait_for_operation() is called to
> > > wait for a Block Erase operation to finish
> > (chip->state == FL_ERASING), it does
> > > not return until both SR.7 = 1 (as before) and SR.6 =
> > 0.
> >
> > Interesting but I do wonder, should not the
> >   if (chip->state != chip_state)
> > test just before the status test prevent this in the first
> > place?
>
> I'm not sure I understand. The (chip->state != chip_state) test forces
> inval_cache_and_wait_for_operation() to stop polling and sleep, pending
> the operation which suspended it. After it wakes it resumes polling. The
> test doesn't prevent inval_cache_and_wait_for_operation() returning
> prematurely, i.e. before the Block Erase has finished.

Been to long since I been in this code I guess, but I still don't get it.

If erase is suspended, chip->state is changed which will keep the function in
the for(;;) loop. Once erase has been resumed again chip->state will change back.
It seems to me that chip->state and SR.6 are mutally exclusive? I can not see how
you can get to
	status = map_read(map, cmd_adr);
	if (map_word_andequal(map, status, status_OK, status_OK))
if the the erase has been suspended.

>
> An important point which I suspect has been overlooked is that both SR.7
> and SR.6 are set when a Block Erase is suspended. Thus testing SR.7 only
> is not sufficient to determine that a Block Erase has completed.
>
> > > +   map_word status_76 = (chip->state
> > == FL_ERASING) ? CMD(0xc0) : CMD(0x80);
> >
> > hmm, I wonder if you can get by without the chip->state
> > test? Just always set it to 0xc0?
>
> No, unfortunately not. For example a write buffer sequence will finish
> whether or not a Block Erase is currently suspended; thus it must test
> SR.7 only, not SR.6.

I see, thanks.

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29  8:58     ` Joakim Tjernlund
@ 2012-02-29 17:22       ` Paul Parsons
  2012-02-29 18:03         ` Artem Bityutskiy
  2012-02-29 23:35         ` Joakim Tjernlund
  0 siblings, 2 replies; 26+ messages in thread
From: Paul Parsons @ 2012-02-29 17:22 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Wed, 29/2/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> If erase is suspended, chip->state is changed which will
> keep the function in
> the for(;;) loop. Once erase has been resumed again
> chip->state will change back.
> It seems to me that chip->state and SR.6 are mutally
> exclusive? I can not see how
> you can get to
>     status = map_read(map, cmd_adr);
>     if (map_word_andequal(map, status,
> status_OK, status_OK))
> if the the erase has been suspended.

OK, I understand. Let me demonstrate.

I Added the following debug to the original linux-3.3-rc5
drivers/mtd/chips/cfi_cmdset_0001.c file (i.e. without my patch):

--------

797a798
> printk("%d: CRDY: about to Erase Suspend adr=0x%08x\n", current->pid, adr);
958a960
> printk("%d: PUTC: enter\n", current->pid);
974a977
> printk("%d: PUTC: about to wake_up (a)\n", current->pid);
988a992
> printk("%d: PUTC: about to wake_up (b)\n", current->pid);
1005a1010
> printk("%d: PUTC: about to Erase Resume adr=0x%08x\n", current->pid, adr);
1025a1031
> printk("%d: PUTC: about to wake_up (c)\n", current->pid);
1216a1223
> printk("%d: WAIT: call-state=%d, cmd_adr=0x%08x\n", current->pid, chip_state, cmd_adr);
1234a1242
> printk("%d: WAIT: about to suspend\n", current->pid);
1235a1244
> printk("%d: WAIT: resumed\n", current->pid);
1241a1251
> printk("%d: WAIT: status=0x%08x\n", current->pid, status.x[0]);
1257a1268
> printk("%d: WAIT: return -ETIME\n", current->pid);
1262a1274
> printk("%d: WAIT: about to sleep: sleep_time=%d\n", current->pid, sleep_time);
1281a1294
> printk("%d: WAIT: return 0\n", current->pid);
1891a1905
> printk("%d: ERAS: adr=0x%08x\n", current->pid, adr);

--------

I rebooted my hx4700, saved the dmesg output, and edited and annotated
what I feel to be the important messages:

--------

[    2.224667] 30: WAIT: status=0x00a000a0
[    2.224693] 30: WAIT: return 0

PID 30 has just failed FL_ERASING of block 0x00440000 for the 3rd time:
erase has failed (SR.5 = 1).

[    2.224719] block erase failed at 0x00440000: status 0xa000a0. Retrying...

Confirmation of erase fail from elsewhere.

[    2.224779] 30: ERAS: adr=0x00440000
[    2.224798] 30: WAIT: call-state=4, cmd_adr=0x00440000
[    2.224814] 30: WAIT: status=0x00000000
[    2.224830] 30: WAIT: about to sleep: sleep_time=512000

PID 30 is now attempting FL_ERASING of block 0x00440000 for the 4th time.

[    2.492129] 76: CRDY: about to Erase Suspend adr=0x0060f748
[    2.492216] 76: PUTC: enter
[    2.492233] 76: PUTC: about to Erase Resume adr=0x0060f748

PID 76 has issued Erase Suspend and Erase Resume (the address is irrelevant).
PID 30 is still sleeping at this point.

[    2.785235] 30: WAIT: status=0x00e800e8
[    2.785250] 30: WAIT: return 0

PID 30 has woken up to find:
bad F-VPP (SR.3 = 1),
erase has failed (SR.5 = 1),
Erase Suspend is in effect (SR.6 = 1).

[    2.785276] physmap-flash: block erase error: (bad VPP)

Confirmation of bad VPP from elsewhere.

[    2.863384] 30: ERAS: adr=0x00440000
[    2.863412] 30: WAIT: call-state=4, cmd_adr=0x00440000
[    2.863431] 30: WAIT: status=0x00c000c0
[    2.863444] 30: WAIT: return 0

PID 30 is now attempting FL_ERASING of block 0x00440000 for the 5th time:
Erase Suspend is in effect (SR.6 = 1),
but no errors, so the erase seemingly succeeded.

[    2.863543] 30: WAIT: call-state=8, cmd_adr=0x00440000
[    2.863562] 30: WAIT: status=0x00c000c0
[    2.863574] 30: WAIT: return 0

PID 30 is now attempting FL_WRITING_TO_BUFFER:
Erase Suspend is in effect (SR.6 = 1),
but no errors, so the FL_WRITING_TO_BUFFER seemingly succeeded.

[    2.863599] 30: WAIT: call-state=7, cmd_adr=0x00440000
[    2.863615] 30: WAIT: status=0x00d000d0
[    2.863628] 30: WAIT: return 0

PID 30 is now attempting FL_WRITING to block 0x00440000:
Erase Suspend is in effect (SR.6 = 1),
program has failed (SR.3 = 1).

[    2.863649] physmap-flash: buffer write error (status 0xd000d0)

Confirmation of write error from elsewhere.

[    2.863703] UBI error: ubi_io_write: error -22 while writing 64 bytes to PEB 262:0, written 0 bytes
[    2.863728] UBI error: erase_worker: failed to erase PEB 262, error -22
[    2.863750] UBI warning: ubi_ro_mode: switch to read-only mode
[    2.863770] UBI error: do_work: work failed with error code -22
[    2.863791] UBI error: ubi_thread: ubi_bgt0d: work failed with error code -22

Bye bye UBI.

--------

First we note that the WAIT (inval_cache_and_wait_for_operation) function
never suspends itself; i.e. it never prints "WAIT: about to suspend".

The message at:
[    2.863431] 30: WAIT: status=0x00c000c0
shows that WAIT found status_OK (SR.7) set and Erase Suspend in effect.

Now things get complicated.

I repeated with exercise with my patch applied. This time the WAIT
function looped forever trying to erase the same block (0x00440000)
because the status never changed from 0x00e800e8 (rather like 2.785235
above, but with my patch now applied the function cannot return).

Disheartened, I thought my patch had only succeeded in livelocking the
function because of a hardware error.

However, remembering our version of the Heisenberg uncertainty principle -
that adding debug sometimes changes the behaviour - I stripped out almost
all of the debug and repeated the exercise. This time block 0x00440000
erased successfully without errors.

Where do we go from here?

The above debug demonstrates that inval_cache_and_wait_for_operation()
can return while Erase Suspend is in effect. My patch prevents that, and
fixes UBI for me.

I am open to the suggestion that my hx4700 exhibits random hardware
failures. This would explain the results. But I have a nagging doubt about
that because my bootloader has never reported erase errors (apart from once
when I broke its erase block size table). As I mentioned previously, the
bootloader never suspends erase operations.

I am open to the suggestion that my patch merely fixes a symptom rather
than an underlying cause.

So perhaps the MTD driver contains a race condition. The stripping out of
debug to fix the erase error tends to support that explanation.

I don't know what else to say.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29 17:22       ` Paul Parsons
@ 2012-02-29 18:03         ` Artem Bityutskiy
  2012-02-29 22:26           ` Paul Parsons
  2012-02-29 23:35         ` Joakim Tjernlund
  1 sibling, 1 reply; 26+ messages in thread
From: Artem Bityutskiy @ 2012-02-29 18:03 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, Joakim Tjernlund, philipp.zabel

[-- Attachment #1: Type: text/plain, Size: 92 bytes --]

Did you try to validate your setup with mtd tests?

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29 18:03         ` Artem Bityutskiy
@ 2012-02-29 22:26           ` Paul Parsons
  2012-03-09 10:39             ` Artem Bityutskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-02-29 22:26 UTC (permalink / raw)
  To: dedekind1; +Cc: dwmw2, linux-mtd, Joakim Tjernlund, philipp.zabel

--- On Wed, 29/2/12, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> Did you try to validate your setup
> with mtd tests?

No I did not.

I've just quickly tried the readtest on all mtd partitions and they seemed
to pass. A torturetest on a small unused partition reported "retried 3
times, still errors, give-up", but a stresstest on the same partition passed.

Was there a specific test and set of options I should try?
The hx4700 has NOR flash (Intel StrataFlash) so it looks like at least 4
of the tests - pagetest, oobtest, subpagetest, nandecctest - are not
appropriate.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29 17:22       ` Paul Parsons
  2012-02-29 18:03         ` Artem Bityutskiy
@ 2012-02-29 23:35         ` Joakim Tjernlund
  2012-03-01 14:57           ` Paul Parsons
  1 sibling, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-02-29 23:35 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/02/29 18:22:39:
>
> --- On Wed, 29/2/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > If erase is suspended, chip->state is changed which will
> > keep the function in
> > the for(;;) loop. Once erase has been resumed again
> > chip->state will change back.
> > It seems to me that chip->state and SR.6 are mutally
> > exclusive? I can not see how
> > you can get to
> >     status = map_read(map, cmd_adr);
> >     if (map_word_andequal(map, status,
> > status_OK, status_OK))
> > if the the erase has been suspended.
>
> OK, I understand. Let me demonstrate.
>
> I Added the following debug to the original linux-3.3-rc5
> drivers/mtd/chips/cfi_cmdset_0001.c file (i.e. without my patch):

This looks like a erase resume problem, cfi driver issues a resume command but the
chip fails or is slow to respond. I remember someone having a similar problem
some time ago. Don't think we got to bottom of it though, look for subject:
  Numonyx NOR and chip->mutex bug?

Try adding extra read status cmds around resume, check if SR.6 is cleared too.
Enable all driver debug code and look for the first sign of trouble.

That said, there seems to be something fishy going on with UBI as it is
erasing the same failing block over and over again.
Also, it would probably be good to add test of SR.6 in some places, just
to make sure the flash does what the driver wants it to do.

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29 23:35         ` Joakim Tjernlund
@ 2012-03-01 14:57           ` Paul Parsons
  2012-03-01 15:59             ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-01 14:57 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Wed, 29/2/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> This looks like a erase resume problem, cfi driver issues a
> resume command but the
> chip fails or is slow to respond. I remember someone having
> a similar problem
> some time ago. Don't think we got to bottom of it though,
> look for subject:
>   Numonyx NOR and chip->mutex bug?

Yes, that Numonyx NOR problem seems to be the same as mine.

> Try adding extra read status cmds around resume, check if
> SR.6 is cleared too.

OK, I think I've made some progress here.

The status transitions around the Erase Resume are as follows:

[   58.702774] 108: PUTC: 0: status=0x00c000c0 // Before CMD(0xd0)
[   58.702792] 108: PUTC: 1: status=0x00400040 // After CMD(0xd0),CMD(0x70)
[   58.702808] 108: PUTC: 2: status=0x00000000 // + cfi_udelay(1)

So SR.7 transitions from 1 to 0 after CMD(0xd0), and some time later SR.6
transitions from 1 to 0. So the effects of Erase Resume are not immediate.
Thus it seemed a good idea to wait for these bits to settle:

diff -upr clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c
--- clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c	2012-02-25 20:18:16.000000000 +0000
+++ linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c	2012-03-01 13:44:49.529371920 +0000
@@ -956,6 +956,7 @@ static int get_chip(struct map_info *map
 static void put_chip(struct map_info *map, struct flchip *chip, unsigned long adr)
 {
 	struct cfi_private *cfi = map->fldrv_priv;
+	map_word status, status_40 = CMD(0x40), status_00 = CMD(0x00);

 	if (chip->priv) {
 		struct flchip_shared *shared = chip->priv;
@@ -1006,6 +1007,15 @@ static void put_chip(struct map_info *ma
 		map_write(map, CMD(0xd0), adr);
 		map_write(map, CMD(0x70), adr);
 		chip->oldstate = FL_READY;
+		chip->state = FL_ERASE_RESUMING;
+		for (;;) {
+			status = map_read(map, adr);
+			if (map_word_andequal(map, status, status_40, status_00))
+			        break;
+			mutex_unlock(&chip->mutex);
+			cfi_udelay(1);
+			mutex_lock(&chip->mutex);
+		}
 		chip->state = FL_ERASING;
 		break;

diff -upr clean-3.3-rc5/include/linux/mtd/flashchip.h linux-3.3-rc5/include/linux/mtd/flashchip.h
--- clean-3.3-rc5/include/linux/mtd/flashchip.h	2012-02-25 20:18:16.000000000 +0000
+++ linux-3.3-rc5/include/linux/mtd/flashchip.h	2012-03-01 13:41:48.288232532 +0000
@@ -36,6 +36,7 @@ typedef enum {
 	FL_ERASING,
 	FL_ERASE_SUSPENDING,
 	FL_ERASE_SUSPENDED,
+	FL_ERASE_RESUMING,
 	FL_WRITING,
 	FL_WRITING_TO_BUFFER,
 	FL_OTP_WRITE,

This patch adds a wait after Erase Resume, just like the existing one after
Erase Suspend (but without the timeout). I added the FL_ERASE_RESUMING
state to perform the same function as the FL_ERASE_SUSPENDING state: to
stop anyone else touching it.

When I tried this patch in place of my previous one I found it too fixed
my problem. That's not a surprise because it does the same thing, it waits
for SR.6 to clear, but at an earlier point. Unlike my previous patch,
this patch leaves SR.7 and SR.6 in a consistent state before put_chip()
returns (and indeed before the call to wake_up(&chip->wq) at the bottom of
put_chip()).

What I don't know is why not waiting for SR.7 and SR.6 to settle after
Erase Resume can then break UBI, in particular provoking the
"block erase error: (bad VPP)" messages I was seeing earlier.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 14:57           ` Paul Parsons
@ 2012-03-01 15:59             ` Joakim Tjernlund
  0 siblings, 0 replies; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-01 15:59 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/03/01 15:57:14:
>
> --- On Wed, 29/2/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > This looks like a erase resume problem, cfi driver issues a
> > resume command but the
> > chip fails or is slow to respond. I remember someone having
> > a similar problem
> > some time ago. Don't think we got to bottom of it though,
> > look for subject:
> >   Numonyx NOR and chip->mutex bug?
>
> Yes, that Numonyx NOR problem seems to be the same as mine.
>
> > Try adding extra read status cmds around resume, check if
> > SR.6 is cleared too.
>
> OK, I think I've made some progress here.
>
> The status transitions around the Erase Resume are as follows:
>
> [   58.702774] 108: PUTC: 0: status=0x00c000c0 // Before CMD(0xd0)
> [   58.702792] 108: PUTC: 1: status=0x00400040 // After CMD(0xd0),CMD(0x70)
> [   58.702808] 108: PUTC: 2: status=0x00000000 // + cfi_udelay(1)

hmm, this is not ideal. The status bits are only valid if SR.7 is set if I remember correctly?
Maybe SR.6 is an exception?
So reading SR.6 and waiting for it to clear may not work on other chips?
Is there some resume to erase time in the specs?


>
> So SR.7 transitions from 1 to 0 after CMD(0xd0), and some time later SR.6
> transitions from 1 to 0. So the effects of Erase Resume are not immediate.
> Thus it seemed a good idea to wait for these bits to settle:
>
> diff -upr clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c
> --- clean-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c   2012-02-25 20:18:16.000000000 +0000
> +++ linux-3.3-rc5/drivers/mtd/chips/cfi_cmdset_0001.c   2012-03-01 13:44:49.529371920 +0000
> @@ -956,6 +956,7 @@ static int get_chip(struct map_info *map
>  static void put_chip(struct map_info *map, struct flchip *chip, unsigned long adr)
>  {
>     struct cfi_private *cfi = map->fldrv_priv;
> +   map_word status, status_40 = CMD(0x40), status_00 = CMD(0x00);
>
>     if (chip->priv) {
>        struct flchip_shared *shared = chip->priv;
> @@ -1006,6 +1007,15 @@ static void put_chip(struct map_info *ma
>        map_write(map, CMD(0xd0), adr);
>        map_write(map, CMD(0x70), adr);
>        chip->oldstate = FL_READY;
> +      chip->state = FL_ERASE_RESUMING;
> +      for (;;) {
> +         status = map_read(map, adr);
> +         if (map_word_andequal(map, status, status_40, status_00))
> +                 break;
> +         mutex_unlock(&chip->mutex);
> +         cfi_udelay(1);
> +         mutex_lock(&chip->mutex);
> +      }
>        chip->state = FL_ERASING;
>        break;
>
> diff -upr clean-3.3-rc5/include/linux/mtd/flashchip.h linux-3.3-rc5/include/linux/mtd/flashchip.h
> --- clean-3.3-rc5/include/linux/mtd/flashchip.h   2012-02-25 20:18:16.000000000 +0000
> +++ linux-3.3-rc5/include/linux/mtd/flashchip.h   2012-03-01 13:41:48.288232532 +0000
> @@ -36,6 +36,7 @@ typedef enum {
>     FL_ERASING,
>     FL_ERASE_SUSPENDING,
>     FL_ERASE_SUSPENDED,
> +   FL_ERASE_RESUMING,
>     FL_WRITING,
>     FL_WRITING_TO_BUFFER,
>     FL_OTP_WRITE,
>
> This patch adds a wait after Erase Resume, just like the existing one after
> Erase Suspend (but without the timeout). I added the FL_ERASE_RESUMING
> state to perform the same function as the FL_ERASE_SUSPENDING state: to
> stop anyone else touching it.
>
> When I tried this patch in place of my previous one I found it too fixed
> my problem. That's not a surprise because it does the same thing, it waits
> for SR.6 to clear, but at an earlier point. Unlike my previous patch,
> this patch leaves SR.7 and SR.6 in a consistent state before put_chip()
> returns (and indeed before the call to wake_up(&chip->wq) at the bottom of
> put_chip()).
>
> What I don't know is why not waiting for SR.7 and SR.6 to settle after
> Erase Resume can then break UBI, in particular provoking the
> "block erase error: (bad VPP)" messages I was seeing earlier.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
@ 2012-03-01 17:22 Paul Parsons
  2012-03-01 17:38 ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-01 17:22 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Thu, 1/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > OK, I think I've made some progress here.
> >
> > The status transitions around the Erase Resume are as
> follows:
> >
> > [   58.702774] 108: PUTC: 0:
> status=0x00c000c0 // Before CMD(0xd0)
> > [   58.702792] 108: PUTC: 1:
> status=0x00400040 // After CMD(0xd0),CMD(0x70)
> > [   58.702808] 108: PUTC: 2:
> status=0x00000000 // + cfi_udelay(1)
> 
> hmm, this is not ideal. The status bits are only valid if
> SR.7 is set if I remember correctly?
> Maybe SR.6 is an exception?
> So reading SR.6 and waiting for it to clear may not work on
> other chips?
> Is there some resume to erase time in the specs?

Yes the spec I have (Intel StrataFlash, 253854-003) says SR[6:1] are valid
only if SR.7=1. It also says (section 13.4) "The Erase Resume command
instructs the corresponding segment to continue erasing, and automatically
clears status register bits SR[7:6]". The flowchart makes no mention of
needing to poll the status register after issuing an Erase Resume command.
The spec makes no mention of Erase Resume time.

It seems to me that waiting for SR[7:6]=00 instead of just SR.6=0 would:
1. Have the same outcome.
2. Be strictly within spec; SR[7:6] have been cleared therefore the Erase
Resume command has been accepted.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 17:22 Paul Parsons
@ 2012-03-01 17:38 ` Paul Parsons
  2012-03-01 17:53   ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-01 17:38 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Thu, 1/3/12, Paul Parsons <lost.distance@yahoo.com> wrote:
> It seems to me that waiting for SR[7:6]=00 instead of just
> SR.6=0 would:
> 1. Have the same outcome.
> 2. Be strictly within spec; SR[7:6] have been cleared
> therefore the Erase
> Resume command has been accepted.

Actually on second thoughts maybe not: if the erase has actually completed
when we issue an Erase Resume command then SR.7 should read back as 1 and we
end up looping forever. hmm...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 17:38 ` Paul Parsons
@ 2012-03-01 17:53   ` Paul Parsons
  2012-03-01 18:03     ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-01 17:53 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Thu, 1/3/12, Paul Parsons <lost.distance@yahoo.com> wrote:
> From: Paul Parsons <lost.distance@yahoo.com>
> Subject: Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
> To: "Joakim Tjernlund" <joakim.tjernlund@transmode.se>
> Cc: linux-mtd@lists.infradead.org, dwmw2@infradead.org, philipp.zabel@gmail.com
> Date: Thursday, 1 March, 2012, 17:38
> --- On Thu, 1/3/12, Paul Parsons
> <lost.distance@yahoo.com>
> wrote:
> > It seems to me that waiting for SR[7:6]=00 instead of
> just
> > SR.6=0 would:
> > 1. Have the same outcome.
> > 2. Be strictly within spec; SR[7:6] have been cleared
> > therefore the Erase
> > Resume command has been accepted.
> 
> Actually on second thoughts maybe not: if the erase has
> actually completed
> when we issue an Erase Resume command then SR.7 should read
> back as 1 and we
> end up looping forever. hmm...

OK, how about this:

1. Erase Suspend now checks SR.6, exactly as it *should* according to the
spec flowchart.

2. If SR.6=0 then the erase has already completed, and there is no need to set chip->erase_suspended or subsequently issue an Erase Resume command.

3. If an Erase Resume command is needed then we know that the erase has
not completed, therefore it should be safe to wait for SR[7:6]=00.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 17:53   ` Paul Parsons
@ 2012-03-01 18:03     ` Joakim Tjernlund
  2012-03-01 18:50       ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-01 18:03 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/03/01 18:53:54:
>
> --- On Thu, 1/3/12, Paul Parsons <lost.distance@yahoo.com> wrote:
> > From: Paul Parsons <lost.distance@yahoo.com>
> > Subject: Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
> > To: "Joakim Tjernlund" <joakim.tjernlund@transmode.se>
> > Cc: linux-mtd@lists.infradead.org, dwmw2@infradead.org, philipp.zabel@gmail.com
> > Date: Thursday, 1 March, 2012, 17:38
> > --- On Thu, 1/3/12, Paul Parsons
> > <lost.distance@yahoo.com>
> > wrote:
> > > It seems to me that waiting for SR[7:6]=00 instead of
> > just
> > > SR.6=0 would:
> > > 1. Have the same outcome.
> > > 2. Be strictly within spec; SR[7:6] have been cleared
> > > therefore the Erase
> > > Resume command has been accepted.
> >
> > Actually on second thoughts maybe not: if the erase has
> > actually completed
> > when we issue an Erase Resume command then SR.7 should read
> > back as 1 and we
> > end up looping forever. hmm...
>
> OK, how about this:
>
> 1. Erase Suspend now checks SR.6, exactly as it *should* according to the
> spec flowchart.
>
> 2. If SR.6=0 then the erase has already completed, and there is no need to set chip->erase_suspended or subsequently issue an Erase Resume command.

Wont work when you have interleaved chips, there one chip may complete and the other not.
It would be an optimization though, if all chips complete then skip suspend state.

>
> 3. If an Erase Resume command is needed then we know that the erase has
> not completed, therefore it should be safe to wait for SR[7:6]=00.

Just wait for SR.6 to clear might work.

Perhaps adding 1 or 2 extra throw away status reads to create a small delay instead?

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 18:03     ` Joakim Tjernlund
@ 2012-03-01 18:50       ` Paul Parsons
  2012-03-02 12:39         ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-01 18:50 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linux-mtd, dwmw2, philipp.zabel

--- On Thu, 1/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > > > It seems to me that waiting for SR[7:6]=00
> instead of
> > > just
> > > > SR.6=0 would:
> > > > 1. Have the same outcome.
> > > > 2. Be strictly within spec; SR[7:6] have been
> cleared
> > > > therefore the Erase
> > > > Resume command has been accepted.
> > >
> > > Actually on second thoughts maybe not: if the
> erase has
> > > actually completed
> > > when we issue an Erase Resume command then SR.7
> should read
> > > back as 1 and we
> > > end up looping forever. hmm...
> >
> > OK, how about this:
> >
> > 1. Erase Suspend now checks SR.6, exactly as it
> *should* according to the
> > spec flowchart.
> >
> > 2. If SR.6=0 then the erase has already completed, and
> there is no need to set chip->erase_suspended or
> subsequently issue an Erase Resume command.
> 
> Wont work when you have interleaved chips, there one chip
> may complete and the other not.

OK, I see, I think: one chip might have finished erasing by the time we
issue an Erase Suspend, but the other might not. Yikes, they don't make it
easy for us.

> It would be an optimization though, if all chips complete
> then skip suspend state.

And it might just preclude other problems elsewhere. Issuing an unnecessary
Erase Resume does seem risky, given that the command code - 0xd0 - is so
overloaded:
1. 3rd cycle of Buffered Program.
2. 2nd cycle of Buffered Enhanced Factory Program.
3. 2nd cycle of Block Erase.
4. 1st cycle of Program Resume.
5. 1st cycle of Erase Resume.
6. 2nd cycle of Unlock Block.
That said, I presume the state machine already prevents any ambiguity.

> > 3. If an Erase Resume command is needed then we know
> that the erase has
> > not completed, therefore it should be safe to wait for
> SR[7:6]=00.
> 
> Just wait for SR.6 to clear might work.

That is what my 2nd patch does. Should I just submit that as V2?

> Perhaps adding 1 or 2 extra throw away status reads to
> create a small delay instead?

How much delay is enough for which platform and which chips? I would
prefer just waiting for SR.6 to clear if that works.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-01 18:50       ` Paul Parsons
@ 2012-03-02 12:39         ` Joakim Tjernlund
  2012-03-02 14:06           ` Paul Parsons
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-02 12:39 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/03/01 19:50:29:
>
> --- On Thu, 1/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > > > > It seems to me that waiting for SR[7:6]=00
> > instead of
> > > > just
> > > > > SR.6=0 would:
> > > > > 1. Have the same outcome.
> > > > > 2. Be strictly within spec; SR[7:6] have been
> > cleared
> > > > > therefore the Erase
> > > > > Resume command has been accepted.
> > > >
> > > > Actually on second thoughts maybe not: if the
> > erase has
> > > > actually completed
> > > > when we issue an Erase Resume command then SR.7
> > should read
> > > > back as 1 and we
> > > > end up looping forever. hmm...
> > >
> > > OK, how about this:
> > >
> > > 1. Erase Suspend now checks SR.6, exactly as it
> > *should* according to the
> > > spec flowchart.
> > >
> > > 2. If SR.6=0 then the erase has already completed, and
> > there is no need to set chip->erase_suspended or
> > subsequently issue an Erase Resume command.
> >
> > Wont work when you have interleaved chips, there one chip
> > may complete and the other not.
>
> OK, I see, I think: one chip might have finished erasing by the time we
> issue an Erase Suspend, but the other might not. Yikes, they don't make it
> easy for us.
>
> > It would be an optimization though, if all chips complete
> > then skip suspend state.
>
> And it might just preclude other problems elsewhere. Issuing an unnecessary
> Erase Resume does seem risky, given that the command code - 0xd0 - is so
> overloaded:
> 1. 3rd cycle of Buffered Program.
> 2. 2nd cycle of Buffered Enhanced Factory Program.
> 3. 2nd cycle of Block Erase.
> 4. 1st cycle of Program Resume.
> 5. 1st cycle of Erase Resume.
> 6. 2nd cycle of Unlock Block.
> That said, I presume the state machine already prevents any ambiguity.
>
> > > 3. If an Erase Resume command is needed then we know
> > that the erase has
> > > not completed, therefore it should be safe to wait for
> > SR[7:6]=00.
> >
> > Just wait for SR.6 to clear might work.
>
> That is what my 2nd patch does. Should I just submit that as V2?

Not sure yet, you would have add tmo support though and check
were to change state/oldstate as the loop drop the lock.

>
> > Perhaps adding 1 or 2 extra throw away status reads to
> > create a small delay instead?
>
> How much delay is enough for which platform and which chips? I would
> prefer just waiting for SR.6 to clear if that works.

The problem is very rare, the other case I mentioned is the only one I know of.
I seem to recall that in that case adding a throwaway status read cured his problem.
You could try, just so we known if it works and mail the other guy too, he might have
some more info.

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-02 12:39         ` Joakim Tjernlund
@ 2012-03-02 14:06           ` Paul Parsons
  2012-03-02 14:30             ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Paul Parsons @ 2012-03-02 14:06 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel

--- On Fri, 2/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > > > 2. If SR.6=0 then the erase has already
> completed, and
> > > there is no need to set chip->erase_suspended
> or
> > > subsequently issue an Erase Resume command.
> > >
> > > Wont work when you have interleaved chips, there
> one chip
> > > may complete and the other not.
> >
> > OK, I see, I think: one chip might have finished
> erasing by the time we
> > issue an Erase Suspend, but the other might not. Yikes,
> they don't make it
> > easy for us.

Thinking about this more, it shouldn't matter if not all interleaved chips
have completed their erase. We should initiate Erase Resume to *all* chips
if we find that *any* chips have suspended. The worse that happens is that
we issue unnecessary resumes to those chips which have already completed.
Since this is what the current MTD driver already does it shouldn't be a
problem.

> > > Just wait for SR.6 to clear might work.
> >
> > That is what my 2nd patch does. Should I just submit
> that as V2?
> 
> Not sure yet, you would have add tmo support though and
> check
> were to change state/oldstate as the loop drop the lock.

OK.

> > > Perhaps adding 1 or 2 extra throw away status
> reads to
> > > create a small delay instead?
> >
> > How much delay is enough for which platform and which
> chips? I would
> > prefer just waiting for SR.6 to clear if that works.
> 
> The problem is very rare, the other case I mentioned is the
> only one I know of.
> I seem to recall that in that case adding a throwaway status
> read cured his problem.
> You could try, just so we known if it works and mail the
> other guy too, he might have
> some more info.

The problem only became a problem for me after I switched from JFFS2 to
UBIFS. I found that UBIFS would make file systems read-only. Either JFFS2
was more forgiving of CFI errors, or UBIFS exercised CFI in new ways that
provoked the errors. As others switch from JFFS2 to UBIFS they too might
find the same problem.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-02 14:06           ` Paul Parsons
@ 2012-03-02 14:30             ` Joakim Tjernlund
  2012-03-02 15:15               ` Paul Parsons
  2012-03-09 10:45               ` Artem Bityutskiy
  0 siblings, 2 replies; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-02 14:30 UTC (permalink / raw)
  To: Paul Parsons; +Cc: linux-mtd, dwmw2, philipp.zabel

Paul Parsons <lost.distance@yahoo.com> wrote on 2012/03/02 15:06:33:
>
> --- On Fri, 2/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > > > > 2. If SR.6=0 then the erase has already
> > completed, and
> > > > there is no need to set chip->erase_suspended
> > or
> > > > subsequently issue an Erase Resume command.
> > > >
> > > > Wont work when you have interleaved chips, there
> > one chip
> > > > may complete and the other not.
> > >
> > > OK, I see, I think: one chip might have finished
> > erasing by the time we
> > > issue an Erase Suspend, but the other might not. Yikes,
> > they don't make it
> > > easy for us.
>
> Thinking about this more, it shouldn't matter if not all interleaved chips
> have completed their erase. We should initiate Erase Resume to *all* chips
> if we find that *any* chips have suspended. The worse that happens is that
> we issue unnecessary resumes to those chips which have already completed.
> Since this is what the current MTD driver already does it shouldn't be a
> problem.

Absolutely, this happens already today. What is missing it if all chips are finished,
then you don't have to resume at all. That could save some erase time.

>
> > > > Just wait for SR.6 to clear might work.
> > >
> > > That is what my 2nd patch does. Should I just submit
> > that as V2?
> >
> > Not sure yet, you would have add tmo support though and
> > check
> > were to change state/oldstate as the loop drop the lock.
>
> OK.
>
> > > > Perhaps adding 1 or 2 extra throw away status
> > reads to
> > > > create a small delay instead?
> > >
> > > How much delay is enough for which platform and which
> > chips? I would
> > > prefer just waiting for SR.6 to clear if that works.
> >
> > The problem is very rare, the other case I mentioned is the
> > only one I know of.
> > I seem to recall that in that case adding a throwaway status
> > read cured his problem.
> > You could try, just so we known if it works and mail the
> > other guy too, he might have
> > some more info.
>
> The problem only became a problem for me after I switched from JFFS2 to
> UBIFS. I found that UBIFS would make file systems read-only. Either JFFS2
> was more forgiving of CFI errors, or UBIFS exercised CFI in new ways that
> provoked the errors. As others switch from JFFS2 to UBIFS they too might
> find the same problem.

yes, UBI probably has some different use. One seems that it keeps erasing the
same block over and over again if the erase fails. JFFS2 doesn't do that since
you can't trust a block that is failing erase(how can you be sure that write will work?)
So, IMHO, UBIFS should reconsider this policy.

On that note, have you noticed any difference between JFFS2 and UBIFS?
Is performance comparable?

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-02 14:30             ` Joakim Tjernlund
@ 2012-03-02 15:15               ` Paul Parsons
  2012-03-09 10:45               ` Artem Bityutskiy
  1 sibling, 0 replies; 26+ messages in thread
From: Paul Parsons @ 2012-03-02 15:15 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linux-mtd, dwmw2, philipp.zabel

--- On Fri, 2/3/12, Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:
> > The problem only became a problem for me after I
> switched from JFFS2 to
> > UBIFS. I found that UBIFS would make file systems
> read-only. Either JFFS2
> > was more forgiving of CFI errors, or UBIFS exercised
> CFI in new ways that
> > provoked the errors. As others switch from JFFS2 to
> UBIFS they too might
> > find the same problem.
> 
> yes, UBI probably has some different use. One seems that it
> keeps erasing the
> same block over and over again if the erase fails. JFFS2
> doesn't do that since
> you can't trust a block that is failing erase(how can you be
> sure that write will work?)
> So, IMHO, UBIFS should reconsider this policy.
> 
> On that note, have you noticed any difference between JFFS2
> and UBIFS?
> Is performance comparable?

I haven't spent any time considering performance since I am preoccupied
with the errors. However my perception is that UBIFS boots slower but is
quicker for normal file writes and perhaps reads.

One functional difference which I have tested is the mmap(2) MAP_SHARED
flag. This was not supported in JFFS2 and consequently commands such as
localedef would simply fail. On UBIFS localedef works normally.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-02-29 22:26           ` Paul Parsons
@ 2012-03-09 10:39             ` Artem Bityutskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Artem Bityutskiy @ 2012-03-09 10:39 UTC (permalink / raw)
  To: Paul Parsons; +Cc: dwmw2, linux-mtd, Joakim Tjernlund, philipp.zabel

On Wed, 2012-02-29 at 22:26 +0000, Paul Parsons wrote:
> --- On Wed, 29/2/12, Artem Bityutskiy <dedekind1@gmail.com> wrote:
> > Did you try to validate your setup
> > with mtd tests?
> 
> No I did not.
> 
> I've just quickly tried the readtest on all mtd partitions and they seemed
> to pass. A torturetest on a small unused partition reported "retried 3
> times, still errors, give-up", but a stresstest on the same partition passed.
> 
> Was there a specific test and set of options I should try?
> The hx4700 has NOR flash (Intel StrataFlash) so it looks like at least 4
> of the tests - pagetest, oobtest, subpagetest, nandecctest - are not
> appropriate.

In the past all tests either worked on nor or refused to load. I thought
stresstest should work as well. I do not have any specific test in mind,
threw out standard suggestion I give to people :-)

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-02 14:30             ` Joakim Tjernlund
  2012-03-02 15:15               ` Paul Parsons
@ 2012-03-09 10:45               ` Artem Bityutskiy
  2012-03-13  8:27                 ` Joakim Tjernlund
  1 sibling, 1 reply; 26+ messages in thread
From: Artem Bityutskiy @ 2012-03-09 10:45 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dwmw2, linux-mtd, philipp.zabel, Paul Parsons

On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> yes, UBI probably has some different use. One seems that it keeps
> erasing the
> same block over and over again if the erase fails. JFFS2 doesn't do
> that since
> you can't trust a block that is failing erase(how can you be sure that
> write will work?)
> So, IMHO, UBIFS should reconsider this policy. 

The thread is very long and I would like save my time by not going
through it. But if you could start a new thread and formulate the
problem with UBI, I could think about it a bit. I am open to change
policies if it is not too much work.  Otherwise I am open to give hints
and suggestions and accept patches which change the policy. :-)

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-09 10:45               ` Artem Bityutskiy
@ 2012-03-13  8:27                 ` Joakim Tjernlund
  2012-03-13 12:48                   ` Artem Bityutskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-13  8:27 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, dwmw2, philipp.zabel, Paul Parsons


>
> On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> > yes, UBI probably has some different use. One seems that it keeps
> > erasing the
> > same block over and over again if the erase fails. JFFS2 doesn't do
> > that since
> > you can't trust a block that is failing erase(how can you be sure that
> > write will work?)
> > So, IMHO, UBIFS should reconsider this policy.
>
> The thread is very long and I would like save my time by not going
> through it. But if you could start a new thread and formulate the
> problem with UBI, I could think about it a bit. I am open to change
> policies if it is not too much work.  Otherwise I am open to give hints
> and suggestions and accept patches which change the policy. :-)

I just noted from one of the logs Paul sent, it LOOKED like UBI tried
to erase the same sector several times if erase failed, gave me the impression
that UBI would try erasing a failing sector over and over again.
I think Paul will have to provide details if this is the case or not.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-13  8:27                 ` Joakim Tjernlund
@ 2012-03-13 12:48                   ` Artem Bityutskiy
  2012-03-13 12:55                     ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Artem Bityutskiy @ 2012-03-13 12:48 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linux-mtd, dwmw2, philipp.zabel, Paul Parsons

[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

On Tue, 2012-03-13 at 09:27 +0100, Joakim Tjernlund wrote:
> >
> > On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> > > yes, UBI probably has some different use. One seems that it keeps
> > > erasing the
> > > same block over and over again if the erase fails. JFFS2 doesn't do
> > > that since
> > > you can't trust a block that is failing erase(how can you be sure that
> > > write will work?)
> > > So, IMHO, UBIFS should reconsider this policy.
> >
> > The thread is very long and I would like save my time by not going
> > through it. But if you could start a new thread and formulate the
> > problem with UBI, I could think about it a bit. I am open to change
> > policies if it is not too much work.  Otherwise I am open to give hints
> > and suggestions and accept patches which change the policy. :-)
> 
> I just noted from one of the logs Paul sent, it LOOKED like UBI tried
> to erase the same sector several times if erase failed, gave me the impression
> that UBI would try erasing a failing sector over and over again.
> I think Paul will have to provide details if this is the case or not.

Correct, UBI does retry erasing several times, see function
'do_sync_erase()' in drivers/mtd/ubi/io.c. Why would this be a problem?

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-13 12:48                   ` Artem Bityutskiy
@ 2012-03-13 12:55                     ` Joakim Tjernlund
  2012-03-13 13:09                       ` Artem Bityutskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-13 12:55 UTC (permalink / raw)
  To: dedekind1
  Cc: linux-mtd-bounces, linux-mtd, dwmw2, philipp.zabel, Paul Parsons

>
> On Tue, 2012-03-13 at 09:27 +0100, Joakim Tjernlund wrote:
> > >
> > > On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> > > > yes, UBI probably has some different use. One seems that it keeps
> > > > erasing the
> > > > same block over and over again if the erase fails. JFFS2 doesn't do
> > > > that since
> > > > you can't trust a block that is failing erase(how can you be sure that
> > > > write will work?)
> > > > So, IMHO, UBIFS should reconsider this policy.
> > >
> > > The thread is very long and I would like save my time by not going
> > > through it. But if you could start a new thread and formulate the
> > > problem with UBI, I could think about it a bit. I am open to change
> > > policies if it is not too much work.  Otherwise I am open to give hints
> > > and suggestions and accept patches which change the policy. :-)
> >
> > I just noted from one of the logs Paul sent, it LOOKED like UBI tried
> > to erase the same sector several times if erase failed, gave me the impression
> > that UBI would try erasing a failing sector over and over again.
> > I think Paul will have to provide details if this is the case or not.
>
> Correct, UBI does retry erasing several times, see function
> 'do_sync_erase()' in drivers/mtd/ubi/io.c. Why would this be a problem?

How can you trust a sector that fails to erase several times and then suddenly works ?

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-13 12:55                     ` Joakim Tjernlund
@ 2012-03-13 13:09                       ` Artem Bityutskiy
  2012-03-13 13:33                         ` Joakim Tjernlund
  0 siblings, 1 reply; 26+ messages in thread
From: Artem Bityutskiy @ 2012-03-13 13:09 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: linux-mtd-bounces, linux-mtd, dwmw2, philipp.zabel, Paul Parsons

[-- Attachment #1: Type: text/plain, Size: 1843 bytes --]

On Tue, 2012-03-13 at 13:55 +0100, Joakim Tjernlund wrote:
> >
> > On Tue, 2012-03-13 at 09:27 +0100, Joakim Tjernlund wrote:
> > > >
> > > > On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> > > > > yes, UBI probably has some different use. One seems that it keeps
> > > > > erasing the
> > > > > same block over and over again if the erase fails. JFFS2 doesn't do
> > > > > that since
> > > > > you can't trust a block that is failing erase(how can you be sure that
> > > > > write will work?)
> > > > > So, IMHO, UBIFS should reconsider this policy.
> > > >
> > > > The thread is very long and I would like save my time by not going
> > > > through it. But if you could start a new thread and formulate the
> > > > problem with UBI, I could think about it a bit. I am open to change
> > > > policies if it is not too much work.  Otherwise I am open to give hints
> > > > and suggestions and accept patches which change the policy. :-)
> > >
> > > I just noted from one of the logs Paul sent, it LOOKED like UBI tried
> > > to erase the same sector several times if erase failed, gave me the impression
> > > that UBI would try erasing a failing sector over and over again.
> > > I think Paul will have to provide details if this is the case or not.
> >
> > Correct, UBI does retry erasing several times, see function
> > 'do_sync_erase()' in drivers/mtd/ubi/io.c. Why would this be a problem?
> 
> How can you trust a sector that fails to erase several times and then suddenly works ?

It could be a transient error, some random noise in the line, some
electrons did not want to move by an accident, something like this :-)

But I guess it does not make much sense for erase and write. For read it
does make sense to re-try. I am happy to accept a patch.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] mtd: cfi: Wait for Block Erase operation to finish
  2012-03-13 13:09                       ` Artem Bityutskiy
@ 2012-03-13 13:33                         ` Joakim Tjernlund
  0 siblings, 0 replies; 26+ messages in thread
From: Joakim Tjernlund @ 2012-03-13 13:33 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd, dwmw2, philipp.zabel, Paul Parsons

Artem Bityutskiy <dedekind1@gmail.com> wrote on 2012/03/13 14:09:09:
>
> On Tue, 2012-03-13 at 13:55 +0100, Joakim Tjernlund wrote:
> > >
> > > On Tue, 2012-03-13 at 09:27 +0100, Joakim Tjernlund wrote:
> > > > >
> > > > > On Fri, 2012-03-02 at 15:30 +0100, Joakim Tjernlund wrote:
> > > > > > yes, UBI probably has some different use. One seems that it keeps
> > > > > > erasing the
> > > > > > same block over and over again if the erase fails. JFFS2 doesn't do
> > > > > > that since
> > > > > > you can't trust a block that is failing erase(how can you be sure that
> > > > > > write will work?)
> > > > > > So, IMHO, UBIFS should reconsider this policy.
> > > > >
> > > > > The thread is very long and I would like save my time by not going
> > > > > through it. But if you could start a new thread and formulate the
> > > > > problem with UBI, I could think about it a bit. I am open to change
> > > > > policies if it is not too much work.  Otherwise I am open to give hints
> > > > > and suggestions and accept patches which change the policy. :-)
> > > >
> > > > I just noted from one of the logs Paul sent, it LOOKED like UBI tried
> > > > to erase the same sector several times if erase failed, gave me the impression
> > > > that UBI would try erasing a failing sector over and over again.
> > > > I think Paul will have to provide details if this is the case or not.
> > >
> > > Correct, UBI does retry erasing several times, see function
> > > 'do_sync_erase()' in drivers/mtd/ubi/io.c. Why would this be a problem?
> >
> > How can you trust a sector that fails to erase several times and then suddenly works ?
>
> It could be a transient error, some random noise in the line, some
> electrons did not want to move by an accident, something like this :-)
>
> But I guess it does not make much sense for erase and write. For read it
> does make sense to re-try. I am happy to accept a patch.

I am not using UBIFS(yet anyway) so don't wait for me.

 Jocke

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-03-13 13:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-28 15:32 [PATCH] mtd: cfi: Wait for Block Erase operation to finish Paul Parsons
2012-02-28 21:25 ` Joakim Tjernlund
2012-02-29  0:23   ` Paul Parsons
2012-02-29  8:58     ` Joakim Tjernlund
2012-02-29 17:22       ` Paul Parsons
2012-02-29 18:03         ` Artem Bityutskiy
2012-02-29 22:26           ` Paul Parsons
2012-03-09 10:39             ` Artem Bityutskiy
2012-02-29 23:35         ` Joakim Tjernlund
2012-03-01 14:57           ` Paul Parsons
2012-03-01 15:59             ` Joakim Tjernlund
  -- strict thread matches above, loose matches on Subject: below --
2012-03-01 17:22 Paul Parsons
2012-03-01 17:38 ` Paul Parsons
2012-03-01 17:53   ` Paul Parsons
2012-03-01 18:03     ` Joakim Tjernlund
2012-03-01 18:50       ` Paul Parsons
2012-03-02 12:39         ` Joakim Tjernlund
2012-03-02 14:06           ` Paul Parsons
2012-03-02 14:30             ` Joakim Tjernlund
2012-03-02 15:15               ` Paul Parsons
2012-03-09 10:45               ` Artem Bityutskiy
2012-03-13  8:27                 ` Joakim Tjernlund
2012-03-13 12:48                   ` Artem Bityutskiy
2012-03-13 12:55                     ` Joakim Tjernlund
2012-03-13 13:09                       ` Artem Bityutskiy
2012-03-13 13:33                         ` Joakim Tjernlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox