PATCH: solving a hang while waiting in FL

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* PATCH: solving a hang while waiting in FL_STATUS
@ 2008-04-23  8:57 Abel Bernabeu
  2008-04-23 10:33 ` David Woodhouse
  0 siblings, 1 reply; 7+ messages in thread
From: Abel Bernabeu @ 2008-04-23  8:57 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

I am experimenting an infinite loop problem happening at
cfi_cmdset_0001.c:chip_ready.

Some kind of timeout while waiting  in FL_STATUS should be included.
Indeed, this timeout feature is not new at all.

The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
some refactoring work easily perceptible).

The present patch restores the "lost functionality" and solves my
problem. Please consider the patch to be included in the mtd code.

Yours, Abel.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: chip_ready.patch --]
[-- Type: text/x-patch; name=chip_ready.patch, Size: 551 bytes --]

--- linux-2.6.25/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25.hacked/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-23 10:27:12.000000000 +0200
@@ -712,6 +712,12 @@
 			if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS))
 				break;
 
+			if (time_after(jiffies, timeo)) {
+				printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n",
+					map->name, status.x[0]);
+				return -EIO;
+			}
+
 			spin_unlock(chip->mutex);
 			cfi_udelay(1);
 			spin_lock(chip->mutex);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23  8:57 PATCH: solving a hang while waiting in FL_STATUS Abel Bernabeu
@ 2008-04-23 10:33 ` David Woodhouse
  2008-04-23 13:27   ` Alexey Korolev
  0 siblings, 1 reply; 7+ messages in thread
From: David Woodhouse @ 2008-04-23 10:33 UTC (permalink / raw)
  To: Abel Bernabeu; +Cc: linux-mtd, Alexey Korolev

On Wed, 2008-04-23 at 10:57 +0200, Abel Bernabeu wrote:
> I am experimenting an infinite loop problem happening at
> cfi_cmdset_0001.c:chip_ready.
> 
> Some kind of timeout while waiting  in FL_STATUS should be included.
> Indeed, this timeout feature is not new at all.
> 
> The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
> some refactoring work easily perceptible).

Hm. This was removed in commit 5a37cf19:
	Fix deadlock in Intel chip driver caused by get_chip recursion

Alexey, can you explain why you removed it? Abel's patch simply adds
this back where it was. That looks reasonable to me, so I'd like to know
if I've missed some reason why it shouldn't be there...

-- 
dwmw2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23 10:33 ` David Woodhouse
@ 2008-04-23 13:27   ` Alexey Korolev
  2008-04-23 16:35     ` Abel Bernabeu
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Korolev @ 2008-04-23 13:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, Abel Bernabeu

Hi David,  

> > I am experimenting an infinite loop problem happening at
> > cfi_cmdset_0001.c:chip_ready.
> > 
> > Some kind of timeout while waiting  in FL_STATUS should be included.
> > Indeed, this timeout feature is not new at all.
> > 
> > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
> > some refactoring work easily perceptible).
> 
> Hm. This was removed in commit 5a37cf19:
> 	Fix deadlock in Intel chip driver caused by get_chip recursion
> 
> Alexey, can you explain why you removed it? Abel's patch simply adds
> this back where it was. That looks reasonable to me, so I'd like to know
> if I've missed some reason why it shouldn't be there...
> 
>
Yes. I can explain. I removed it because just simple adding timeout
won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur:
--------------
	case FL_STATUS:
		for (;;) {
			status = map_read(map, adr);
			if (map_word_andequal(map, status, status_OK, status_OK))
				break;

			/* At this point we're fine with write operations
			   in other partitions as they don't conflict. */
			if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS))
				break;

			if (time_after(jiffies, timeo)) {
				printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n",
				       map->name, status.x[0]);
				return -EIO;
			}
			spin_unlock(chip->mutex);
			cfi_udelay(1);
			spin_lock(chip->mutex);
			/* Someone else might have been playing with it. */
			return -EAGAIN;
		}
--------------

I agree that watchdog could be quite useful here, but adding
it properly will lead to rather big changes. That is why Nicolas and me
agreed to kick it off. 

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23 13:27   ` Alexey Korolev
@ 2008-04-23 16:35     ` Abel Bernabeu
  2008-04-23 16:41       ` David Woodhouse
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Abel Bernabeu @ 2008-04-23 16:35 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: abel.bernabeu, linux-mtd, David Woodhouse

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

2008/4/23, Alexey Korolev <akorolev@infradead.org>:
> Hi David,
>
>
>  > > I am experimenting an infinite loop problem happening at
>  > > cfi_cmdset_0001.c:chip_ready.
>  > >
>  > > Some kind of timeout while waiting  in FL_STATUS should be included.
>  > > Indeed, this timeout feature is not new at all.
>  > >
>  > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
>  > > some refactoring work easily perceptible).
>  >
>  > Hm. This was removed in commit 5a37cf19:
>  >       Fix deadlock in Intel chip driver caused by get_chip recursion
>  >
>  > Alexey, can you explain why you removed it? Abel's patch simply adds
>  > this back where it was. That looks reasonable to me, so I'd like to know
>  > if I've missed some reason why it shouldn't be there...
>  >
>  >
>
> Yes. I can explain. I removed it because just simple adding timeout
>  won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur:


Alexey, you are rigth... the patch did nothing, but the attached one
does the correct thing (discard the previous one).

Believe me, it is not a good idea to return -EAGAIN and let get_chip
to iterate forever (was my case).

I have added a timeout parameter to get_chip. It is a dirty solution
but I cannot redesign that piece of code since I did not wrote it and
I could introduce serious bugs.

Image you have a broken sector which will never respond... that could
hang the process while unlocking the sectors of a partition. In my
case the problem prevents our board to complete the kernel start-up.

Yours, Abel.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: chip_ready.patch --]
[-- Type: text/x-patch; name=chip_ready.patch, Size: 3323 bytes --]

--- linux-2.6.25/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25.hacked/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-23 18:00:16.000000000 +0200
@@ -86,7 +86,7 @@
 static void cfi_intelext_unpoint (struct mtd_info *mtd, u_char *addr, loff_t from,
 			size_t len);
 
-static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode);
+static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode, unsigned long timeout);
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode);
 static void put_chip(struct map_info *map, struct flchip *chip, unsigned long adr);
 #include "fwh_lock.h"
@@ -691,34 +691,40 @@
 /*
  *  *********** CHIP ACCESS FUNCTIONS ***********
  */
-static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
+static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode, unsigned long timeo)
 {
 	DECLARE_WAITQUEUE(wait, current);
 	struct cfi_private *cfi = map->fldrv_priv;
 	map_word status, status_OK = CMD(0x80), status_PWS = CMD(0x01);
 	struct cfi_pri_intelext *cfip = cfi->cmdset_priv;
-	unsigned long timeo = jiffies + HZ;
 
 	switch (chip->state) {
 
 	case FL_STATUS:
-		for (;;) {
-			status = map_read(map, adr);
-			if (map_word_andequal(map, status, status_OK, status_OK))
-				break;
 
-			/* At this point we're fine with write operations
-			   in other partitions as they don't conflict. */
-			if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS))
-				break;
+		status = map_read(map, adr);
 
-			spin_unlock(chip->mutex);
-			cfi_udelay(1);
-			spin_lock(chip->mutex);
-			/* Someone else might have been playing with it. */
-			return -EAGAIN;
+		if (map_word_andequal(map, status, status_OK, status_OK))
+			return 0;
+
+		/* At this point we're fine with write operations
+		   in other partitions as they don't conflict. */
+		if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS))
+			return 0;
+
+		if (time_after(jiffies, timeo)) {
+			printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n",
+				map->name, status.x[0]);
+			return -EIO;
 		}
-		/* Fall through */
+
+		spin_unlock(chip->mutex);
+		cfi_udelay(1);
+		spin_lock(chip->mutex);
+
+		/* Someone else might have been playing with it. */
+		return -EAGAIN;
+
 	case FL_READY:
 	case FL_CFI_QUERY:
 	case FL_JEDEC_QUERY:
@@ -801,8 +807,12 @@
 static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
 {
 	int ret;
+	unsigned int timeo;
+
 	DECLARE_WAITQUEUE(wait, current);
 
+	timeo = jiffies + HZ;
+
  retry:
 	if (chip->priv && (mode == FL_WRITING || mode == FL_ERASING
 			   || mode == FL_OTP_WRITE || mode == FL_SHUTDOWN)) {
@@ -843,7 +853,7 @@
 			if (!ret)
 				goto retry;
 			spin_unlock(chip->mutex);
-			ret = chip_ready(map, contender, contender->start, mode);
+			ret = chip_ready(map, contender, contender->start, mode, timeo);
 			spin_lock(chip->mutex);
 
 			if (ret == -EAGAIN) {
@@ -878,7 +888,7 @@
 			shared->erasing = chip;
 		spin_unlock(&shared->lock);
 	}
-	ret = chip_ready(map, chip, adr, mode);
+	ret = chip_ready(map, chip, adr, mode, timeo);
 	if (ret == -EAGAIN)
 		goto retry;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23 16:35     ` Abel Bernabeu
@ 2008-04-23 16:41       ` David Woodhouse
  2008-04-23 17:28       ` Alexey Korolev
  2008-04-23 18:57       ` Anders Grafström
  2 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2008-04-23 16:41 UTC (permalink / raw)
  To: abelbg; +Cc: abel.bernabeu, linux-mtd, Alexey Korolev

On Wed, 2008-04-23 at 18:35 +0200, Abel Bernabeu wrote:
> I have added a timeout parameter to get_chip. It is a dirty solution
> but I cannot redesign that piece of code since I did not wrote it and
> I could introduce serious bugs.

Wimp. That never stopped anyone else. And having written it doesn't
really help either :)

-- 
dwmw2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23 16:35     ` Abel Bernabeu
  2008-04-23 16:41       ` David Woodhouse
@ 2008-04-23 17:28       ` Alexey Korolev
  2008-04-23 18:57       ` Anders Grafström
  2 siblings, 0 replies; 7+ messages in thread
From: Alexey Korolev @ 2008-04-23 17:28 UTC (permalink / raw)
  To: Abel Bernabeu, nico; +Cc: linux-mtd, abel.bernabeu, David Woodhouse

Hi Abel,

> >  > > I am experimenting an infinite loop problem happening at
> >  > > cfi_cmdset_0001.c:chip_ready.
> >  > >
> >  > > Some kind of timeout while waiting  in FL_STATUS should be included.
> >  > > Indeed, this timeout feature is not new at all.
> >  > >
> >  > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
> >  > > some refactoring work easily perceptible).
> >  >
> >  > Hm. This was removed in commit 5a37cf19:
> >  >       Fix deadlock in Intel chip driver caused by get_chip recursion
> >  >
> >  > Alexey, can you explain why you removed it? Abel's patch simply adds
> >  > this back where it was. That looks reasonable to me, so I'd like to know
> >  > if I've missed some reason why it shouldn't be there...
> >  >
> >  >
> >
> > Yes. I can explain. I removed it because just simple adding timeout
> >  won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur:
> 
> 
> Alexey, you are rigth... the patch did nothing, but the attached one
> does the correct thing (discard the previous one).
> 
> Believe me, it is not a good idea to return -EAGAIN and let get_chip
> to iterate forever (was my case).
> 
> I have added a timeout parameter to get_chip. It is a dirty solution
> but I cannot redesign that piece of code since I did not wrote it and
> I could introduce serious bugs.
> 
Exactly. Don't use this patch. Timeo is added but conditions when it is
neccesary to drop timeo are not defined. Get_chip may work for very long time if
suspend operations are enabled. As result you may easily face return -EIO by timeo
when it should not be!
> Image you have a broken sector which will never respond... that could
> hang the process while unlocking the sectors of a partition. In my
> case the problem prevents our board to complete the kernel start-up.
>
In fact this situation occurs in really bad case. Why you have broken
sector on NOR?!  
 
I may try to find a way to introduce watchdog timer here. Your
suggestions to fix it are also quite welcome!

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PATCH: solving a hang while waiting in FL_STATUS
  2008-04-23 16:35     ` Abel Bernabeu
  2008-04-23 16:41       ` David Woodhouse
  2008-04-23 17:28       ` Alexey Korolev
@ 2008-04-23 18:57       ` Anders Grafström
  2 siblings, 0 replies; 7+ messages in thread
From: Anders Grafström @ 2008-04-23 18:57 UTC (permalink / raw)
  To: abelbg, linux-mtd; +Cc: David Woodhouse, abel.bernabeu, Alexey Korolev

Abel Bernabeu wrote:
> Image you have a broken sector which will never respond... that could
> hang the process while unlocking the sectors of a partition. In my
> case the problem prevents our board to complete the kernel start-up.

I have seen a case with a faulty Intel 28F128J3 that had a sector that
wouldn't erase. What happened was that the device kept trying to erase
the sector until the max specified erase time had passed then it set
the ready and error bits. The max erase time in this case was 25 seconds.
(early chip revision suffering from errata #2, I believe)
inval_cache_and_wait_for_operation() timed out already after 8 seconds
and set chip->state to FL_STATUS. It thus lost the actual state of the chip.
All operations within the next 17 seconds failed. The unit was impossble
to boot since it just kept panicing.

The error message I saw at that time (2.6.18) was
"Waiting for chip to be ready timed out. Status xxxx"
then JFFS2 got very unhappy.

A patched kernel with proper timeouts later booted this unit with just a
warning message about the bad sector. I'll post the patch shortly.

I've added this patch below as a safeguard. (Any thoughts on it?)
You could try it and see if it gets you out of the loop.

Anders

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
index e812df6..e8a880c 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -706,6 +706,7 @@ static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long

  	case FL_STATUS:
  		for (;;) {
+			map_write(map, CMD(0x70), adr);
  			status = map_read(map, adr);
  			if (map_word_andequal(map, status, status_OK, status_OK))
  				break;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-04-23 18:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23  8:57 PATCH: solving a hang while waiting in FL_STATUS Abel Bernabeu
2008-04-23 10:33 ` David Woodhouse
2008-04-23 13:27   ` Alexey Korolev
2008-04-23 16:35     ` Abel Bernabeu
2008-04-23 16:41       ` David Woodhouse
2008-04-23 17:28       ` Alexey Korolev
2008-04-23 18:57       ` Anders Grafström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox