* PATCH: solving a hang while waiting in FL_STATUS
@ 2008-04-23 8:57 Abel Bernabeu
2008-04-23 10:33 ` David Woodhouse
0 siblings, 1 reply; 7+ messages in thread
From: Abel Bernabeu @ 2008-04-23 8:57 UTC (permalink / raw)
To: linux-mtd
[-- Attachment #1: Type: text/plain, Size: 467 bytes --]
I am experimenting an infinite loop problem happening at
cfi_cmdset_0001.c:chip_ready.
Some kind of timeout while waiting in FL_STATUS should be included.
Indeed, this timeout feature is not new at all.
The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after
some refactoring work easily perceptible).
The present patch restores the "lost functionality" and solves my
problem. Please consider the patch to be included in the mtd code.
Yours, Abel.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: chip_ready.patch --]
[-- Type: text/x-patch; name=chip_ready.patch, Size: 551 bytes --]
--- linux-2.6.25/drivers/mtd/chips/cfi_cmdset_0001.c 2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25.hacked/drivers/mtd/chips/cfi_cmdset_0001.c 2008-04-23 10:27:12.000000000 +0200
@@ -712,6 +712,12 @@
if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS))
break;
+ if (time_after(jiffies, timeo)) {
+ printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n",
+ map->name, status.x[0]);
+ return -EIO;
+ }
+
spin_unlock(chip->mutex);
cfi_udelay(1);
spin_lock(chip->mutex);
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 8:57 PATCH: solving a hang while waiting in FL_STATUS Abel Bernabeu @ 2008-04-23 10:33 ` David Woodhouse 2008-04-23 13:27 ` Alexey Korolev 0 siblings, 1 reply; 7+ messages in thread From: David Woodhouse @ 2008-04-23 10:33 UTC (permalink / raw) To: Abel Bernabeu; +Cc: linux-mtd, Alexey Korolev On Wed, 2008-04-23 at 10:57 +0200, Abel Bernabeu wrote: > I am experimenting an infinite loop problem happening at > cfi_cmdset_0001.c:chip_ready. > > Some kind of timeout while waiting in FL_STATUS should be included. > Indeed, this timeout feature is not new at all. > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after > some refactoring work easily perceptible). Hm. This was removed in commit 5a37cf19: Fix deadlock in Intel chip driver caused by get_chip recursion Alexey, can you explain why you removed it? Abel's patch simply adds this back where it was. That looks reasonable to me, so I'd like to know if I've missed some reason why it shouldn't be there... -- dwmw2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 10:33 ` David Woodhouse @ 2008-04-23 13:27 ` Alexey Korolev 2008-04-23 16:35 ` Abel Bernabeu 0 siblings, 1 reply; 7+ messages in thread From: Alexey Korolev @ 2008-04-23 13:27 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd, Abel Bernabeu Hi David, > > I am experimenting an infinite loop problem happening at > > cfi_cmdset_0001.c:chip_ready. > > > > Some kind of timeout while waiting in FL_STATUS should be included. > > Indeed, this timeout feature is not new at all. > > > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after > > some refactoring work easily perceptible). > > Hm. This was removed in commit 5a37cf19: > Fix deadlock in Intel chip driver caused by get_chip recursion > > Alexey, can you explain why you removed it? Abel's patch simply adds > this back where it was. That looks reasonable to me, so I'd like to know > if I've missed some reason why it shouldn't be there... > > Yes. I can explain. I removed it because just simple adding timeout won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur: -------------- case FL_STATUS: for (;;) { status = map_read(map, adr); if (map_word_andequal(map, status, status_OK, status_OK)) break; /* At this point we're fine with write operations in other partitions as they don't conflict. */ if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS)) break; if (time_after(jiffies, timeo)) { printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n", map->name, status.x[0]); return -EIO; } spin_unlock(chip->mutex); cfi_udelay(1); spin_lock(chip->mutex); /* Someone else might have been playing with it. */ return -EAGAIN; } -------------- I agree that watchdog could be quite useful here, but adding it properly will lead to rather big changes. That is why Nicolas and me agreed to kick it off. Thanks, Alexey ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 13:27 ` Alexey Korolev @ 2008-04-23 16:35 ` Abel Bernabeu 2008-04-23 16:41 ` David Woodhouse ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Abel Bernabeu @ 2008-04-23 16:35 UTC (permalink / raw) To: Alexey Korolev; +Cc: abel.bernabeu, linux-mtd, David Woodhouse [-- Attachment #1: Type: text/plain, Size: 1619 bytes --] 2008/4/23, Alexey Korolev <akorolev@infradead.org>: > Hi David, > > > > > I am experimenting an infinite loop problem happening at > > > cfi_cmdset_0001.c:chip_ready. > > > > > > Some kind of timeout while waiting in FL_STATUS should be included. > > > Indeed, this timeout feature is not new at all. > > > > > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after > > > some refactoring work easily perceptible). > > > > Hm. This was removed in commit 5a37cf19: > > Fix deadlock in Intel chip driver caused by get_chip recursion > > > > Alexey, can you explain why you removed it? Abel's patch simply adds > > this back where it was. That looks reasonable to me, so I'd like to know > > if I've missed some reason why it shouldn't be there... > > > > > > Yes. I can explain. I removed it because just simple adding timeout > won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur: Alexey, you are rigth... the patch did nothing, but the attached one does the correct thing (discard the previous one). Believe me, it is not a good idea to return -EAGAIN and let get_chip to iterate forever (was my case). I have added a timeout parameter to get_chip. It is a dirty solution but I cannot redesign that piece of code since I did not wrote it and I could introduce serious bugs. Image you have a broken sector which will never respond... that could hang the process while unlocking the sectors of a partition. In my case the problem prevents our board to complete the kernel start-up. Yours, Abel. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: chip_ready.patch --] [-- Type: text/x-patch; name=chip_ready.patch, Size: 3323 bytes --] --- linux-2.6.25/drivers/mtd/chips/cfi_cmdset_0001.c 2008-04-17 04:49:44.000000000 +0200 +++ linux-2.6.25.hacked/drivers/mtd/chips/cfi_cmdset_0001.c 2008-04-23 18:00:16.000000000 +0200 @@ -86,7 +86,7 @@ static void cfi_intelext_unpoint (struct mtd_info *mtd, u_char *addr, loff_t from, size_t len); -static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode); +static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode, unsigned long timeout); static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode); static void put_chip(struct map_info *map, struct flchip *chip, unsigned long adr); #include "fwh_lock.h" @@ -691,34 +691,40 @@ /* * *********** CHIP ACCESS FUNCTIONS *********** */ -static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode) +static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long adr, int mode, unsigned long timeo) { DECLARE_WAITQUEUE(wait, current); struct cfi_private *cfi = map->fldrv_priv; map_word status, status_OK = CMD(0x80), status_PWS = CMD(0x01); struct cfi_pri_intelext *cfip = cfi->cmdset_priv; - unsigned long timeo = jiffies + HZ; switch (chip->state) { case FL_STATUS: - for (;;) { - status = map_read(map, adr); - if (map_word_andequal(map, status, status_OK, status_OK)) - break; - /* At this point we're fine with write operations - in other partitions as they don't conflict. */ - if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS)) - break; + status = map_read(map, adr); - spin_unlock(chip->mutex); - cfi_udelay(1); - spin_lock(chip->mutex); - /* Someone else might have been playing with it. */ - return -EAGAIN; + if (map_word_andequal(map, status, status_OK, status_OK)) + return 0; + + /* At this point we're fine with write operations + in other partitions as they don't conflict. */ + if (chip->priv && map_word_andequal(map, status, status_PWS, status_PWS)) + return 0; + + if (time_after(jiffies, timeo)) { + printk(KERN_ERR "%s: Waiting for chip to be ready timed out. Status %lx\n", + map->name, status.x[0]); + return -EIO; } - /* Fall through */ + + spin_unlock(chip->mutex); + cfi_udelay(1); + spin_lock(chip->mutex); + + /* Someone else might have been playing with it. */ + return -EAGAIN; + case FL_READY: case FL_CFI_QUERY: case FL_JEDEC_QUERY: @@ -801,8 +807,12 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode) { int ret; + unsigned int timeo; + DECLARE_WAITQUEUE(wait, current); + timeo = jiffies + HZ; + retry: if (chip->priv && (mode == FL_WRITING || mode == FL_ERASING || mode == FL_OTP_WRITE || mode == FL_SHUTDOWN)) { @@ -843,7 +853,7 @@ if (!ret) goto retry; spin_unlock(chip->mutex); - ret = chip_ready(map, contender, contender->start, mode); + ret = chip_ready(map, contender, contender->start, mode, timeo); spin_lock(chip->mutex); if (ret == -EAGAIN) { @@ -878,7 +888,7 @@ shared->erasing = chip; spin_unlock(&shared->lock); } - ret = chip_ready(map, chip, adr, mode); + ret = chip_ready(map, chip, adr, mode, timeo); if (ret == -EAGAIN) goto retry; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 16:35 ` Abel Bernabeu @ 2008-04-23 16:41 ` David Woodhouse 2008-04-23 17:28 ` Alexey Korolev 2008-04-23 18:57 ` Anders Grafström 2 siblings, 0 replies; 7+ messages in thread From: David Woodhouse @ 2008-04-23 16:41 UTC (permalink / raw) To: abelbg; +Cc: abel.bernabeu, linux-mtd, Alexey Korolev On Wed, 2008-04-23 at 18:35 +0200, Abel Bernabeu wrote: > I have added a timeout parameter to get_chip. It is a dirty solution > but I cannot redesign that piece of code since I did not wrote it and > I could introduce serious bugs. Wimp. That never stopped anyone else. And having written it doesn't really help either :) -- dwmw2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 16:35 ` Abel Bernabeu 2008-04-23 16:41 ` David Woodhouse @ 2008-04-23 17:28 ` Alexey Korolev 2008-04-23 18:57 ` Anders Grafström 2 siblings, 0 replies; 7+ messages in thread From: Alexey Korolev @ 2008-04-23 17:28 UTC (permalink / raw) To: Abel Bernabeu, nico; +Cc: linux-mtd, abel.bernabeu, David Woodhouse Hi Abel, > > > > I am experimenting an infinite loop problem happening at > > > > cfi_cmdset_0001.c:chip_ready. > > > > > > > > Some kind of timeout while waiting in FL_STATUS should be included. > > > > Indeed, this timeout feature is not new at all. > > > > > > > > The feature was present in 2.6.23.17 but disappeared in 2.6.24 (after > > > > some refactoring work easily perceptible). > > > > > > Hm. This was removed in commit 5a37cf19: > > > Fix deadlock in Intel chip driver caused by get_chip recursion > > > > > > Alexey, can you explain why you removed it? Abel's patch simply adds > > > this back where it was. That looks reasonable to me, so I'd like to know > > > if I've missed some reason why it shouldn't be there... > > > > > > > > > > Yes. I can explain. I removed it because just simple adding timeout > > won't work. Please look at the code after patch applying (solving a hang while waiting in FL_STATUS). Time_after will never occur: > > > Alexey, you are rigth... the patch did nothing, but the attached one > does the correct thing (discard the previous one). > > Believe me, it is not a good idea to return -EAGAIN and let get_chip > to iterate forever (was my case). > > I have added a timeout parameter to get_chip. It is a dirty solution > but I cannot redesign that piece of code since I did not wrote it and > I could introduce serious bugs. > Exactly. Don't use this patch. Timeo is added but conditions when it is neccesary to drop timeo are not defined. Get_chip may work for very long time if suspend operations are enabled. As result you may easily face return -EIO by timeo when it should not be! > Image you have a broken sector which will never respond... that could > hang the process while unlocking the sectors of a partition. In my > case the problem prevents our board to complete the kernel start-up. > In fact this situation occurs in really bad case. Why you have broken sector on NOR?! I may try to find a way to introduce watchdog timer here. Your suggestions to fix it are also quite welcome! Thanks, Alexey ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PATCH: solving a hang while waiting in FL_STATUS 2008-04-23 16:35 ` Abel Bernabeu 2008-04-23 16:41 ` David Woodhouse 2008-04-23 17:28 ` Alexey Korolev @ 2008-04-23 18:57 ` Anders Grafström 2 siblings, 0 replies; 7+ messages in thread From: Anders Grafström @ 2008-04-23 18:57 UTC (permalink / raw) To: abelbg, linux-mtd; +Cc: David Woodhouse, abel.bernabeu, Alexey Korolev Abel Bernabeu wrote: > Image you have a broken sector which will never respond... that could > hang the process while unlocking the sectors of a partition. In my > case the problem prevents our board to complete the kernel start-up. I have seen a case with a faulty Intel 28F128J3 that had a sector that wouldn't erase. What happened was that the device kept trying to erase the sector until the max specified erase time had passed then it set the ready and error bits. The max erase time in this case was 25 seconds. (early chip revision suffering from errata #2, I believe) inval_cache_and_wait_for_operation() timed out already after 8 seconds and set chip->state to FL_STATUS. It thus lost the actual state of the chip. All operations within the next 17 seconds failed. The unit was impossble to boot since it just kept panicing. The error message I saw at that time (2.6.18) was "Waiting for chip to be ready timed out. Status xxxx" then JFFS2 got very unhappy. A patched kernel with proper timeouts later booted this unit with just a warning message about the bad sector. I'll post the patch shortly. I've added this patch below as a safeguard. (Any thoughts on it?) You could try it and see if it gets you out of the loop. Anders diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c index e812df6..e8a880c 100644 --- a/drivers/mtd/chips/cfi_cmdset_0001.c +++ b/drivers/mtd/chips/cfi_cmdset_0001.c @@ -706,6 +706,7 @@ static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long case FL_STATUS: for (;;) { + map_write(map, CMD(0x70), adr); status = map_read(map, adr); if (map_word_andequal(map, status, status_OK, status_OK)) break; ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-23 18:57 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-23 8:57 PATCH: solving a hang while waiting in FL_STATUS Abel Bernabeu 2008-04-23 10:33 ` David Woodhouse 2008-04-23 13:27 ` Alexey Korolev 2008-04-23 16:35 ` Abel Bernabeu 2008-04-23 16:41 ` David Woodhouse 2008-04-23 17:28 ` Alexey Korolev 2008-04-23 18:57 ` Anders Grafström
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox