* cfi_cmdset_0001.c: Excessive erase suspends @ 2008-04-17 20:38 Anders Grafström 2008-04-18 15:02 ` Alexey Korolev 0 siblings, 1 reply; 12+ messages in thread From: Anders Grafström @ 2008-04-17 20:38 UTC (permalink / raw) To: Linux-MTD Mailing List With recent kernels I've been seeing a lot of these "Newly-erased block contained word 0xffff0000 at offset 0x00180000" on a board using Intel 28F640J5 flash chips. It looks like the errors are caused by large amounts of erase suspends. Each erase gets suspended around 8500 times and in some extreme cases a lot more. The erase ends without any error bits set but it turns out that it has failed. It seems like some flash chips have a limit on the number of times that the erase can be suspended. I have not seen any information on the Intel chips but a Spansion AppNote says 5,980 times for some of their devices before running the risk of an erase fail. So I'm guessing that erase suspends need to be paced somehow? /Anders ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-17 20:38 cfi_cmdset_0001.c: Excessive erase suspends Anders Grafström @ 2008-04-18 15:02 ` Alexey Korolev 2008-04-18 16:35 ` Jamie Lokier 0 siblings, 1 reply; 12+ messages in thread From: Alexey Korolev @ 2008-04-18 15:02 UTC (permalink / raw) To: Anders Grafström; +Cc: Linux-MTD Mailing List Hi Anders, > With recent kernels I've been seeing a lot of these > "Newly-erased block contained word 0xffff0000 at offset 0x00180000" > on a board using Intel 28F640J5 flash chips. > > It looks like the errors are caused by large amounts of erase suspends. > Each erase gets suspended around 8500 times and in some extreme cases > a lot more. The erase ends without any error bits set but it turns out > that it has failed. > > It seems like some flash chips have a limit on the number of times that > the erase can be suspended. I have not seen any information on the Intel > chips but a Spansion AppNote says 5,980 times for some of their devices > before running the risk of an erase fail. > We saw the similar problem in our tests. As a possible solution I could suggest to disable erase suspend on write. Regarding limit of suspend/resume cycles: it is rather unclear how many cycles would be ok how many cycles would be not. Special investigations are required here. Thanks, Alexey ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-18 15:02 ` Alexey Korolev @ 2008-04-18 16:35 ` Jamie Lokier 2008-04-18 17:54 ` Jared Hulbert 0 siblings, 1 reply; 12+ messages in thread From: Jamie Lokier @ 2008-04-18 16:35 UTC (permalink / raw) To: Alexey Korolev; +Cc: Linux-MTD Mailing List, Anders Grafström Alexey Korolev wrote: > > "Newly-erased block contained word 0xffff0000 at offset 0x00180000" > > on a board using Intel 28F640J5 flash chips. > > > > It looks like the errors are caused by large amounts of erase suspends. > > Each erase gets suspended around 8500 times and in some extreme cases > > a lot more. The erase ends without any error bits set but it turns out > > that it has failed. > > > > It seems like some flash chips have a limit on the number of times that > > the erase can be suspended. I have not seen any information on the Intel > > chips but a Spansion AppNote says 5,980 times for some of their devices > > before running the risk of an erase fail. That's very interesting, thanks. > We saw the similar problem in our tests. As a possible solution I could > suggest to disable erase suspend on write. That's quite bad for write latency, though. Adding a suspend cycle counter, and disabling suspend on write when it reaches a certain number sounds better. > Regarding limit of suspend/resume cycles: it is rather unclear how > many cycles would be ok how many cycles would be not. Special > investigations are required here. That's interesting too. - Do other chip docs say how many cycles are acceptable? Is there a count we can assume is safe for all devices of this type - like say 100? - Does the time spent in erase suspend matter? E.g. if it was suspended for 1 minute due to lots of pending writes, restarted, and then suspended _again_ for 1 minute, etc. does that reduce the number of safe suspend-resume cycles due to the unstable partially-erased physical state? - Is it worth reading a block after erasing it, to verify that it's wiped - and mark blocks which have experienced >threshold suspend cycles as needing verification and re-erase, rather than meaning it's a bad block? ( Verification could be done lazily, on each part of the block just before writing. ) But is this good physically, or does too many suspends put the block into an unreliable state even if it does pass verification, so that it's important to limit the suspends rather than allow many and verify afterwards? -- Jamie ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-18 16:35 ` Jamie Lokier @ 2008-04-18 17:54 ` Jared Hulbert 2008-04-18 22:11 ` Anders Grafström 0 siblings, 1 reply; 12+ messages in thread From: Jared Hulbert @ 2008-04-18 17:54 UTC (permalink / raw) To: Alexey Korolev, Anders Grafström, Linux-MTD Mailing List Anders, Just to make sure we aren't overlooking an easier problem: 1) the original report is a JFFS2 error, not direct observation of the mtd partition. 2) Did you check that the MTD is configured to treat this flash with the correct bus width? I've been burned trying to figure out why a 16bit configuration was missing half the data, turned out I had it configured for 32bit MTD accesses. 3) I'd like to see that you can't use flash_eraseall and hexdump /dev/mtdX to see this behavior. Or maybe you could try to create a simple test that would suspend an erase 10K times and verify the erase looking for errors. Before we go off implement solutions lets verify that we understand the problem by implementing some kind a repeatable test condition to trigger this. However, if there is a problem with excessive suspending... > - Do other chip docs say how many cycles are acceptable? > Is there a count we can assume is safe for all devices of this > type - like say 100? A number like that would probably serve to punish many devices while failing to be safe for "all devices". I'm not a big fan of this approach. > - Does the time spent in erase suspend matter? E.g. if it was > suspended for 1 minute due to lots of pending writes, restarted, > and then suspended _again_ for 1 minute, etc. does that reduce the > number of safe suspend-resume cycles due to the unstable > partially-erased physical state? I'm pretty sure that any difficultly with suspends is more about the time between suspends rather than the time _in_ suspend. Rather than minimizing the time in suspend, maximize the time out of suspend. Think of the erase being suspended as a process being context switched. If you context switch to quickly all you do is the switching, no real work in the process. You keep that up to long, you _could_ have a problem. A solution might be an small delay before a write suspends the erase. > - Is it worth reading a block after erasing it, to verify that it's > wiped - and mark blocks which have experienced >threshold suspend > cycles as needing verification and re-erase, rather than meaning > it's a bad block? ( Verification could be done lazily, on each part > of the block just before writing. ) But is this good physically, > or does too many suspends put the block into an unreliable state > even if it does pass verification, so that it's important to limit > the suspends rather than allow many and verify afterwards? Similar to what you are thinking, I believe that even _if_ you can put a block in a funky state by excessive suspending the part would still read cleanly. I don't think this is worth much. Excessive suspending is much more likely to trigger a failed erase, with an error condition being reported by the chip. The MTD will return an error in that case so you don't really need to look for an error. It's kind of in your face. I'd be very surprised if can you can get a consistent behavior like what you describe just by suspending. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-18 17:54 ` Jared Hulbert @ 2008-04-18 22:11 ` Anders Grafström 2008-04-19 2:47 ` Jared Hulbert 2008-04-19 9:18 ` Joakim Tjernlund 0 siblings, 2 replies; 12+ messages in thread From: Anders Grafström @ 2008-04-18 22:11 UTC (permalink / raw) To: Jared Hulbert, Alexey Korolev, Linux-MTD Mailing List Jared Hulbert wrote: > 2) Did you check that the MTD is configured to treat this flash with > the correct bus width? I've been burned trying to figure out why a > 16bit configuration was missing half the data, turned out I had it > configured for 32bit MTD accesses. I think I got it right. > 3) I'd like to see that you can't use flash_eraseall and hexdump > /dev/mtdX to see this behavior. Or maybe you could try to create a > simple test that would suspend an erase 10K times and verify the erase See below. > A solution might be an small delay before a > write suspends the erase. udelay(100) right before the suspend command seems to work for me. It reduces the suspend count to about 1700 and no errors reported by JFFS2. I ran this routine in a standalone test program from u-boot: void test_flash_erase_suspend(void) { volatile unsigned int *flash = (unsigned int *)0x41040000; unsigned int i; unsigned int suspends = 0; unsigned int words = 10; /* Make sure there's something to erase */ for (i = 0; i < 0x10000; i++) { if (flash[i] == 0xffffffff) { flash[i] = 0x00400040; flash[i] = 0; while ((flash[i] & 0x00800080) != 0x00800080); } } /* Clear status */ flash[0] = 0x00500050; /* Erase */ flash[0] = 0x00200020; flash[0] = 0x00d000d0; while (1) { /* Suspend */ flash[0] = 0x00b000b0; suspends++; /* Short delay */ for (i = 0; i < 1000; i++); /* Resume */ flash[0] = 0x00d000d0; flash[0] = 0x00700070; /* Erase done ? */ if ((flash[i] & 0x00800080) == 0x00800080) { printf("\nStatus %08x\n", flash[i]); break; } /* Short delay */ for (i = 0; i < 3500; i++); } /* Read array */ flash[0] = 0x00ff00ff; /* Show the first 10 failed words */ for (i = 0; i < 0x10000; i++) { if (flash[i] != 0xffffffff) { printf("%08x\n", flash[i]); if (--words == 0) break; } } printf("%d suspends\n", suspends); } The delay loop doing 3500 loops seems to be the critical one. Most of the time I get something like this, which I assume means 'Operation abort'. Status 00a000a0 aa695555 695a5555 5a555555 a56a5555 59555555 595a5555 6aa65555 69a65555 69595555 95965555 5647 suspends But sometimes I get this: Status 00a00080 aa690000 696a0000 5a950000 a66a0000 59590000 595a0000 6a960000 69a60000 69590000 95960000 5625 suspends Here's a link to the document that I previously referred to: http://www.spansion.com/application_notes/erase_susp_appnote_00_a1_e.pdf /Anders ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-18 22:11 ` Anders Grafström @ 2008-04-19 2:47 ` Jared Hulbert 2008-04-19 9:18 ` Joakim Tjernlund 1 sibling, 0 replies; 12+ messages in thread From: Jared Hulbert @ 2008-04-19 2:47 UTC (permalink / raw) To: Anders Grafström; +Cc: Linux-MTD Mailing List, Alexey Korolev > > A solution might be an small delay before a > > write suspends the erase. > > > > udelay(100) right before the suspend command seems to work for me. > It reduces the suspend count to about 1700 and no errors reported > by JFFS2. good. Not sure how/if that sort of fix can be merged. So you didn't see this kind of problem with older kernels? > I ran this routine in a standalone test program from u-boot: > > void test_flash_erase_suspend(void) > { > } > > The delay loop doing 3500 loops seems to be the critical one. Interesting. I'm going to talk with some people... > Most of the time I get something like this, which I assume > means 'Operation abort'. > > Status 00a000a0 Yes, bit 5 is erase error. You've got to expect data to be bad under those circumstances. > Status 00a00080 > aa690000 you have 2 parts in parallel, right? So the chip on the high order word is expected to be garbage, erase error and all. But the lower word chip confuses me, that shouldn't happen. Alexey, what do you think? ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-18 22:11 ` Anders Grafström 2008-04-19 2:47 ` Jared Hulbert @ 2008-04-19 9:18 ` Joakim Tjernlund 2008-04-19 13:47 ` Anders Grafström 1 sibling, 1 reply; 12+ messages in thread From: Joakim Tjernlund @ 2008-04-19 9:18 UTC (permalink / raw) To: 'Anders Grafström', 'Jared Hulbert', 'Alexey Korolev', 'Linux-MTD Mailing List' > -----Original Message----- > From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf > Of Anders Grafström > Sent: den 19 april 2008 00:11 > To: Jared Hulbert; Alexey Korolev; Linux-MTD Mailing List > Subject: Re: cfi_cmdset_0001.c: Excessive erase suspends > > Jared Hulbert wrote: > > 2) Did you check that the MTD is configured to treat this flash with > > the correct bus width? I've been burned trying to figure out why a > > 16bit configuration was missing half the data, turned out I had it > > configured for 32bit MTD accesses. > > I think I got it right. > > > 3) I'd like to see that you can't use flash_eraseall and hexdump > > /dev/mtdX to see this behavior. Or maybe you could try to create a > > simple test that would suspend an erase 10K times and verify the erase > > See below. > > > A solution might be an small delay before a > > write suspends the erase. > > udelay(100) right before the suspend command seems to work for me. > It reduces the suspend count to about 1700 and no errors reported > by JFFS2. > > > I ran this routine in a standalone test program from u-boot: > > void test_flash_erase_suspend(void) > { > volatile unsigned int *flash = (unsigned int *)0x41040000; > unsigned int i; > unsigned int suspends = 0; > unsigned int words = 10; > > /* Make sure there's something to erase */ > for (i = 0; i < 0x10000; i++) { > if (flash[i] == 0xffffffff) { > flash[i] = 0x00400040; > flash[i] = 0; > while ((flash[i] & 0x00800080) != 0x00800080); > } > } > > /* Clear status */ > flash[0] = 0x00500050; > > /* Erase */ > flash[0] = 0x00200020; > flash[0] = 0x00d000d0; > > while (1) { > /* Suspend */ > flash[0] = 0x00b000b0; > suspends++; > > /* Short delay */ > for (i = 0; i < 1000; i++); > Don't you need to check that the chip is suspended before you do any new operation? > /* Resume */ > flash[0] = 0x00d000d0; > flash[0] = 0x00700070; > > /* Erase done ? */ > if ((flash[i] & 0x00800080) == 0x00800080) { > printf("\nStatus %08x\n", flash[i]); > break; > } > > /* Short delay */ > for (i = 0; i < 3500; i++); > } > > /* Read array */ > flash[0] = 0x00ff00ff; > > /* Show the first 10 failed words */ > for (i = 0; i < 0x10000; i++) { > if (flash[i] != 0xffffffff) { > printf("%08x\n", flash[i]); > if (--words == 0) > break; > } > } > > printf("%d suspends\n", suspends); > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-19 9:18 ` Joakim Tjernlund @ 2008-04-19 13:47 ` Anders Grafström 2008-04-19 17:01 ` Joakim Tjernlund 2008-04-24 14:34 ` Alexey Korolev 0 siblings, 2 replies; 12+ messages in thread From: Anders Grafström @ 2008-04-19 13:47 UTC (permalink / raw) To: Joakim Tjernlund, 'Jared Hulbert', 'Alexey Korolev', 'Linux-MTD Mailing List' Joakim Tjernlund wrote: > Don't you need to check that the chip is suspended before you do any > new operation? You're right. Here's the improved version; static void erase_suspend(unsigned int loops) { volatile unsigned int *flash = (unsigned int *)0x41040000; unsigned int i; unsigned int suspends = 0; unsigned int words = 2; unsigned int timeout; /* Make sure there's something to erase */ for (i = 0; i < 0x10000; i++) { if (flash[i] == 0xffffffff) { flash[i] = 0x00400040; flash[i] = 0; while ((flash[i] & 0x00800080) != 0x00800080); } } /* Clear status */ flash[0] = 0x00500050; /* Erase */ flash[0] = 0x00200020; flash[0] = 0x00d000d0; while (1) { /* Suspend */ flash[0] = 0x00b000b0; flash[0] = 0x00700070; timeout = 100000; while ((flash[0] & 0x00800080) != 0x00800080 && --timeout); if (!timeout) printf("suspend timeout\n"); suspends++; /* Short delay */ for (i = 0; i < 1000; i++); /* Resume */ flash[0] = 0x00d000d0; flash[0] = 0x00700070; /* Erase done ? */ if ((flash[0] & 0x00800080) == 0x00800080) { printf("\nstatus %08x\n", flash[0]); break; } /* Short delay */ for (i = 0; i < loops; i++); } /* Read array */ flash[0] = 0x00ff00ff; /* Show the first 2 failed words */ for (i = 0; i < 0x10000; i++) { if (flash[i] != 0xffffffff) { printf("%08x\n", flash[i]); if (--words == 0) break; } } printf("%d suspends\n", suspends); } void test_flash_erase_suspend(void) { unsigned int loops; for (loops = 6000; loops > 2000; loops -= 500) { printf("\n%d loops", loops); erase_suspend(loops); } } It produced this output at one occasion: 6000 loops status 00800080 ffff0000 ffff0000 4783 suspends 5500 loops status 00800080 5508 suspends 5000 loops status 008000a0 6394 suspends 4500 loops status 00a000a0 fffffeff fffffffe 5169 suspends 4000 loops status 00a000a0 fffffeff fffffeff 5378 suspends 3500 loops status 00a000a0 aaaaaaaa aaaaaaaa 5715 suspends 3000 loops status 00a000a0 55555555 55555555 6142 suspends 2500 loops status 00a000a0 00000000 00000000 6929 suspends It locks up when I drop it down to 2000 or less iterations in the loop between resume and suspend. I think I need to do more testing with properly timed delays. Is there anything else that I have missed? /Anders ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-19 13:47 ` Anders Grafström @ 2008-04-19 17:01 ` Joakim Tjernlund 2008-04-24 14:34 ` Alexey Korolev 1 sibling, 0 replies; 12+ messages in thread From: Joakim Tjernlund @ 2008-04-19 17:01 UTC (permalink / raw) To: 'Anders Grafström', 'Jared Hulbert', 'Alexey Korolev', 'Linux-MTD Mailing List' > -----Original Message----- > From: Anders Grafström [mailto:grfstrm@users.sourceforge.net] > Sent: den 19 april 2008 15:47 > To: Joakim Tjernlund; 'Jared Hulbert'; 'Alexey Korolev'; 'Linux-MTD Mailing List' > Subject: Re: cfi_cmdset_0001.c: Excessive erase suspends > > Joakim Tjernlund wrote: > > Don't you need to check that the chip is suspended before you do any > > new operation? > > You're right. > > Here's the improved version; > > static void erase_suspend(unsigned int loops) > { > volatile unsigned int *flash = (unsigned int *)0x41040000; > unsigned int i; > unsigned int suspends = 0; > unsigned int words = 2; > unsigned int timeout; > > /* Make sure there's something to erase */ > for (i = 0; i < 0x10000; i++) { > if (flash[i] == 0xffffffff) { > flash[i] = 0x00400040; > flash[i] = 0; > while ((flash[i] & 0x00800080) != 0x00800080); > } > } > > /* Clear status */ > flash[0] = 0x00500050; > > /* Erase */ > flash[0] = 0x00200020; > flash[0] = 0x00d000d0; > > while (1) { > /* Suspend */ > flash[0] = 0x00b000b0; > flash[0] = 0x00700070; > timeout = 100000; > while ((flash[0] & 0x00800080) != 0x00800080 && --timeout); > if (!timeout) > printf("suspend timeout\n"); > suspends++; > > /* Short delay */ > for (i = 0; i < 1000; i++); > > /* Resume */ > flash[0] = 0x00d000d0; > flash[0] = 0x00700070; > > /* Erase done ? */ > if ((flash[0] & 0x00800080) == 0x00800080) { > printf("\nstatus %08x\n", flash[0]); > break; > } > > /* Short delay */ > for (i = 0; i < loops; i++); > } > > /* Read array */ > flash[0] = 0x00ff00ff; > > /* Show the first 2 failed words */ > for (i = 0; i < 0x10000; i++) { > if (flash[i] != 0xffffffff) { > printf("%08x\n", flash[i]); > if (--words == 0) > break; > } > } > > printf("%d suspends\n", suspends); > } > > void test_flash_erase_suspend(void) > { > unsigned int loops; > > for (loops = 6000; loops > 2000; loops -= 500) { > printf("\n%d loops", loops); > erase_suspend(loops); > } > } > > > It produced this output at one occasion: > > 6000 loops > status 00800080 > ffff0000 > ffff0000 > 4783 suspends > > 5500 loops > status 00800080 > 5508 suspends > > 5000 loops > status 008000a0 > 6394 suspends > > 4500 loops > status 00a000a0 > fffffeff > fffffffe > 5169 suspends > > 4000 loops > status 00a000a0 > fffffeff > fffffeff > 5378 suspends > > 3500 loops > status 00a000a0 > aaaaaaaa > aaaaaaaa > 5715 suspends > > 3000 loops > status 00a000a0 > 55555555 > 55555555 > 6142 suspends > > 2500 loops > status 00a000a0 > 00000000 > 00000000 > 6929 suspends > > It locks up when I drop it down to 2000 or less iterations in the loop between resume and suspend. > I think I need to do more testing with properly timed delays. > > Is there anything else that I have missed? Perhaps, you should make "i" a volatile. That said I know that Intel had erase suspend bugs in their early strata flash, thats why there is a define in cmdset_0001 that disables them. Maybe you got buggy chips? oh, just remembered, strata chips needs to have a stable address bus before CS is pulled. We had add a small delay on CS in the memory controller, otherwise they would fail, especially when cold. > > /Anders > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-19 13:47 ` Anders Grafström 2008-04-19 17:01 ` Joakim Tjernlund @ 2008-04-24 14:34 ` Alexey Korolev 2008-04-24 21:02 ` Anders Grafström 1 sibling, 1 reply; 12+ messages in thread From: Alexey Korolev @ 2008-04-24 14:34 UTC (permalink / raw) To: Anders Grafström; +Cc: 'Linux-MTD Mailing List', Joakim Tjernlund Hi Anders, Could you please try this patch on your platform. It should solve the issue. As I said before we also have a case which reproduces the issue. After applying it the issue is not seen on test items which usually fails. (However to prove that it is Ok we need to execute whole bunch. It will be completed on next day) diff -aurp linux-2.6.24-a/drivers/mtd/chips/cfi_cmdset_0001.c linux-2.6.24-b/drivers/mtd/chips/cfi_cmdset_0001.c --- linux-2.6.24-a/drivers/mtd/chips/cfi_cmdset_0001.c 2008-02-11 08:51:11.000000000 +0300 +++ linux-2.6.24-b/drivers/mtd/chips/cfi_cmdset_0001.c 2008-04-24 18:13:00.000000000 +0400 @@ -874,6 +874,10 @@ static void put_chip(struct map_info *ma do. */ map_write(map, CMD(0xd0), adr); map_write(map, CMD(0x70), adr); + /* According to recomendations from h/w experts adding wait + of 100us just after resume solves the erase suspend issues + seen on some NOR parts. */ + udelay(100); chip->oldstate = FL_READY; chip->state = FL_ERASING; break; Thanks, Alexey ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-24 14:34 ` Alexey Korolev @ 2008-04-24 21:02 ` Anders Grafström 2008-04-25 9:59 ` Alexey Korolev 0 siblings, 1 reply; 12+ messages in thread From: Anders Grafström @ 2008-04-24 21:02 UTC (permalink / raw) To: Alexey Korolev; +Cc: 'Linux-MTD Mailing List', Joakim Tjernlund Alexey Korolev wrote: > Could you please try this patch on your platform. > It should solve the issue. > As I said before we also have a case which reproduces the issue. So I'm not the only one seeing it. That's 'good' to know. > After applying it the issue is not seen on test items which usually > fails. (However to prove that it is Ok > we need to execute whole bunch. It will be completed on next day) Works a lot better but it has failed one time so far with the patch applied. To answer Jared's question earlier, I've seen the problem with 2.6.18 as well but it was very rare, a couple of months in between sightings. With 2.6.23 it got frequent and even worse with 2.6.25. I'm overwriting a file, size about 6MB, to a file system that is filled to the brim. It triggers around 25-30 erases and around 10 of those usually fail. So the problem is probably not the number of suspends but more likely a latency issue? This patch used to work around the issue but it's not enough any more: diff --git a/fs/jffs2/erase.c b/fs/jffs2/erase.c index 5e2719c..be8fc87 100644 --- a/fs/jffs2/erase.c +++ b/fs/jffs2/erase.c @@ -18,6 +18,9 @@ #include <linux/pagemap.h> #include "nodelist.h" +/* max. erase failures before we mark a block bad */ +#define MAX_ERASE_FAILURES 2 + struct erase_priv_struct { struct jffs2_eraseblock *jeb; struct jffs2_sb_info *c; @@ -190,8 +193,23 @@ static void jffs2_erase_failed(struct jffs2_sb_info *c, struct jffs2_eraseblock mutex_unlock(&c->erase_free_sem); return; } + } else if (c->mtd->type != MTD_NANDFLASH) { + if( ++jeb->bad_count < MAX_ERASE_FAILURES) { + /* We'd like to give this block another try. */ + printk(KERN_ERR "Retrying erase at 0x%08x\n", jeb->offset); + mutex_lock(&c->erase_free_sem); + spin_lock(&c->erase_completion_lock); + list_move(&jeb->list, &c->erase_pending_list); + c->erasing_size -= c->sector_size; + c->dirty_size += c->sector_size; + jeb->dirty_size = c->sector_size; + spin_unlock(&c->erase_completion_lock); + mutex_unlock(&c->erase_free_sem); + return; + } } + printk(KERN_ERR "Failed erase at 0x%08x\n", jeb->offset); mutex_lock(&c->erase_free_sem); spin_lock(&c->erase_completion_lock); c->erasing_size -= c->sector_size; @@ -454,6 +472,7 @@ static void jffs2_mark_erased_block(struct jffs2_sb_info *c, struct jffs2_eraseb } /* Everything else got zeroed before the erase */ jeb->free_size = c->sector_size; + jeb->bad_count = 0; mutex_lock(&c->erase_free_sem); spin_lock(&c->erase_completion_lock); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: cfi_cmdset_0001.c: Excessive erase suspends 2008-04-24 21:02 ` Anders Grafström @ 2008-04-25 9:59 ` Alexey Korolev 0 siblings, 0 replies; 12+ messages in thread From: Alexey Korolev @ 2008-04-25 9:59 UTC (permalink / raw) To: Anders Grafström; +Cc: 'Linux-MTD Mailing List', Joakim Tjernlund Hi > > After applying it the issue is not seen on test items which usually > > fails. (However to prove that it is Ok > > we need to execute whole bunch. It will be completed on next day) > > Works a lot better but it has failed one time so far with the patch applied. > Could you please specify how it fails. I'm not sure that we are talking about the same issue. Also I need to see the data read just after broken erase. Could you please read the same data several times just after broken erase and compare it? Will data in erase block be the same? Is you flash configuration interleaved? Thanks, Alexey ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-04-25 9:59 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-17 20:38 cfi_cmdset_0001.c: Excessive erase suspends Anders Grafström 2008-04-18 15:02 ` Alexey Korolev 2008-04-18 16:35 ` Jamie Lokier 2008-04-18 17:54 ` Jared Hulbert 2008-04-18 22:11 ` Anders Grafström 2008-04-19 2:47 ` Jared Hulbert 2008-04-19 9:18 ` Joakim Tjernlund 2008-04-19 13:47 ` Anders Grafström 2008-04-19 17:01 ` Joakim Tjernlund 2008-04-24 14:34 ` Alexey Korolev 2008-04-24 21:02 ` Anders Grafström 2008-04-25 9:59 ` Alexey Korolev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox