* state of support for "external ECC hardware" @ 2012-10-29 20:42 Christopher Harvey 2012-11-08 11:02 ` Gerlando Falauto 2012-11-14 10:59 ` Angus CLARK 0 siblings, 2 replies; 26+ messages in thread From: Christopher Harvey @ 2012-10-29 20:42 UTC (permalink / raw) To: linux-mtd I know of at least one Micron NAND chip that has the ability to handle ECC completely on the NAND chip itself. All the host has to do is send data and the OOB section is updated automatically. The automatic ECC hardware can be enabled and disabled with the "Set Feature" command, (0xEF) and bit flips are reported via get status after page reads. I don't see support for this in 2.6.37, and a quick check in the logs doesn't show anything new for these chips in the latest version of the kernel. Any idea floating around on this list? Are these chips going to be the future for NAND and does Linux care about them? thanks, Chris ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-10-29 20:42 state of support for "external ECC hardware" Christopher Harvey @ 2012-11-08 11:02 ` Gerlando Falauto 2012-11-08 15:21 ` Christopher Harvey 2012-11-08 18:04 ` Ivan Djelic 2012-11-14 10:59 ` Angus CLARK 1 sibling, 2 replies; 26+ messages in thread From: Gerlando Falauto @ 2012-11-08 11:02 UTC (permalink / raw) To: Christopher Harvey; +Cc: stefan.bigler, Ivan Djelic, Brunck, Holger, linux-mtd Hi Chris, good to hear we're not alone in this thinking... :-) We're now facing the exact same issue as some Micron NAND chips (most likely the same one you're dealing with) can no longer live with the default, simple 1-bit ECC implementation used by default (NAND_ECC_SOFT), I guess because chances of having multiple bitflips within the same page are no longer negligible. So some 4-bit ECC mechanism must be used at the very least. Support for software-based multiple-bit-resilient ECC mechanism (BCH) was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I took liberty to Cc:) and merged in March last year. I haven't been able to track how the situation evolved, but apparently you need to enable it (in addition to within the kernel configuration), also within your flash controller setup. Micron gives an example of how to enable it on a sample NAND host controller S3C6410 in this TN (rest of the code, mainly from the above patch, would be already present in recent kernels): http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf As for hardware-based (or on-die) ECC support, one of the application notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on Linux/Android OS, http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2956_ondie_ecc_omap3_linux.pdf) shows how to enable that (rather, it shows how to disable software ECC altogether after enabling it on the chip). However, I haven't been able to find a code section where the information returned by the chip ("Rewrite recommended") is actually used to solicit scrubbing... Neither on the TN, nor on the upstream linux kernel... My next step would be to give it a go and see what happens. I'd love to hear some feedback, if anyone has had experience with this. I know it's not been a long time since your post, but perhaps you've heard something in the meantime? I have one additional question though. Looking at the code I got the impression that decisions upon ECC seem to be based on the flash controller rather than on the flash chip itself. I mean, I would think of having a default 1-bit NAND_ECC_SOFT implementation; only when it is detected that the flash part either supports HW ECC or requires multiple-bit ECC, should the ECC mode get switched to NAND_ECC_NONE or NAND_ECC_SOFT_BCH respectively. No matter what the flash controller, I would say. Ivan, do you think that makes any sense? Thank you so much! Gerlando On 10/29/2012 09:42 PM, Christopher Harvey wrote: > I know of at least one Micron NAND chip that has the ability to handle > ECC completely on the NAND chip itself. All the host has to do is send > data and the OOB section is updated automatically. The automatic ECC > hardware can be enabled and disabled with the "Set Feature" command, > (0xEF) and bit flips are reported via get status after page reads. I > don't see support for this in 2.6.37, and a quick check in the logs > doesn't show anything new for these chips in the latest version of the > kernel. Any idea floating around on this list? Are these chips going > to be the future for NAND and does Linux care about them? > > thanks, > Chris > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 11:02 ` Gerlando Falauto @ 2012-11-08 15:21 ` Christopher Harvey 2012-11-08 16:32 ` Gerlando Falauto 2012-11-08 18:59 ` Ivan Djelic 2012-11-08 18:04 ` Ivan Djelic 1 sibling, 2 replies; 26+ messages in thread From: Christopher Harvey @ 2012-11-08 15:21 UTC (permalink / raw) To: Gerlando Falauto; +Cc: stefan.bigler, Ivan Djelic, Brunck, Holger, linux-mtd On Thu, Nov 08, 2012 at 12:02:27PM +0100, Gerlando Falauto wrote: > Hi Chris, > > good to hear we're not alone in this thinking... :-) > We're now facing the exact same issue as some Micron NAND chips (most > likely the same one you're dealing with) can no longer live with the > default, simple 1-bit ECC implementation used by default > (NAND_ECC_SOFT), I guess because chances of having multiple bitflips > within the same page are no longer negligible. So some 4-bit ECC > mechanism must be used at the very least. We had BCH8 code running, but it wasn't enough. The main reason we switched away from host side ECC was because we were getting bitflips within the ECC codeword data itself. Yes, it would have been possible to add a 1 byte hamming code to protect the main ECC data, but it was just easier to say, "hey, Micron knows their hardware, so we'll trust their algorithms", and enable the Micron ECC hardware. Although it didn't require too much work to enable it's all a total hack. I took the code that runs the "ECC disabled mode", and sprinkled in some extra init code and error checking code. Would be nice to add an "external ecc mode" to support these chips explicitly. > Support for software-based multiple-bit-resilient ECC mechanism (BCH) > was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I > took liberty to Cc:) and merged in March last year. > I haven't been able to track how the situation evolved, but apparently > you need to enable it (in addition to within the kernel configuration), > also within your flash controller setup. > Micron gives an example of how to enable it on a sample NAND host > controller S3C6410 in this TN (rest of the code, mainly from the above > patch, would be already present in recent kernels): > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf I haven't looked into current software ECC algorithms in the kernel. Do the protect against corrupted ECC data? As in, corruptions in the out of bounds area? > As for hardware-based (or on-die) ECC support, one of the application > notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on > Linux/Android OS, > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2956_ondie_ecc_omap3_linux.pdf) > shows how to enable that (rather, it shows how to disable software ECC > altogether after enabling it on the chip). However, I haven't been able > to find a code section where the information returned by the chip > ("Rewrite recommended") is actually used to solicit scrubbing... Neither > on the TN, nor on the upstream linux kernel... My next step would be to > give it a go and see what happens. I got that working, if you're running in "eec disabled mode", try something like this: diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index a796dd7..68af8b0 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1069,7 +1069,16 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *buf, int page) { chip->read_buf(mtd, buf, mtd->writesize); - chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); + chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); /* (this data used in sw ecc) */ + + /* TODO: only do this CMD_STATUS if we have Micron NAND */ + chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1); + if (chip->read_byte(mtd) & NAND_STATUS_REWRITE_RECOMMENDED) { + /* Micron NAND is telling us that this block may be going bad, + tell Linux to move it */ + mtd->ecc_stats.corrected++; /* (we don't actually know if it's just one correction, could be up to 4) */ + } + return 0; } Ran a trace on some manually inserted bitflips, and the block was moved. Hope that helps. > I'd love to hear some feedback, if anyone has had experience with this. > I know it's not been a long time since your post, but perhaps you've > heard something in the meantime? > > I have one additional question though. Looking at the code I got the > impression that decisions upon ECC seem to be based on the flash > controller rather than on the flash chip itself. > I mean, I would think of having a default 1-bit NAND_ECC_SOFT > implementation; only when it is detected that the flash part either > supports HW ECC or requires multiple-bit ECC, should the ECC mode get > switched to NAND_ECC_NONE or NAND_ECC_SOFT_BCH respectively. > No matter what the flash controller, I would say. > > Ivan, do you think that makes any sense? > > Thank you so much! > Gerlando > > On 10/29/2012 09:42 PM, Christopher Harvey wrote: > > I know of at least one Micron NAND chip that has the ability to handle > > ECC completely on the NAND chip itself. All the host has to do is send > > data and the OOB section is updated automatically. The automatic ECC > > hardware can be enabled and disabled with the "Set Feature" command, > > (0xEF) and bit flips are reported via get status after page reads. I > > don't see support for this in 2.6.37, and a quick check in the logs > > doesn't show anything new for these chips in the latest version of the > > kernel. Any idea floating around on this list? Are these chips going > > to be the future for NAND and does Linux care about them? > > > > thanks, > > Chris > > ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 15:21 ` Christopher Harvey @ 2012-11-08 16:32 ` Gerlando Falauto 2012-11-08 16:37 ` Gerlando Falauto ` (3 more replies) 2012-11-08 18:59 ` Ivan Djelic 1 sibling, 4 replies; 26+ messages in thread From: Gerlando Falauto @ 2012-11-08 16:32 UTC (permalink / raw) To: Christopher Harvey Cc: Bigler, Stefan, Ivan Djelic, Brunck, Holger, linux-mtd@lists.infradead.org Hi Chris, first of all thanks for answering this quick! On 11/08/2012 04:21 PM, Christopher Harvey wrote: > On Thu, Nov 08, 2012 at 12:02:27PM +0100, Gerlando Falauto wrote: >> Hi Chris, >> >> good to hear we're not alone in this thinking... :-) >> We're now facing the exact same issue as some Micron NAND chips (most >> likely the same one you're dealing with) can no longer live with the >> default, simple 1-bit ECC implementation used by default >> (NAND_ECC_SOFT), I guess because chances of having multiple bitflips >> within the same page are no longer negligible. So some 4-bit ECC >> mechanism must be used at the very least. > > We had BCH8 code running, but it wasn't enough. The main reason we > switched away from host side ECC was because we were getting bitflips > within the ECC codeword data itself. Wow... I mean, I figured it wouldn't be that easy to (purposedly) get bitflips in any area, I wonder what kind of test you managed to come up with in order to get bitflips within the ECC area itself. In my case it takes several hours (of continuous reads) to get a single bitflip within a 1Gb (128MB) flash. > Yes, it would have been possible > to add a 1 byte hamming code to protect the main ECC data, I'd have thought the algorithm would take care of that itself. Adding a further level of ECC seems a bit unnatural, at least to me. > but it was > just easier to say, "hey, Micron knows their hardware, so we'll trust > their algorithms", and enable the Micron ECC hardware. Although it > didn't require too much work to enable it's all a total hack. I took > the code that runs the "ECC disabled mode", and sprinkled in some > extra init code and error checking code. Would be nice to add an > "external ecc mode" to support these chips explicitly. That was sort of my point below. Would be nice to know whether there is some ongoing work for that matter. > >> Support for software-based multiple-bit-resilient ECC mechanism (BCH) >> was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I >> took liberty to Cc:) and merged in March last year. >> I haven't been able to track how the situation evolved, but apparently >> you need to enable it (in addition to within the kernel configuration), >> also within your flash controller setup. >> Micron gives an example of how to enable it on a sample NAND host >> controller S3C6410 in this TN (rest of the code, mainly from the above >> patch, would be already present in recent kernels): >> http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf > > I haven't looked into current software ECC algorithms in the > kernel. Do the protect against corrupted ECC data? As in, corruptions > in the out of bounds area? I sort of assumed that BCH would take care of that, but I understand you are stating the opposite. >> As for hardware-based (or on-die) ECC support, one of the application >> notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on >> Linux/Android OS, >> http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2956_ondie_ecc_omap3_linux.pdf) >> shows how to enable that (rather, it shows how to disable software ECC >> altogether after enabling it on the chip). However, I haven't been able >> to find a code section where the information returned by the chip >> ("Rewrite recommended") is actually used to solicit scrubbing... Neither >> on the TN, nor on the upstream linux kernel... My next step would be to >> give it a go and see what happens. > > I got that working, if you're running in "eec disabled mode", try something like this: > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c > index a796dd7..68af8b0 100644 > --- a/drivers/mtd/nand/nand_base.c > +++ b/drivers/mtd/nand/nand_base.c > @@ -1069,7 +1069,16 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, > uint8_t *buf, int page) > { > chip->read_buf(mtd, buf, mtd->writesize); > - chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); > + chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); /* (this data used in sw ecc) */ > + > + /* TODO: only do this CMD_STATUS if we have Micron NAND */ > + chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1); > + if (chip->read_byte(mtd)& NAND_STATUS_REWRITE_RECOMMENDED) { > + /* Micron NAND is telling us that this block may be going bad, > + tell Linux to move it */ > + mtd->ecc_stats.corrected++; /* (we don't actually know if it's just one correction, could be up to 4) */ Right, datasheets and TNs don't even mention what the threshold actually is. They just say "Rewrite recommended". Perhaps you could get some feeling while running your tests? I mean, if you could get bitflips by using host-software ECC (within a reasonable time), and after enabling on-die ECC you couldn't anymore, it probably means HW ECC won't tell you about bitflips until they reach a number higher than 1. Am I right? [Did you ever ask Micron by any chance?] > + } > + > return 0; > } > I was going down the way pointed out by Micron in their TN, that is hacking into nand_read_page_hwecc(). But I like your approach more. > Ran a trace on some manually inserted bitflips, and the block was moved. Could you give some pointers on how to manually insert bitflips? nanddump/nandwrite from mtd-utils perhaps? > Hope that helps. Yep, it does help a great deal! Thanks a bunch! Gerlando > >> I'd love to hear some feedback, if anyone has had experience with this. >> I know it's not been a long time since your post, but perhaps you've >> heard something in the meantime? >> >> I have one additional question though. Looking at the code I got the >> impression that decisions upon ECC seem to be based on the flash >> controller rather than on the flash chip itself. >> I mean, I would think of having a default 1-bit NAND_ECC_SOFT >> implementation; only when it is detected that the flash part either >> supports HW ECC or requires multiple-bit ECC, should the ECC mode get >> switched to NAND_ECC_NONE or NAND_ECC_SOFT_BCH respectively. >> No matter what the flash controller, I would say. >> >> Ivan, do you think that makes any sense? >> >> Thank you so much! >> Gerlando >> >> On 10/29/2012 09:42 PM, Christopher Harvey wrote: >>> I know of at least one Micron NAND chip that has the ability to handle >>> ECC completely on the NAND chip itself. All the host has to do is send >>> data and the OOB section is updated automatically. The automatic ECC >>> hardware can be enabled and disabled with the "Set Feature" command, >>> (0xEF) and bit flips are reported via get status after page reads. I >>> don't see support for this in 2.6.37, and a quick check in the logs >>> doesn't show anything new for these chips in the latest version of the >>> kernel. Any idea floating around on this list? Are these chips going >>> to be the future for NAND and does Linux care about them? >>> >>> thanks, >>> Chris >>> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 16:32 ` Gerlando Falauto @ 2012-11-08 16:37 ` Gerlando Falauto 2012-11-08 17:03 ` Christopher Harvey 2012-11-08 17:02 ` Christopher Harvey ` (2 subsequent siblings) 3 siblings, 1 reply; 26+ messages in thread From: Gerlando Falauto @ 2012-11-08 16:37 UTC (permalink / raw) To: Christopher Harvey Cc: Bigler, Stefan, Ivan Djelic, Brunck, Holger, linux-mtd@lists.infradead.org Hi Chris, On 11/08/2012 05:32 PM, Gerlando Falauto wrote: > >> Ran a trace on some manually inserted bitflips, and the block was moved. > > Could you give some pointers on how to manually insert bitflips? > nanddump/nandwrite from mtd-utils perhaps? And BTW, wouldn't you also need to explicitly disable on-die ECC in order to force that, anyway? Thanks again! Gerlando ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 16:37 ` Gerlando Falauto @ 2012-11-08 17:03 ` Christopher Harvey 0 siblings, 0 replies; 26+ messages in thread From: Christopher Harvey @ 2012-11-08 17:03 UTC (permalink / raw) To: Gerlando Falauto Cc: Brunck, Holger, Ivan Djelic, Bigler, Stefan, linux-mtd@lists.infradead.org On Thu, Nov 08, 2012 at 05:37:17PM +0100, Gerlando Falauto wrote: > Hi Chris, > On 11/08/2012 05:32 PM, Gerlando Falauto wrote: > > > >> Ran a trace on some manually inserted bitflips, and the block was moved. > > > > Could you give some pointers on how to manually insert bitflips? > > nanddump/nandwrite from mtd-utils perhaps? > > And BTW, wouldn't you also need to explicitly disable on-die ECC in > order to force that, anyway? Depends on the version I think. IIRC, some are "enabled by default", others are "disabled by default". -C ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 16:32 ` Gerlando Falauto 2012-11-08 16:37 ` Gerlando Falauto @ 2012-11-08 17:02 ` Christopher Harvey 2012-11-08 19:07 ` Ivan Djelic 2012-11-09 8:46 ` Ricard Wanderlof 3 siblings, 0 replies; 26+ messages in thread From: Christopher Harvey @ 2012-11-08 17:02 UTC (permalink / raw) To: Gerlando Falauto Cc: Brunck, Holger, Ivan Djelic, Bigler, Stefan, linux-mtd@lists.infradead.org On Thu, Nov 08, 2012 at 05:32:27PM +0100, Gerlando Falauto wrote: > Hi Chris, > > first of all thanks for answering this quick! > > On 11/08/2012 04:21 PM, Christopher Harvey wrote: > > On Thu, Nov 08, 2012 at 12:02:27PM +0100, Gerlando Falauto wrote: > >> Hi Chris, > >> > >> good to hear we're not alone in this thinking... :-) > >> We're now facing the exact same issue as some Micron NAND chips (most > >> likely the same one you're dealing with) can no longer live with the > >> default, simple 1-bit ECC implementation used by default > >> (NAND_ECC_SOFT), I guess because chances of having multiple bitflips > >> within the same page are no longer negligible. So some 4-bit ECC > >> mechanism must be used at the very least. > > > > We had BCH8 code running, but it wasn't enough. The main reason we > > switched away from host side ECC was because we were getting bitflips > > within the ECC codeword data itself. > > Wow... I mean, I figured it wouldn't be that easy to (purposedly) get > bitflips in any area, I wonder what kind of test you managed to come up > with in order to get bitflips within the ECC area itself. > In my case it takes several hours (of continuous reads) to get a single > bitflip within a 1Gb (128MB) flash. I was surprised too. I was seeing about 30 bitflips per 512MB. Running at about 1/3 of max bus speed. No error codes on write. Micron never said that was abnormal for our chip. > > Yes, it would have been possible > > to add a 1 byte hamming code to protect the main ECC data, > > I'd have thought the algorithm would take care of that itself. Adding a > further level of ECC seems a bit unnatural, at least to me. I don't know the details of BCH, but apparently not. I asked Micron if the OOB area was safer to write to, and they said no. Can somebody on this list confirm this? > > but it was > > just easier to say, "hey, Micron knows their hardware, so we'll trust > > their algorithms", and enable the Micron ECC hardware. Although it > > didn't require too much work to enable it's all a total hack. I took > > the code that runs the "ECC disabled mode", and sprinkled in some > > extra init code and error checking code. Would be nice to add an > > "external ecc mode" to support these chips explicitly. > > That was sort of my point below. Would be nice to know whether there is > some ongoing work for that matter. > > > > >> Support for software-based multiple-bit-resilient ECC mechanism (BCH) > >> was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I > >> took liberty to Cc:) and merged in March last year. > >> I haven't been able to track how the situation evolved, but apparently > >> you need to enable it (in addition to within the kernel configuration), > >> also within your flash controller setup. > >> Micron gives an example of how to enable it on a sample NAND host > >> controller S3C6410 in this TN (rest of the code, mainly from the above > >> patch, would be already present in recent kernels): > >> http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf > > > > I haven't looked into current software ECC algorithms in the > > kernel. Do the protect against corrupted ECC data? As in, corruptions > > in the out of bounds area? > > I sort of assumed that BCH would take care of that, but I understand you > are stating the opposite. > > >> As for hardware-based (or on-die) ECC support, one of the application > >> notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on > >> Linux/Android OS, > >> http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2956_ondie_ecc_omap3_linux.pdf) > >> shows how to enable that (rather, it shows how to disable software ECC > >> altogether after enabling it on the chip). However, I haven't been able > >> to find a code section where the information returned by the chip > >> ("Rewrite recommended") is actually used to solicit scrubbing... Neither > >> on the TN, nor on the upstream linux kernel... My next step would be to > >> give it a go and see what happens. > > > > I got that working, if you're running in "eec disabled mode", try something like this: > > > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c > > index a796dd7..68af8b0 100644 > > --- a/drivers/mtd/nand/nand_base.c > > +++ b/drivers/mtd/nand/nand_base.c > > @@ -1069,7 +1069,16 @@ static int nand_read_page_raw(struct mtd_info *mtd, struct nand_chip *chip, > > uint8_t *buf, int page) > > { > > chip->read_buf(mtd, buf, mtd->writesize); > > - chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); > > + chip->read_buf(mtd, chip->oob_poi, mtd->oobsize); /* (this data used in sw ecc) */ > > + > > + /* TODO: only do this CMD_STATUS if we have Micron NAND */ > > + chip->cmdfunc(mtd, NAND_CMD_STATUS, -1, -1); > > + if (chip->read_byte(mtd)& NAND_STATUS_REWRITE_RECOMMENDED) { > > + /* Micron NAND is telling us that this block may be going bad, > > + tell Linux to move it */ > > + mtd->ecc_stats.corrected++; /* (we don't actually know if it's just one correction, could be up to 4) */ > > Right, datasheets and TNs don't even mention what the threshold actually > is. They just say "Rewrite recommended". Perhaps you could get some > feeling while running your tests? I mean, if you could get bitflips by > using host-software ECC (within a reasonable time), and after enabling > on-die ECC you couldn't anymore, it probably means HW ECC won't tell you > about bitflips until they reach a number higher than 1. Am I right? > > [Did you ever ask Micron by any chance?] Yeah, I asked but I don't remember the answer. I tested with 3 bit flips in a block and didn't see the rewrite recommended bit. 4 did the trick. I didn't go any higher. > > + } > > + > > return 0; > > } > > > > I was going down the way pointed out by Micron in their TN, that is > hacking into nand_read_page_hwecc(). But I like your approach more. > > > Ran a trace on some manually inserted bitflips, and the block was moved. > > Could you give some pointers on how to manually insert bitflips? > nanddump/nandwrite from mtd-utils perhaps? I had 2 kernels in NAND, one that enables Micron ECC, one that didn't. I booted the board over NFS then used nanddump, and a hex editor to put 4 bit flips in a file of AAAAAAAAAA's. (UBIFS). After doing a nandwrite and making sure Micron didn't update its ECC data I rebooted and enabled Micron ECC. when doing 'cat the_aaaa_file', and I was able to watch the UBIFS debug statements say it moved one PEB to another. After dumping the new PEB, I was able to see the original AAAA's and the new ECC data. Also, reading the ECC stats said there was one bitflip detected. Be sure to completely power cycle your nand (not just a reset signal), because the Micron ECC enabled bit is persistent. > > Hope that helps. > > Yep, it does help a great deal! Thanks a bunch! > > Gerlando > > > > >> I'd love to hear some feedback, if anyone has had experience with this. > >> I know it's not been a long time since your post, but perhaps you've > >> heard something in the meantime? > >> > >> I have one additional question though. Looking at the code I got the > >> impression that decisions upon ECC seem to be based on the flash > >> controller rather than on the flash chip itself. > >> I mean, I would think of having a default 1-bit NAND_ECC_SOFT > >> implementation; only when it is detected that the flash part either > >> supports HW ECC or requires multiple-bit ECC, should the ECC mode get > >> switched to NAND_ECC_NONE or NAND_ECC_SOFT_BCH respectively. > >> No matter what the flash controller, I would say. > >> > >> Ivan, do you think that makes any sense? > >> > >> Thank you so much! > >> Gerlando > >> > >> On 10/29/2012 09:42 PM, Christopher Harvey wrote: > >>> I know of at least one Micron NAND chip that has the ability to handle > >>> ECC completely on the NAND chip itself. All the host has to do is send > >>> data and the OOB section is updated automatically. The automatic ECC > >>> hardware can be enabled and disabled with the "Set Feature" command, > >>> (0xEF) and bit flips are reported via get status after page reads. I > >>> don't see support for this in 2.6.37, and a quick check in the logs > >>> doesn't show anything new for these chips in the latest version of the > >>> kernel. Any idea floating around on this list? Are these chips going > >>> to be the future for NAND and does Linux care about them? > >>> > >>> thanks, > >>> Chris > >>> > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 16:32 ` Gerlando Falauto 2012-11-08 16:37 ` Gerlando Falauto 2012-11-08 17:02 ` Christopher Harvey @ 2012-11-08 19:07 ` Ivan Djelic 2012-11-09 8:46 ` Ricard Wanderlof 3 siblings, 0 replies; 26+ messages in thread From: Ivan Djelic @ 2012-11-08 19:07 UTC (permalink / raw) To: Gerlando Falauto Cc: Bigler, Stefan, Christopher Harvey, Brunck, Holger, linux-mtd@lists.infradead.org On Thu, Nov 08, 2012 at 04:32:27PM +0000, Gerlando Falauto wrote: (...) > Right, datasheets and TNs don't even mention what the threshold actually > is. They just say "Rewrite recommended". Perhaps you could get some > feeling while running your tests? I mean, if you could get bitflips by > using host-software ECC (within a reasonable time), and after enabling > on-die ECC you couldn't anymore, it probably means HW ECC won't tell you > about bitflips until they reach a number higher than 1. Am I right? > > [Did you ever ask Micron by any chance?] IIRC, Micron on-die ECC reports a "rewrite recommended" status when the number of bitflips has reached the internal error correction capability (4 in my case). In other words, a "rewrite recommended" means you should rewrite the block ASAP before an additional bitflip triggers an ECC failure. -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 16:32 ` Gerlando Falauto ` (2 preceding siblings ...) 2012-11-08 19:07 ` Ivan Djelic @ 2012-11-09 8:46 ` Ricard Wanderlof 2012-11-12 17:19 ` Gerlando Falauto 2012-11-20 11:13 ` Calvin Johnson 3 siblings, 2 replies; 26+ messages in thread From: Ricard Wanderlof @ 2012-11-09 8:46 UTC (permalink / raw) To: Gerlando Falauto Cc: Brunck, Holger, Christopher Harvey, Bigler, Stefan, Ivan Djelic, linux-mtd@lists.infradead.org On Thu, 8 Nov 2012, Gerlando Falauto wrote: >> We had BCH8 code running, but it wasn't enough. The main reason we >> switched away from host side ECC was because we were getting bitflips >> within the ECC codeword data itself. > > Wow... I mean, I figured it wouldn't be that easy to (purposedly) get > bitflips in any area, I wonder what kind of test you managed to come up > with in order to get bitflips within the ECC area itself. In my case it > takes several hours (of continuous reads) to get a single bitflip within > a 1Gb (128MB) flash. There are 1Gb flashes and 1Gb flashes. Depending on the technology used during manufacture (essentially the scale of the on-chip structures, usually specified as 'xxx nm technology') the bit error probabilities can vary. "Traditional" 1Gb flashes where the manufacturer recommends 1-bit ECC in practice very rarely exhibit bit flips. I have seen bit flips in the OOB area as well as the main area (there was a bug in nand_ecc.c many years ago which didn't handle this correctly which is how I discovered what was going on); indeed there's nothing different about the OOB area in terms of bit flips, it's just another area of (the same type of) flash. The probability for the whole OOB area is of course less than for the rest as it is smaller, but it is the same per bit if I understand it correctly. Some manufacturers (Micron for instance I believe) have started to deliver 1 Gb chips using a higher density technology where they specify a requirement for 4-bit ECC. These naturally exhibit a much higher bitflip rate. At any rate, the ECC algorithm itself should be able to take care of bit flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this by comparing the computed ECC with the actual ECC; if there's a difference of exactly one bit (rather than a more complex diff which after calculations points out the flipped bit in the main area), it is assumed that the bitflip is in the ECC area rather than the data. I don't know how BCH does this though. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-09 8:46 ` Ricard Wanderlof @ 2012-11-12 17:19 ` Gerlando Falauto 2012-11-12 17:35 ` Ivan Djelic 2012-11-20 11:13 ` Calvin Johnson 1 sibling, 1 reply; 26+ messages in thread From: Gerlando Falauto @ 2012-11-12 17:19 UTC (permalink / raw) To: Ricard Wanderlof Cc: Brunck, Holger, Christopher Harvey, Bigler, Stefan, Ivan Djelic, linux-mtd@lists.infradead.org Hi everyone, first of all I am very grateful for your feedback. Thanks a lot to all of you!! On 11/09/2012 09:46 AM, Ricard Wanderlof wrote: > Some manufacturers (Micron for instance I believe) have started to deliver > 1 Gb chips using a higher density technology where they specify a > requirement for 4-bit ECC. These naturally exhibit a much higher bitflip > rate. Would there be any reason *NOT* to use 4-bit ECC with parts which do not require it? Apart from performance, of course. I mean, we need to be as flexible as possible as far as hardware parts are concerned, as long as the basic requirements are met. So we would like to have a single kernel which can run on different flash parts, past, present, and (as far as we can predict) future. As pointed out within this thread, dynamic detection might be a bit tricky, so perhaps finding a common solution might be a good compromise. > At any rate, the ECC algorithm itself should be able to take care of bit > flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this > by comparing the computed ECC with the actual ECC; if there's a difference > of exactly one bit (rather than a more complex diff which after > calculations points out the flipped bit in the main area), it is assumed > that the bitflip is in the ECC area rather than the data. I don't know how > BCH does this though. Ivan, I came to understand (but I am not sure), that the implementation you provided (and currently mainlined) *DOES* handle this correctly. It was instead an old one which did not handle this properly. Is my understanding correct? Thank you again, Gerlando > > /Ricard ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-12 17:19 ` Gerlando Falauto @ 2012-11-12 17:35 ` Ivan Djelic 2012-11-12 17:39 ` Gerlando Falauto 0 siblings, 1 reply; 26+ messages in thread From: Ivan Djelic @ 2012-11-12 17:35 UTC (permalink / raw) To: Gerlando Falauto Cc: Bigler, Stefan, Christopher Harvey, Brunck, Holger, Ricard Wanderlof, linux-mtd@lists.infradead.org On Mon, Nov 12, 2012 at 05:19:57PM +0000, Gerlando Falauto wrote: (...) > > At any rate, the ECC algorithm itself should be able to take care of bit > > flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this > > by comparing the computed ECC with the actual ECC; if there's a difference > > of exactly one bit (rather than a more complex diff which after > > calculations points out the flipped bit in the main area), it is assumed > > that the bitflip is in the ECC area rather than the data. I don't know how > > BCH does this though. > > Ivan, I came to understand (but I am not sure), that the implementation > you provided (and currently mainlined) *DOES* handle this correctly. It > was instead an old one which did not handle this properly. Is my > understanding correct? Yes you are correct. In BCH ECC, there is no difference between data and ecc bytes, they are all part of larger codeword on which error correction is performed. An old patch introducing BCH support in nand/omap2.c had a bug which was triggered when a bitflip was detected in ecc bytes; but this has nothing to do with the way BCH algorithms work. BR, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-12 17:35 ` Ivan Djelic @ 2012-11-12 17:39 ` Gerlando Falauto 2012-11-12 18:52 ` Ivan Djelic 0 siblings, 1 reply; 26+ messages in thread From: Gerlando Falauto @ 2012-11-12 17:39 UTC (permalink / raw) To: Ivan Djelic Cc: Bigler, Stefan, Christopher Harvey, Brunck, Holger, Ricard Wanderlof, linux-mtd@lists.infradead.org Hi Ivan, wonderful, thanks a lot! If you also happen to have an opionion to using it for chips only needing 1-bit correction, I'd love to hear that... Thanks again! Gerlando On 11/12/2012 06:35 PM, Ivan Djelic wrote: > On Mon, Nov 12, 2012 at 05:19:57PM +0000, Gerlando Falauto wrote: > (...) >>> At any rate, the ECC algorithm itself should be able to take care of bit >>> flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this >>> by comparing the computed ECC with the actual ECC; if there's a difference >>> of exactly one bit (rather than a more complex diff which after >>> calculations points out the flipped bit in the main area), it is assumed >>> that the bitflip is in the ECC area rather than the data. I don't know how >>> BCH does this though. >> >> Ivan, I came to understand (but I am not sure), that the implementation >> you provided (and currently mainlined) *DOES* handle this correctly. It >> was instead an old one which did not handle this properly. Is my >> understanding correct? > > Yes you are correct. In BCH ECC, there is no difference between data and ecc bytes, they are > all part of larger codeword on which error correction is performed. > An old patch introducing BCH support in nand/omap2.c had a bug which was triggered when a bitflip > was detected in ecc bytes; but this has nothing to do with the way BCH algorithms work. > BR, > -- > Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-12 17:39 ` Gerlando Falauto @ 2012-11-12 18:52 ` Ivan Djelic 2012-11-14 10:12 ` Gerlando Falauto 0 siblings, 1 reply; 26+ messages in thread From: Ivan Djelic @ 2012-11-12 18:52 UTC (permalink / raw) To: Gerlando Falauto Cc: Bigler, Stefan, Christopher Harvey, Brunck, Holger, Ricard Wanderlof, linux-mtd@lists.infradead.org On Mon, Nov 12, 2012 at 05:39:45PM +0000, Gerlando Falauto wrote: > Hi Ivan, > > wonderful, thanks a lot! > If you also happen to have an opionion to using it for chips only > needing 1-bit correction, I'd love to hear that... I would recommend using the strongest ECC your hardware can provide without hurting performance too much. This is what I do on my hardware (e.g. 8-bit correction on current 4-bit devices). I find it has 2 advantages: - increased reliability - seamless transition to newer devices with stronger ecc requirements The latter is important, because changing ECC strength can be painful: it means changing the OOB layout, impacting bootloader and kernel, thus breaking compatibility, etc. HTH, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-12 18:52 ` Ivan Djelic @ 2012-11-14 10:12 ` Gerlando Falauto 2012-11-14 13:24 ` Angus CLARK 0 siblings, 1 reply; 26+ messages in thread From: Gerlando Falauto @ 2012-11-14 10:12 UTC (permalink / raw) To: Ivan Djelic Cc: Bigler, Stefan, Christopher Harvey, Brunck, Holger, Ricard Wanderlof, linux-mtd@lists.infradead.org Hi Ivan, thanks once more. Speaking of compatibility, I was wondering: doesn't a NAND flash have *any* spare storage space at all, where software could store some information about the current OOB layout and/or ECC mechanism? Partition tables on hard drives for instance have a "partition type" byte which provides some hints about what to expect from the data within a partition. This would be especially useful for *future* compatibility (i.e. old software reading a NAND "formatted" with unknown mechanism could simply stop working, or force read-only mode disabling ECC altogether). Feasibility aside, would that make any sense? Thank you, Gerlando On 11/12/2012 07:52 PM, Ivan Djelic wrote: > On Mon, Nov 12, 2012 at 05:39:45PM +0000, Gerlando Falauto wrote: >> Hi Ivan, >> >> wonderful, thanks a lot! >> If you also happen to have an opionion to using it for chips only >> needing 1-bit correction, I'd love to hear that... > > I would recommend using the strongest ECC your hardware can provide without > hurting performance too much. This is what I do on my hardware (e.g. 8-bit > correction on current 4-bit devices). I find it has 2 advantages: > - increased reliability > - seamless transition to newer devices with stronger ecc requirements > The latter is important, because changing ECC strength can be painful: it > means changing the OOB layout, impacting bootloader and kernel, thus breaking > compatibility, etc. > HTH, > -- > Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-14 10:12 ` Gerlando Falauto @ 2012-11-14 13:24 ` Angus CLARK 2012-11-14 14:48 ` Matthieu CASTET 2012-11-14 20:22 ` Ivan Djelic 0 siblings, 2 replies; 26+ messages in thread From: Angus CLARK @ 2012-11-14 13:24 UTC (permalink / raw) To: Gerlando Falauto Cc: Ricard Wanderlof, Brunck, Holger, linux-mtd@lists.infradead.org, Bigler, Stefan, Ivan Djelic, Christopher Harvey Hi Gerlando, On 11/14/2012 10:12 AM, Gerlando Falauto wrote: > Hi Ivan, > > thanks once more. > Speaking of compatibility, I was wondering: doesn't a NAND flash have > *any* spare storage space at all, where software could store some > information about the current OOB layout and/or ECC mechanism? > Partition tables on hard drives for instance have a "partition type" > byte which provides some hints about what to expect from the data within > a partition. > > This would be especially useful for *future* compatibility (i.e. old > software reading a NAND "formatted" with unknown mechanism could simply > stop working, or force read-only mode disabling ECC altogether). > > Feasibility aside, would that make any sense? > In general I am in favour of anything that facilitates the automatic probing of devices. However, I can see a number of complications in trying to implement what you suggest. Storing static information in a fixed location is never a good idea on NAND. A further issue relates to the very information you are trying to store. The data itself would need to be protected by ECC, but for it to be useful, you need to be able to retrieve it without knowing what ECC/layout was used when storing it. Perhaps, for this ECC/layout data, one could use a special dedicated S/W ECC scheme, strong enough for any device. Yet another layout of complexity though. With regards to "spare storage", I would probably suggest the ECC/layout data be added to the BBT area, assuming Flash-Resident BBTs are being used. My only doubt would be whether there is sufficient motivation to overcome some of the complexities and implement such a scheme... Cheers, Angus > Thank you, > Gerlando > > On 11/12/2012 07:52 PM, Ivan Djelic wrote: >> On Mon, Nov 12, 2012 at 05:39:45PM +0000, Gerlando Falauto wrote: >>> Hi Ivan, >>> >>> wonderful, thanks a lot! >>> If you also happen to have an opionion to using it for chips only >>> needing 1-bit correction, I'd love to hear that... >> >> I would recommend using the strongest ECC your hardware can provide >> without >> hurting performance too much. This is what I do on my hardware (e.g. >> 8-bit >> correction on current 4-bit devices). I find it has 2 advantages: >> - increased reliability >> - seamless transition to newer devices with stronger ecc requirements >> The latter is important, because changing ECC strength can be painful: it >> means changing the OOB layout, impacting bootloader and kernel, thus >> breaking >> compatibility, etc. >> HTH, >> -- >> Ivan > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > -- ------------------------------------- Angus Clark ST Microelectronics (R&D) Ltd. 1000 Aztec West, Bristol, BS32 4SQ email: angus.clark@st.com tel: +44 (0) 1454 462389 st-tina: 065 2389 ------------------------------------- ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-14 13:24 ` Angus CLARK @ 2012-11-14 14:48 ` Matthieu CASTET 2012-11-14 20:22 ` Ivan Djelic 1 sibling, 0 replies; 26+ messages in thread From: Matthieu CASTET @ 2012-11-14 14:48 UTC (permalink / raw) To: Angus CLARK Cc: Bigler, Stefan, Brunck, Holger, Gerlando Falauto, linux-mtd@lists.infradead.org, Ricard Wanderlof, Ivan Djelic, Christopher Harvey Angus CLARK a écrit : > Hi Gerlando, > > On 11/14/2012 10:12 AM, Gerlando Falauto wrote: >> Hi Ivan, >> >> >> Feasibility aside, would that make any sense? >> > > In general I am in favour of anything that facilitates the automatic probing of > devices. However, I can see a number of complications in trying to implement > what you suggest. Storing static information in a fixed location is never a > good idea on NAND. A further issue relates to the very information you are > trying to store. The data itself would need to be protected by ECC, but for it > to be useful, you need to be able to retrieve it without knowing what ECC/layout > was used when storing it. Perhaps, for this ECC/layout data, one could use a > special dedicated S/W ECC scheme, strong enough for any device. Yet another > layout of complexity though. You can use what is used on onfi flash for read parameter data [1] : - duplicate data over n page - use crc to detect corruption Matthieu [1] The host should issue the Read Parameter Page (ECh) command. This command returns information that includes the capabilities, features, and operating parameters of the device. When the information is read from the device, the host shall check the CRC to ensure that the data was received correctly and without error prior to taking action on that data. If the CRC of the first parameter page read is not valid (refer to section 5.7.1.24), the host should read redundant parameter page copies. The host can determine whether a redundant parameter page is present or not by checking if the first four bytes contain at least two bytes of the parameter page signature. If the parameter page signature is present, then the host should read the entirety of that redundant parameter page. The host should then check the CRC of that redundant parameter page. If the CRC is correct, the host may take action based on the contents of that redundant parameter page. If the CRC is incorrect, then the host should attempt to read the next redundant parameter page by the same procedure. The host should continue reading redundant parameter pages until the host is able to accurately reconstruct the parameter page contents. All parameter pages returned by the Target may have invalid CRC values; however, bit-wise majority or other ECC techniques may be used to recover the contents of the parameter page. The host may use bit-wise majority or other ECC techniques to recover the contents of the parameter page from the parameter page copies present. When the host determines that a parameter page signature is not present (refer to section 5.7.1.1), then all parameter pages have been read. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-14 13:24 ` Angus CLARK 2012-11-14 14:48 ` Matthieu CASTET @ 2012-11-14 20:22 ` Ivan Djelic 1 sibling, 0 replies; 26+ messages in thread From: Ivan Djelic @ 2012-11-14 20:22 UTC (permalink / raw) To: Angus CLARK Cc: Ricard Wanderlof, Brunck, Holger, Gerlando Falauto, linux-mtd@lists.infradead.org, Bigler, Stefan, Christopher Harvey On Wed, Nov 14, 2012 at 01:24:43PM +0000, Angus CLARK wrote: > Hi Gerlando, > > On 11/14/2012 10:12 AM, Gerlando Falauto wrote: > > Hi Ivan, > > > > thanks once more. > > Speaking of compatibility, I was wondering: doesn't a NAND flash have > > *any* spare storage space at all, where software could store some > > information about the current OOB layout and/or ECC mechanism? > > Partition tables on hard drives for instance have a "partition type" > > byte which provides some hints about what to expect from the data within > > a partition. > > > > This would be especially useful for *future* compatibility (i.e. old > > software reading a NAND "formatted" with unknown mechanism could simply > > stop working, or force read-only mode disabling ECC altogether). > > > > Feasibility aside, would that make any sense? > > > > In general I am in favour of anything that facilitates the automatic probing of > devices. However, I can see a number of complications in trying to implement > what you suggest. Storing static information in a fixed location is never a > good idea on NAND. A further issue relates to the very information you are > trying to store. The data itself would need to be protected by ECC, but for it > to be useful, you need to be able to retrieve it without knowing what ECC/layout > was used when storing it. Perhaps, for this ECC/layout data, one could use a > special dedicated S/W ECC scheme, strong enough for any device. Yet another > layout of complexity though. FWIW, I have once implemented a kind of primitive "formatting" similar to what you are describing (i.e. storage of NAND parameters inside the device itself). For that, I used a dedicated SW BCH ECC, that adds 3 redundant bytes to each useful byte (effectively multiplying by 4 the required storage). The resulting data can sustain up to 4 bitflips in each 32-bit word; it is also stored redundantly in multiple blocks. BR, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-09 8:46 ` Ricard Wanderlof 2012-11-12 17:19 ` Gerlando Falauto @ 2012-11-20 11:13 ` Calvin Johnson 2012-11-20 11:35 ` Gerlando Falauto 2012-11-20 16:16 ` Ricard Wanderlof 1 sibling, 2 replies; 26+ messages in thread From: Calvin Johnson @ 2012-11-20 11:13 UTC (permalink / raw) To: Ricard Wanderlof Cc: Bigler, Stefan, linux-mtd@lists.infradead.org, Gerlando Falauto, Brunck, Holger, Christopher Harvey, Ivan Djelic Hi, I thought of sharing my recent experience with MLC NAND which requires 24-bit ECC. On Fri, Nov 9, 2012 at 2:16 PM, Ricard Wanderlof <ricard.wanderlof@axis.com> wrote: > > On Thu, 8 Nov 2012, Gerlando Falauto wrote: > >>> We had BCH8 code running, but it wasn't enough. The main reason we >>> switched away from host side ECC was because we were getting bitflips >>> within the ECC codeword data itself. >> >> >> Wow... I mean, I figured it wouldn't be that easy to (purposedly) get >> bitflips in any area, I wonder what kind of test you managed to come up with >> in order to get bitflips within the ECC area itself. In my case it takes >> several hours (of continuous reads) to get a single bitflip within a 1Gb >> (128MB) flash. > > > There are 1Gb flashes and 1Gb flashes. Depending on the technology used > during manufacture (essentially the scale of the on-chip structures, usually > specified as 'xxx nm technology') the bit error probabilities can vary. > > "Traditional" 1Gb flashes where the manufacturer recommends 1-bit ECC in > practice very rarely exhibit bit flips. I have seen bit flips in the OOB > area as well as the main area (there was a bug in nand_ecc.c many years ago > which didn't handle this correctly which is how I discovered what was going > on); indeed there's nothing different about the OOB area in terms of bit > flips, it's just another area of (the same type of) flash. The probability > for the whole OOB area is of course less than for the rest as it is smaller, > but it is the same per bit if I understand it correctly. > > Some manufacturers (Micron for instance I believe) have started to deliver 1 > Gb chips using a higher density technology where they specify a requirement > for 4-bit ECC. These naturally exhibit a much higher bitflip rate. > I'm using Micron's MT29F16G08CBACA. Minimum required ECC :- 24-bit ECC per 1080 bytes of data The H/W ECC controller(external to NAND flash) I'm using supports 24-bit ECC. Had a tough time initially when I started working on this NAND flash. Without being aware of the minimum required ECC, I was using Hamming(1-bit) correction. This showed inconsistency at a level of 1/6, i.e 1 boot out of 6 failed. When I switched to 24-bit ECC with UBIFS, everything seems to work properly without any issue so far. But with JFFS2 still there are many issues. I assume that this can be due to the bit flips in the OOB area which are not covered by ECC. Also for the erased pages, there is no ECC protection and JFFS2 reads first 256 bytes of data and checks for all 0xFF to confirm it is an erased page along with the checking of clean marker it read from the OOB. >From various articles in the internet, it seems that NAND flashes are going to get more denser and the bit flips are going to increase. Hence the H/W ECC controllers are going to have more demand. The S/W BCH algorithm available in Linux will consume plenty of cycles which can be offloaded to the H/W ECC controller. > At any rate, the ECC algorithm itself should be able to take care of bit > flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this > by comparing the computed ECC with the actual ECC; if there's a difference > of exactly one bit (rather than a more complex diff which after calculations > points out the flipped bit in the main area), it is assumed that the bitflip > is in the ECC area rather than the data. I don't know how BCH does this > though. > regards, Calvin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-20 11:13 ` Calvin Johnson @ 2012-11-20 11:35 ` Gerlando Falauto 2012-11-20 12:12 ` Calvin Johnson 2012-11-20 16:16 ` Ricard Wanderlof 1 sibling, 1 reply; 26+ messages in thread From: Gerlando Falauto @ 2012-11-20 11:35 UTC (permalink / raw) To: Calvin Johnson Cc: Ricard Wanderlof, Bigler, Stefan, linux-mtd@lists.infradead.org, Brunck, Holger, Christopher Harvey, Ivan Djelic Hi Calvin, thanks for sharing your experience. On 11/20/2012 12:13 PM, Calvin Johnson wrote: > Hi, > > I thought of sharing my recent experience with MLC NAND which requires > 24-bit ECC. When you say 24-bit, you mean ECC capable of correcting up to 24 bitflips within the same block, right? I guess that should be the case since I hear MLC NANDs are even less reliable than SLC. > > On Fri, Nov 9, 2012 at 2:16 PM, Ricard Wanderlof > <ricard.wanderlof@axis.com> wrote: >> >> On Thu, 8 Nov 2012, Gerlando Falauto wrote: >> >>>> We had BCH8 code running, but it wasn't enough. The main reason we >>>> switched away from host side ECC was because we were getting bitflips >>>> within the ECC codeword data itself. >>> >>> >>> Wow... I mean, I figured it wouldn't be that easy to (purposedly) get >>> bitflips in any area, I wonder what kind of test you managed to come up with >>> in order to get bitflips within the ECC area itself. In my case it takes >>> several hours (of continuous reads) to get a single bitflip within a 1Gb >>> (128MB) flash. >> >> >> There are 1Gb flashes and 1Gb flashes. Depending on the technology used >> during manufacture (essentially the scale of the on-chip structures, usually >> specified as 'xxx nm technology') the bit error probabilities can vary. >> >> "Traditional" 1Gb flashes where the manufacturer recommends 1-bit ECC in >> practice very rarely exhibit bit flips. I have seen bit flips in the OOB >> area as well as the main area (there was a bug in nand_ecc.c many years ago >> which didn't handle this correctly which is how I discovered what was going >> on); indeed there's nothing different about the OOB area in terms of bit >> flips, it's just another area of (the same type of) flash. The probability >> for the whole OOB area is of course less than for the rest as it is smaller, >> but it is the same per bit if I understand it correctly. >> >> Some manufacturers (Micron for instance I believe) have started to deliver 1 >> Gb chips using a higher density technology where they specify a requirement >> for 4-bit ECC. These naturally exhibit a much higher bitflip rate. >> > > I'm using Micron's MT29F16G08CBACA. > Minimum required ECC :- 24-bit ECC per 1080 bytes of data > The H/W ECC controller(external to NAND flash) I'm using supports 24-bit ECC. Could you please share, just for the record, what controller you are using? Do you also know what algorithm is being used? Is that already supported in the kernel or did you have to write the code for it? > Had a tough time initially when I started working on this NAND flash. > Without being aware of the minimum required ECC, I was using > Hamming(1-bit) correction. This showed inconsistency at a level of > 1/6, i.e 1 boot out of 6 failed. > > When I switched to 24-bit ECC with UBIFS, everything seems to work > properly without any issue so far. > > But with JFFS2 still there are many issues. I assume that this can be > due to the bit flips in the OOB area which are not covered by ECC. I'm not that familiar with the whole thing, but I thought you could specify what portions of the OOB area were to be used by the filesystem (like in the case of the on-die HW ECC for Micron as specified in their TN's and discussed here). Or perhaps JFFS2 is too demanding in terms of OOB data that you're also forced to use unprotected portions? > Also for the erased pages, there is no ECC protection and JFFS2 reads > first 256 bytes of data and checks for all 0xFF to confirm it is an > erased page along with the checking of clean marker it read from the > OOB. > > From various articles in the internet, it seems that NAND flashes are > going to get more denser and the bit flips are going to increase. > Hence the H/W ECC controllers are going to have more demand. The S/W > BCH algorithm available in Linux will consume plenty of cycles which > can be offloaded to the H/W ECC controller. Right, so... what is the current support then? >> At any rate, the ECC algorithm itself should be able to take care of bit >> flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does this >> by comparing the computed ECC with the actual ECC; if there's a difference >> of exactly one bit (rather than a more complex diff which after calculations >> points out the flipped bit in the main area), it is assumed that the bitflip >> is in the ECC area rather than the data. I don't know how BCH does this >> though. >> > regards, > Calvin Thanks again, Gerlando ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-20 11:35 ` Gerlando Falauto @ 2012-11-20 12:12 ` Calvin Johnson 0 siblings, 0 replies; 26+ messages in thread From: Calvin Johnson @ 2012-11-20 12:12 UTC (permalink / raw) To: Gerlando Falauto Cc: Ricard Wanderlof, Bigler, Stefan, linux-mtd@lists.infradead.org, Brunck, Holger, Christopher Harvey, Ivan Djelic Hi Gerlando, On Tue, Nov 20, 2012 at 5:05 PM, Gerlando Falauto <gerlando.falauto@keymile.com> wrote: > Hi Calvin, > > thanks for sharing your experience. > > > On 11/20/2012 12:13 PM, Calvin Johnson wrote: >> >> Hi, >> >> I thought of sharing my recent experience with MLC NAND which requires >> 24-bit ECC. > > > When you say 24-bit, you mean ECC capable of correcting up to 24 bitflips > within the same block, right? I guess that should be the case since I hear > MLC NANDs are even less reliable than SLC. Yes, 24-bit ECC means any number of bit flips upto 24 per ECC block can be corrected using this. Generally ECC block size can be 512 Bytes or 1K Bytes according to the ECC H/W engine's buffer capacity. >> >> On Fri, Nov 9, 2012 at 2:16 PM, Ricard Wanderlof >> <ricard.wanderlof@axis.com> wrote: >>> >>> >>> On Thu, 8 Nov 2012, Gerlando Falauto wrote: >>> >>>>> We had BCH8 code running, but it wasn't enough. The main reason we >>>>> switched away from host side ECC was because we were getting bitflips >>>>> within the ECC codeword data itself. >>>> >>>> >>>> >>>> Wow... I mean, I figured it wouldn't be that easy to (purposedly) get >>>> bitflips in any area, I wonder what kind of test you managed to come up >>>> with >>>> in order to get bitflips within the ECC area itself. In my case it takes >>>> several hours (of continuous reads) to get a single bitflip within a 1Gb >>>> (128MB) flash. >>> >>> >>> >>> There are 1Gb flashes and 1Gb flashes. Depending on the technology used >>> during manufacture (essentially the scale of the on-chip structures, >>> usually >>> specified as 'xxx nm technology') the bit error probabilities can vary. >>> >>> "Traditional" 1Gb flashes where the manufacturer recommends 1-bit ECC in >>> practice very rarely exhibit bit flips. I have seen bit flips in the OOB >>> area as well as the main area (there was a bug in nand_ecc.c many years >>> ago >>> which didn't handle this correctly which is how I discovered what was >>> going >>> on); indeed there's nothing different about the OOB area in terms of bit >>> flips, it's just another area of (the same type of) flash. The >>> probability >>> for the whole OOB area is of course less than for the rest as it is >>> smaller, >>> but it is the same per bit if I understand it correctly. >>> >>> Some manufacturers (Micron for instance I believe) have started to >>> deliver 1 >>> Gb chips using a higher density technology where they specify a >>> requirement >>> for 4-bit ECC. These naturally exhibit a much higher bitflip rate. >>> >> >> I'm using Micron's MT29F16G08CBACA. >> Minimum required ECC :- 24-bit ECC per 1080 bytes of data >> The H/W ECC controller(external to NAND flash) I'm using supports 24-bit >> ECC. > > > Could you please share, just for the record, what controller you are using? > Do you also know what algorithm is being used? > Is that already supported in the kernel or did you have to write the code > for it? The controller is inside the SoC. AFAIK, there are 2 popular error correction algorithms. Hamming and BCH. Hamming is used for 2-bit error detection and single bit error correction. BCH can correct to higher levels of bit errors per ECC block size. I used BCH. Although kernel has some H/W ECC support functions, I had to write calculate and correct functions. >> Had a tough time initially when I started working on this NAND flash. >> Without being aware of the minimum required ECC, I was using >> Hamming(1-bit) correction. This showed inconsistency at a level of >> 1/6, i.e 1 boot out of 6 failed. >> >> When I switched to 24-bit ECC with UBIFS, everything seems to work >> properly without any issue so far. >> >> But with JFFS2 still there are many issues. I assume that this can be >> due to the bit flips in the OOB area which are not covered by ECC. > > > I'm not that familiar with the whole thing, but I thought you could specify > what portions of the OOB area were to be used by the filesystem (like in the > case of the on-die HW ECC for Micron as specified in their TN's and > discussed here). > Or perhaps JFFS2 is too demanding in terms of OOB data that you're also > forced to use unprotected portions? JFFS2 places clean markers in the OOB area and any time bits which make up this marker can flip resulting in inconsistent behaviour. > >> Also for the erased pages, there is no ECC protection and JFFS2 reads >> first 256 bytes of data and checks for all 0xFF to confirm it is an >> erased page along with the checking of clean marker it read from the >> OOB. >> >> From various articles in the internet, it seems that NAND flashes are >> going to get more denser and the bit flips are going to increase. >> Hence the H/W ECC controllers are going to have more demand. The S/W >> BCH algorithm available in Linux will consume plenty of cycles which >> can be offloaded to the H/W ECC controller. > > > Right, so... what is the current support then? If anyone is concerned about freeing the processor from performing the SW BCH calculation, can get some HW ECC controllers from the market. I don't know who all supply them. >>> At any rate, the ECC algorithm itself should be able to take care of bit >>> flips in the ECC codes. For the 1-bit algorithm in nand_ecc.c it does >>> this >>> by comparing the computed ECC with the actual ECC; if there's a >>> difference >>> of exactly one bit (rather than a more complex diff which after >>> calculations >>> points out the flipped bit in the main area), it is assumed that the >>> bitflip >>> is in the ECC area rather than the data. I don't know how BCH does this >>> though. regards, Calvin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-20 11:13 ` Calvin Johnson 2012-11-20 11:35 ` Gerlando Falauto @ 2012-11-20 16:16 ` Ricard Wanderlof 1 sibling, 0 replies; 26+ messages in thread From: Ricard Wanderlof @ 2012-11-20 16:16 UTC (permalink / raw) To: Calvin Johnson Cc: Bigler, Stefan, Ricard Wanderlöf, Brunck, Holger, Gerlando Falauto, linux-mtd@lists.infradead.org, Christopher Harvey, Ivan Djelic On Tue, 20 Nov 2012, Calvin Johnson wrote: > I thought of sharing my recent experience with MLC NAND which requires > 24-bit ECC. > ... Thanks for sharing your experiences. > From various articles in the internet, it seems that NAND flashes are > going to get more denser and the bit flips are going to increase. > Hence the H/W ECC controllers are going to have more demand. The S/W > BCH algorithm available in Linux will consume plenty of cycles which > can be offloaded to the H/W ECC controller. That is certainly the case for the newer and larger flashes. However, in the past year or so it seems that manufacturers have appeared which are offering "small" (i.e. 1 Gb and thereabouts) flashes with 1 bit ECC requirements, and are planning to do so for a number of years. It seems that while the big manufacturers (Micron, Samsung, etc) are moving on to state-of-the-art higher densities, there is still a market interest for smaller, more reliable flashes, I would think mostly for code storage for embedded systems. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 15:21 ` Christopher Harvey 2012-11-08 16:32 ` Gerlando Falauto @ 2012-11-08 18:59 ` Ivan Djelic 2012-11-08 19:22 ` Christopher Harvey 1 sibling, 1 reply; 26+ messages in thread From: Ivan Djelic @ 2012-11-08 18:59 UTC (permalink / raw) To: Christopher Harvey Cc: stefan.bigler@keymile.com, Brunck, Holger, linux-mtd@lists.infradead.org, Gerlando Falauto On Thu, Nov 08, 2012 at 03:21:25PM +0000, Christopher Harvey wrote: (...) > We had BCH8 code running, but it wasn't enough. The main reason we > switched away from host side ECC was because we were getting bitflips > within the ECC codeword data itself. But the ECC bytes are part of the BCH codeword, therefore I don't understand what the issue could be ? Are you sure bitflips were not in some unprotected OOB area ? Yes, it would have been possible > to add a 1 byte hamming code to protect the main ECC data, but it was > just easier to say, "hey, Micron knows their hardware, so we'll trust > their algorithms", and enable the Micron ECC hardware. Although it > didn't require too much work to enable it's all a total hack. I took > the code that runs the "ECC disabled mode", and sprinkled in some > extra init code and error checking code. Would be nice to add an > "external ecc mode" to support these chips explicitly. > > > Support for software-based multiple-bit-resilient ECC mechanism (BCH) > > was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I > > took liberty to Cc:) and merged in March last year. > > I haven't been able to track how the situation evolved, but apparently > > you need to enable it (in addition to within the kernel configuration), > > also within your flash controller setup. > > Micron gives an example of how to enable it on a sample NAND host > > controller S3C6410 in this TN (rest of the code, mainly from the above > > patch, would be already present in recent kernels): > > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf > > I haven't looked into current software ECC algorithms in the > kernel. Do the protect against corrupted ECC data? As in, corruptions > in the out of bounds area? Yes, BCH ECC works by generating a codeword containing data+ecc bytes. Errors can be detected and corrected in any location of the codeword (data and ecc). Note that in practice, we are interested in actually fixing errors in data only (not ecc). When an error is detected in ECC bytes, it must simply be reported to trigger block scrubbing. The current software BCH implementation in MTD protects the page data area (and ecc bytes). It does not protect additional bytes in the OOB area (like the Micron on-die ECC does), but since the BCH library is not limited to any particular size, a simple patch could achieve this. BR, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 18:59 ` Ivan Djelic @ 2012-11-08 19:22 ` Christopher Harvey 2012-11-08 19:33 ` Ivan Djelic 0 siblings, 1 reply; 26+ messages in thread From: Christopher Harvey @ 2012-11-08 19:22 UTC (permalink / raw) To: Ivan Djelic Cc: stefan.bigler@keymile.com, Brunck, Holger, linux-mtd@lists.infradead.org, Gerlando Falauto On Thu, Nov 08, 2012 at 07:59:42PM +0100, Ivan Djelic wrote: > On Thu, Nov 08, 2012 at 03:21:25PM +0000, Christopher Harvey wrote: > (...) > > We had BCH8 code running, but it wasn't enough. The main reason we > > switched away from host side ECC was because we were getting bitflips > > within the ECC codeword data itself. > > But the ECC bytes are part of the BCH codeword, therefore I don't understand > what the issue could be ? Are you sure bitflips were not in some unprotected > OOB area ? Ok, the ECC bytes I had were stored in the OOB area and were unprotected. Any bit flips in the OOB area was a disaster. This was coming from a heavily modified forked kernel that had BCH8 bugs in the past. For example, I had to fix this one before the patch came out: http://arago-project.org/git/projects/linux-omap3.git?p=projects/linux-omap3.git;a=commitdiff;h=adc46d691d745604da1197d154fe712e10ec468d;hp=9e78267ed6302537474489e88bd59827315db15b I can't explain why this implementation fails on ECC byte corruption. -Chris ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 19:22 ` Christopher Harvey @ 2012-11-08 19:33 ` Ivan Djelic 0 siblings, 0 replies; 26+ messages in thread From: Ivan Djelic @ 2012-11-08 19:33 UTC (permalink / raw) To: Christopher Harvey Cc: stefan.bigler@keymile.com, Brunck, Holger, linux-mtd@lists.infradead.org, Gerlando Falauto On Thu, Nov 08, 2012 at 07:22:50PM +0000, Christopher Harvey wrote: > On Thu, Nov 08, 2012 at 07:59:42PM +0100, Ivan Djelic wrote: > > On Thu, Nov 08, 2012 at 03:21:25PM +0000, Christopher Harvey wrote: > > (...) > > > We had BCH8 code running, but it wasn't enough. The main reason we > > > switched away from host side ECC was because we were getting bitflips > > > within the ECC codeword data itself. > > > > But the ECC bytes are part of the BCH codeword, therefore I don't understand > > what the issue could be ? Are you sure bitflips were not in some unprotected > > OOB area ? > > Ok, the ECC bytes I had were stored in the OOB area and were > unprotected. Any bit flips in the OOB area was a disaster. This was > coming from a heavily modified forked kernel that had BCH8 bugs in the > past. For example, I had to fix this one before the patch came out: > http://arago-project.org/git/projects/linux-omap3.git?p=projects/linux-omap3.git;a=commitdiff;h=adc46d691d745604da1197d154fe712e10ec468d;hp=9e78267ed6302537474489e88bd59827315db15b > I can't explain why this implementation fails on ECC byte corruption. Oooh, I think I understand now... I had very similar issues with some BCH8 code on an OMAP3630 board. The error correction code was buggy, and would trip on errors located in ecc bytes. Actually, this (and performance issues) is what pushed me into writing lib/bch.c :) BR, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-11-08 11:02 ` Gerlando Falauto 2012-11-08 15:21 ` Christopher Harvey @ 2012-11-08 18:04 ` Ivan Djelic 1 sibling, 0 replies; 26+ messages in thread From: Ivan Djelic @ 2012-11-08 18:04 UTC (permalink / raw) To: Gerlando Falauto Cc: stefan.bigler@keymile.com, Christopher Harvey, Brunck, Holger, linux-mtd@lists.infradead.org On Thu, Nov 08, 2012 at 11:02:27AM +0000, Gerlando Falauto wrote: (...) > > Support for software-based multiple-bit-resilient ECC mechanism (BCH) > was posted (http://lwn.net/Articles/426856/) by Ivan Djelic (which I > took liberty to Cc:) and merged in March last year. > I haven't been able to track how the situation evolved, but apparently > you need to enable it (in addition to within the kernel configuration), > also within your flash controller setup. > Micron gives an example of how to enable it on a sample NAND host > controller S3C6410 in this TN (rest of the code, mainly from the above > patch, would be already present in recent kernels): > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2971_software_bch_ecc_on_linux.pdf Hi Gerlando, The Micron TN2971 is a good step-by-step explanation; it just lacks a mention of the BCH_CONST_PARAMS option that provides much better results (2x) than what their benchmarks are showing. > As for hardware-based (or on-die) ECC support, one of the application > notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on > Linux/Android OS, > http://www.micron.com/~/media/Documents/Products/Technical%20Note/NAND%20Flash/tn2956_ondie_ecc_omap3_linux.pdf) > shows how to enable that (rather, it shows how to disable software ECC > altogether after enabling it on the chip). However, I haven't been able > to find a code section where the information returned by the chip > ("Rewrite recommended") is actually used to solicit scrubbing... Neither > on the TN, nor on the upstream linux kernel... My next step would be to > give it a go and see what happens. > > I'd love to hear some feedback, if anyone has had experience with this. > I know it's not been a long time since your post, but perhaps you've > heard something in the meantime? We have been using several Micron parts with on-die ECC support. We basically had to: 1. Disable SW ECC 2. Enable on-die ECC (SET FEATURE command) 3. Make sure the OOB layout does not conflict with the on-die ECC storage 4. Check Micron dedicated status bit (bit 0 in READ STATUS byte) to report ECC correction (and trigger scrubbing) A tricky part is the initial ROM boot: since the on-die ECC is initially disabled, a SoC ROM generally cannot take advantage of it (unless it is aware of SET FEATURES extensions). Some manufacturers also provide "almost transparent NANDs", in which an internal on-die ECC is enabled at startup, stores its ECC codes in (not accessible) spare area. The device basically behaves like a memory without bitflips, except that a status bit may indicate necessary scrubbing. Some work would indeed be required in MTD to support those parts with various on-die ECC strategies... > I have one additional question though. Looking at the code I got the > impression that decisions upon ECC seem to be based on the flash > controller rather than on the flash chip itself. > I mean, I would think of having a default 1-bit NAND_ECC_SOFT > implementation; only when it is detected that the flash part either > supports HW ECC or requires multiple-bit ECC, should the ECC mode get > switched to NAND_ECC_NONE or NAND_ECC_SOFT_BCH respectively. > No matter what the flash controller, I would say. > > Ivan, do you think that makes any sense? Historically, not all NAND controllers had a HW Hamming engine; but (almost) all NAND devices required 1-bit correction. So the decision was indeed dependent on controller capabilities. Today, _some_ devices are able to reliably report their ECC requirements (e.g. through ONFI parameters). For those devices, your idea could apply. But even so, you would need to map those requirements on the hardware controller capabilities. For instance, some controllers can only do 8-bit error correction on 1024-byte sectors. A NAND with an ECC requirement of 4-bit/512-byte would still be supported, if the driver implemented the necessary heuristic. Since a NAND flash is not a removable device, it does not necessarily require the kind of flexibility you are describing for ECC mode selection. Most people are happy with some platform data informing the driver of the required ECC mode for a given board. BR, -- Ivan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: state of support for "external ECC hardware" 2012-10-29 20:42 state of support for "external ECC hardware" Christopher Harvey 2012-11-08 11:02 ` Gerlando Falauto @ 2012-11-14 10:59 ` Angus CLARK 1 sibling, 0 replies; 26+ messages in thread From: Angus CLARK @ 2012-11-14 10:59 UTC (permalink / raw) To: linux-mtd; +Cc: charvey, gerlando.falauto, Ivan Djelic [-- Attachment #1: Type: text/plain, Size: 4238 bytes --] Hi Chris, Sorry to come to this thread late (I have been working on non-Flash related projects recently!), but I have used several Micron "on-die" ECC parts so I thought I would share my experience. I will try to collate here my comments to some of the issues raised in the thread. On 11/08/2012 04:21 PM, Christopher Harvey wrote: > We had BCH8 code running, but it wasn't enough. The main reason we > switched away from host side ECC was because we were getting bitflips > within the ECC codeword data itself. I think this point has already been dealt with by others, but just to confirm, the ECC algorithms handle bit-flips in the data or ECC code. (In fact, as part of my test procedures, I manually insert bit-errors in the data and ECC areas and check for correct fixups.) On 11/08/2012 04:37 PM, Gerlando Falauto wrote: > And BTW, wouldn't you also need to explicitly disable on-die ECC in > order to force that, anyway? Yes, to insert bit-errors your driver needs to support raw read and write operations. You can then use a combination of nandwrite and nanddump to inject 1 or more bitflips, something like: 1. Write data to Flash, generating ECC data in the process nandwrite page.bin /dev/mtd? 2. Read data + OOB in raw mode nanddump --noecc --oob --length=2048 --file=page_ecc.bin /dev/mtd? 3. Check data for no *real* bit flips 4. Inject bit-flips to 'page_ecc.bin' 5. Write corrupted data to a new page, in raw mode nandwrite --noecc --oob page_ecc_err.bin 6. Read back, using ECC nanddump --length --file=page_ecc_fix.bin 7. Check bit-flips have been corrected [I have a standalone program that implements the same procedure, testing every bit and multiple bits, although it is not really fit for public consumption I am afraid.] On 11/08/2012 05:02 PM, Christopher Harvey wrote: > I was surprised too. I was seeing about 30 bitflips per 512MB. Running > at about 1/3 of max bus speed. No error codes on write. That is probably a bit higher than we have experienced, but not significantly so. On 11/08/2012 05:02 PM, Christopher Harvey wrote: > I don't know the details of BCH, but apparently not. I asked Micron if > the OOB area was safer to write to, and they said no. Can somebody on > this list confirm this? The OOB area is the same as any other part of the page, in terms of reliability, and therefore subject to the same ECC requirements. One thing to look out for with the Micron devices is that the on-die ECC is applied to some but not all of the OOB area. For the ECC-protected OOB, it is important that any data here is written at the same time as the page data -- this has consequences when using filesystems that store meta-data in the OOB (eg YAFFS2 and JFFS2 to some extent). At the time, there was no user-space tool, or IOCTL, that could write Page+OOB in one go. To support writing YAFFS2 images, they had to invent their own IOCTL and a new tool! On 11/12/2012 05:19 PM, Gerlando Falauto wrote: > Would there be any reason *NOT* to use 4-bit ECC with parts which do not > require it? Apart from performance, of course. As long as your ECC potency matches or exceeds the reliability characteristics of the NAND device, there should be no problem (except perhaps performance.) Indeed, some have been known to use over-spec'ed ECC schemes in an attempt to improve endurance and data retention -- the qualification reports from the manufacturers tend to be a bit vague on how effective this strategy might be though. On 11/08/2012 11:02 AM, Gerlando Falauto wrote: > As for hardware-based (or on-die) ECC support, one of the application > notes from Micron (TN-29-56 Enabling On-Die ECC for OMAP3 on > Linux/Android OS The TN provides a good start, but neglects a few areas, including: * the default BBT pattern clashes with on-die ECC locations * it makes no attempt to support raw read/write operations * it does not handle the the REWRITE status flag For what it's worth, I have attached the patch we added to support the Micron on-die ECC devices -- based on a rather old 2.6.32 kernel I am afraid. We have since updated the probing code that detects on-die ECC capabilities, but it might help if you are planning to do your own support. Cheers, Angus [-- Attachment #2: 0001-mtd_nand-Support-for-Micron-on-die-4-bit-ECC-SLC-LP-.patch --] [-- Type: text/plain, Size: 17977 bytes --] From 31fb1df8757177e47e3368f6e735623748449ef4 Mon Sep 17 00:00:00 2001 From: Angus CLARK <angus.clark@st.com> Date: Wed, 8 Jun 2011 17:31:24 +0100 Subject: [PATCH (sh-2.6.32.y)] mtd_nand: Support for Micron on-die 4-bit ECC SLC LP NAND devices This patch adds support for the Micron on-die 4-bit ECC SLC LP family of devices. The main changes are: * Add support for the SET/GET FEATURES NAND CMD (required to enable/disable on-die ECC). * BBT signature moved so as not to clash with the on-die ECC layout * Check STATUS after READ operations for correctable/non-correctable ECC errors * Add a new ECC layout for on-die 4-bit ECC devices. Note, the use of on-die ECC brings a number of limitations to the way in which the OOB area can be used. In particular, some bytes in OOB are ECC-protected, and some are not. The ECC-protected bytes must be written at the same time as the page data. This breaks a number of assumptions made by existing MTD clients. As a result, it is not possible to define a ECC layout that is compatible with both JFFS2 and YAFFS2. Here we have chosen to support JFFS2, since it breaks fewer assumptions, and requires fewer changes to be made elsewhere. (UBI/UBIFS is fully supported, since it does not use the OOB area.) * Disable on-die ECC for 'RAW' page read/write operations * Disable on-die ECC for all OOB read/write operations (provides greatest level of compatibility with existing MTD utilities). * Extend nand_get_flash_type() to correctly distinguish between "legacy" and 4-bit on-die ECC Micron devices. Signed-off-by: Angus Clark <angus.clark@st.com> --- drivers/mtd/nand/nand_base.c | 277 +++++++++++++++++++++++++++++++++++--- drivers/mtd/nand/nand_bbt.c | 30 ++++- drivers/mtd/nand/stm_nand_flex.c | 7 + include/linux/mtd/nand.h | 7 + 4 files changed, 302 insertions(+), 19 deletions(-) diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 98060be..45634cb 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -103,6 +103,40 @@ static struct nand_ecclayout nand_oob_128 = { .length = 78}} }; +/* Micron 4-bit on-die ECC layout + * + * The 64-byte OOB is divided into 4 identical records. Each 16-byte record has + * the following layout: + * 0x00 - 0x01 : Reserved (for Bad Block Markers) + * 0x02 - 0x03 : User Metadata II (unprotected) + * 0x04 - 0x07 : User Metadata I (protected) + * 0x08 - 0x0f : ECC for main + Metadata I regions + * + * The use of on-die ECC brings a number of limitations to the way in which the + * OOB area can be used. In particular, some bytes in OOB are ECC-protected, + * and some are not. The ECC-protected bytes must be written at the same time + * as the page data. This breaks a number of assumptions made by existing MTD + * clients. As a result, it is not possible to define a ECC layout that is + * compatible with both JFFS2 and YAFFS2. Here we have chosen to support JFFS2, + * since it breaks fewer assumptions, and requires fewer changes to be made + * elsewhere. (UBI/UBIFS is fully supported, since it does not use the OOB + * area.) + */ +static struct nand_ecclayout nand_oob_64_4bitondie = { + .eccbytes = 32, + .eccpos = { + 8, 9, 10, 11, 12, 13, 14, 15, + 24, 25, 26, 27, 28, 29, 30, 31, + 40, 41, 42, 43, 44, 45, 46, 47, + 56, 57, 58, 59, 60, 61, 62, 63 }, + .oobfree = { + {.offset = 2, .length = 2}, + {.offset = 18, .length = 2}, + {.offset = 34, .length = 2}, + {.offset = 50, .length = 2} + } +}; + int nand_get_device(struct nand_chip *chip, struct mtd_info *mtd, int new_state); @@ -436,6 +470,104 @@ static int nand_block_checkbad(struct mtd_info *mtd, loff_t ofs, int getchip, return nand_isbad_bbt(mtd, ofs, allowbbt); } +/** + * nand_get_features - issue a "GET FEATURES" command + * @mtd: MTD device structure + * @feature: the feature address (FA) to be used + * @parameters: returned parameters (P1,P2,P3,P4) + * + * Send an entire "SET FEATURES" command to NAND device. This includes + * the feature address (FA), and the set of 4 parameters to use (P1,P2,P3,P4). + */ +static int nand_get_features(struct mtd_info *mtd, int feature, + uint8_t *parameters) +{ + struct nand_chip *chip = mtd->priv; + + /* issue the appropriate command + address */ + chip->cmdfunc(mtd, NAND_CMD_GETFEATURES, feature, -1); + + /* short delay */ + ndelay(100); /* tWB = 100ns */ + + /* wait until "GET FEATURES" command is processed */ + if (!chip->dev_ready) + udelay(chip->chip_delay); + else + while (!chip->dev_ready(mtd)) + ; + + /* read the 4 parameters */ + chip->read_buf(mtd, parameters, 4); + + DEBUG(MTD_DEBUG_LEVEL0, + "%s: with FA=0x%02x, P1=0x%02x, P2=0x%02x, " + "P3=0x%02x, P4=0x%02x\n", + __func__, feature, + parameters[0], parameters[1], parameters[2], parameters[3]); + + return 0; +} + +/** + * nand_set_features - issue a "SET FEATURES" command + * @mtd: MTD device structure + * @feature: the feature address (FA) to be used + * @parameters: the set of 4 parameters to use (P1,P2,P3,P4) + * + * Send an entire "SET FEATURES" command to NAND device. This includes + * the feature address (FA), and the set of 4 parameters to use (P1,P2,P3,P4). + */ +static int nand_set_features(struct mtd_info *mtd, int feature, + const uint8_t *parameters) +{ + struct nand_chip *chip = mtd->priv; + + DEBUG(MTD_DEBUG_LEVEL0, + "%s: with FA=0x%02x, P1=0x%02x, P2=0x%02x, " + "P3=0x%02x, P4=0x%02x\n", + __func__, feature, + parameters[0], parameters[1], parameters[2], parameters[3]); + + /* issue the appropriate command + address */ + chip->cmdfunc(mtd, NAND_CMD_SETFEATURES, feature, -1); + + /* write the 4 parameters */ + chip->write_buf(mtd, parameters, 4); + + /* short delay */ + ndelay(100); /* tWB = 100ns */ + + /* wait until "SET FEATURES" command is processed */ + if (!chip->dev_ready) + udelay(chip->chip_delay); + else + while (!chip->dev_ready(mtd)) + ; + + return 0; +} + +/* + * Micron 4-bit on-die ECC: enable/disable ECC Note, we use 'ecc.postpad' as a + * flag to indicate that on-die ECC is currently enabled; used by + * nand_command_lp() to check on-die ECC status after a read operation. + */ +static void nand_micron_4bit_ondie_ecc(struct mtd_info *mtd, int enable) +{ + struct nand_chip *chip = mtd->priv; + const uint8_t fp_ecc[2][4] = { + {0x0, 0x0, 0x0, 0x0}, + {0x8, 0x0, 0x0, 0x0} + }; + + BUG_ON(enable != 0 && enable != 1); + + nand_set_features(mtd, NAND_FEATURE_MICRON_ARRAY_OP_MODE, + fp_ecc[enable]); + chip->ecc.postpad = enable; +} + /* * Wait for the ready pin, after a command * The timeout is catched later. @@ -587,23 +719,30 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, if (column != -1 || page_addr != -1) { int ctrl = NAND_CTRL_CHANGE | NAND_NCE | NAND_ALE; - /* Serially input address */ - if (column != -1) { - /* Adjust columns for 16 bit buswidth */ - if (chip->options & NAND_BUSWIDTH_16) - column >>= 1; - chip->cmd_ctrl(mtd, column, ctrl); - ctrl &= ~NAND_CTRL_CHANGE; - chip->cmd_ctrl(mtd, column >> 8, ctrl); - } - if (page_addr != -1) { - chip->cmd_ctrl(mtd, page_addr, ctrl); - chip->cmd_ctrl(mtd, page_addr >> 8, - NAND_NCE | NAND_ALE); - /* One more address cycle for devices > 128MiB */ - if (chip->chipsize > (128 << 20)) - chip->cmd_ctrl(mtd, page_addr >> 16, + if (command == NAND_CMD_SETFEATURES || + command == NAND_CMD_GETFEATURES) { + /* Write Feature Address */ + chip->cmd_ctrl(mtd, column & 0xff, ctrl); + } else { + /* Serially input address */ + if (column != -1) { + /* Adjust columns for 16 bit buswidth */ + if (chip->options & NAND_BUSWIDTH_16) + column >>= 1; + chip->cmd_ctrl(mtd, column, ctrl); + ctrl &= ~NAND_CTRL_CHANGE; + chip->cmd_ctrl(mtd, column >> 8, ctrl); + } + if (page_addr != -1) { + chip->cmd_ctrl(mtd, page_addr, ctrl); + chip->cmd_ctrl(mtd, page_addr >> 8, NAND_NCE | NAND_ALE); + /* One more address cycle for devices > 128MiB + */ + if (chip->chipsize > (128 << 20)) + chip->cmd_ctrl(mtd, page_addr >> 16, + NAND_NCE | NAND_ALE); + } } } chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE); @@ -624,6 +763,13 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, case NAND_CMD_DEPLETE1: return; + case NAND_CMD_SETFEATURES: + ndelay(70); /* tADL = 70ns */ + return; + + case NAND_CMD_GETFEATURES: + break; + /* * read error status commands require only a short delay */ @@ -668,6 +814,30 @@ static void nand_command_lp(struct mtd_info *mtd, unsigned int command, chip->cmd_ctrl(mtd, NAND_CMD_NONE, NAND_NCE | NAND_CTRL_CHANGE); + /* If using 4-bit on-die ECC, check status for + * correctable/uncorrectable ECC errors. (ecc.postpad is used as + * a flag to indicate on-die ECC is currently enabled) + */ + if (chip->ecc.mode == NAND_ECC_4BITONDIE && chip->ecc.postpad) { + int status; + + status = chip->waitfunc(mtd, chip); + + if (status & NAND_STATUS_FAIL) + mtd->ecc_stats.failed++; + else if (status & NAND_STATUS_ECCREWRITE) + mtd->ecc_stats.corrected++; + + /* Re-issue CMD0 after STATUS Check */ + chip->cmd_ctrl(mtd, NAND_CMD_READ0, + NAND_NCE | NAND_CLE | NAND_CTRL_CHANGE); + chip->cmd_ctrl(mtd, NAND_CMD_NONE, + NAND_NCE | NAND_CTRL_CHANGE); + + /* Device now ready for reading, return immediately */ + return; + } + /* This applies to read commands */ default: /* @@ -1172,6 +1342,7 @@ int nand_do_read_ops(struct mtd_info *mtd, loff_t from, uint32_t readlen = ops->len; uint32_t oobreadlen = ops->ooblen; uint8_t *bufpoi, *oob, *buf; + int reenable_ondie_ecc = 0; stats = mtd->ecc_stats; @@ -1186,6 +1357,12 @@ int nand_do_read_ops(struct mtd_info *mtd, loff_t from, buf = ops->datbuf; oob = ops->oobbuf; + /* For 'RAW' reads, disable on-die ECC if necessary */ + if (ops->mode == MTD_OOB_RAW && chip->ecc.mode == NAND_ECC_4BITONDIE) { + nand_micron_4bit_ondie_ecc(mtd, 0); + reenable_ondie_ecc = 1; + } + while(1) { bytes = min(mtd->writesize - col, readlen); aligned = (bytes == mtd->writesize); @@ -1282,6 +1459,10 @@ int nand_do_read_ops(struct mtd_info *mtd, loff_t from, if (oob) ops->oobretlen = ops->ooblen - oobreadlen; + /* Re-enable on-die ECC if necessary */ + if (reenable_ondie_ecc) + nand_micron_4bit_ondie_ecc(mtd, 1); + if (ret) return ret; @@ -1485,6 +1666,7 @@ int nand_do_read_oob(struct mtd_info *mtd, loff_t from, int readlen = ops->ooblen; int len; uint8_t *buf = ops->oobbuf; + int reenable_ondie_ecc = 0; DEBUG(MTD_DEBUG_LEVEL3, "%s: from = 0x%08Lx, len = %i\n", __func__, (unsigned long long)from, readlen); @@ -1516,6 +1698,12 @@ int nand_do_read_oob(struct mtd_info *mtd, loff_t from, realpage = (int)(from >> chip->page_shift); page = realpage & chip->pagemask; + /* Disable on-die ECC if necessary */ + if (chip->ecc.mode == NAND_ECC_4BITONDIE) { + nand_micron_4bit_ondie_ecc(mtd, 0); + reenable_ondie_ecc = 1; + } + while(1) { sndcmd = chip->ecc.read_oob(mtd, chip, page, sndcmd); @@ -1557,6 +1745,10 @@ int nand_do_read_oob(struct mtd_info *mtd, loff_t from, sndcmd = 1; } + /* Re-enable on-die ECC if necessary */ + if (reenable_ondie_ecc) + nand_micron_4bit_ondie_ecc(mtd, 1); + ops->oobretlen = ops->ooblen; return 0; } @@ -1883,6 +2075,7 @@ int nand_do_write_ops(struct mtd_info *mtd, loff_t to, uint8_t *oob = ops->oobbuf; uint8_t *buf = ops->datbuf; int ret, subpage; + int reenable_ondie_ecc = 0; ops->retlen = 0; if (!writelen) @@ -1921,6 +2114,12 @@ int nand_do_write_ops(struct mtd_info *mtd, loff_t to, if (likely(!oob)) memset(chip->oob_poi, 0xff, mtd->oobsize); + /* For 'RAW' writes, disable on-die ECC if necessary */ + if (ops->mode == MTD_OOB_RAW && chip->ecc.mode == NAND_ECC_4BITONDIE) { + nand_micron_4bit_ondie_ecc(mtd, 0); + reenable_ondie_ecc = 1; + } + while(1) { int bytes = mtd->writesize; int cached = writelen > bytes && page != blockmask; @@ -1961,6 +2160,10 @@ int nand_do_write_ops(struct mtd_info *mtd, loff_t to, } } + /* Re-enable on-die ECC if necessary */ + if (reenable_ondie_ecc) + nand_micron_4bit_ondie_ecc(mtd, 1); + ops->retlen = ops->len - writelen; if (unlikely(oob)) ops->oobretlen = ops->ooblen; @@ -2018,6 +2221,7 @@ int nand_do_write_oob(struct mtd_info *mtd, loff_t to, { int chipnr, page, status, len; struct nand_chip *chip = mtd->priv; + int reenable_ondie_ecc = 0; DEBUG(MTD_DEBUG_LEVEL3, "%s: to = 0x%08x, len = %i\n", __func__, (unsigned int)to, (int)ops->ooblen); @@ -2072,11 +2276,21 @@ int nand_do_write_oob(struct mtd_info *mtd, loff_t to, if (page == chip->pagebuf) chip->pagebuf = -1; + /* Disable on-die ECC */ + if (chip->ecc.mode == NAND_ECC_4BITONDIE) { + nand_micron_4bit_ondie_ecc(mtd, 0); + reenable_ondie_ecc = 1; + } + memset(chip->oob_poi, 0xff, mtd->oobsize); nand_fill_oob(chip, ops->oobbuf, ops); status = chip->ecc.write_oob(mtd, chip, page & chip->pagemask); memset(chip->oob_poi, 0xff, mtd->oobsize); + /* Re-enable on-die ECC if necessary */ + if (reenable_ondie_ecc) + nand_micron_4bit_ondie_ecc(mtd, 1); + if (status) return status; @@ -2566,6 +2780,18 @@ static struct nand_flash_dev *nand_get_flash_type(struct mtd_info *mtd, /* Get buswidth information */ busw = (extid & 0x01) ? NAND_BUSWIDTH_16 : 0; + /* Micron device: check for 4-bit on-die ECC */ + if (*maf_id == NAND_MFR_MICRON) { + u8 id4, id5; + id4 = chip->read_byte(mtd); + id5 = chip->read_byte(mtd); + + /* Do we have a 5-byte ID ? */ + if (!(id4 == *maf_id && id5 == dev_id)) + /* ECC level in id4[1:0] */ + if ((id4 & 0x3) == 0x2) + chip->ecc.mode = NAND_ECC_4BITONDIE; + } } else { /* * Old devices have chip data hardcoded in the device id table @@ -2730,7 +2956,10 @@ int nand_scan_tail(struct mtd_info *mtd) chip->ecc.layout = &nand_oob_16; break; case 64: - chip->ecc.layout = &nand_oob_64; + if (chip->ecc.mode == NAND_ECC_4BITONDIE) + chip->ecc.layout = &nand_oob_64_4bitondie; + else + chip->ecc.layout = &nand_oob_64; break; case 128: chip->ecc.layout = &nand_oob_128; @@ -2837,6 +3066,20 @@ int nand_scan_tail(struct mtd_info *mtd) chip->ecc.bytes = 0; break; + case NAND_ECC_4BITONDIE: + chip->ecc.read_page = nand_read_page_raw; + chip->ecc.write_page = nand_write_page_raw; + chip->ecc.read_page_raw = nand_read_page_raw; + chip->ecc.write_page_raw = nand_write_page_raw; + chip->ecc.read_oob = nand_read_oob_std; + chip->ecc.write_oob = nand_write_oob_std; + chip->ecc.size = 512; + chip->ecc.bytes = 8; + + /* Turn on on-die ECC */ + nand_micron_4bit_ondie_ecc(mtd, 1); + break; + default: printk(KERN_WARNING "Invalid NAND_ECC_MODE %d\n", chip->ecc.mode); diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c index 6b1942b..fee510a 100644 --- a/drivers/mtd/nand/nand_bbt.c +++ b/drivers/mtd/nand/nand_bbt.c @@ -1165,6 +1165,27 @@ static struct nand_bbt_descr bbt_mirror_descr = { .pattern = mirror_pattern }; +/* BBT descriptors for (Micron) 4-bit on-die ECC */ +static struct nand_bbt_descr bbt_main_descr_ode = { + .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE + | NAND_BBT_2BIT | NAND_BBT_VERSION | NAND_BBT_PERCHIP, + .offs = 8 + 8, /* need to shift by 8 due to on-die ECC */ + .len = 4, + .veroffs = 12 + 8, /* need to shift by 8 due to on-die ECC */ + .maxblocks = 4, + .pattern = bbt_pattern +}; + +static struct nand_bbt_descr bbt_mirror_descr_ode = { + .options = NAND_BBT_LASTBLOCK | NAND_BBT_CREATE | NAND_BBT_WRITE + | NAND_BBT_2BIT | NAND_BBT_VERSION | NAND_BBT_PERCHIP, + .offs = 8 + 8, /* need to shift by 8 due to on-die ECC */ + .len = 4, + .veroffs = 12 + 8, /* need to shift by 8 due to on-die ECC */ + .maxblocks = 4, + .pattern = mirror_pattern +}; + /** * nand_default_bbt - [NAND Interface] Select a default bad block table for the device * @mtd: MTD device structure @@ -1198,8 +1219,13 @@ int nand_default_bbt(struct mtd_info *mtd) if (this->options & NAND_USE_FLASH_BBT) { /* Use the default pattern descriptors */ if (!this->bbt_td) { - this->bbt_td = &bbt_main_descr; - this->bbt_md = &bbt_mirror_descr; + if (this->ecc.mode == NAND_ECC_4BITONDIE) { + this->bbt_td = &bbt_main_descr_ode; + this->bbt_md = &bbt_mirror_descr_ode; + } else { + this->bbt_td = &bbt_main_descr; + this->bbt_md = &bbt_mirror_descr; + } } if (!this->badblock_pattern) { this->badblock_pattern = (mtd->writesize > 512) ? &largepage_flashbased : &smallpage_flashbased; diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 0471f01..67144b7 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -80,6 +80,8 @@ extern void nand_wait_ready(struct mtd_info *mtd); #define NAND_CMD_READID 0x90 #define NAND_CMD_ERASE2 0xd0 #define NAND_CMD_RESET 0xff +#define NAND_CMD_SETFEATURES 0xef +#define NAND_CMD_GETFEATURES 0xee /* Extended commands for large page devices */ #define NAND_CMD_READSTART 0x30 @@ -107,12 +109,16 @@ extern void nand_wait_ready(struct mtd_info *mtd); #define NAND_CMD_NONE -1 +/* Feature Addresses (for the "SET/GET FEATURES" commands) */ +#define NAND_FEATURE_MICRON_ARRAY_OP_MODE 0x90 + /* Status bits */ #define NAND_STATUS_FAIL 0x01 #define NAND_STATUS_FAIL_N1 0x02 #define NAND_STATUS_TRUE_READY 0x20 #define NAND_STATUS_READY 0x40 #define NAND_STATUS_WP 0x80 +#define NAND_STATUS_ECCREWRITE 0x08 /* * Constants for ECC_MODES @@ -123,6 +129,7 @@ typedef enum { NAND_ECC_HW, NAND_ECC_HW_SYNDROME, NAND_ECC_HW_OOB_FIRST, + NAND_ECC_4BITONDIE, } nand_ecc_modes_t; /* -- 1.7.7 ^ permalink raw reply related [flat|nested] 26+ messages in thread
end of thread, other threads:[~2012-11-20 16:16 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-10-29 20:42 state of support for "external ECC hardware" Christopher Harvey 2012-11-08 11:02 ` Gerlando Falauto 2012-11-08 15:21 ` Christopher Harvey 2012-11-08 16:32 ` Gerlando Falauto 2012-11-08 16:37 ` Gerlando Falauto 2012-11-08 17:03 ` Christopher Harvey 2012-11-08 17:02 ` Christopher Harvey 2012-11-08 19:07 ` Ivan Djelic 2012-11-09 8:46 ` Ricard Wanderlof 2012-11-12 17:19 ` Gerlando Falauto 2012-11-12 17:35 ` Ivan Djelic 2012-11-12 17:39 ` Gerlando Falauto 2012-11-12 18:52 ` Ivan Djelic 2012-11-14 10:12 ` Gerlando Falauto 2012-11-14 13:24 ` Angus CLARK 2012-11-14 14:48 ` Matthieu CASTET 2012-11-14 20:22 ` Ivan Djelic 2012-11-20 11:13 ` Calvin Johnson 2012-11-20 11:35 ` Gerlando Falauto 2012-11-20 12:12 ` Calvin Johnson 2012-11-20 16:16 ` Ricard Wanderlof 2012-11-08 18:59 ` Ivan Djelic 2012-11-08 19:22 ` Christopher Harvey 2012-11-08 19:33 ` Ivan Djelic 2012-11-08 18:04 ` Ivan Djelic 2012-11-14 10:59 ` Angus CLARK
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).