linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] mtd: gpmi: Bitflip support in erased regions
@ 2013-12-09 19:58 Elie De Brauwer
  2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
  2013-12-11 13:24 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
  0 siblings, 2 replies; 6+ messages in thread
From: Elie De Brauwer @ 2013-12-09 19:58 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1; +Cc: Elie De Brauwer, linux-mtd

Fixed cc to linux-mtd, please ignore my previous version.

Hello all,

I bumped into an issue on a custom board with an i.MX28 and a Micron 
MT29F4G08 NAND flash. My system running a 3.9.0 failed to boot during 
upgrade testing  due to UBI errors related to a bitflips in NAND:

[    3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
[    3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
...
[    4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
[    4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
[    4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383

Diving a bit deeper with nanddump:
root@(none):~# nanddump -a  /dev/mtd8  > /dev/null
ECC failed: 8
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 262144, page size 4096, OOB size 224
Dumping data starting at 0x00000000 and ending at 0x1ea00000...
ECC: 1 corrected bitflip(s) at offset 0x042c2000
ECC: 1 uncorrectable bitflip(s) at offset 0x06efe000
root@(none):~# nanddump  -s 116129792 -c --noecc     -l 262144 /dev/mtd8 
...
0x06efe6a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7f  |................|

Which is points to a well know 'corrupt empty space' issue, which appears 
every now and then:
 - http://permalink.gmane.org/gmane.linux.drivers.mtd/46617
 - http://lists.infradead.org/pipermail/linux-mtd/2012-January/039254.html

Hence I went on a quest to teach my NAND driver how to do this, gpmi-nand in 
question. The problem is that although on properly written data which gets
streamed through the BCH block we get 16 bit ecc, if we erase block we git
like 0 bit ecc, since erase is a command, not a stream of data travelling 
through the BCH block. The BCH block (see i.MX28 reference manual chapters 
15 GPMI and 16 BCH) can tell us of protected chunks:
 - if they are error free (if ecc data is present)
 - the amount of bitflips they contain (if ecc data is present)
 - if they are fully erased (all 0xFF's)
 - if they are uncorrectable (# bitflips > ecc_strength, or 0xFF with 
bitflips).
In the current situation as soon as a single bitflip exists in a region 
where the parity information is all 0xFF (looking like it's erased) the 
block is marked as uncorrectable. Which is a pity since I can peform this 
kind of ECC by hand.

Quote datasheet:
"As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
that all of the bits of a payload data block or metadata block are one, including any associated
parity bytes. The all-ones case for both parity and data indicates an erased block in the
NAND device."

Fortunately we can more or less tune this parameter by using the 
ERASE_THRESHOLD in HW_BCH_MODE register:
"This value indicates the maximum number of zero bits on a flash page for 
it to be considered erased. For SLC NAND devices, this value should be 
programmed to 0 (meaning that the entire page should consist of bytes of 
0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank 
pages), so this threshold can be used to tune the erased page checking 
algorithm."

So as my solution I'm setting this erase threshold to the ecc_strength 
derived from the geometry, meaning that I will tolerate the same number of 
bitflips the BCH block would consider correctable.
The side effect is that whever I'm reading a page (gpmi_ecc_read_page() ) 
which the BCH block marked as "erased" I need to take a software approach. 
The software approach is inspired on what is currently
done in the omap2 driver (but not free from discussion). At that point I 
now that the page can contain up to ecc_strenght bitflips, so I need to 
count and correct them if necessary. This obviously gives a slight overhead
 when compared to a normal read of erased pages but is more polite towards 
upper layers.
On the other hand, the upper layers should also show some intelligence when 
it comes to reading erased pages which doesn't make much sense either. 

I considered alternatives based upon the 'let it fails as it does now, and 
try to intelligently figure out whether or not it's an erased page or not' 
possibly using additional byte in the metadata or something based
on fuzzy rules, but this is actually the solution which ended up giving 
most certainty. 

I have tested this on a 3.9/i.MX28 and after applying this patch my board 
went from a stubbornly-whining-about-corrupt-empty-space to happily 
mounting the partition and even the trace of my stuck bit disappeared:

root@(none):~# nanddump -a  /dev/mtd8  > /dev/null
ECC failed: 0
ECC corrected: 1
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 262144, page size 4096, OOB size 224
Dumping data starting at 0x00000000 and ending at 0x1ea00000...
ECC: 1 corrected bitflip(s) at offset 0x042c2000


I have also seen Pekon is eagerly trying to get the code removed from omap2,
 (e.g.  http://lists.infradead.org/pipermail/linux-mtd/2013-July/047548.html ) 
but even though his set of patches is currently in their 4th version I 
haven't seen any proper solution to handling bitflips in erased pages 
without iterating through them. 

Any suggestions or feedback is welcomed.
E.


Elie De Brauwer (1):
  mtd: gpmi: Deal with bitflips in erased regions regions

 drivers/mtd/nand/gpmi-nand/bch-regs.h  |    1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |    7 ++++++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c |   40 +++++++++++++++++++++++++++++---
 3 files changed, 45 insertions(+), 3 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions
  2013-12-09 19:58 [PATCH v1] mtd: gpmi: Bitflip support in erased regions Elie De Brauwer
@ 2013-12-09 19:58 ` Elie De Brauwer
  2013-12-10  9:37   ` Peter Korsgaard
  2013-12-11 13:24 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
  1 sibling, 1 reply; 6+ messages in thread
From: Elie De Brauwer @ 2013-12-09 19:58 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1; +Cc: Elie De Brauwer, linux-mtd

The BCH block typically used with a i.MX28 and GPMI block is only
able to correct bitflips on data actually streamed through the block.
When erasing a block the data does not stream through the BCH block
and therefore not ECC data is written to the NAND chip. This causes
gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
found in an erased page. Typically causing problems at higher levels
(ubifs corrupted empty space warnings).

This patch configures the BCH block to mark a block as 'erased' if
no more than ecc_strength bitflips are found. The result is that
erased blocks are doublechecked by software for bitflips and that
bitflips are corrected and reported, allowing the upper layers to
take proper actions.

Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
---
 drivers/mtd/nand/gpmi-nand/bch-regs.h  |    1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |    7 ++++++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c |   40 +++++++++++++++++++++++++++++---
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/bch-regs.h b/drivers/mtd/nand/gpmi-nand/bch-regs.h
index 588f537..b2104de 100644
--- a/drivers/mtd/nand/gpmi-nand/bch-regs.h
+++ b/drivers/mtd/nand/gpmi-nand/bch-regs.h
@@ -31,6 +31,7 @@
 
 #define HW_BCH_STATUS0				0x00000010
 #define HW_BCH_MODE				0x00000020
+#define BM_BCH_MODE_ERASE_THRESHOLD_MASK	0xff
 #define HW_BCH_ENCODEPTR			0x00000030
 #define HW_BCH_DATAPTR				0x00000040
 #define HW_BCH_METAPTR				0x00000050
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
index aaced29..4236739 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
@@ -286,6 +286,13 @@ int bch_set_geometry(struct gpmi_nand_data *this)
 			| BF_BCH_FLASH0LAYOUT1_DATAN_SIZE(block_size, this),
 			r->bch_regs + HW_BCH_FLASH0LAYOUT1);
 
+	/*
+	 * Set the tolerance for bitflips when reading erased blocks
+	 * equal to the ecc_strength.
+	 */
+	writel(bch_geo->ecc_strength & BM_BCH_MODE_ERASE_THRESHOLD_MASK,
+		r->bch_regs + HW_BCH_MODE);
+
 	/* Set *all* chip selects to use layout 0. */
 	writel(0, r->bch_regs + HW_BCH_LAYOUTSELECT);
 
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index dabbc14..766261f 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -1012,6 +1012,30 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+/*
+ * Count the number of 0 bits in a supposed to be
+ * erased region and correct them. Return the number
+ * of bitflips or zero when the region was correct.
+ */
+static unsigned int erased_sector_bitflips(unsigned char *data,
+					unsigned int chunk,
+					struct bch_geometry *geo)
+{
+	unsigned int flip_bits = 0;
+	int i;
+	int base = geo->ecc_chunk_size * chunk;
+
+	/* Count bitflips */
+	for (i = 0; i < geo->ecc_chunk_size; i++)
+		flip_bits += hweight8(~data[base + i]);
+
+	/* Correct bitflips by 0xFF'ing this chunk. */
+	if (flip_bits)
+		memset(&data[base], 0xFF, geo->ecc_chunk_size);
+
+	return flip_bits;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1023,6 +1047,7 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	dma_addr_t    auxiliary_phys;
 	unsigned int  i;
 	unsigned char *status;
+	unsigned int  flips;
 	unsigned int  max_bitflips = 0;
 	int           ret;
 
@@ -1057,15 +1082,24 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
-		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
+		if (*status == STATUS_GOOD)
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
 			mtd->ecc_stats.failed++;
 			continue;
 		}
-		mtd->ecc_stats.corrected += *status;
-		max_bitflips = max_t(unsigned int, max_bitflips, *status);
+
+		if (*status == STATUS_ERASED)
+			/* Erased block, check/correct bitflips manually. */
+			flips = erased_sector_bitflips(payload_virt, i,
+							nfc_geo);
+		else
+			/* BCH block corrected some errors for us. */
+			flips = *status;
+
+		mtd->ecc_stats.corrected += flips;
+		max_bitflips = max_t(unsigned int, max_bitflips, flips);
 	}
 
 	if (oob_required) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions
  2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
@ 2013-12-10  9:37   ` Peter Korsgaard
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Korsgaard @ 2013-12-10  9:37 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: b32955, dwmw2, linux-mtd, dedekind1

>>>>> "Elie" == Elie De Brauwer <eliedebrauwer@gmail.com> writes:

 > The BCH block typically used with a i.MX28 and GPMI block is only
 > able to correct bitflips on data actually streamed through the block.
 > When erasing a block the data does not stream through the BCH block
 > and therefore not ECC data is written to the NAND chip. This causes

s/not/no/

 > gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
 > found in an erased page. Typically causing problems at higher levels
 > (ubifs corrupted empty space warnings).

 > This patch configures the BCH block to mark a block as 'erased' if
 > no more than ecc_strength bitflips are found. The result is that
 > erased blocks are doublechecked by software for bitflips and that
 > bitflips are corrected and reported, allowing the upper layers to
 > take proper actions.

 > Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>

Looks good otherwise.

Acked-by: Peter Korsgaard <peter@korsgaard.com>

 > ---
 >  drivers/mtd/nand/gpmi-nand/bch-regs.h  |    1 +
 >  drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |    7 ++++++
 >  drivers/mtd/nand/gpmi-nand/gpmi-nand.c |   40 +++++++++++++++++++++++++++++---
 >  3 files changed, 45 insertions(+), 3 deletions(-)

 > diff --git a/drivers/mtd/nand/gpmi-nand/bch-regs.h b/drivers/mtd/nand/gpmi-nand/bch-regs.h
 > index 588f537..b2104de 100644
 > --- a/drivers/mtd/nand/gpmi-nand/bch-regs.h
 > +++ b/drivers/mtd/nand/gpmi-nand/bch-regs.h
 > @@ -31,6 +31,7 @@
 
 >  #define HW_BCH_STATUS0				0x00000010
 >  #define HW_BCH_MODE				0x00000020
 > +#define BM_BCH_MODE_ERASE_THRESHOLD_MASK	0xff
 >  #define HW_BCH_ENCODEPTR			0x00000030
 >  #define HW_BCH_DATAPTR				0x00000040
 >  #define HW_BCH_METAPTR				0x00000050
 > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
 > index aaced29..4236739 100644
 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
 > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
 > @@ -286,6 +286,13 @@ int bch_set_geometry(struct gpmi_nand_data *this)
 >  			| BF_BCH_FLASH0LAYOUT1_DATAN_SIZE(block_size, this),
 r-> bch_regs + HW_BCH_FLASH0LAYOUT1);
 
 > +	/*
 > +	 * Set the tolerance for bitflips when reading erased blocks
 > +	 * equal to the ecc_strength.
 > +	 */
 > +	writel(bch_geo->ecc_strength & BM_BCH_MODE_ERASE_THRESHOLD_MASK,
 > +		r->bch_regs + HW_BCH_MODE);
 > +
 >  	/* Set *all* chip selects to use layout 0. */
 >  	writel(0, r->bch_regs + HW_BCH_LAYOUTSELECT);
 
 > diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
 > index dabbc14..766261f 100644
 > --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
 > +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
 > @@ -1012,6 +1012,30 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 >  	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 >  }
 
 > +/*
 > + * Count the number of 0 bits in a supposed to be
 > + * erased region and correct them. Return the number
 > + * of bitflips or zero when the region was correct.
 > + */
 > +static unsigned int erased_sector_bitflips(unsigned char *data,
 > +					unsigned int chunk,
 > +					struct bch_geometry *geo)
 > +{
 > +	unsigned int flip_bits = 0;
 > +	int i;
 > +	int base = geo->ecc_chunk_size * chunk;
 > +
 > +	/* Count bitflips */
 > +	for (i = 0; i < geo->ecc_chunk_size; i++)
 > +		flip_bits += hweight8(~data[base + i]);
 > +
 > +	/* Correct bitflips by 0xFF'ing this chunk. */
 > +	if (flip_bits)
 > +		memset(&data[base], 0xFF, geo->ecc_chunk_size);
 > +
 > +	return flip_bits;
 > +}
 > +
 >  static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 >  				uint8_t *buf, int oob_required, int page)
 >  {
 > @@ -1023,6 +1047,7 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 >  	dma_addr_t    auxiliary_phys;
 >  	unsigned int  i;
 >  	unsigned char *status;
 > +	unsigned int  flips;
 >  	unsigned int  max_bitflips = 0;
 >  	int           ret;
 
 > @@ -1057,15 +1082,24 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 >  	status = auxiliary_virt + nfc_geo->auxiliary_status_offset;
 
 >  	for (i = 0; i < nfc_geo->ecc_chunk_count; i++, status++) {
 > -		if ((*status == STATUS_GOOD) || (*status == STATUS_ERASED))
 > +		if (*status == STATUS_GOOD)
 >  			continue;
 
 >  		if (*status == STATUS_UNCORRECTABLE) {
 mtd-> ecc_stats.failed++;
 >  			continue;
 >  		}
 > -		mtd->ecc_stats.corrected += *status;
 > -		max_bitflips = max_t(unsigned int, max_bitflips, *status);
 > +
 > +		if (*status == STATUS_ERASED)
 > +			/* Erased block, check/correct bitflips manually. */
 > +			flips = erased_sector_bitflips(payload_virt, i,
 > +							nfc_geo);
 > +		else
 > +			/* BCH block corrected some errors for us. */
 > +			flips = *status;
 > +
 > +		mtd->ecc_stats.corrected += flips;
 > +		max_bitflips = max_t(unsigned int, max_bitflips, flips);
 >  	}
 
 >  	if (oob_required) {
 > -- 
 > 1.7.10.4


 > ______________________________________________________
 > Linux MTD discussion mailing list
 > http://lists.infradead.org/mailman/listinfo/linux-mtd/


-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] mtd: gpmi: Bitflip support in erased regions
  2013-12-09 19:58 [PATCH v1] mtd: gpmi: Bitflip support in erased regions Elie De Brauwer
  2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
@ 2013-12-11 13:24 ` Huang Shijie
  2013-12-13  8:49   ` Huang Shijie
  1 sibling, 1 reply; 6+ messages in thread
From: Huang Shijie @ 2013-12-11 13:24 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: b32955, dwmw2, linux-mtd, dedekind1

On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
> Fixed cc to linux-mtd, please ignore my previous version.
> 
> Hello all,
> 
> I bumped into an issue on a custom board with an i.MX28 and a Micron 
> MT29F4G08 NAND flash. My system running a 3.9.0 failed to boot during 
> upgrade testing  due to UBI errors related to a bitflips in NAND:
> 
> [    3.831323] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [    3.845026] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [    3.858710] UBI warning: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read only 16384 bytes, retry
> [    3.872408] UBI error: ubi_io_read: error -74 (ECC error) while reading 16384 bytes from PEB 443:245760, read 16384 bytes
> ...
> [    4.011529] UBIFS error (pid 36): ubifs_recover_leb: corrupt empty space LEB 27:237568, corruption starts at 9815
> [    4.021897] UBIFS error (pid 36): ubifs_scanned_corruption: corruption at LEB 27:247383
> [    4.030000] UBIFS error (pid 36): ubifs_scanned_corruption: first 6569 bytes from LEB 27:247383

thanks a lot for this patch. 

I met the "corrupt empty space" issue too.


> 
> Diving a bit deeper with nanddump:
> root@(none):~# nanddump -a  /dev/mtd8  > /dev/null
> ECC failed: 8
> ECC corrected: 0
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
> ECC: 1 uncorrectable bitflip(s) at offset 0x06efe000
> root@(none):~# nanddump  -s 116129792 -c --noecc     -l 262144 /dev/mtd8 
> ...
> 0x06efe6a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 7f  |................|
> 
> Which is points to a well know 'corrupt empty space' issue, which appears 
> every now and then:
>  - http://permalink.gmane.org/gmane.linux.drivers.mtd/46617
>  - http://lists.infradead.org/pipermail/linux-mtd/2012-January/039254.html
> 
> Hence I went on a quest to teach my NAND driver how to do this, gpmi-nand in 
> question. The problem is that although on properly written data which gets
> streamed through the BCH block we get 16 bit ecc, if we erase block we git
> like 0 bit ecc, since erase is a command, not a stream of data travelling 
> through the BCH block. The BCH block (see i.MX28 reference manual chapters 
> 15 GPMI and 16 BCH) can tell us of protected chunks:
>  - if they are error free (if ecc data is present)
>  - the amount of bitflips they contain (if ecc data is present)
>  - if they are fully erased (all 0xFF's)
>  - if they are uncorrectable (# bitflips > ecc_strength, or 0xFF with 
> bitflips).
> In the current situation as soon as a single bitflip exists in a region 
> where the parity information is all 0xFF (looking like it's erased) the 
> block is marked as uncorrectable. Which is a pity since I can peform this 
> kind of ECC by hand.
> 
> Quote datasheet:
> "As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
> that all of the bits of a payload data block or metadata block are one, including any associated
> parity bytes. The all-ones case for both parity and data indicates an erased block in the
> NAND device."
> 
> Fortunately we can more or less tune this parameter by using the 
> ERASE_THRESHOLD in HW_BCH_MODE register:
> "This value indicates the maximum number of zero bits on a flash page for 
> it to be considered erased. For SLC NAND devices, this value should be 

I met the "correct empty space" with a Toshiba SLC nand.
The spec tells us it should be 0 for the SLC nand.	

I will double-check it tomorrow.
	
> programmed to 0 (meaning that the entire page should consist of bytes of 
> 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank 
> pages), so this threshold can be used to tune the erased page checking 
> algorithm."
> 
> So as my solution I'm setting this erase threshold to the ecc_strength 
> derived from the geometry, meaning that I will tolerate the same number of 
> bitflips the BCH block would consider correctable.
> The side effect is that whever I'm reading a page (gpmi_ecc_read_page() ) 
> which the BCH block marked as "erased" I need to take a software approach. 
> The software approach is inspired on what is currently
> done in the omap2 driver (but not free from discussion). At that point I 
> now that the page can contain up to ecc_strenght bitflips, so I need to 

The ecc_strength can be 40 sometimes.

I really donot know what is the proper value for the ERASE_THRESHOLD.

Maybe set ERASE_THRESHOLD with 2 is ok?
I think the ecc_strength is a little large.


> count and correct them if necessary. This obviously gives a slight overhead
>  when compared to a normal read of erased pages but is more polite towards 
> upper layers.
> On the other hand, the upper layers should also show some intelligence when 
> it comes to reading erased pages which doesn't make much sense either. 
> 
> I considered alternatives based upon the 'let it fails as it does now, and 
> try to intelligently figure out whether or not it's an erased page or not' 
> possibly using additional byte in the metadata or something based
> on fuzzy rules, but this is actually the solution which ended up giving 
> most certainty. 
> 
> I have tested this on a 3.9/i.MX28 and after applying this patch my board 
> went from a stubbornly-whining-about-corrupt-empty-space to happily 
> mounting the partition and even the trace of my stuck bit disappeared:
> 
> root@(none):~# nanddump -a  /dev/mtd8  > /dev/null
> ECC failed: 0
> ECC corrected: 1
> Number of bad blocks: 0
> Number of bbt blocks: 0
> Block size 262144, page size 4096, OOB size 224
> Dumping data starting at 0x00000000 and ending at 0x1ea00000...
> ECC: 1 corrected bitflip(s) at offset 0x042c2000
> 
> 
> I have also seen Pekon is eagerly trying to get the code removed from omap2,
>  (e.g.  http://lists.infradead.org/pipermail/linux-mtd/2013-July/047548.html ) 
> but even though his set of patches is currently in their 4th version I 
> haven't seen any proper solution to handling bitflips in erased pages 
> without iterating through them. 
> 
I will read it.


Please give us more time about this issue.
I will discuss it with out IC guy.

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] mtd: gpmi: Bitflip support in erased regions
  2013-12-11 13:24 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
@ 2013-12-13  8:49   ` Huang Shijie
  2013-12-13 10:00     ` Elie De Brauwer
  0 siblings, 1 reply; 6+ messages in thread
From: Huang Shijie @ 2013-12-13  8:49 UTC (permalink / raw)
  To: Huang Shijie; +Cc: dwmw2, Elie De Brauwer, linux-mtd, dedekind1

On Wed, Dec 11, 2013 at 09:24:58PM +0800, Huang Shijie wrote:
> On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
> > 
> > Quote datasheet:
> > "As the BCH decoder reads the data and parity blocks, it records a special condition, i.e.,
> > that all of the bits of a payload data block or metadata block are one, including any associated
> > parity bytes. The all-ones case for both parity and data indicates an erased block in the
> > NAND device."
> > 
> > Fortunately we can more or less tune this parameter by using the 
> > ERASE_THRESHOLD in HW_BCH_MODE register:
> > "This value indicates the maximum number of zero bits on a flash page for 
> > it to be considered erased. For SLC NAND devices, this value should be 
> 
> I met the "correct empty space" with a Toshiba SLC nand.
> The spec tells us it should be 0 for the SLC nand.	
> 
> I will double-check it tomorrow.

I confirm that a SLC nand can meet this issue too.

> 	
> > programmed to 0 (meaning that the entire page should consist of bytes of 
> > 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank 
> > pages), so this threshold can be used to tune the erased page checking 
> > algorithm."
> > 
> > So as my solution I'm setting this erase threshold to the ecc_strength 
> > derived from the geometry, meaning that I will tolerate the same number of 
> > bitflips the BCH block would consider correctable.
> > The side effect is that whever I'm reading a page (gpmi_ecc_read_page() ) 
A solution is to check the HW_BCH_STATUS0:ALLONES.

If it is all ones, we do not check it by software.


> > which the BCH block marked as "erased" I need to take a software approach. 
> > The software approach is inspired on what is currently
> > done in the omap2 driver (but not free from discussion). At that point I 
> > now that the page can contain up to ecc_strenght bitflips, so I need to 
> 
> The ecc_strength can be 40 sometimes.

The only problem is what is the proper value for ERASE_THRESHOLD.

still need more investigate.

thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] mtd: gpmi: Bitflip support in erased regions
  2013-12-13  8:49   ` Huang Shijie
@ 2013-12-13 10:00     ` Elie De Brauwer
  0 siblings, 0 replies; 6+ messages in thread
From: Elie De Brauwer @ 2013-12-13 10:00 UTC (permalink / raw)
  To: Huang Shijie
  Cc: David Woodhouse, Huang Shijie, linux-mtd@lists.infradead.org,
	Artem Bityutskiy

On Fri, Dec 13, 2013 at 9:49 AM, Huang Shijie <b32955@freescale.com> wrote:
> On Wed, Dec 11, 2013 at 09:24:58PM +0800, Huang Shijie wrote:
>> On Mon, Dec 09, 2013 at 08:58:10PM +0100, Elie De Brauwer wrote:
>> > programmed to 0 (meaning that the entire page should consist of bytes of
>> > 0xFF. For MLC NAND devices, bit errors may occur on reads (even on blank
>> > pages), so this threshold can be used to tune the erased page checking
>> > algorithm."
>> >
>> > So as my solution I'm setting this erase threshold to the ecc_strength
>> > derived from the geometry, meaning that I will tolerate the same number of
>> > bitflips the BCH block would consider correctable.
>> > The side effect is that whever I'm reading a page (gpmi_ecc_read_page() )
> A solution is to check the HW_BCH_STATUS0:ALLONES.
>
> If it is all ones, we do not check it by software.

Ah, the HW_BCH_STATUS0 register does indeed contain some interesting bits:
- ALLONES  1 = All data bits of this transaction are ONE
(source: imx datasheet page 1292)

Hence I will change gpmi_err_read_page() from my original patch
gpmi_ecc_read_page(). Going from something like:

if (*status == STATUS_ERASED)
    /* Erased block, check/correct bitflips manually. */
    flips = erased_sector_bitflips(payload_virt, i, nfc_geo);
else
   /* BCH block corrected some errors for us. */
   flips = *status;

Into something more

if (*status == STATUS_ERASED)
    if (readl(HW_BCH_STATUS0 & BM_BCH_STATUS0_ALLONES))
        flips = 0;
    else
        /* Erased block, check/correct bitflips manually. */
        flips = erased_sector_bitflips(payload_virt, i, nfc_geo);
else
   /* BCH block corrected some errors for us. */
   flips = *status;


(Hiding the readl() in a function in gpmi-lib.c since all register
peek-n-poke is occuring there). And this way only doing the
soft-bitflip-check where absolutely necessary (where necessary is
defined on a page granularity).

I will provide a v2 of my patch incorporating these changes after I've
had a chance to test them on hardware. (Somewhere this
evening/tomorrow).

>
>
>> > which the BCH block marked as "erased" I need to take a software approach.
>> > The software approach is inspired on what is currently
>> > done in the omap2 driver (but not free from discussion). At that point I
>> > now that the page can contain up to ecc_strenght bitflips, so I need to
>>
>> The ecc_strength can be 40 sometimes.
>
> The only problem is what is the proper value for ERASE_THRESHOLD.
>
> still need more investigate.

Okay, I'm awaiting input on this.

The reason why I set this to ecc_strength was because I assumed that
the BCH could correct up to ecc_strength biterrors, hence I could fix
this amount of bitflips myself. But I agree it can be rough estimate.
In my specific case hardcoding it to 1 would have already fixed it.


Thanks for your feedback !



-- 
Elie De Brauwer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-13 10:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-09 19:58 [PATCH v1] mtd: gpmi: Bitflip support in erased regions Elie De Brauwer
2013-12-09 19:58 ` [PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions Elie De Brauwer
2013-12-10  9:37   ` Peter Korsgaard
2013-12-11 13:24 ` [PATCH v1] mtd: gpmi: Bitflip support in erased regions Huang Shijie
2013-12-13  8:49   ` Huang Shijie
2013-12-13 10:00     ` Elie De Brauwer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).