[PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.

linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
@ 2014-01-07 19:44 Elie De Brauwer
  2014-01-07 19:44 ` Elie De Brauwer
  2014-01-08  5:38 ` Huang Shijie
  0 siblings, 2 replies; 6+ messages in thread
From: Elie De Brauwer @ 2014-01-07 19:44 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1, computersforpeace, shijie8
  Cc: Elie De Brauwer, linux-mtd

Hello all,

This patch is incremental with respect to 'mtd: gpmi: Deal with bitflips in 
erased regions' currently in its 7th version, which I consider more or less
stable.
The 7th version of that patch removed however a fast path option which 
resulted in a flash throughput decrease. Everytime an erased block is read,
each individual byte has its hamming weight calculated and bitflips corrected
in software. I'm testing this on a poor old i.mx28 which isn't an powerful
beast. 

Hence I've been looking for a way to regain some of the performance which 
was lost (at the cost of making ubifs more reliable). And I've been a bit
insipired by Pekon Gupta's work on omap (where I stole the first hamming 
weight approach). 

Apparently the BCH block has the possibility to write the syndrome data
to auxiliary memory (where also the status bytes are stored) when setting 
the HW_BCH_CTRL:DEBUGSYNDROME). Hence I followed the approach where the 
first time a correctly erase block is found these syndromes are cached 
and future reads of likely-to-be--erased blocks can be identified based
on the comparison of these syndroms as opposed to checking each individual
bytes. For example on my 2k chip I would normally get the hamming weight 
of 4 (chunks) of 512 bytes aka 2k bytes. But with an ecc8 I can replace 
this by a memcmp() of 4x32 bytes. (4 sets of syndromes). The result is 
obviously that the processor is more eager to do this resulting in a 
regaining some of the lost speed.

I did some benchmarking on the following 2k and 4k nand chips:
NAND device: Manufacturer ID: 0x2c, Chip ID: 0xdc (Micron MT29F4G08ABAEAH4), 512MiB, page size: 4096, OOB size: 224
NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAH4), 256MiB, page size: 2048, OOB size: 64

By simply doing time dd if=/dev/mtd8 of=/dev/null bs=1M and calculating 
the throughput (in megabyte/s). This gave the following results:

2k  |4k
========
7.0 |11.3 <- v6 of the bitflips correction path (broken fast path)
4.7 |5.9  <- v7 of the bitflip correction patch (no fast path)
5.9 |8.4  <- with this patch applied.

(Some background info, I expect these chips to be less than half used,
hence plenty of erased blocks, also the NAND timings are already optimized
for these chips).

I have tested this on the chips above on Linux 3.9 and rebased this patch
to l2-mtd (however my test mainly concluded the legacy geometry part). 

A side node, two pair of syndromes are stored, the first syndromes cover 
the first data block AND the metadata (more 0xff's) and the second set only
covers n-1 datablocks.

Any feedback regarding this approach/patch is welcomed.

Thanks
E.

Elie De Brauwer (1):
  mtd: gpmi: Use cached syndromes to speedup erase region bitflip
    detection.

 drivers/mtd/nand/gpmi-nand/bch-regs.h  |  1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |  4 ++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 95 ++++++++++++++++++++++++++++++++--
 drivers/mtd/nand/gpmi-nand/gpmi-nand.h |  6 +++
 4 files changed, 101 insertions(+), 5 deletions(-)

-- 
1.8.5.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
  2014-01-07 19:44 [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection Elie De Brauwer
@ 2014-01-07 19:44 ` Elie De Brauwer
  2014-01-08  5:38 ` Huang Shijie
  1 sibling, 0 replies; 6+ messages in thread
From: Elie De Brauwer @ 2014-01-07 19:44 UTC (permalink / raw)
  To: b32955, dwmw2, dedekind1, computersforpeace, shijie8
  Cc: Elie De Brauwer, linux-mtd

When reading an erased page detection of bitflips in this page needs
to be done in software. This is currently done by calculating the
hamming weight of each individual byte which scales linearly with
the page size and results in a decreased read speed.

The BCH block has an opion to write the computed syndromes to auxiliary
memory for each transaction (see HW_BCH_CTRL:DEBUGSYNDROME, or also
Figure 16-4 in the i.MX28 reference manual). If the syndromes of
properly erased blocks are known it is possible to distinguish faster
between a properly erased page and an erased page with bitflips. This
compares scales linearly with the ECC strength and number of data blocks
the page is split into (as opposed to page size).

Hence the first time a properly erase block is found its syndromes
are put in a cache, and all successive reads of an erased block
compare the syndromes (rather than the individual bytes). The only
catch here is that the syndromes (SM) of the first block cover the
first datablock and the metadata. And the successive syndromes (SD) cover
the other data blocks. So two sets of syndromes will need to be stored.

+--------------------------------------------+
| Meta | Data0 | Data1 | Data2 | ... | Datan |
+--------------------------------------------+
 <-----SM-----> <--SD-> <--SD->  ...  <--SD->

Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com>
---
 drivers/mtd/nand/gpmi-nand/bch-regs.h  |  1 +
 drivers/mtd/nand/gpmi-nand/gpmi-lib.c  |  4 ++
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 95 ++++++++++++++++++++++++++++++++--
 drivers/mtd/nand/gpmi-nand/gpmi-nand.h |  6 +++
 4 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/bch-regs.h b/drivers/mtd/nand/gpmi-nand/bch-regs.h
index b2104de..e828c59 100644
--- a/drivers/mtd/nand/gpmi-nand/bch-regs.h
+++ b/drivers/mtd/nand/gpmi-nand/bch-regs.h
@@ -26,6 +26,7 @@
 #define HW_BCH_CTRL_CLR				0x00000008
 #define HW_BCH_CTRL_TOG				0x0000000c
 
+#define BM_BCH_CTRL_DEBUGSYNDROME		(1 << 22)
 #define BM_BCH_CTRL_COMPLETE_IRQ_EN		(1 << 8)
 #define BM_BCH_CTRL_COMPLETE_IRQ		(1 << 0)
 
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
index b6a7aa8..440cd52 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-lib.c
@@ -307,6 +307,10 @@ int bch_set_geometry(struct gpmi_nand_data *this)
 	writel(erase_threshold & BM_BCH_MODE_ERASE_THRESHOLD_MASK,
 		r->bch_regs + HW_BCH_MODE);
 
+	/* The syndromes should be written to aux mem*/
+	writel(BM_BCH_CTRL_DEBUGSYNDROME,
+		r->bch_regs + HW_BCH_CTRL_SET);
+
 	/* Set *all* chip selects to use layout 0. */
 	writel(0, r->bch_regs + HW_BCH_LAYOUTSELECT);
 
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index eac6714..f0a8489 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -219,8 +219,21 @@ static bool set_geometry_by_ecc_info(struct gpmi_nand_data *this)
 	geo->payload_size = mtd->writesize;
 
 	geo->auxiliary_status_offset = ALIGN(geo->metadata_size, 4);
+
+	/*
+	 * The syndromes consist out of 2*t 13-bit symbols each
+	 * stored as a 16 bit halfword, per chunk. Hence 4 byte per
+	 * ECC bit per chunk.
+	 */
+	geo->auxiliary_syndrome_size = 4 * geo->ecc_strength *
+					geo->ecc_chunk_count;
+
 	geo->auxiliary_size = ALIGN(geo->metadata_size, 4)
-				+ ALIGN(geo->ecc_chunk_count, 4);
+				+ ALIGN(geo->ecc_chunk_count, 4)
+				+ geo->auxiliary_syndrome_size;
+
+	geo->auxiliary_syndrome_offset = ALIGN(geo->metadata_size, 4)
+					+ ALIGN(geo->ecc_chunk_count, 4);
 
 	if (!this->swap_block_mark)
 		return true;
@@ -286,8 +299,18 @@ static int legacy_set_geometry(struct gpmi_nand_data *this)
 	metadata_size = ALIGN(geo->metadata_size, 4);
 	status_size   = ALIGN(geo->ecc_chunk_count, 4);
 
-	geo->auxiliary_size = metadata_size + status_size;
+	/*
+	 * The syndromes consist out of 2*t 13-bit symbols each
+	 * stored as a 16 bit halfword, per chunk. Hence 4 byte per
+	 * ECC bit per chunk.
+	 */
+	geo->auxiliary_syndrome_size = 4 * geo->ecc_strength *
+					geo->ecc_chunk_count;
+
+	geo->auxiliary_size = metadata_size + status_size +
+				geo->auxiliary_syndrome_size;
 	geo->auxiliary_status_offset = metadata_size;
+	geo->auxiliary_syndrome_offset = metadata_size + status_size;
 
 	if (!this->swap_block_mark)
 		return 0;
@@ -761,6 +784,18 @@ static void send_page_end(struct gpmi_nand_data *this,
 		dma_unmap_single(dev, used_phys, length, DMA_TO_DEVICE);
 }
 
+static void gpmi_free_syndrome_buffer(struct gpmi_nand_data *this)
+{
+	if (this->syndrome_data)
+		kfree(this->syndrome_data);
+
+	if (this->syndrome_metadata)
+		kfree(this->syndrome_metadata);
+
+	this->syndrome_data = NULL;
+	this->syndrome_metadata = NULL;
+}
+
 static void gpmi_free_dma_buffer(struct gpmi_nand_data *this)
 {
 	struct device *dev = this->dev;
@@ -959,25 +994,74 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 }
 
 /*
+ * Count number of zero bits in a region.
+ */
+static unsigned int num_zero_bits(unsigned char *data, int len)
+{
+	int i;
+	int num = 0;
+	for (i = 0; i < len; i++)
+		num += hweight8(~data[i]);
+	return num;
+}
+
+/*
  * Count the number of 0 bits in a supposed to be
  * erased region and correct them. Return the number
  * of bitflips or zero when the region was correct.
  */
 static unsigned int erased_sector_bitflips(unsigned char *data,
+					unsigned char * aux,
 					unsigned int chunk,
-					struct bch_geometry *geo)
+					struct gpmi_nand_data *this)
 {
+	struct bch_geometry *geo = &this->bch_geometry;
 	unsigned int flip_bits = 0;
 	int i;
 	int base = geo->ecc_chunk_size * chunk;
+	int syndrome_size = geo->auxiliary_syndrome_size / geo->ecc_chunk_count;
+	int aux_offset = geo->auxiliary_syndrome_offset + chunk * syndrome_size;
+	char ** syndrome_cache = &this->syndrome_data;
+	if (chunk == 0)
+		syndrome_cache = &this->syndrome_metadata;
+
+	if (*syndrome_cache)
+	{
+		/* Fast path with cached syndromes */
+		if (memcmp(*syndrome_cache, &aux[aux_offset], syndrome_size))
+			goto slow_path;
+		return 0;
+	}
+	else
+	{
+		/*
+		 * Cache the syndromes if the data is valid. For chunk 0
+		 * data and metadata should be all 0xff, for the other chunks
+		 * data should be all 0xff.
+		 */
+		if (num_zero_bits(&data[base], geo->ecc_chunk_size) ||
+		(chunk == 0 && num_zero_bits(&aux[0], geo->metadata_size)))
+			goto slow_path;
+
+		*syndrome_cache = kmalloc(syndrome_size, GFP_KERNEL);
+		if ( !*syndrome_cache)
+			goto slow_path;
+
+		memcpy(*syndrome_cache, &aux[aux_offset], syndrome_size);
+
+		return 0;
+	}
 
+slow_path:
 	/* Count bitflips */
 	for (i = 0; i < geo->ecc_chunk_size; i++)
 		flip_bits += hweight8(~data[base + i]);
+	flip_bits = num_zero_bits(&data[base], geo->ecc_chunk_size);
 
 	/* Correct bitflips by 0xFF'ing this chunk. */
 	if (flip_bits)
 		memset(&data[base], 0xFF, geo->ecc_chunk_size);
+	memset(&data[base], 0xFF, geo->ecc_chunk_size);
 
 	return flip_bits;
 }
@@ -1042,8 +1126,8 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 		 * the BCH block.
 		 */
 		if (*status == STATUS_ERASED)
-			flips = erased_sector_bitflips(payload_virt, i,
-							       nfc_geo);
+			flips = erased_sector_bitflips(payload_virt,
+						auxiliary_virt, i, this);
 		else
 			flips = *status;
 
@@ -1561,6 +1645,7 @@ static int gpmi_set_geometry(struct gpmi_nand_data *this)
 static void gpmi_nand_exit(struct gpmi_nand_data *this)
 {
 	nand_release(&this->mtd);
+	gpmi_free_syndrome_buffer(this);
 	gpmi_free_dma_buffer(this);
 }
 
diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.h b/drivers/mtd/nand/gpmi-nand/gpmi-nand.h
index 4c801fa..136f8e9 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.h
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.h
@@ -62,6 +62,8 @@ struct bch_geometry {
 	unsigned int  payload_size;
 	unsigned int  auxiliary_size;
 	unsigned int  auxiliary_status_offset;
+	unsigned int  auxiliary_syndrome_offset;
+	unsigned int  auxiliary_syndrome_size;
 	unsigned int  block_mark_byte_offset;
 	unsigned int  block_mark_bit_offset;
 };
@@ -156,6 +158,10 @@ struct gpmi_nand_data {
 	uint8_t			*upper_buf;
 	int			upper_len;
 
+	/* Syndrome caches */
+	char			*syndrome_data;
+	char			*syndrome_metadata;
+
 	/* for DMA operations */
 	bool			direct_dma_map_ok;
 
-- 
1.8.5.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v0]  mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
  2014-01-07 19:44 [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection Elie De Brauwer
  2014-01-07 19:44 ` Elie De Brauwer
@ 2014-01-08  5:38 ` Huang Shijie
  2014-01-08  7:52   ` Elie De Brauwer
  1 sibling, 1 reply; 6+ messages in thread
From: Huang Shijie @ 2014-01-08  5:38 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: shijie8, computersforpeace, dwmw2, linux-mtd, dedekind1

On Tue, Jan 07, 2014 at 08:44:32PM +0100, Elie De Brauwer wrote:
> Hello all,
> 
> This patch is incremental with respect to 'mtd: gpmi: Deal with bitflips in 
> erased regions' currently in its 7th version, which I consider more or less
> stable.
> The 7th version of that patch removed however a fast path option which 
> resulted in a flash throughput decrease. Everytime an erased block is read,
> each individual byte has its hamming weight calculated and bitflips corrected
> in software. I'm testing this on a poor old i.mx28 which isn't an powerful
> beast. 
> 
> Hence I've been looking for a way to regain some of the performance which 
> was lost (at the cost of making ubifs more reliable). And I've been a bit
> insipired by Pekon Gupta's work on omap (where I stole the first hamming 
> weight approach). 
> 
> Apparently the BCH block has the possibility to write the syndrome data
> to auxiliary memory (where also the status bytes are stored) when setting 
> the HW_BCH_CTRL:DEBUGSYNDROME). Hence I followed the approach where the 
> first time a correctly erase block is found these syndromes are cached 
> and future reads of likely-to-be--erased blocks can be identified based
> on the comparison of these syndroms as opposed to checking each individual
> bytes. For example on my 2k chip I would normally get the hamming weight 
> of 4 (chunks) of 512 bytes aka 2k bytes. But with an ecc8 I can replace 
> this by a memcmp() of 4x32 bytes. (4 sets of syndromes). The result is 
> obviously that the processor is more eager to do this resulting in a 
> regaining some of the lost speed.
> 
> I did some benchmarking on the following 2k and 4k nand chips:
> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xdc (Micron MT29F4G08ABAEAH4), 512MiB, page size: 4096, OOB size: 224
> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAH4), 256MiB, page size: 2048, OOB size: 64
> 
> By simply doing time dd if=/dev/mtd8 of=/dev/null bs=1M and calculating 
> the throughput (in megabyte/s). This gave the following results:
> 
> 2k  |4k
> ========
> 7.0 |11.3 <- v6 of the bitflips correction path (broken fast path)
> 4.7 |5.9  <- v7 of the bitflip correction patch (no fast path)
> 5.9 |8.4  <- with this patch applied.
> 
thanks for the new patch.

I suddenly think out a new solution about this issue:
  [1] when the bitflip occurs, the BCH will tells up the uncorrectable,
  [2] if we catch a uncorrectable error, we could check the whole buffer, and
      count the number of the bitflips. Assume we get the bitflips is N.

  [3] if N is < (gf_len ), we could think this is a erased page, and call the
      memset to the whole buffer, and tell the upper layer that this is a good
      empty page.

  [4] since the [1] is very rare, i think this method is much faster then the
      current solution.

the patch is something like this:
---
 drivers/mtd/nand/gpmi-nand/gpmi-nand.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index e2f5820..575327d 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -958,6 +958,29 @@ static void block_mark_swapping(struct gpmi_nand_data *this,
 	p[1] = (p[1] & mask) | (from_oob >> (8 - bit));
 }
 
+static bool gpmi_erased_check(struct gpmi_nand_data *this,
+			unsigned char *data, unsigned int chunk)
+{
+	struct bch_geometry *geo = &this->bch_geometry;
+	int base = geo->ecc_chunk_size * chunk;
+	unsigned int flip_bits = 0;
+	unsigned threthold = geo->gf_len / 2;
+	int i;
+
+	/* Count bitflips */
+	for (i = 0; i < geo->ecc_chunk_size; i++)
+		flip_bits += hweight8(~data[base + i]);
+
+	if (threthold > geo->ecc_strength)
+		threthold = geo->ecc_strength;
+
+	if (flip_bits < threthold) {
+		memset(&data[base], 0xFF, geo->ecc_chunk_size);
+		return true;
+	}
+	return false;
+}
+
 static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 				uint8_t *buf, int oob_required, int page)
 {
@@ -1007,6 +1030,8 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			continue;
 
 		if (*status == STATUS_UNCORRECTABLE) {
+			if (gpmi_erased_check(this, payload_virt, i))
+				continue;
 			mtd->ecc_stats.failed++;
 			continue;
 		}
-- 


thanks
Huang Shijie

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
  2014-01-08  5:38 ` Huang Shijie
@ 2014-01-08  7:52   ` Elie De Brauwer
  2014-01-08 17:10     ` Bill Pringlemeir
  2014-01-09  2:00     ` Huang Shijie
  0 siblings, 2 replies; 6+ messages in thread
From: Elie De Brauwer @ 2014-01-08  7:52 UTC (permalink / raw)
  To: Huang Shijie
  Cc: Huang Shijie, Brian Norris, David Woodhouse,
	linux-mtd@lists.infradead.org, Artem Bityutskiy

On Wed, Jan 8, 2014 at 6:38 AM, Huang Shijie <b32955@freescale.com> wrote:
> On Tue, Jan 07, 2014 at 08:44:32PM +0100, Elie De Brauwer wrote:
>> Hello all,
>>
>> This patch is incremental with respect to 'mtd: gpmi: Deal with bitflips in
>> erased regions' currently in its 7th version, which I consider more or less
>> stable.
>> The 7th version of that patch removed however a fast path option which
>> resulted in a flash throughput decrease. Everytime an erased block is read,
>> each individual byte has its hamming weight calculated and bitflips corrected
>> in software. I'm testing this on a poor old i.mx28 which isn't an powerful
>> beast.
>>
>> Hence I've been looking for a way to regain some of the performance which
>> was lost (at the cost of making ubifs more reliable). And I've been a bit
>> insipired by Pekon Gupta's work on omap (where I stole the first hamming
>> weight approach).
>>
>> Apparently the BCH block has the possibility to write the syndrome data
>> to auxiliary memory (where also the status bytes are stored) when setting
>> the HW_BCH_CTRL:DEBUGSYNDROME). Hence I followed the approach where the
>> first time a correctly erase block is found these syndromes are cached
>> and future reads of likely-to-be--erased blocks can be identified based
>> on the comparison of these syndroms as opposed to checking each individual
>> bytes. For example on my 2k chip I would normally get the hamming weight
>> of 4 (chunks) of 512 bytes aka 2k bytes. But with an ecc8 I can replace
>> this by a memcmp() of 4x32 bytes. (4 sets of syndromes). The result is
>> obviously that the processor is more eager to do this resulting in a
>> regaining some of the lost speed.
>>
>> I did some benchmarking on the following 2k and 4k nand chips:
>> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xdc (Micron MT29F4G08ABAEAH4), 512MiB, page size: 4096, OOB size: 224
>> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAH4), 256MiB, page size: 2048, OOB size: 64
>>
>> By simply doing time dd if=/dev/mtd8 of=/dev/null bs=1M and calculating
>> the throughput (in megabyte/s). This gave the following results:
>>
>> 2k  |4k
>> ========
>> 7.0 |11.3 <- v6 of the bitflips correction path (broken fast path)
>> 4.7 |5.9  <- v7 of the bitflip correction patch (no fast path)
>> 5.9 |8.4  <- with this patch applied.
>>
> thanks for the new patch.
>
> I suddenly think out a new solution about this issue:
>   [1] when the bitflip occurs, the BCH will tells up the uncorrectable,
>   [2] if we catch a uncorrectable error, we could check the whole buffer, and
>       count the number of the bitflips. Assume we get the bitflips is N.
>
>   [3] if N is < (gf_len ), we could think this is a erased page, and call the
>       memset to the whole buffer, and tell the upper layer that this is a good
>       empty page.
>
>   [4] since the [1] is very rare, i think this method is much faster then the
>       current solution.
>
> the patch is something like this:

What you suggest will obviously be able to reach the maximum speed,
I've been playing with a similar idea too but always bumped into the issue
that you cannot know whether or not a page is actually an erased page or
not. In your solution for example you cannot distinguish between:
- an erased page which suffers from bitflips
- a genuine uncorrectable page whose original contents may be close to
all 0xff's, but only has a handful of bits set to 0.

The risk of this approach is that if you bump into an erased page, which the
system should mark as uncorrectable, (and deal with accordingly)  is returned
as a valid all 0xff's page which the system will consider as valid data.

I agree this may sound a bit theoretical, and the risk (uncorrectable on a page
with very few 0 bits) is small, but I'm not able to judge whether or not it
can/should be neglected.

What do you think ?


-- 
Elie De Brauwer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
  2014-01-08  7:52   ` Elie De Brauwer
@ 2014-01-08 17:10     ` Bill Pringlemeir
  2014-01-09  2:00     ` Huang Shijie
  1 sibling, 0 replies; 6+ messages in thread
From: Bill Pringlemeir @ 2014-01-08 17:10 UTC (permalink / raw)
  To: Elie De Brauwer; +Cc: Huang Shijie, linux-mtd

eliedebrauwer at gmail.com wrote:

> (Some background info, I expect these chips to be less than half used,
> hence plenty of erased blocks, also the NAND timings are already
> optimized for these chips).

That may not be true if the higher layer file system does wear
levelling.  Especially dynamic wear levelling where seldom read inodes are
moved in order to spread the wear.

> On Wed, Jan 8, 2014 at 6:38 AM, Huang Shijie <b32955 at freescale.com> wrote:
>> thanks for the new patch.
>>
>> I suddenly think out a new solution about this issue:
>> [1] when the bitflip occurs, the BCH will tells up the uncorrectable,
>> [2] if we catch a uncorrectable error, we could check the whole buffer, and
>> count the number of the bitflips. Assume we get the bitflips is N.
>>
>> [3] if N is < (gf_len ), we could think this is a erased page, and call the
>> memset to the whole buffer, and tell the upper layer that this is a good
>> empty page.
>>
>> [4] since the [1] is very rare, i think this method is much faster then the
>> current solution.

>> Elie De Brauwer wrote:

> What you suggest will obviously be able to reach the maximum speed,
> I've been playing with a similar idea too but always bumped into the issue
> that you cannot know whether or not a page is actually an erased page or
> not. In your solution for example you cannot distinguish between:
> - an erased page which suffers from bitflips
> - a genuine uncorrectable page whose original contents may be close to
> all 0xff's, but only has a handful of bits set to 0.

> The risk of this approach is that if you bump into an erased page, which the
> system should mark as uncorrectable, (and deal with accordingly)  is returned
> as a valid all 0xff's page which the system will consider as valid data.

> I agree this may sound a bit theoretical, and the risk (uncorrectable
> on a page with very few 0 bits) is small, but I'm not able to judge
> whether or not it can/should be neglected.

> What do you think ?

I think you have a good rational.  However, there will always be errors
which can by-pass the bit-flips checking of any ECC (just like any hash
has some theoretical collision/conflict).  The higher layer file system
should still have some integrity checks and the '-EUCLEAN' should have
been a clue that these pages/sectors were wearing previously.  Ie, the
NAND device should gradually fail.  I don't think it is fair to expect
the MTD drivers to perform on a flash that it excessively used to the
point where they are so broke it has bit flips way beyond the maximum
specified by the vendors.

Probably there is no completely right answer, just as you analyze.  How
do you tell between a mostly xff page with bit flips versus a bad
hardware-ECC of an erased page?  It maybe possible to 're-read' the page
with the hardware ECC turned off to validate that the whole page is xff
(I have guessed you are comparing data with ECC applied?). But this would
again affect performance.

I think that we should have some reliability at the upper layers.  How
do you combat against power fail, etc?  The ECC should be able to
correct errors.  This foists error detection to another layer, with it
highly likely to be caught at this layer.  Running file systems on
mtdblock, etc might like this but those have other issues, don't they?

My 2cents,
Bill Pringlemeir.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection.
  2014-01-08  7:52   ` Elie De Brauwer
  2014-01-08 17:10     ` Bill Pringlemeir
@ 2014-01-09  2:00     ` Huang Shijie
  1 sibling, 0 replies; 6+ messages in thread
From: Huang Shijie @ 2014-01-09  2:00 UTC (permalink / raw)
  To: Elie De Brauwer
  Cc: Huang Shijie, Brian Norris, David Woodhouse,
	linux-mtd@lists.infradead.org, Artem Bityutskiy

On Wed, Jan 08, 2014 at 08:52:35AM +0100, Elie De Brauwer wrote:
> >> I did some benchmarking on the following 2k and 4k nand chips:
> >> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xdc (Micron MT29F4G08ABAEAH4), 512MiB, page size: 4096, OOB size: 224
> >> NAND device: Manufacturer ID: 0x2c, Chip ID: 0xda (Micron MT29F2G08ABAEAH4), 256MiB, page size: 2048, OOB size: 64
> >>
> >> By simply doing time dd if=/dev/mtd8 of=/dev/null bs=1M and calculating
> >> the throughput (in megabyte/s). This gave the following results:
> >>
> >> 2k  |4k
> >> ========
> >> 7.0 |11.3 <- v6 of the bitflips correction path (broken fast path)
> >> 4.7 |5.9  <- v7 of the bitflip correction patch (no fast path)
> >> 5.9 |8.4  <- with this patch applied.
> >>
> > thanks for the new patch.
> >
> > I suddenly think out a new solution about this issue:
> >   [1] when the bitflip occurs, the BCH will tells up the uncorrectable,
> >   [2] if we catch a uncorrectable error, we could check the whole buffer, and
> >       count the number of the bitflips. Assume we get the bitflips is N.
> >
> >   [3] if N is < (gf_len ), we could think this is a erased page, and call the
> >       memset to the whole buffer, and tell the upper layer that this is a good
> >       empty page.
> >
> >   [4] since the [1] is very rare, i think this method is much faster then the
> >       current solution.
> >
> > the patch is something like this:
> 
> What you suggest will obviously be able to reach the maximum speed,
> I've been playing with a similar idea too but always bumped into the issue
> that you cannot know whether or not a page is actually an erased page or
> not. In your solution for example you cannot distinguish between:
> - an erased page which suffers from bitflips
> - a genuine uncorrectable page whose original contents may be close to
> all 0xff's, but only has a handful of bits set to 0.

In actually, we do have a method to distinguish the two cases:

  If [3] is met, we could read the page out without ECC enabled, and check
  the bitflips again. such as:
  
  <1> if [3] is met, read out the page again without the ECC enabled.
      and check the bitflips with this new page, assume we get N2.
  <2> if it is a erased page, the N2 will less then gf_len.

  <3> if it is a page full of 0xff(only servel bits set to 0), the N2 will
      greater then gf_len.


I will give a patch later, and you can test it.

thanks
Huang Shijie
    

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-09  2:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07 19:44 [PATCH v0] mtd: gpmi: Use cached syndromes to speedup erased region bitflip detection Elie De Brauwer
2014-01-07 19:44 ` Elie De Brauwer
2014-01-08  5:38 ` Huang Shijie
2014-01-08  7:52   ` Elie De Brauwer
2014-01-08 17:10     ` Bill Pringlemeir
2014-01-09  2:00     ` Huang Shijie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).