* [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips
@ 2016-06-09 21:17 Kamal Dasu
2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
0 siblings, 2 replies; 6+ messages in thread
From: Kamal Dasu @ 2016-06-09 21:17 UTC (permalink / raw)
To: linux-mtd, computersforpeace, boris.brezillon
Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu
Check for erased page bitflips in a page. And if well within
threshold return data as all 0xff. Apply sw check for controller
version < 7.2. Controller vesion >= 7.2 has hw support.
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
V3 changes
Return the max bitflips from a sector within a page
V2 changes
Added use of nand_check_erased_ecc_chunk
Restrict change to older controller < 7.2
---
drivers/mtd/nand/brcmnand/brcmnand.c | 62 ++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index b76ad7c..7ee9617 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1545,6 +1545,56 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
return ret;
}
+/*
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error
+ *
+ * Because the HW ECC signals an ECC error if an erase paged has even a single
+ * bitflip, we must check each ECC error to see if it is actually an erased
+ * page with bitflips, not a truly corrupted page.
+ *
+ * On a real error, return a negative error code (-EBADMSG for ECC error), and
+ * buf will contain raw data.
+ * Otherwise, buf gets filled with 0xffs and return the maximum number of
+ * bitflips-per-ECC-sector to the caller.
+ *
+ */
+static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
+ struct nand_chip *chip, void *buf, u64 addr)
+{
+ int i, sas;
+ void *oob = chip->oob_poi;
+ int bitflips = 0;
+ int page = addr >> chip->page_shift;
+ int ret;
+
+ if (!buf) {
+ buf = chip->buffers->databuf;
+ /* Invalidate page cache */
+ chip->pagebuf = -1;
+ }
+
+ sas = mtd->oobsize / chip->ecc.steps;
+
+ /* read without ecc for verification */
+ chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
+ ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
+ ret = nand_check_erased_ecc_chunk(buf, chip->ecc.size,
+ oob, sas, NULL, 0,
+ chip->ecc.strength);
+ if (ret < 0)
+ return ret;
+
+ bitflips = max(bitflips, ret);
+ }
+
+ return bitflips;
+}
+
static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
u64 addr, unsigned int trans, u32 *buf, u8 *oob)
{
@@ -1575,6 +1625,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
}
if (mtd_is_eccerr(err)) {
+ /*
+ * Controller version 7.2 has hw encoder to detect erased page
+ * bitflips, apply sw verification for older controllers only
+ */
+ if (ctrl->nand_version < 0x0702) {
+ err = brcmstb_nand_verify_erased_page(mtd, chip, buf,
+ addr);
+ /* erased page bitflips corrected */
+ if (err > 0)
+ return err;
+ }
+
dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
(unsigned long long)err_addr);
mtd->ecc_stats.failed++;
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
@ 2016-06-09 21:17 ` Kamal Dasu
2016-07-11 5:04 ` Brian Norris
2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
1 sibling, 1 reply; 6+ messages in thread
From: Kamal Dasu @ 2016-06-09 21:17 UTC (permalink / raw)
To: linux-mtd, computersforpeace, boris.brezillon
Cc: f.fainelli, bcm-kernel-feedback-list, Kamal Dasu
This change provides a fix for controller bug where nand
controller could have a possible sticky error after a PIO
followed by a DMA read. The fix retries a read if we see
a uncorr_ecc after read to detect such sticky errors.
The fix applies to only controller version 7.0 and 7.1.
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
V3 changes
none
V2 Changes
Restrict controller bug workaround to version 7.0 and 7.1
---
drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index 7ee9617..1156495 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
struct brcmnand_controller *ctrl = host->ctrl;
u64 err_addr = 0;
int err;
+ static bool retry = true;
dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
+try_dmaread:
brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
@@ -1626,6 +1628,22 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
if (mtd_is_eccerr(err)) {
/*
+ * On controller version and 7.0, 7.1 , DMA read after a
+ * prior PIO read that reported uncorrectable error,
+ * the DMA engine captures this error following DMA read
+ * cleared only on subsequent DMA read, so just retry once
+ * to clear a possible false error reported for current DMA
+ * read
+ */
+ if ((ctrl->nand_version == 0x0700) ||
+ (ctrl->nand_version == 0x0701)) {
+ if (retry) {
+ retry = false;
+ goto try_dmaread;
+ }
+ }
+
+ /*
* Controller version 7.2 has hw encoder to detect erased page
* bitflips, apply sw verification for older controllers only
*/
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-07-11 5:04 ` Brian Norris
2016-07-11 6:49 ` Boris Brezillon
0 siblings, 1 reply; 6+ messages in thread
From: Brian Norris @ 2016-07-11 5:04 UTC (permalink / raw)
To: Kamal Dasu
Cc: linux-mtd, boris.brezillon, f.fainelli, bcm-kernel-feedback-list
Hi,
Looking through Boris's pull request, I noticed an issue:
On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> This change provides a fix for controller bug where nand
> controller could have a possible sticky error after a PIO
> followed by a DMA read. The fix retries a read if we see
> a uncorr_ecc after read to detect such sticky errors.
> The fix applies to only controller version 7.0 and 7.1.
>
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
> V3 changes
> none
> V2 Changes
> Restrict controller bug workaround to version 7.0 and 7.1
> ---
> drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index 7ee9617..1156495 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> struct brcmnand_controller *ctrl = host->ctrl;
> u64 err_addr = 0;
> int err;
> + static bool retry = true;
Is this intentionally static? That means your retry will only be
performed exactly once *ever*. Probably not what you expected?
Boris, if this is indeed unintended, would you rather fix this up in the
original patch and send me a new pull request? Or have my apply the
trivial fixup (i.e., s/static//) as a separate patch on top?
Brian
>
> dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>
> +try_dmaread:
> brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
>
> if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> @@ -1626,6 +1628,22 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>
> if (mtd_is_eccerr(err)) {
> /*
> + * On controller version and 7.0, 7.1 , DMA read after a
> + * prior PIO read that reported uncorrectable error,
> + * the DMA engine captures this error following DMA read
> + * cleared only on subsequent DMA read, so just retry once
> + * to clear a possible false error reported for current DMA
> + * read
> + */
> + if ((ctrl->nand_version == 0x0700) ||
> + (ctrl->nand_version == 0x0701)) {
> + if (retry) {
> + retry = false;
> + goto try_dmaread;
> + }
> + }
> +
> + /*
> * Controller version 7.2 has hw encoder to detect erased page
> * bitflips, apply sw verification for older controllers only
> */
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
2016-07-11 5:04 ` Brian Norris
@ 2016-07-11 6:49 ` Boris Brezillon
2016-07-11 14:52 ` Kamal Dasu
0 siblings, 1 reply; 6+ messages in thread
From: Boris Brezillon @ 2016-07-11 6:49 UTC (permalink / raw)
To: Brian Norris; +Cc: Kamal Dasu, linux-mtd, f.fainelli, bcm-kernel-feedback-list
On Sun, 10 Jul 2016 22:04:52 -0700
Brian Norris <computersforpeace@gmail.com> wrote:
> Hi,
>
> Looking through Boris's pull request, I noticed an issue:
>
> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> > This change provides a fix for controller bug where nand
> > controller could have a possible sticky error after a PIO
> > followed by a DMA read. The fix retries a read if we see
> > a uncorr_ecc after read to detect such sticky errors.
> > The fix applies to only controller version 7.0 and 7.1.
> >
> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > ---
> > V3 changes
> > none
> > V2 Changes
> > Restrict controller bug workaround to version 7.0 and 7.1
> > ---
> > drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> > index 7ee9617..1156495 100644
> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> > struct brcmnand_controller *ctrl = host->ctrl;
> > u64 err_addr = 0;
> > int err;
> > + static bool retry = true;
>
> Is this intentionally static? That means your retry will only be
> performed exactly once *ever*. Probably not what you expected?
>
> Boris, if this is indeed unintended, would you rather fix this up in the
> original patch and send me a new pull request?
I'll fix the initial commit and send a new PR once Kamal has confirmed
this variable should not be static (which is also my opinion).
> Or have my apply the
> trivial fixup (i.e., s/static//) as a separate patch on top?
Thanks,
Boris
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads
2016-07-11 6:49 ` Boris Brezillon
@ 2016-07-11 14:52 ` Kamal Dasu
0 siblings, 0 replies; 6+ messages in thread
From: Kamal Dasu @ 2016-07-11 14:52 UTC (permalink / raw)
To: Boris Brezillon
Cc: Brian Norris, Kamal Dasu, linux-mtd, Florian Fainelli,
bcm-kernel-feedback-list
It's ok to fix it up.
Kamal
On Mon, Jul 11, 2016 at 2:49 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> On Sun, 10 Jul 2016 22:04:52 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>
>> Hi,
>>
>> Looking through Boris's pull request, I noticed an issue:
>>
>> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
>> > This change provides a fix for controller bug where nand
>> > controller could have a possible sticky error after a PIO
>> > followed by a DMA read. The fix retries a read if we see
>> > a uncorr_ecc after read to detect such sticky errors.
>> > The fix applies to only controller version 7.0 and 7.1.
>> >
>> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> > ---
>> > V3 changes
>> > none
>> > V2 Changes
>> > Restrict controller bug workaround to version 7.0 and 7.1
>> > ---
>> > drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
>> > 1 file changed, 18 insertions(+)
>> >
>> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > index 7ee9617..1156495 100644
>> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
>> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>> > struct brcmnand_controller *ctrl = host->ctrl;
>> > u64 err_addr = 0;
>> > int err;
>> > + static bool retry = true;
>>
>> Is this intentionally static? That means your retry will only be
>> performed exactly once *ever*. Probably not what you expected?
>>
>> Boris, if this is indeed unintended, would you rather fix this up in the
>> original patch and send me a new pull request?
>
> I'll fix the initial commit and send a new PR once Kamal has confirmed
> this variable should not be static (which is also my opinion).
>
>> Or have my apply the
>> trivial fixup (i.e., s/static//) as a separate patch on top?
>
> Thanks,
>
> Boris
--
Kamal Dasu
Principal Engineer | Broadcom Ltd. 200 Brickstone Sq Andover | 978-719-1405
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips
2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
@ 2016-06-13 20:07 ` Boris Brezillon
1 sibling, 0 replies; 6+ messages in thread
From: Boris Brezillon @ 2016-06-13 20:07 UTC (permalink / raw)
To: Kamal Dasu
Cc: linux-mtd, computersforpeace, f.fainelli,
bcm-kernel-feedback-list
On Thu, 9 Jun 2016 17:17:54 -0400
Kamal Dasu <kdasu.kdev@gmail.com> wrote:
> Check for erased page bitflips in a page. And if well within
> threshold return data as all 0xff. Apply sw check for controller
> version < 7.2. Controller vesion >= 7.2 has hw support.
>
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Applied both.
Thanks,
Boris
> ---
> V3 changes
> Return the max bitflips from a sector within a page
> V2 changes
> Added use of nand_check_erased_ecc_chunk
> Restrict change to older controller < 7.2
> ---
>
> drivers/mtd/nand/brcmnand/brcmnand.c | 62 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
>
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index b76ad7c..7ee9617 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1545,6 +1545,56 @@ static int brcmnand_read_by_pio(struct mtd_info *mtd, struct nand_chip *chip,
> return ret;
> }
>
> +/*
> + * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
> + * error
> + *
> + * Because the HW ECC signals an ECC error if an erase paged has even a single
> + * bitflip, we must check each ECC error to see if it is actually an erased
> + * page with bitflips, not a truly corrupted page.
> + *
> + * On a real error, return a negative error code (-EBADMSG for ECC error), and
> + * buf will contain raw data.
> + * Otherwise, buf gets filled with 0xffs and return the maximum number of
> + * bitflips-per-ECC-sector to the caller.
> + *
> + */
> +static int brcmstb_nand_verify_erased_page(struct mtd_info *mtd,
> + struct nand_chip *chip, void *buf, u64 addr)
> +{
> + int i, sas;
> + void *oob = chip->oob_poi;
> + int bitflips = 0;
> + int page = addr >> chip->page_shift;
> + int ret;
> +
> + if (!buf) {
> + buf = chip->buffers->databuf;
> + /* Invalidate page cache */
> + chip->pagebuf = -1;
> + }
> +
> + sas = mtd->oobsize / chip->ecc.steps;
> +
> + /* read without ecc for verification */
> + chip->cmdfunc(mtd, NAND_CMD_READ0, 0x00, page);
> + ret = chip->ecc.read_page_raw(mtd, chip, buf, true, page);
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
> + ret = nand_check_erased_ecc_chunk(buf, chip->ecc.size,
> + oob, sas, NULL, 0,
> + chip->ecc.strength);
> + if (ret < 0)
> + return ret;
> +
> + bitflips = max(bitflips, ret);
> + }
> +
> + return bitflips;
> +}
> +
> static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> u64 addr, unsigned int trans, u32 *buf, u8 *oob)
> {
> @@ -1575,6 +1625,18 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> }
>
> if (mtd_is_eccerr(err)) {
> + /*
> + * Controller version 7.2 has hw encoder to detect erased page
> + * bitflips, apply sw verification for older controllers only
> + */
> + if (ctrl->nand_version < 0x0702) {
> + err = brcmstb_nand_verify_erased_page(mtd, chip, buf,
> + addr);
> + /* erased page bitflips corrected */
> + if (err > 0)
> + return err;
> + }
> +
> dev_dbg(ctrl->dev, "uncorrectable error at 0x%llx\n",
> (unsigned long long)err_addr);
> mtd->ecc_stats.failed++;
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-11 14:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-09 21:17 [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Kamal Dasu
2016-06-09 21:17 ` [V3, 2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads Kamal Dasu
2016-07-11 5:04 ` Brian Norris
2016-07-11 6:49 ` Boris Brezillon
2016-07-11 14:52 ` Kamal Dasu
2016-06-13 20:07 ` [V3, 1/2] mtd: brcmnand: Add check for erased page bitflips Boris Brezillon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).