linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* mtd: rawnand: atmel: ECC error after update to kernel 6.6
@ 2025-06-30 14:38 Zixun LI
  2025-07-01 15:48 ` Ada Couprie Diaz
  0 siblings, 1 reply; 4+ messages in thread
From: Zixun LI @ 2025-06-30 14:38 UTC (permalink / raw)
  To: Tudor Ambarus, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Nicolas Ferre, Alexandre Belloni,
	Claudiu Beznea, linux-mtd, SoC support', linux-kernel

Hi,

After updating our SAM9G25 device from kernel 3.16 to 6.6, we have
encountered UBIFS failures with ECC errors:

  ubi0 warning: ubi_io_read: error -74 (ECC error) while reading
126976 bytes from PEB 8:4096, read only 126976 bytes, retry

On the old system (kernel 3.16), nandtest passed successfully:

  nandtest -p 1 -l 0x40000 /dev/mtd2
  ECC corrections: 1
  ECC failures : 0
  Bad blocks : 0
  BBT blocks : 0
  00020000: checking...
  Finished pass 1 successfully

But on the new system (kernel 6.6), nandtest reports many errors:

  nandtest -p 1 -l 0x40000 /dev/mtd2
  ECC corrections: 1667
  ECC failures : 1205
  Bad blocks : 1
  BBT blocks : 0
  00000000: reading (1 of 4)...
  218 bit(s) ECC corrected at 00000000

After some diagnostics, I used devmem2 to compare SMC and PMECC
registers between the two kernels. All values match except for the
PMECC_CLK register, which is 2 in kernel 3.16 [1] and 0 in kernel 6.6
[2]. It appears the clock setting is missing since the kernel 4.14
refactor.

According to the SAM9G25 datasheet this field must be programmed with 2.

Manually setting PMECC_CLK to 2 (devmem2 0xffffe010 w 2) resolves the
nandtest issue.

Is the clock setting moved to elsewhere after the refactor ?

Best regards,
Zixun LI

[1] https://github.com/torvalds/linux/blob/19583ca584d6f574384e17fe7613dfaeadcdc4a6/drivers/mtd/nand/atmel_nand.c#L1058
[2] https://github.com/torvalds/linux/blob/ffc253263a1375a65fa6c9f62a893e9767fbebfa/drivers/mtd/nand/raw/atmel/pmecc.c#L772


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mtd: rawnand: atmel: ECC error after update to kernel 6.6
  2025-06-30 14:38 mtd: rawnand: atmel: ECC error after update to kernel 6.6 Zixun LI
@ 2025-07-01 15:48 ` Ada Couprie Diaz
  2025-07-01 18:04   ` Zixun LI
  0 siblings, 1 reply; 4+ messages in thread
From: Ada Couprie Diaz @ 2025-07-01 15:48 UTC (permalink / raw)
  To: Zixun LI
  Cc: Claudiu Beznea, Alexandre Belloni, Vignesh Raghavendra,
	Richard Weinberger, Nicolas Ferre, linux-kernel, Tudor Ambarus,
	linux-mtd, Miquel Raynal, SoC support'

Hi,

I'm not really involved in this part of the kernel or drivers,
but I took some time to look into it. Hopefully I'm not completely
missing something !

On 30/06/2025 15:38, Zixun LI wrote:
> Hi,
>
> After updating our SAM9G25 device from kernel 3.16 to 6.6, we have
> encountered UBIFS failures with ECC errors
>
> [...]
>
> After some diagnostics, I used devmem2 to compare SMC and PMECC
> registers between the two kernels. All values match except for the
> PMECC_CLK register, which is 2 in kernel 3.16 [1] and 0 in kernel 6.6
> [2]. It appears the clock setting is missing since the kernel 4.14
> refactor.
>
> According to the SAM9G25 datasheet this field must be programmed with 2.
>
> Manually setting PMECC_CLK to 2 (devmem2 0xffffe010 w 2) resolves the
> nandtest issue.
>
> Is the clock setting moved to elsewhere after the refactor ?

As far as I can tell, the setting of the *PMECC_CLK* was lost in
f88fc122cc34 (mtd: nand: Cleanup/rework the atmel_nand driver),
which was merged for 4.12.

The register offset and values are  part of the defines it introduces
but are unused, so they might have been forgotten.
I couldn't find another place where this was done, and I think it would
make sense for it to be done here still.

The tricky part is that now the driver handles other PMECCs that do not have
this register at all in their datasheets :
  - SAMA5D2 series[3], page 713, 36.20 Register summary :
      0x80 (0x10 offset) is marked reserved,
  - SAMA5D4 series[4], page 512, Table 30-20 :
      0x080 (0x10 offset) is marked reserved,

So I think it's best not to add this write back for all cases.

I also checked, as some other SoCs' PMECCs are marked compatible with
the one used in your SAM9G25 (`atmel,at91sam9g45-pmecc`) :
  - SAM9x60 series[5], page 256, 21.6 Register Summary :
      0x10 offset has PMECC_CLK present, recommends setting to 2 at 133MHz,
  - SAM9x7 series[6], page 263, 21.6 Register Summary :
      0x10 offset has PMECC_CLK present, reccomends setting to 2 at 133MHz.

So it should be safe to write 2 to this register for all those matching
this /compatible/ string.

The datasheet itself is not clear as to what to do for other MCK frequencies,
though as 133MHz is the highest supported frequency and it worked before,
hopefully it just means a cycle or two of useless delay at worse,
but at least it keeps it simple.

I've written a small patch below that I think should fix the issue, but again
I don't know if that's upstreamable as-is.

> Best regards,
> Zixun LI
>
> [1] https://github.com/torvalds/linux/blob/19583ca584d6f574384e17fe7613dfaeadcdc4a6/drivers/mtd/nand/atmel_nand.c#L1058
> [2] https://github.com/torvalds/linux/blob/ffc253263a1375a65fa6c9f62a893e9767fbebfa/drivers/mtd/nand/raw/atmel/pmecc.c#L772
>
Hopefully that can help !
Best regards,
Ada


[3]: https://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Series-Data-Sheet-DS60001476C.pdf
[4]: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA5D4-Series-Data-Sheet-DS60001525.pdf
[5]: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAM9X60-Data-Sheet-DS60001579.pdf
[6]: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAM9X7-Series-Data-Sheet-DS60001813.pdf

------------------------ >8 ------------------------

diff --git a/drivers/mtd/nand/raw/atmel/pmecc.c b/drivers/mtd/nand/raw/atmel/pmecc.c
index 3c7dee1be21d..cd96b44f3f30 100644
--- a/drivers/mtd/nand/raw/atmel/pmecc.c
+++ b/drivers/mtd/nand/raw/atmel/pmecc.c
@@ -143,6 +143,7 @@ struct atmel_pmecc_caps {
  	int nstrengths;
  	int el_offset;
  	bool correct_erased_chunks;
+	bool need_clk_config;
  };
  
  struct atmel_pmecc {
@@ -787,6 +788,9 @@ int atmel_pmecc_enable(struct atmel_pmecc_user *user, int op)
  	writel(PMECC_CTRL_ENABLE, pmecc->regs.base + ATMEL_PMECC_CTRL);
  	writel(PMECC_CTRL_DATA, pmecc->regs.base + ATMEL_PMECC_CTRL);
  
+	if (pmecc->caps->need_clk_config)
+		writel(PMECC_CLK_133MHZ, pmecc->regs.base + ATMEL_PMECC_CLK);
+
  	return 0;
  }
  EXPORT_SYMBOL_GPL(atmel_pmecc_enable);
@@ -896,6 +900,7 @@ static struct atmel_pmecc_caps at91sam9g45_caps = {
  	.strengths = atmel_pmecc_strengths,
  	.nstrengths = 5,
  	.el_offset = 0x8c,
+	.need_clk_config = true,
  };
  
  static struct atmel_pmecc_caps sama5d4_caps = {



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: mtd: rawnand: atmel: ECC error after update to kernel 6.6
  2025-07-01 15:48 ` Ada Couprie Diaz
@ 2025-07-01 18:04   ` Zixun LI
  2025-07-02 17:20     ` Ada Couprie Diaz
  0 siblings, 1 reply; 4+ messages in thread
From: Zixun LI @ 2025-07-01 18:04 UTC (permalink / raw)
  To: Ada Couprie Diaz
  Cc: Alexandre Belloni, Vignesh Raghavendra, Tudor Ambarus,
	Nicolas Ferre, Richard Weinberger, Miquel Raynal, linux-mtd,
	linux-kernel, SoC support', Claudiu Beznea

On Tue, Jul 1, 2025 at 5:49 PM Ada Couprie Diaz <ada.coupriediaz@arm.com> wrote:
>
> Hi,
>
> I've written a small patch below that I think should fix the issue, but again
> I don't know if that's upstreamable as-is.
>
> > Best regards,
> > Zixun LI
> >
> > [1] https://github.com/torvalds/linux/blob/19583ca584d6f574384e17fe7613dfaeadcdc4a6/drivers/mtd/nand/atmel_nand.c#L1058
> > [2] https://github.com/torvalds/linux/blob/ffc253263a1375a65fa6c9f62a893e9767fbebfa/drivers/mtd/nand/raw/atmel/pmecc.c#L772
> >
> Hopefully that can help !
> Best regards,
> Ada
>

Hi Ada,
Thank you for looking into this. Internally I've written the same patch as
yours and it works fine.

What's more interesting is the issue happens depending on chip individual
difference or aging. Among 3 chips tested, two with date code of 1933 and
one with 2223. The 1st one has many ECC errors as in the mail, 2nd one
has less errors, while the 3rd one passed the nandtest without error.
Maybe that's why this issue is overlooked.

Best regards,
Zixun LI


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mtd: rawnand: atmel: ECC error after update to kernel 6.6
  2025-07-01 18:04   ` Zixun LI
@ 2025-07-02 17:20     ` Ada Couprie Diaz
  0 siblings, 0 replies; 4+ messages in thread
From: Ada Couprie Diaz @ 2025-07-02 17:20 UTC (permalink / raw)
  To: Zixun LI
  Cc: Claudiu Beznea, Alexandre Belloni, Vignesh Raghavendra,
	Richard Weinberger, Nicolas Ferre, linux-kernel, Tudor Ambarus,
	linux-mtd, Miquel Raynal, SoC support'

On 01/07/2025 19:04, Zixun LI wrote:

> On Tue, Jul 1, 2025 at 5:49 PM Ada Couprie Diaz <ada.coupriediaz@arm.com> wrote:
>> Hi,
>>
>> I've written a small patch below that I think should fix the issue, but again
>> I don't know if that's upstreamable as-is.
>>
>>> Best regards,
>>> Zixun LI
>>>
>>> [1] https://github.com/torvalds/linux/blob/19583ca584d6f574384e17fe7613dfaeadcdc4a6/drivers/mtd/nand/atmel_nand.c#L1058
>>> [2] https://github.com/torvalds/linux/blob/ffc253263a1375a65fa6c9f62a893e9767fbebfa/drivers/mtd/nand/raw/atmel/pmecc.c#L772
>>>
>> Hopefully that can help !
>> Best regards,
>> Ada
>>
> Hi Ada,
> Thank you for looking into this. Internally I've written the same patch as
> yours and it works fine.

Glad to know that you already have a working solution and that I wasn't too far off !
I got interested and wanted to dig into it, I hope it didn't come off wrong.

> What's more interesting is the issue happens depending on chip individual
> difference or aging. Among 3 chips tested, two with date code of 1933 and
> one with 2223. The 1st one has many ECC errors as in the mail, 2nd one
> has less errors, while the 3rd one passed the nandtest without error.
> Maybe that's why this issue is overlooked.

Really interesting behaviour, that could indeed explain how it might 
have been missed during the original refactor as it seems it was tested 
with similar SoCs.

> Best regards,
> Zixun LI
Best regards,
Ada


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-02 17:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 14:38 mtd: rawnand: atmel: ECC error after update to kernel 6.6 Zixun LI
2025-07-01 15:48 ` Ada Couprie Diaz
2025-07-01 18:04   ` Zixun LI
2025-07-02 17:20     ` Ada Couprie Diaz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).