* Boot failed after patch "mtd: rawnand: Support for sequential cache reads" @ 2023-05-25 7:48 Alexander Shiyan 2023-05-26 18:14 ` Miquel Raynal 0 siblings, 1 reply; 15+ messages in thread From: Alexander Shiyan @ 2023-05-25 7:48 UTC (permalink / raw) To: Miquel Raynal, JaimeLiao, linux-mtd Hello. Kernel boot fails after patch "mtd: rawnand: Support for sequential cache reads" (thanks to git bisect). Please advise what can be done here and where to look for a bug. Thanks. ... omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda nand: Micron MT29F2G08ABAEAWP nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... VFS: Mounted root (squashfs filesystem) readonly on device 254:0. devtmpfs: mounted Freeing unused kernel image (initmem) memory: 1024K Run /sbin/init as init process SQUASHFS error: lzo decompression failed, data probably corrupt SQUASHFS error: Failed to read block 0xd291c2: -5 SQUASHFS error: lzo decompression failed, data probably corrupt SQUASHFS error: Failed to read block 0xd291c2: -5 SQUASHFS error: Unable to read data cache entry [d291c2] SQUASHFS error: Unable to read page, block d291c2, size 14307 SQUASHFS error: Unable to read data cache entry [d291c2] SQUASHFS error: Unable to read page, block d291c2, size 14307 Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened Device Tree) unwind_backtrace from show_stack+0xb/0xc show_stack from dump_stack_lvl+0x2b/0x34 dump_stack_lvl from panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 make_task_dead from 0xc102ca80 ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-25 7:48 Boot failed after patch "mtd: rawnand: Support for sequential cache reads" Alexander Shiyan @ 2023-05-26 18:14 ` Miquel Raynal 2023-05-29 6:10 ` Alexander Shiyan 0 siblings, 1 reply; 15+ messages in thread From: Miquel Raynal @ 2023-05-26 18:14 UTC (permalink / raw) To: Alexander Shiyan; +Cc: JaimeLiao, linux-mtd Hi Alexander, eagle.alexander923@gmail.com wrote on Thu, 25 May 2023 10:48:39 +0300: > Hello. > > Kernel boot fails after patch "mtd: rawnand: Support for sequential > cache reads" (thanks to git bisect). > Please advise what can be done here and where to look for a bug. Thanks for the report, and sorry for the trouble. Right now I don't know what's wrong with the driver but as a first step, you could just try to reset chip->controller->supported_op.cont_read after rawnand_check_cont_read_support(). It should just avoid using the optimization and solve the boot. That's of course a very early fix, we now need to understand further what's going on. My first guess would be that the sequential read patterns are not supported by the controller or badly implemented by its driver. But that is strange given the simplicity of this controller. This controller is meant to be versatile, I doubt it does not support these operations. Plus, I would expect page accesses to be directly implemented by the driver and not be affected by this logic. Could you try to trace the actual calls which are made through the mtd layer which lead to these errors? Is ->exec_op() involved in the process? Where? How? Also, what kernel are you using exactly? I'm surprised there is no mtd-related error. If you reboot with an older kernel, you get your data, right? Otherwise maybe the Micron chip is in fault. Which would mean that there are unsupported commands. I believed they were all standard, maybe some of them are optional? Could you check in the chip datasheet if there is any command used there that is unsupported? > > Thanks. > > ... > omap-gpmc 50000000.gpmc: GPMC revision 6.0 > ... > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > nand: Micron MT29F2G08ABAEAWP > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme > ... > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > devtmpfs: mounted > Freeing unused kernel image (initmem) memory: 1024K > Run /sbin/init as init process > SQUASHFS error: lzo decompression failed, data probably corrupt > SQUASHFS error: Failed to read block 0xd291c2: -5 > SQUASHFS error: lzo decompression failed, data probably corrupt > SQUASHFS error: Failed to read block 0xd291c2: -5 > SQUASHFS error: Unable to read data cache entry [d291c2] > SQUASHFS error: Unable to read page, block d291c2, size 14307 > SQUASHFS error: Unable to read data cache entry [d291c2] > SQUASHFS error: Unable to read page, block d291c2, size 14307 > Kernel panic - not syncing: Attempted to kill init! > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 > Hardware name: Generic AM33XX (Flattened Device Tree) > unwind_backtrace from show_stack+0xb/0xc > show_stack from dump_stack_lvl+0x2b/0x34 > dump_stack_lvl from panic+0xbd/0x230 > panic from make_task_dead+0x1/0x120 > make_task_dead from 0xc102ca80 > ---[ end Kernel panic - not syncing: Attempted to kill init! > exitcode=0x00000007 ]--- Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-26 18:14 ` Miquel Raynal @ 2023-05-29 6:10 ` Alexander Shiyan 2023-05-29 8:12 ` Miquel Raynal 0 siblings, 1 reply; 15+ messages in thread From: Alexander Shiyan @ 2023-05-29 6:10 UTC (permalink / raw) To: Miquel Raynal; +Cc: JaimeLiao, linux-mtd Hello Miquel. пт, 26 мая 2023 г. в 21:14, Miquel Raynal <miquel.raynal@bootlin.com>: > Hi Alexander, > eagle.alexander923@gmail.com wrote on Thu, 25 May 2023 10:48:39 +0300: > > Hello. > > Kernel boot fails after patch "mtd: rawnand: Support for sequential > > cache reads" (thanks to git bisect). > > Please advise what can be done here and where to look for a bug. > Thanks for the report, and sorry for the trouble. Right now I don't > know what's wrong with the driver but as a first step, you could just > try to reset chip->controller->supported_op.cont_read after > rawnand_check_cont_read_support(). It should just avoid using the > optimization and solve the boot. That's of course a very early fix, we > now need to understand further what's going on. When I comment out the line "rawnand_check_cont_read_support(chip);" the booting works as expected. > My first guess would be that the sequential read patterns are not > supported by the controller or badly implemented by its driver. But > that is strange given the simplicity of this controller. This > controller is meant to be versatile, I doubt it does not support these > operations. Plus, I would expect page accesses to be directly > implemented by the driver and not be affected by this logic. Could you > try to trace the actual calls which are made through the mtd layer > which lead to these errors? Is ->exec_op() involved in the process? > Where? How? Yes, Here everything goes as expected, debugging shows that the correct opcodes are passing, for the NAND_CMD_READCACHESEQ it is 0x31. > Also, what kernel are you using exactly? I'm surprised there is no > mtd-related error. If you reboot with an older kernel, you get your > data, right? Right. This bug appeared in Linux 6.3. For 6.2 everything worked as expected, so I used "git bisect" to find the point where the error occurs. > Otherwise maybe the Micron chip is in fault. Which would mean that > there are unsupported commands. I believed they were all standard, > maybe some of them are optional? Could you check in the chip datasheet > if there is any command used there that is unsupported? According to the MT29F2G08ABAEAWP datasheet, the chip supports the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: 4. These commands supported only with ECC disabled. 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command when the array is busy (RDY = 1, ARDY = 0) is supported if the previous command was a READ PAGE (00h-30h) or READ PAGE CACHE series command; otherwise, it is prohibited. As far as I understand, the second remark suits us, since we create the correct sequence. But the first remark can be a problem in this case. > > ... > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 > > ... > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > nand: Micron MT29F2G08ABAEAWP > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme > > ... > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > devtmpfs: mounted > > Freeing unused kernel image (initmem) memory: 1024K > > Run /sbin/init as init process > > SQUASHFS error: lzo decompression failed, data probably corrupt > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > SQUASHFS error: lzo decompression failed, data probably corrupt > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > SQUASHFS error: Unable to read data cache entry [d291c2] > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > SQUASHFS error: Unable to read data cache entry [d291c2] > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > Kernel panic - not syncing: Attempted to kill init! > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 > > Hardware name: Generic AM33XX (Flattened Device Tree) > > unwind_backtrace from show_stack+0xb/0xc > > show_stack from dump_stack_lvl+0x2b/0x34 > > dump_stack_lvl from panic+0xbd/0x230 > > panic from make_task_dead+0x1/0x120 > > make_task_dead from 0xc102ca80 > > ---[ end Kernel panic - not syncing: Attempted to kill init! > > exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-29 6:10 ` Alexander Shiyan @ 2023-05-29 8:12 ` Miquel Raynal 2023-05-29 10:33 ` Alexander Shiyan 0 siblings, 1 reply; 15+ messages in thread From: Miquel Raynal @ 2023-05-29 8:12 UTC (permalink / raw) To: Alexander Shiyan; +Cc: JaimeLiao, linux-mtd Hi Alexander, eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 09:10:32 +0300: > Hello Miquel. > > пт, 26 мая 2023 г. в 21:14, Miquel Raynal <miquel.raynal@bootlin.com>: > > Hi Alexander, > > eagle.alexander923@gmail.com wrote on Thu, 25 May 2023 10:48:39 +0300: > > > Hello. > > > Kernel boot fails after patch "mtd: rawnand: Support for sequential > > > cache reads" (thanks to git bisect). > > > Please advise what can be done here and where to look for a bug. > > Thanks for the report, and sorry for the trouble. Right now I don't > > know what's wrong with the driver but as a first step, you could just > > try to reset chip->controller->supported_op.cont_read after > > rawnand_check_cont_read_support(). It should just avoid using the > > optimization and solve the boot. That's of course a very early fix, we > > now need to understand further what's going on. > > When I comment out the line "rawnand_check_cont_read_support(chip);" > the booting works as expected. > > > My first guess would be that the sequential read patterns are not > > supported by the controller or badly implemented by its driver. But > > that is strange given the simplicity of this controller. This > > controller is meant to be versatile, I doubt it does not support these > > operations. Plus, I would expect page accesses to be directly > > implemented by the driver and not be affected by this logic. Could you > > try to trace the actual calls which are made through the mtd layer > > which lead to these errors? Is ->exec_op() involved in the process? > > Where? How? > > Yes, Here everything goes as expected, debugging shows that the correct > opcodes are passing, for the NAND_CMD_READCACHESEQ it is 0x31. > > > Also, what kernel are you using exactly? I'm surprised there is no > > mtd-related error. If you reboot with an older kernel, you get your > > data, right? > > Right. This bug appeared in Linux 6.3. For 6.2 everything worked as expected, > so I used "git bisect" to find the point where the error occurs. > > > Otherwise maybe the Micron chip is in fault. Which would mean that > > there are unsupported commands. I believed they were all standard, > > maybe some of them are optional? Could you check in the chip datasheet > > if there is any command used there that is unsupported? > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > 4. These commands supported only with ECC disabled. > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > command; otherwise, it is prohibited. > > As far as I understand, the second remark suits us, since we create > the correct sequence. Exactly, we do: READ0 (0), READSTART (30), READCACHESEQ (31), data, READCACHESEQ (31), data, ... READCACHEEND (3f), data. which is what the datasheet tells us I believe. > But the first remark can be a problem in this case. I was not aware of this limitation, it's only written in the summary, not in the details about the commands, nice finding. We need to prevent on-die ECC users from enabling this feature. But given the below trace, you're not using the on-die ECC engine, right? It looks like you're using the controller's ELM engine to perform ECC correction, so I don't see why this specific limitation would hit us. Can you confirm the ECC engine of the chip is disabled? > > > ... > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 > > > ... > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > nand: Micron MT29F2G08ABAEAWP > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme > > > ... > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > devtmpfs: mounted > > > Freeing unused kernel image (initmem) memory: 1024K > > > Run /sbin/init as init process > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > Kernel panic - not syncing: Attempted to kill init! > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 > > > Hardware name: Generic AM33XX (Flattened Device Tree) > > > unwind_backtrace from show_stack+0xb/0xc > > > show_stack from dump_stack_lvl+0x2b/0x34 > > > dump_stack_lvl from panic+0xbd/0x230 > > > panic from make_task_dead+0x1/0x120 > > > make_task_dead from 0xc102ca80 > > > ---[ end Kernel panic - not syncing: Attempted to kill init! > > > exitcode=0x00000007 ]--- Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-29 8:12 ` Miquel Raynal @ 2023-05-29 10:33 ` Alexander Shiyan 2023-05-29 11:46 ` Miquel Raynal 0 siblings, 1 reply; 15+ messages in thread From: Alexander Shiyan @ 2023-05-29 10:33 UTC (permalink / raw) To: Miquel Raynal; +Cc: JaimeLiao, linux-mtd пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: ... According to the MT29F2G08ABAEAWP datasheet, the chip supports > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > 4. These commands supported only with ECC disabled. > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > command; otherwise, it is prohibited. > > > > As far as I understand, the second remark suits us, since we create > > the correct sequence. > > Exactly, we do: > > READ0 (0), READSTART (30), > READCACHESEQ (31), data, > READCACHESEQ (31), data, > ... > READCACHEEND (3f), data. > > which is what the datasheet tells us I believe. > > > But the first remark can be a problem in this case. > > I was not aware of this limitation, it's only written in the summary, > not in the details about the commands, nice finding. We need to prevent > on-die ECC users from enabling this feature. > > But given the below trace, you're not using the on-die ECC engine, > right? It looks like you're using the controller's ELM engine to > perform ECC correction, so I don't see why this specific limitation > would hit us. Can you confirm the ECC engine of the chip is disabled? Yes, on-die ECC is disabled. Please advise where I can insert some debug messages to clear things up. > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 > > > > ... > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > nand: Micron MT29F2G08ABAEAWP > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme > > > > ... > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > devtmpfs: mounted > > > > Freeing unused kernel image (initmem) memory: 1024K > > > > Run /sbin/init as init process > > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > > Kernel panic - not syncing: Attempted to kill init! > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 > > > > Hardware name: Generic AM33XX (Flattened Device Tree) > > > > unwind_backtrace from show_stack+0xb/0xc > > > > show_stack from dump_stack_lvl+0x2b/0x34 > > > > dump_stack_lvl from panic+0xbd/0x230 > > > > panic from make_task_dead+0x1/0x120 > > > > make_task_dead from 0xc102ca80 > > > > ---[ end Kernel panic - not syncing: Attempted to kill init! > > > > exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-29 10:33 ` Alexander Shiyan @ 2023-05-29 11:46 ` Miquel Raynal 2023-05-30 14:48 ` [EXT] " Bean Huo 0 siblings, 1 reply; 15+ messages in thread From: Miquel Raynal @ 2023-05-29 11:46 UTC (permalink / raw) To: Alexander Shiyan; +Cc: JaimeLiao, linux-mtd, Bean Huo (beanhuo) Hi Bean, I'm adding you to this thread because I'm clueless regarding what's happening to Alexander. Short recap: sequential page reads seem to fail with an MT29F Micron chip. Alexander is using a gpmc controller without on-die ECC, I doubt the error comes from the controller, so I would like to know if there is anything known to fail with these chips regarding the use of sequential reads. We can easily work around that situation if we identify the problem. Thanks a lot, Miquèl eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 13:33:04 +0300: > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > ... > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > 4. These commands supported only with ECC disabled. > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > command; otherwise, it is prohibited. > > > > > > As far as I understand, the second remark suits us, since we create > > > the correct sequence. > > > > Exactly, we do: > > > > READ0 (0), READSTART (30), > > READCACHESEQ (31), data, > > READCACHESEQ (31), data, > > ... > > READCACHEEND (3f), data. > > > > which is what the datasheet tells us I believe. > > > > > But the first remark can be a problem in this case. > > > > I was not aware of this limitation, it's only written in the summary, > > not in the details about the commands, nice finding. We need to prevent > > on-die ECC users from enabling this feature. > > > > But given the below trace, you're not using the on-die ECC engine, > > right? It looks like you're using the controller's ELM engine to > > perform ECC correction, so I don't see why this specific limitation > > would hit us. Can you confirm the ECC engine of the chip is disabled? > > Yes, on-die ECC is disabled. > Please advise where I can insert some debug messages to clear things up. > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 > > > > > ... > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme > > > > > ... > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > devtmpfs: mounted > > > > > Freeing unused kernel image (initmem) memory: 1024K > > > > > Run /sbin/init as init process > > > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > > > SQUASHFS error: lzo decompression failed, data probably corrupt > > > > > SQUASHFS error: Failed to read block 0xd291c2: -5 > > > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > > > SQUASHFS error: Unable to read data cache entry [d291c2] > > > > > SQUASHFS error: Unable to read page, block d291c2, size 14307 > > > > > Kernel panic - not syncing: Attempted to kill init! > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted 6.3.0+ #105 > > > > > Hardware name: Generic AM33XX (Flattened Device Tree) > > > > > unwind_backtrace from show_stack+0xb/0xc > > > > > show_stack from dump_stack_lvl+0x2b/0x34 > > > > > dump_stack_lvl from panic+0xbd/0x230 > > > > > panic from make_task_dead+0x1/0x120 > > > > > make_task_dead from 0xc102ca80 > > > > > ---[ end Kernel panic - not syncing: Attempted to kill init! > > > > > exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-29 11:46 ` Miquel Raynal @ 2023-05-30 14:48 ` Bean Huo 2023-05-31 9:02 ` Alexander Shiyan 0 siblings, 1 reply; 15+ messages in thread From: Bean Huo @ 2023-05-30 14:48 UTC (permalink / raw) To: Miquel Raynal, Alexander Shiyan; +Cc: JaimeLiao, linux-mtd@lists.infradead.org Hi Miquel, Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash Performance. https://media-www.micron.com/-/media/client/global/documents/products/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea097ecda If Alexander can do tracing with oscilloscope on the parallel NAND bus, that would be help to check if this is the timing issue. Kind regards, Bean > -----Original Message----- > From: Miquel Raynal <miquel.raynal@bootlin.com> > Sent: Monday, May 29, 2023 1:46 PM > To: Alexander Shiyan <eagle.alexander923@gmail.com> > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; linux-mtd@lists.infradead.org; Bean Huo > <beanhuo@micron.com> > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > cache reads" > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > recognize the sender and were expecting this message. > > > Hi Bean, > > I'm adding you to this thread because I'm clueless regarding what's happening to > Alexander. > > Short recap: sequential page reads seem to fail with an MT29F Micron chip. > Alexander is using a gpmc controller without on-die ECC, I doubt the error comes > from the controller, so I would like to know if there is anything known to fail with > these chips regarding the use of sequential reads. We can easily work around that > situation if we identify the problem. > > Thanks a lot, > Miquèl > > eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 > 13:33:04 +0300: > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > > ... > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > > 4. These commands supported only with ECC disabled. > > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > > command; otherwise, it is prohibited. > > > > > > > > As far as I understand, the second remark suits us, since we > > > > create the correct sequence. > > > > > > Exactly, we do: > > > > > > READ0 (0), READSTART (30), > > > READCACHESEQ (31), data, > > > READCACHESEQ (31), data, > > > ... > > > READCACHEEND (3f), data. > > > > > > which is what the datasheet tells us I believe. > > > > > > > But the first remark can be a problem in this case. > > > > > > I was not aware of this limitation, it's only written in the > > > summary, not in the details about the commands, nice finding. We > > > need to prevent on-die ECC users from enabling this feature. > > > > > > But given the below trace, you're not using the on-die ECC engine, > > > right? It looks like you're using the controller's ELM engine to > > > perform ECC correction, so I don't see why this specific limitation > > > would hit us. Can you confirm the ECC engine of the chip is disabled? > > > > Yes, on-die ECC is disabled. > > Please advise where I can insert some debug messages to clear things up. > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > > > > > > size: 64 > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > > devtmpfs: mounted > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run > > > > > > /sbin/init as init process SQUASHFS error: lzo decompression > > > > > > failed, data probably corrupt SQUASHFS error: Failed to read > > > > > > block 0xd291c2: -5 SQUASHFS error: lzo decompression failed, > > > > > > data probably corrupt SQUASHFS error: Failed to read block > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache entry > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > size 14307 SQUASHFS error: Unable to read data cache entry > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > size 14307 Kernel panic - not syncing: Attempted to kill init! > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened Device > > > > > > Tree) unwind_backtrace from show_stack+0xb/0xc show_stack > > > > > > from dump_stack_lvl+0x2b/0x34 dump_stack_lvl from > > > > > > panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - not > > > > > > syncing: Attempted to kill init! > > > > > > exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-30 14:48 ` [EXT] " Bean Huo @ 2023-05-31 9:02 ` Alexander Shiyan 2023-06-01 7:37 ` Miquel Raynal 0 siblings, 1 reply; 15+ messages in thread From: Alexander Shiyan @ 2023-05-31 9:02 UTC (permalink / raw) To: Bean Huo; +Cc: Miquel Raynal, JaimeLiao, linux-mtd@lists.infradead.org [-- Attachment #1: Type: text/plain, Size: 5419 bytes --] Hello. I'm not very sure what I should have measured and how to catch the right moment. Here's what happened: The first shot was taken immediately after the board was launched, the second before the kernel crashes. Yellow beam - R/~B, blue - AD1. вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo@micron.com>: > > Hi Miquel, > > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash Performance. > https://media-www.micron.com/-/media/client/global/documents/products/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea097ecda > > > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that would be help to check if this is the timing issue. > > Kind regards, > Bean > > > -----Original Message----- > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > Sent: Monday, May 29, 2023 1:46 PM > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; linux-mtd@lists.infradead.org; Bean Huo > > <beanhuo@micron.com> > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > > cache reads" > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > > recognize the sender and were expecting this message. > > > > > > Hi Bean, > > > > I'm adding you to this thread because I'm clueless regarding what's happening to > > Alexander. > > > > Short recap: sequential page reads seem to fail with an MT29F Micron chip. > > Alexander is using a gpmc controller without on-die ECC, I doubt the error comes > > from the controller, so I would like to know if there is anything known to fail with > > these chips regarding the use of sequential reads. We can easily work around that > > situation if we identify the problem. > > > > Thanks a lot, > > Miquèl > > > > eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 > > 13:33:04 +0300: > > > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > > > ... > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > > > 4. These commands supported only with ECC disabled. > > > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > > > command; otherwise, it is prohibited. > > > > > > > > > > As far as I understand, the second remark suits us, since we > > > > > create the correct sequence. > > > > > > > > Exactly, we do: > > > > > > > > READ0 (0), READSTART (30), > > > > READCACHESEQ (31), data, > > > > READCACHESEQ (31), data, > > > > ... > > > > READCACHEEND (3f), data. > > > > > > > > which is what the datasheet tells us I believe. > > > > > > > > > But the first remark can be a problem in this case. > > > > > > > > I was not aware of this limitation, it's only written in the > > > > summary, not in the details about the commands, nice finding. We > > > > need to prevent on-die ECC users from enabling this feature. > > > > > > > > But given the below trace, you're not using the on-die ECC engine, > > > > right? It looks like you're using the controller's ELM engine to > > > > perform ECC correction, so I don't see why this specific limitation > > > > would hit us. Can you confirm the ECC engine of the chip is disabled? > > > > > > Yes, on-die ECC is disabled. > > > Please advise where I can insert some debug messages to clear things up. > > > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > > > > > > > size: 64 > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > > > devtmpfs: mounted > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run > > > > > > > /sbin/init as init process SQUASHFS error: lzo decompression > > > > > > > failed, data probably corrupt SQUASHFS error: Failed to read > > > > > > > block 0xd291c2: -5 SQUASHFS error: lzo decompression failed, > > > > > > > data probably corrupt SQUASHFS error: Failed to read block > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache entry > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > > size 14307 SQUASHFS error: Unable to read data cache entry > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > > size 14307 Kernel panic - not syncing: Attempted to kill init! > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened Device > > > > > > > Tree) unwind_backtrace from show_stack+0xb/0xc show_stack > > > > > > > from dump_stack_lvl+0x2b/0x34 dump_stack_lvl from > > > > > > > panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - not > > > > > > > syncing: Attempted to kill init! > > > > > > > exitcode=0x00000007 ]--- [-- Attachment #2: F0000TEK.png --] [-- Type: image/png, Size: 4209 bytes --] [-- Attachment #3: F0001TEK.png --] [-- Type: image/png, Size: 3948 bytes --] [-- Attachment #4: Type: text/plain, Size: 144 bytes --] ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-05-31 9:02 ` Alexander Shiyan @ 2023-06-01 7:37 ` Miquel Raynal 2023-06-01 16:55 ` Bean Huo 0 siblings, 1 reply; 15+ messages in thread From: Miquel Raynal @ 2023-06-01 7:37 UTC (permalink / raw) To: Alexander Shiyan; +Cc: Bean Huo, JaimeLiao, linux-mtd@lists.infradead.org Hello, eagle.alexander923@gmail.com wrote on Wed, 31 May 2023 12:02:08 +0300: > Hello. > > I'm not very sure what I should have measured and how to catch the right moment. > Here's what happened: The first shot was taken immediately after the > board was launched, > the second before the kernel crashes. Yellow beam - R/~B, blue - AD1. Bean, can you be more specific about what timings you need to see? Where do you think we might have a timing issue? Maybe we can just add delays and see if we get different results. I did not understand where the below link needs to be looked at specifically. What bothers me though, is the absence of ECC error, just like if the data was fine. Alexander, how is it possible that the NAND controller does not complain about errors while squashfs does? Can you also dump a buffer and compare with what you expect? Is the data fully random? Smashed somewhere specific...? Thanks, Miquèl > вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo@micron.com>: > > > > Hi Miquel, > > > > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash Performance. > > https://media-www.micron.com/-/media/client/global/documents/products/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea097ecda > > > > > > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that would be help to check if this is the timing issue. > > > > Kind regards, > > Bean > > > > > -----Original Message----- > > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > > Sent: Monday, May 29, 2023 1:46 PM > > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; linux-mtd@lists.infradead.org; Bean Huo > > > <beanhuo@micron.com> > > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > > > cache reads" > > > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > > > recognize the sender and were expecting this message. > > > > > > > > > Hi Bean, > > > > > > I'm adding you to this thread because I'm clueless regarding what's happening to > > > Alexander. > > > > > > Short recap: sequential page reads seem to fail with an MT29F Micron chip. > > > Alexander is using a gpmc controller without on-die ECC, I doubt the error comes > > > from the controller, so I would like to know if there is anything known to fail with > > > these chips regarding the use of sequential reads. We can easily work around that > > > situation if we identify the problem. > > > > > > Thanks a lot, > > > Miquèl > > > > > > eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 > > > 13:33:04 +0300: > > > > > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > > > > ... > > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > > > > 4. These commands supported only with ECC disabled. > > > > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > > > > command; otherwise, it is prohibited. > > > > > > > > > > > > As far as I understand, the second remark suits us, since we > > > > > > create the correct sequence. > > > > > > > > > > Exactly, we do: > > > > > > > > > > READ0 (0), READSTART (30), > > > > > READCACHESEQ (31), data, > > > > > READCACHESEQ (31), data, > > > > > ... > > > > > READCACHEEND (3f), data. > > > > > > > > > > which is what the datasheet tells us I believe. > > > > > > > > > > > But the first remark can be a problem in this case. > > > > > > > > > > I was not aware of this limitation, it's only written in the > > > > > summary, not in the details about the commands, nice finding. We > > > > > need to prevent on-die ECC users from enabling this feature. > > > > > > > > > > But given the below trace, you're not using the on-die ECC engine, > > > > > right? It looks like you're using the controller's ELM engine to > > > > > perform ECC correction, so I don't see why this specific limitation > > > > > would hit us. Can you confirm the ECC engine of the chip is disabled? > > > > > > > > Yes, on-die ECC is disabled. > > > > Please advise where I can insert some debug messages to clear things up. > > > > > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB > > > > > > > > size: 64 > > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... > > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > > > > devtmpfs: mounted > > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run > > > > > > > > /sbin/init as init process SQUASHFS error: lzo decompression > > > > > > > > failed, data probably corrupt SQUASHFS error: Failed to read > > > > > > > > block 0xd291c2: -5 SQUASHFS error: lzo decompression failed, > > > > > > > > data probably corrupt SQUASHFS error: Failed to read block > > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache entry > > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > > > size 14307 SQUASHFS error: Unable to read data cache entry > > > > > > > > [d291c2] SQUASHFS error: Unable to read page, block d291c2, > > > > > > > > size 14307 Kernel panic - not syncing: Attempted to kill init! > > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted > > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened Device > > > > > > > > Tree) unwind_backtrace from show_stack+0xb/0xc show_stack > > > > > > > > from dump_stack_lvl+0x2b/0x34 dump_stack_lvl from > > > > > > > > panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 > > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - not > > > > > > > > syncing: Attempted to kill init! > > > > > > > > exitcode=0x00000007 ]--- Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-01 7:37 ` Miquel Raynal @ 2023-06-01 16:55 ` Bean Huo 2023-06-02 9:52 ` Alexander Shiyan 0 siblings, 1 reply; 15+ messages in thread From: Bean Huo @ 2023-06-01 16:55 UTC (permalink / raw) To: Miquel Raynal, Alexander Shiyan; +Cc: JaimeLiao, linux-mtd@lists.infradead.org Hi Miquel, As you mentioned no ECC error. And SquashFS complains: Unable to read data cache. We want to see I/Ox, RE#, WE# and R/B#, to check if command input and data output properly. It is better to capture the command 31h, and its following data. Kind regards, Bean > -----Original Message----- > From: Miquel Raynal <miquel.raynal@bootlin.com> > Sent: Thursday, June 1, 2023 9:38 AM > To: Alexander Shiyan <eagle.alexander923@gmail.com> > Cc: Bean Huo <beanhuo@micron.com>; JaimeLiao <jaimeliao.tw@gmail.com>; > linux-mtd@lists.infradead.org > Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > cache reads" > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > recognize the sender and were expecting this message. > > > Hello, > > eagle.alexander923@gmail.com wrote on Wed, 31 May 2023 12:02:08 +0300: > > > Hello. > > > > I'm not very sure what I should have measured and how to catch the right moment. > > Here's what happened: The first shot was taken immediately after the > > board was launched, the second before the kernel crashes. Yellow beam > > - R/~B, blue - AD1. > > Bean, can you be more specific about what timings you need to see? > Where do you think we might have a timing issue? Maybe we can just add delays and > see if we get different results. I did not understand where the below link needs to be > looked at specifically. > > What bothers me though, is the absence of ECC error, just like if the data was fine. > Alexander, how is it possible that the NAND controller does not complain about > errors while squashfs does? > > Can you also dump a buffer and compare with what you expect? Is the data fully > random? Smashed somewhere specific...? > > Thanks, > Miquèl > > > вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo@micron.com>: > > > > > > Hi Miquel, > > > > > > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash > Performance. > > > https://media-www.micron.com/-/media/client/global/documents/product > > > s/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea > > > 097ecda > > > > > > > > > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that > would be help to check if this is the timing issue. > > > > > > Kind regards, > > > Bean > > > > > > > -----Original Message----- > > > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > > > Sent: Monday, May 29, 2023 1:46 PM > > > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > > > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; > > > > linux-mtd@lists.infradead.org; Bean Huo <beanhuo@micron.com> > > > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support > > > > for sequential cache reads" > > > > > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments > > > > unless you recognize the sender and were expecting this message. > > > > > > > > > > > > Hi Bean, > > > > > > > > I'm adding you to this thread because I'm clueless regarding > > > > what's happening to Alexander. > > > > > > > > Short recap: sequential page reads seem to fail with an MT29F Micron chip. > > > > Alexander is using a gpmc controller without on-die ECC, I doubt > > > > the error comes from the controller, so I would like to know if > > > > there is anything known to fail with these chips regarding the use > > > > of sequential reads. We can easily work around that situation if we identify the > problem. > > > > > > > > Thanks a lot, > > > > Miquèl > > > > > > > > eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 > > > > 13:33:04 +0300: > > > > > > > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > > > > > ... > > > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > > > > > 4. These commands supported only with ECC disabled. > > > > > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > > > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > > > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > > > > > command; otherwise, it is prohibited. > > > > > > > > > > > > > > As far as I understand, the second remark suits us, since we > > > > > > > create the correct sequence. > > > > > > > > > > > > Exactly, we do: > > > > > > > > > > > > READ0 (0), READSTART (30), > > > > > > READCACHESEQ (31), data, > > > > > > READCACHESEQ (31), data, > > > > > > ... > > > > > > READCACHEEND (3f), data. > > > > > > > > > > > > which is what the datasheet tells us I believe. > > > > > > > > > > > > > But the first remark can be a problem in this case. > > > > > > > > > > > > I was not aware of this limitation, it's only written in the > > > > > > summary, not in the details about the commands, nice finding. > > > > > > We need to prevent on-die ECC users from enabling this feature. > > > > > > > > > > > > But given the below trace, you're not using the on-die ECC > > > > > > engine, right? It looks like you're using the controller's ELM > > > > > > engine to perform ECC correction, so I don't see why this > > > > > > specific limitation would hit us. Can you confirm the ECC engine of the chip > is disabled? > > > > > > > > > > Yes, on-die ECC is disabled. > > > > > Please advise where I can insert some debug messages to clear things up. > > > > > > > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... > > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: > > > > > > > > > 2048, OOB > > > > > > > > > size: 64 > > > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... > > > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > > > > > devtmpfs: mounted > > > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run > > > > > > > > > /sbin/init as init process SQUASHFS error: lzo > > > > > > > > > decompression failed, data probably corrupt SQUASHFS > > > > > > > > > error: Failed to read block 0xd291c2: -5 SQUASHFS error: > > > > > > > > > lzo decompression failed, data probably corrupt SQUASHFS > > > > > > > > > error: Failed to read block > > > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache > > > > > > > > > entry [d291c2] SQUASHFS error: Unable to read page, > > > > > > > > > block d291c2, size 14307 SQUASHFS error: Unable to read > > > > > > > > > data cache entry [d291c2] SQUASHFS error: Unable to read > > > > > > > > > page, block d291c2, size 14307 Kernel panic - not syncing: Attempted > to kill init! > > > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted > > > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened > > > > > > > > > Device > > > > > > > > > Tree) unwind_backtrace from show_stack+0xb/0xc > > > > > > > > > show_stack from dump_stack_lvl+0x2b/0x34 dump_stack_lvl > > > > > > > > > from > > > > > > > > > panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 > > > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - > > > > > > > > > not > > > > > > > > > syncing: Attempted to kill init! > > > > > > > > > exitcode=0x00000007 ]--- > > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-01 16:55 ` Bean Huo @ 2023-06-02 9:52 ` Alexander Shiyan 2023-06-05 8:39 ` Miquel Raynal 0 siblings, 1 reply; 15+ messages in thread From: Alexander Shiyan @ 2023-06-02 9:52 UTC (permalink / raw) To: Bean Huo; +Cc: Miquel Raynal, JaimeLiao, linux-mtd@lists.infradead.org Hello. > As you mentioned no ECC error. And SquashFS complains: Unable to read data cache. > We want to see I/Ox, RE#, WE# and R/B#, to check if command input and data output properly. > It is better to capture the command 31h, and its following data. I only have a two-channel oscilloscope :) In any case, in order to capture the right moment, I need to somehow loop command 0x31 when an error occurs in order to take a picture. Right? Please, tell me where I can patch the nand_base to get such an infinity loop on error? Thanks! > > -----Original Message----- > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > Sent: Thursday, June 1, 2023 9:38 AM > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > Cc: Bean Huo <beanhuo@micron.com>; JaimeLiao <jaimeliao.tw@gmail.com>; > > linux-mtd@lists.infradead.org > > Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > > cache reads" > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > > recognize the sender and were expecting this message. > > > > > > Hello, > > > > eagle.alexander923@gmail.com wrote on Wed, 31 May 2023 12:02:08 +0300: > > > > > Hello. > > > > > > I'm not very sure what I should have measured and how to catch the right moment. > > > Here's what happened: The first shot was taken immediately after the > > > board was launched, the second before the kernel crashes. Yellow beam > > > - R/~B, blue - AD1. > > > > Bean, can you be more specific about what timings you need to see? > > Where do you think we might have a timing issue? Maybe we can just add delays and > > see if we get different results. I did not understand where the below link needs to be > > looked at specifically. > > > > What bothers me though, is the absence of ECC error, just like if the data was fine. > > Alexander, how is it possible that the NAND controller does not complain about > > errors while squashfs does? > > > > Can you also dump a buffer and compare with what you expect? Is the data fully > > random? Smashed somewhere specific...? > > > > Thanks, > > Miquèl > > > > > вт, 30 мая 2023 г. в 17:49, Bean Huo <beanhuo@micron.com>: > > > > > > > > Hi Miquel, > > > > > > > > Thanks for reaching out me, here has a TN-29-01: Increasing NAND Flash > > Performance. > > > > https://media-www.micron.com/-/media/client/global/documents/product > > > > s/technical-note/nand-flash/tn2901.pdf?rev=a228fd154c274ef78669b67ea > > > > 097ecda > > > > > > > > > > > > If Alexander can do tracing with oscilloscope on the parallel NAND bus, that > > would be help to check if this is the timing issue. > > > > > > > > Kind regards, > > > > Bean > > > > > > > > > -----Original Message----- > > > > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > > > > Sent: Monday, May 29, 2023 1:46 PM > > > > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > > > > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; > > > > > linux-mtd@lists.infradead.org; Bean Huo <beanhuo@micron.com> > > > > > Subject: [EXT] Re: Boot failed after patch "mtd: rawnand: Support > > > > > for sequential cache reads" > > > > > > > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments > > > > > unless you recognize the sender and were expecting this message. > > > > > > > > > > > > > > > Hi Bean, > > > > > > > > > > I'm adding you to this thread because I'm clueless regarding > > > > > what's happening to Alexander. > > > > > > > > > > Short recap: sequential page reads seem to fail with an MT29F Micron chip. > > > > > Alexander is using a gpmc controller without on-die ECC, I doubt > > > > > the error comes from the controller, so I would like to know if > > > > > there is anything known to fail with these chips regarding the use > > > > > of sequential reads. We can easily work around that situation if we identify the > > problem. > > > > > > > > > > Thanks a lot, > > > > > Miquèl > > > > > > > > > > eagle.alexander923@gmail.com wrote on Mon, 29 May 2023 > > > > > 13:33:04 +0300: > > > > > > > > > > > пн, 29 мая 2023 г. в 11:12, Miquel Raynal <miquel.raynal@bootlin.com>: > > > > > > ... > > > > > > According to the MT29F2G08ABAEAWP datasheet, the chip supports > > > > > > > > the READ PAGE CACHE SEQUENTIAL opcode, but with two caveats: > > > > > > > > 4. These commands supported only with ECC disabled. > > > > > > > > 5. Issuing a READ PAGE CACHE series (31h, 00h-31h, 3Fh) command > > > > > > > > when the array is busy (RDY = 1, ARDY = 0) is supported if the previous > > > > > > > > command was a READ PAGE (00h-30h) or READ PAGE CACHE series > > > > > > > > command; otherwise, it is prohibited. > > > > > > > > > > > > > > > > As far as I understand, the second remark suits us, since we > > > > > > > > create the correct sequence. > > > > > > > > > > > > > > Exactly, we do: > > > > > > > > > > > > > > READ0 (0), READSTART (30), > > > > > > > READCACHESEQ (31), data, > > > > > > > READCACHESEQ (31), data, > > > > > > > ... > > > > > > > READCACHEEND (3f), data. > > > > > > > > > > > > > > which is what the datasheet tells us I believe. > > > > > > > > > > > > > > > But the first remark can be a problem in this case. > > > > > > > > > > > > > > I was not aware of this limitation, it's only written in the > > > > > > > summary, not in the details about the commands, nice finding. > > > > > > > We need to prevent on-die ECC users from enabling this feature. > > > > > > > > > > > > > > But given the below trace, you're not using the on-die ECC > > > > > > > engine, right? It looks like you're using the controller's ELM > > > > > > > engine to perform ECC correction, so I don't see why this > > > > > > > specific limitation would hit us. Can you confirm the ECC engine of the chip > > is disabled? > > > > > > > > > > > > Yes, on-die ECC is disabled. > > > > > > Please advise where I can insert some debug messages to clear things up. > > > > > > > > > > > > > > > > omap-gpmc 50000000.gpmc: GPMC revision 6.0 ... > > > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > > > > > > > > nand: Micron MT29F2G08ABAEAWP > > > > > > > > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: > > > > > > > > > > 2048, OOB > > > > > > > > > > size: 64 > > > > > > > > > > nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme ... > > > > > > > > > > VFS: Mounted root (squashfs filesystem) readonly on device 254:0. > > > > > > > > > > devtmpfs: mounted > > > > > > > > > > Freeing unused kernel image (initmem) memory: 1024K Run > > > > > > > > > > /sbin/init as init process SQUASHFS error: lzo > > > > > > > > > > decompression failed, data probably corrupt SQUASHFS > > > > > > > > > > error: Failed to read block 0xd291c2: -5 SQUASHFS error: > > > > > > > > > > lzo decompression failed, data probably corrupt SQUASHFS > > > > > > > > > > error: Failed to read block > > > > > > > > > > 0xd291c2: -5 SQUASHFS error: Unable to read data cache > > > > > > > > > > entry [d291c2] SQUASHFS error: Unable to read page, > > > > > > > > > > block d291c2, size 14307 SQUASHFS error: Unable to read > > > > > > > > > > data cache entry [d291c2] SQUASHFS error: Unable to read > > > > > > > > > > page, block d291c2, size 14307 Kernel panic - not syncing: Attempted > > to kill init! > > > > > > > > > > exitcode=0x00000007 CPU: 0 PID: 1 Comm: init Not tainted > > > > > > > > > > 6.3.0+ #105 Hardware name: Generic AM33XX (Flattened > > > > > > > > > > Device > > > > > > > > > > Tree) unwind_backtrace from show_stack+0xb/0xc > > > > > > > > > > show_stack from dump_stack_lvl+0x2b/0x34 dump_stack_lvl > > > > > > > > > > from > > > > > > > > > > panic+0xbd/0x230 panic from make_task_dead+0x1/0x120 > > > > > > > > > > make_task_dead from 0xc102ca80 ---[ end Kernel panic - > > > > > > > > > > not > > > > > > > > > > syncing: Attempted to kill init! > > > > > > > > > > exitcode=0x00000007 ]--- ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-02 9:52 ` Alexander Shiyan @ 2023-06-05 8:39 ` Miquel Raynal 2023-06-05 15:45 ` Bean Huo 0 siblings, 1 reply; 15+ messages in thread From: Miquel Raynal @ 2023-06-05 8:39 UTC (permalink / raw) To: Alexander Shiyan; +Cc: Bean Huo, JaimeLiao, linux-mtd@lists.infradead.org Hi Alexander, eagle.alexander923@gmail.com wrote on Fri, 2 Jun 2023 12:52:08 +0300: > Hello. > > > As you mentioned no ECC error. And SquashFS complains: Unable to read data cache. > > We want to see I/Ox, RE#, WE# and R/B#, to check if command input and data output properly. > > It is better to capture the command 31h, and its following data. > > I only have a two-channel oscilloscope :) > In any case, in order to capture the right moment, I need to somehow > loop command 0x31 > when an error occurs in order to take a picture. Right? > > Please, tell me where I can patch the nand_base to get such an > infinity loop on error? Anywhere in the core you could just make your own exec_op sequence and call it in a loop I guess? Bean, what are you trying to picture precisely? Have you ever had any issues with these commands? Can Alexander try to add a surgical delay somewhere? Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-05 8:39 ` Miquel Raynal @ 2023-06-05 15:45 ` Bean Huo 2023-06-26 9:22 ` Bean Huo 0 siblings, 1 reply; 15+ messages in thread From: Bean Huo @ 2023-06-05 15:45 UTC (permalink / raw) To: Miquel Raynal, Alexander Shiyan; +Cc: JaimeLiao, linux-mtd@lists.infradead.org Hi Miquel I only can say, we have not experienced this kind issue yet. My suggestion as yours, please firstly Let Alex add debug print in mtd, especially in NAND driver. To check what exactly error causes this SquashFS complain, is it IO error, or CRC, or ECC error, or NAND device no response and timeout? Kind regards, Bean > -----Original Message----- > From: Miquel Raynal <miquel.raynal@bootlin.com> > Sent: Monday, June 5, 2023 10:40 AM > To: Alexander Shiyan <eagle.alexander923@gmail.com> > Cc: Bean Huo <beanhuo@micron.com>; JaimeLiao <jaimeliao.tw@gmail.com>; > linux-mtd@lists.infradead.org > Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > cache reads" > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you > recognize the sender and were expecting this message. > > > Hi Alexander, > > eagle.alexander923@gmail.com wrote on Fri, 2 Jun 2023 12:52:08 +0300: > > > Hello. > > > > > As you mentioned no ECC error. And SquashFS complains: Unable to read data > cache. > > > We want to see I/Ox, RE#, WE# and R/B#, to check if command input and data > output properly. > > > It is better to capture the command 31h, and its following data. > > > > I only have a two-channel oscilloscope :) In any case, in order to > > capture the right moment, I need to somehow loop command 0x31 when an > > error occurs in order to take a picture. Right? > > > > Please, tell me where I can patch the nand_base to get such an > > infinity loop on error? > > Anywhere in the core you could just make your own exec_op sequence and call it in > a loop I guess? > > Bean, what are you trying to picture precisely? Have you ever had any issues with > these commands? Can Alexander try to add a surgical delay somewhere? > > Thanks, > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-05 15:45 ` Bean Huo @ 2023-06-26 9:22 ` Bean Huo 2023-07-03 9:16 ` Alexander Shiyan 0 siblings, 1 reply; 15+ messages in thread From: Bean Huo @ 2023-06-26 9:22 UTC (permalink / raw) To: 'Miquel Raynal', 'Alexander Shiyan' Cc: 'JaimeLiao', 'linux-mtd@lists.infradead.org' Hi Alexander, Do you have any update on this issue? Kind regards, Bean > -----Original Message----- > From: Bean Huo > Sent: Monday, June 5, 2023 5:46 PM > To: Miquel Raynal <miquel.raynal@bootlin.com>; Alexander Shiyan > <eagle.alexander923@gmail.com> > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; linux-mtd@lists.infradead.org > Subject: RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > cache reads" > > Hi Miquel > > I only can say, we have not experienced this kind issue yet. My suggestion as yours, > please firstly Let Alex add debug print in mtd, especially in NAND driver. To check > what exactly error causes this SquashFS complain, is it IO error, or CRC, or ECC > error, or NAND device no response and timeout? > > Kind regards, > Bean > > > -----Original Message----- > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > Sent: Monday, June 5, 2023 10:40 AM > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > Cc: Bean Huo <beanhuo@micron.com>; JaimeLiao <jaimeliao.tw@gmail.com>; > > linux-mtd@lists.infradead.org > > Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support > > for sequential cache reads" > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless > > you recognize the sender and were expecting this message. > > > > > > Hi Alexander, > > > > eagle.alexander923@gmail.com wrote on Fri, 2 Jun 2023 12:52:08 +0300: > > > > > Hello. > > > > > > > As you mentioned no ECC error. And SquashFS complains: Unable to > > > > read data > > cache. > > > > We want to see I/Ox, RE#, WE# and R/B#, to check if command input > > > > and data > > output properly. > > > > It is better to capture the command 31h, and its following data. > > > > > > I only have a two-channel oscilloscope :) In any case, in order to > > > capture the right moment, I need to somehow loop command 0x31 when > > > an error occurs in order to take a picture. Right? > > > > > > Please, tell me where I can patch the nand_base to get such an > > > infinity loop on error? > > > > Anywhere in the core you could just make your own exec_op sequence and > > call it in a loop I guess? > > > > Bean, what are you trying to picture precisely? Have you ever had any > > issues with these commands? Can Alexander try to add a surgical delay somewhere? > > > > Thanks, > > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential cache reads" 2023-06-26 9:22 ` Bean Huo @ 2023-07-03 9:16 ` Alexander Shiyan 0 siblings, 0 replies; 15+ messages in thread From: Alexander Shiyan @ 2023-07-03 9:16 UTC (permalink / raw) To: Bean Huo; +Cc: Miquel Raynal, JaimeLiao, linux-mtd@lists.infradead.org Hello. For now I just changed rawnand_check_cont_read_support() function to turn off supported_op.cont_read variable. I haven't tried anything new. пн, 26 июн. 2023 г. в 12:22, Bean Huo <beanhuo@micron.com>: > > Hi Alexander, > > Do you have any update on this issue? > > > Kind regards, > Bean > > > -----Original Message----- > > From: Bean Huo > > Sent: Monday, June 5, 2023 5:46 PM > > To: Miquel Raynal <miquel.raynal@bootlin.com>; Alexander Shiyan > > <eagle.alexander923@gmail.com> > > Cc: JaimeLiao <jaimeliao.tw@gmail.com>; linux-mtd@lists.infradead.org > > Subject: RE: [EXT] Re: Boot failed after patch "mtd: rawnand: Support for sequential > > cache reads" > > > > Hi Miquel > > > > I only can say, we have not experienced this kind issue yet. My suggestion as yours, > > please firstly Let Alex add debug print in mtd, especially in NAND driver. To check > > what exactly error causes this SquashFS complain, is it IO error, or CRC, or ECC > > error, or NAND device no response and timeout? > > > > Kind regards, > > Bean > > > > > -----Original Message----- > > > From: Miquel Raynal <miquel.raynal@bootlin.com> > > > Sent: Monday, June 5, 2023 10:40 AM > > > To: Alexander Shiyan <eagle.alexander923@gmail.com> > > > Cc: Bean Huo <beanhuo@micron.com>; JaimeLiao <jaimeliao.tw@gmail.com>; > > > linux-mtd@lists.infradead.org > > > Subject: Re: [EXT] Re: Boot failed after patch "mtd: rawnand: Support > > > for sequential cache reads" > > > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless > > > you recognize the sender and were expecting this message. > > > > > > > > > Hi Alexander, > > > > > > eagle.alexander923@gmail.com wrote on Fri, 2 Jun 2023 12:52:08 +0300: > > > > > > > Hello. > > > > > > > > > As you mentioned no ECC error. And SquashFS complains: Unable to > > > > > read data > > > cache. > > > > > We want to see I/Ox, RE#, WE# and R/B#, to check if command input > > > > > and data > > > output properly. > > > > > It is better to capture the command 31h, and its following data. > > > > > > > > I only have a two-channel oscilloscope :) In any case, in order to > > > > capture the right moment, I need to somehow loop command 0x31 when > > > > an error occurs in order to take a picture. Right? > > > > > > > > Please, tell me where I can patch the nand_base to get such an > > > > infinity loop on error? > > > > > > Anywhere in the core you could just make your own exec_op sequence and > > > call it in a loop I guess? > > > > > > Bean, what are you trying to picture precisely? Have you ever had any > > > issues with these commands? Can Alexander try to add a surgical delay somewhere? > > > > > > Thanks, > > > Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-07-03 9:16 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-25 7:48 Boot failed after patch "mtd: rawnand: Support for sequential cache reads" Alexander Shiyan 2023-05-26 18:14 ` Miquel Raynal 2023-05-29 6:10 ` Alexander Shiyan 2023-05-29 8:12 ` Miquel Raynal 2023-05-29 10:33 ` Alexander Shiyan 2023-05-29 11:46 ` Miquel Raynal 2023-05-30 14:48 ` [EXT] " Bean Huo 2023-05-31 9:02 ` Alexander Shiyan 2023-06-01 7:37 ` Miquel Raynal 2023-06-01 16:55 ` Bean Huo 2023-06-02 9:52 ` Alexander Shiyan 2023-06-05 8:39 ` Miquel Raynal 2023-06-05 15:45 ` Bean Huo 2023-06-26 9:22 ` Bean Huo 2023-07-03 9:16 ` Alexander Shiyan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).