All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: "Stefan Riedmüller" <S.Riedmueller@phytec.de>
Cc: "festevam@gmail.com" <festevam@gmail.com>,
	"guillaume.tucker@collabora.com" <guillaume.tucker@collabora.com>,
	"kernel@pengutronix.de" <kernel@pengutronix.de>,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: imx27: No space left to write bad block table
Date: Tue, 4 May 2021 10:34:53 +0200	[thread overview]
Message-ID: <20210504103453.15786c69@xps13> (raw)
In-Reply-To: <50c7f5d96dd4faaabfcf1e8cbf9248a7646d4f9a.camel@phytec.de>

Hi Stefan,

Stefan Riedmüller <S.Riedmueller@phytec.de> wrote on Mon, 26 Apr 2021
15:53:39 +0000:

> Hi Miquel,
> 
> On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> > Hi Stefan,
> >   
> > > > Interesting. Maybe I overlooked the below commit when applying. Indeed,
> > > > BBT may be considered as bad blocks, so I wonder if the below change is
> > > > valid now...
> > > > 
> > > > Guillaume, would you have a way to revert this patch on top of
> > > > linux-next? Stefan, would you mind giving more details on the testing
> > > > procedure?    
> > > 
> > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> > > returning -EIO in nand_erase_nand when the block to be erased is one of
> > > the
> > > first two BBT blocks.
> > > 
> > > I have seen this once on a customer board but were not able to reproduce
> > > it
> > > anymore, thus the simulation of the two bad blocks.
> > > 
> > > Without the patch below new versions of the BBT can no longer be written
> > > to
> > > the first two blocks reserved for the BBT but they are still evaluated to
> > > read
> > > the BBT from during boot due the lack of a test if these blocks are bad.
> > > So
> > > changes to the BBT after these two blocks turn bad are only kept and used
> > > until the next reboot where again the old version of the two worn blocks
> > > is
> > > used as a basis.
> > > 
> > > I tried to use the same mechanism that is used to identify bad blocks
> > > during a
> > > scan for bad blocks. But maybe I missed something there? Or were my
> > > assumptions wrong in the first place?  
> > 
> > Honestly I don't know what is wrong exactly in this patch.
> > 
> > We will revert the commit as it clearly breaks something fundamental
> > and the merge window is too close to adopt a hackish attitude.
> > 
> > I would propose the following tests with your board:
> > - Hack the core to allow yourself to access bad blocks from userspace
> >   for testing purposes.
> > - With the below commit, you should have the same behavior than
> >   reported by Fabio.  
> 
> On my imx6 board the patch does not lead to the behavior reported by Fabio.
> The BBT is found and can be read:
> 
> [    1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3
> [    1.526944] nand: Macronix MX60LF8G18AC
> [    1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> [    1.539412] Bad block table found at page 524224, version 0x01
> [    1.545790] Bad block table found at page 524160, version 0x01
> [    1.551796] nand_read_bbt: bad block at 0x000001b60000
> [    1.557032] nand_read_bbt: bad block at 0x000008cc0000
> [    1.562204] nand_read_bbt: bad block at 0x00000f480000
> [    1.567395] nand_read_bbt: bad block at 0x0000111c0000
> [    1.572588] nand_read_bbt: bad block at 0x0000205c0000
> [    1.577802] nand_read_bbt: bad block at 0x00002dfc0000
> 
> I dug a little deeper and I think I found the cause for the failure on the
> imx27 board.
> 
> The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with an
> offset of 0 in the OOB area. This is the same place the bad block marker is
> located on worn or factory bad blocks.
> 
> This explains why the BBT is no longer found with my patch. scan_block_fast
> checks if there is anything else than 0xff in the bad block marker and finds
> the 'B' from 'Bbt0'. The same occurs for the mirrored version where it finds
> the '1' from '1tbB'. 

Ok, that's the reason why the original logic failed, thanks for looking
for it.

> This also explains why the original BBT is detected as bad blocks in the scan
> after the BBT was not found, which results in the BBT being written to the
> remaining two blocks reserved for the BBT.
> 
> 19:38:23.001385  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> 19:38:23.002635  nand: ST Micro NAND01GR3B2CZA6
> 19:38:23.006666  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> 19:38:23.028413  Bad block table not found for chip 0
> 19:38:23.035625  random: fast init done
> 19:38:23.049144  Bad block table not found for chip 0
> 19:38:23.050024  Scanning device for bad blocks
> 19:38:23.330999  Bad eraseblock 329 at 0x000002920000
> 19:38:23.345958  Bad eraseblock 330 at 0x000002940000
> 19:38:23.356024  Bad eraseblock 331 at 0x000002960000
> 19:38:23.365738  Bad eraseblock 332 at 0x000002980000
> 19:38:23.375590  Bad eraseblock 333 at 0x0000029a0000
> 19:38:23.385505  Bad eraseblock 334 at 0x0000029c0000
> 19:38:23.395548  Bad eraseblock 335 at 0x0000029e0000
> 19:38:23.405501  Bad eraseblock 336 at 0x000002a00000
> 19:38:23.415551  Bad eraseblock 337 at 0x000002a20000
> 19:38:23.425937  Bad eraseblock 338 at 0x000002a40000
> 19:38:23.436028  Bad eraseblock 339 at 0x000002a60000
> 19:38:23.445959  Bad eraseblock 340 at 0x000002a80000
> 19:38:23.456008  Bad eraseblock 341 at 0x000002aa0000
> 19:38:23.466006  Bad eraseblock 342 at 0x000002ac0000
> 19:38:23.475912  Bad eraseblock 343 at 0x000002ae0000
> 19:38:23.486064  Bad eraseblock 344 at 0x000002b00000
> 19:38:23.495925  Bad eraseblock 345 at 0x000002b20000
> 19:38:24.048053  Bad eraseblock 1022 at 0x000007fc0000
> 19:38:24.056117  Bad eraseblock 1023 at 0x000007fe0000
> 19:38:24.067953  Bad block table written to 0x000007fa0000, version 0x01
> 19:38:24.087637  Bad block table written to 0x000007f80000, version 0x01
> 
> 
> On the next boot all four BBT version in flash are skipped for the same reason
> as before and the two blocks containing the latest BBT are also detected as
> bad blocks. The result is no more remaining blocks to write the BBT to.
> 
> 
> 21:22:55.032595  nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> 21:22:55.033333  nand: ST Micro NAND01GR3B2CZA6
> 21:22:55.037804  nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> 21:22:55.088475  Bad block table not found for chip 0
> 21:22:55.093807  Bad block table not found for chip 0
> 21:22:55.105995  Scanning device for bad blocks
> 21:22:55.109049  random: fast init done
> 21:22:55.395488  Bad eraseblock 329 at 0x000002920000
> 21:22:55.406832  Bad eraseblock 330 at 0x000002940000
> 21:22:55.416885  Bad eraseblock 331 at 0x000002960000
> 21:22:55.426736  Bad eraseblock 332 at 0x000002980000
> 21:22:55.436732  Bad eraseblock 333 at 0x0000029a0000
> 21:22:55.446864  Bad eraseblock 334 at 0x0000029c0000
> 21:22:55.456662  Bad eraseblock 335 at 0x0000029e0000
> 21:22:55.466785  Bad eraseblock 336 at 0x000002a00000
> 21:22:55.476801  Bad eraseblock 337 at 0x000002a20000
> 21:22:55.486772  Bad eraseblock 338 at 0x000002a40000
> 21:22:55.496768  Bad eraseblock 339 at 0x000002a60000
> 21:22:55.506607  Bad eraseblock 340 at 0x000002a80000
> 21:22:55.516965  Bad eraseblock 341 at 0x000002aa0000
> 21:22:55.526621  Bad eraseblock 342 at 0x000002ac0000
> 21:22:55.536702  Bad eraseblock 343 at 0x000002ae0000
> 21:22:55.546660  Bad eraseblock 344 at 0x000002b00000
> 21:22:55.556745  Bad eraseblock 345 at 0x000002b20000
> 21:22:56.172928  Bad eraseblock 1020 at 0x000007f80000
> 21:22:56.187043  Bad eraseblock 1021 at 0x000007fa0000
> 21:22:56.197437  Bad eraseblock 1022 at 0x000007fc0000
> 21:22:56.212665  Bad eraseblock 1023 at 0x000007fe0000
> 21:22:56.213356  No space left to write bad block table
> 21:22:56.215012  nand_bbt: error while writing bad block table -28
> 21:22:56.239353  mxc_nand: probe of d8000000.nand-controller failed with error
> -28
> 
> I'm not sure of the best way to address this issue. A few ideas came into my
> mind:
> 
> - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the bad
> block marker. But I'm not sure if this would already conflict with the ECC
> hardware but the ooblayout functions would suggest that it could work. 

There are thousands of boards out there that would be broken with such
change: it's too late to do changes in this driver, unfortunately.

> Unfortunately I don't have any hardware at hand at the moment to test it. I
> think the distinction between small and large pagesizes needs to be reflected
> on the bbt_descr as well.
> 
> - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment saying
> there is an overlap between the generic bbt descriptors and the ECC hardware.
> I'm not sure what other effects it might have to set NAND_BBT_NO_OOB.

Same here: that's not an option.

> - Explicitly check for the bad block marker during a search for the BBT
> instead of using scan_block_fast

This look more reasonable. You can create a helper which does the
scan_block_fast(), then eventually checks the beginning of the OOB
buffer and tries to match with the ->td and ->md descriptors. This
should work with all the legacy drivers implementing their own
descriptors - hopefully.

Other drivers are impacted as well, so maybe you'll find a board for
testing (or someone gentle enough that will test it for you).

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2021-05-04  8:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-17 15:59 imx27: No space left to write bad block table Fabio Estevam
2021-04-19  6:37 ` Miquel Raynal
2021-04-19 11:47   ` Fabio Estevam
2021-04-19 12:27     ` Miquel Raynal
2021-04-19 12:41       ` Fabio Estevam
2021-04-19 12:48         ` Fabio Estevam
2021-04-19 13:01           ` Fabio Estevam
2021-04-19 13:40           ` Miquel Raynal
2021-04-19 13:56             ` Fabio Estevam
2021-04-19 13:04       ` Stefan Riedmüller
2021-04-19 15:36         ` Miquel Raynal
2021-04-20  6:26           ` Stefan Riedmüller
2021-04-21 20:44             ` Guillaume Tucker
2021-04-21 23:29               ` Fabio Estevam
2021-04-22 13:16                 ` Guillaume Tucker
2021-04-22 13:28                   ` Fabio Estevam
2021-04-23 21:04                     ` Fabio Estevam
2021-04-26 15:53           ` Stefan Riedmüller
2021-05-04  8:34             ` Miquel Raynal [this message]
2021-05-10  8:38               ` Stefan Riedmüller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210504103453.15786c69@xps13 \
    --to=miquel.raynal@bootlin.com \
    --cc=S.Riedmueller@phytec.de \
    --cc=festevam@gmail.com \
    --cc=guillaume.tucker@collabora.com \
    --cc=kernel@pengutronix.de \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.