From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from co202.xi-lite.net ([149.6.83.202]) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QPH9U-0004D4-Fz for linux-mtd@lists.infradead.org; Wed, 25 May 2011 16:41:45 +0000 Date: Wed, 25 May 2011 18:41:07 +0200 From: Ivan Djelic To: Brian Norris Subject: Re: dangerous NAND_BBT_SCANBYTE1AND6 Message-ID: <20110525164107.GA16801@parrot.com> References: <4DB052DB.7040308@parrot.com> <4DB06A6B.2080806@gmail.com> <4DB14439.1050507@parrot.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Cc: "linux-mtd@lists.infradead.org" , Ricard Wanderlof , Matthieu Castet , Artem Bityutskiy List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, May 24, 2011 at 02:09:10AM +0100, Brian Norris wrote: > >>> So I bet > >>> your device is actually an x8 device and so the 1st/6th byte pattern is > >>> correct. I think the fact that this conflicts with your ECC patterns is > >>> something you must deal with. > >> > >> I don't agree, that's a big mtd regression. If you update your kernel on such > >> flash, you brick it. > > > > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason. > > Right, I see how this could be a problem. So for a resolution, I'd ask > for suggestions on which of the following seems best: > 1) Completely revert the SCANBYTE1AND6 change > 2) Remove the option from nand_get_flash_type(), still allowing > drivers to enable the scan option themselves > 3) Have nand_get_flash_type() use ECC layout information to decide to > scan bytes 1+6 or just byte 1 only > > Regarding correctness: > As far as I can tell, no one has found a definitive answer on the > manufacturer intention, right? I'm now leaning toward the intention > that software only needs to scan *either* byte 1 *or* byte 6, but I > don't know for sure. Hello Brian, Here is a relevant excerpt from a 2004 STM application note (AN1819): RECOGNIZING BAD BLOCKS The devices are supplied with all the locations inside valid blocks erased (FFh). The Bad Block Information is written prior to shipping. For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the 6th Byte/ 1st Word in the spare area of the 1st page does not contain FFh is a Bad Block. For 2112 Byte/1056 Word Page devices, any block, where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. If we check only the 1st byte, we just need to make sure that there is no possibility of having a good erased block with: - 1st byte == bad block marker (usually 0x00) and - 6th byte == 0xff I believe this is unlikely; or rather, it _was_ totally unlikely in 2004 when the application note was written. Therefore, I think we can safely use only the 1st marker byte to detect factory bad blocks in that case (STM large page); the manufacturer simply guarantees that both markers are written when a factory bad block is marked. It does not require you to check both bytes. The above note is probably not applicable to recent devices. Because bitflips are much more likely to appear, saying that a specific byte marks a bad block if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and states that a marker is 0x00, not just a byte that "does not contain FFh". And recent Micron devices do not store markers in flash; they just return 0x00 for any byte read in a bad block (instead of the real data), using an internal bad block table. I suggest we revert the SCANBYTE1AND6 change, because: - it breaks existing ecc layouts - factory bad blocks in relevant STM nands can be detected without checking the 6th byte Best Regards, Ivan