* dangerous NAND_BBT_SCANBYTE1AND6 @ 2011-04-21 15:52 Matthieu CASTET 2011-04-21 17:10 ` Ivan Djelic 2011-04-21 17:33 ` Brian Norris 0 siblings, 2 replies; 17+ messages in thread From: Matthieu CASTET @ 2011-04-21 15:52 UTC (permalink / raw) To: linux-mtd@lists.infradead.org, Brian Norris Hi, I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. We have a ST flash where ecc where but on bit 5 and 6. With new kernel all block are bad. Is this option is really needed ? ST datasheet say [1]. We already check the first Word. Why do we need to check the 6th Byte ? Matthieu PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes. [1] The devices are supplied with all the locations inside valid blocks erased (FFh). The Bad Block Information is written prior to shipping. Any block, where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET @ 2011-04-21 17:10 ` Ivan Djelic 2011-04-22 4:50 ` Brian Norris 2011-04-22 8:23 ` Artem Bityutskiy 2011-04-21 17:33 ` Brian Norris 1 sibling, 2 replies; 17+ messages in thread From: Ivan Djelic @ 2011-04-21 17:10 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org, Brian Norris On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote: > Hi, > > I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. > We have a ST flash where ecc where but on bit 5 and 6. > With new kernel all block are bad. > > Is this option is really needed ? > ST datasheet say [1]. We already check the first Word. > Why do we need to check the 6th Byte ? I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me. Old small page nand devices used to have their bad block marker in 6th byte of the spare area of the first page. ST datasheet says that factory bad blocks will have _both_ bytes cleared (1st and 6th); I guess this was done to allow choosing which marker to check (but I may be wrong). Maybe to be compatible with large page marker location scheme (again, just guessing). Option NAND_BBT_SCANBYTE1AND6 code was introduced in commit 58373ff0afff4cc8ac40608872995f4d87eb72ec; but the commit message does not clearly explain why both markers should be checked. My understanding of bad block markers is (please correct me if I am wrong): small page => check 6th byte of spare area of first page large page, non-ONFI => check first word of spare area of first page ONFI => see ONFI spec Ivan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-21 17:10 ` Ivan Djelic @ 2011-04-22 4:50 ` Brian Norris 2011-04-22 8:23 ` Artem Bityutskiy 1 sibling, 0 replies; 17+ messages in thread From: Brian Norris @ 2011-04-22 4:50 UTC (permalink / raw) To: Ivan Djelic; +Cc: Brian Norris, linux-mtd@lists.infradead.org, Matthieu CASTET Hi Ivan, (FYI, please use my @gmail.com, not my @broadcom address.) I can't say I know everything about the intentions and history of all statements in various NAND flash data sheets, but I have read many of them and will try to explain my view. Of course, I may be wrong. On 4/21/2011 10:10 AM, Ivan Djelic wrote: > Old small page nand devices used to have their bad block marker in 6th byte of > the spare area of the first page. Correct > ST datasheet says that factory bad blocks will have _both_ bytes cleared > (1st and 6th); I guess this was done to allow choosing which marker to check > (but I may be wrong). Maybe to be compatible with large page marker location > scheme (again, just guessing). The actual statement is one of these two (pulled from various ST and Numonyx sheets): "Any block, where the 1st and 6th bytes or the 1st word in the spare area of the 1st page, does not contain FFh, is a bad block." "Any block, where the 1st and 6th bytes, in the spare area of the first page, does not contain FFh is a bad block." Strictly speaking, neither of these "sentences" uses correct grammar, as the commas are placed arbitrarily. Most importantly, though, I don't think they make clear the following: 1) Does the manufacturer guarantee that BOTH bytes are non-FFh? 2) Does the manufacturer guarantee that the combined bytes ("1st and 6th") contain a non-FFh byte? I understood it as the latter, and so decided the scan needed both bytes (perhaps one byte was written successfully but not the other). However, your argument for choice (1) ("this was done to allow choosing which marker to check") makes just as much sense (or more) to me. In trying to decide why I came to conclude choice (2) and not (1), I recall that some Hynix and Samsung parts explicitly declare that the first OR second page may be used, in case the first page is bad. I may have subconsciously applied this 1st/2nd page concept to the 1st/6th bytes logic. > My understanding of bad block markers is (please correct me if I am wrong): > small page => check 6th byte of spare area of first page > large page, non-ONFI => check first word of spare area of first page > ONFI => see ONFI spec Unfortunately, small page, large page, and ONFI are 3 classifications that oversimplify bad block markers. Some people (especially Samsung and Hynix, but even some Micron) got creative. Some of their chips use: 1st or 2nd page the last page the 1st or last page the last or (last - 2)th page And of course, there's the controversial 1st/6th byte usage - that I'm still not clear on. Some of these scanning patterns are rare, but they do exist. Sorry for any confusion, but I guess it's better late than never for this sort of discussion... Brian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-21 17:10 ` Ivan Djelic 2011-04-22 4:50 ` Brian Norris @ 2011-04-22 8:23 ` Artem Bityutskiy 2011-04-22 8:53 ` Matthieu CASTET 1 sibling, 1 reply; 17+ messages in thread From: Artem Bityutskiy @ 2011-04-22 8:23 UTC (permalink / raw) To: Ivan Djelic; +Cc: linux-mtd@lists.infradead.org, Brian Norris, Matthieu CASTET On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote: > On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote: > > Hi, > > > > I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. > > We have a ST flash where ecc where but on bit 5 and 6. > > With new kernel all block are bad. > > > > Is this option is really needed ? > > ST datasheet say [1]. We already check the first Word. > > Why do we need to check the 6th Byte ? > > I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me. This just means that we need a better way for drivers to inform the generic code about how exactly blocks are marked as bad. Probably drivers could describe this with a data structure, and sometimes even provide a "is_block_bad()" function. The options seem to be not enough. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-22 8:23 ` Artem Bityutskiy @ 2011-04-22 8:53 ` Matthieu CASTET 2011-04-22 9:28 ` Artem Bityutskiy 0 siblings, 1 reply; 17+ messages in thread From: Matthieu CASTET @ 2011-04-22 8:53 UTC (permalink / raw) To: dedekind1@gmail.com Cc: Ivan Djelic, linux-mtd@lists.infradead.org, Brian Norris Artem Bityutskiy a écrit : > On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote: >> On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote: >>> Hi, >>> >>> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. >>> We have a ST flash where ecc where but on bit 5 and 6. >>> With new kernel all block are bad. >>> >>> Is this option is really needed ? >>> ST datasheet say [1]. We already check the first Word. >>> Why do we need to check the 6th Byte ? >> I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me. > > This just means that we need a better way for drivers to inform the > generic code about how exactly blocks are marked as bad. Probably > drivers could describe this with a data structure, and sometimes even > provide a "is_block_bad()" function. > > The options seem to be not enough. > I think we should also unify bad block scanning. In the current code bad block scanning could be done by : - chip->block_bad (default nand_block_bad) - nand_isbad_bbt Why nand_isbad_bbt doesn't call chip->block_bad and implement its own scanning code (scan_block_full or scan_block_fast) ? This is bad because chip->block_bad can be modified by a driver, but nand_isbad_bbt won't use it. Also nand_block_bad and nand_isbad_bbt doesn't use the same scanning pattern. Matthieu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-22 8:53 ` Matthieu CASTET @ 2011-04-22 9:28 ` Artem Bityutskiy 0 siblings, 0 replies; 17+ messages in thread From: Artem Bityutskiy @ 2011-04-22 9:28 UTC (permalink / raw) To: Matthieu CASTET; +Cc: Ivan Djelic, linux-mtd@lists.infradead.org, Brian Norris On Fri, 2011-04-22 at 10:53 +0200, Matthieu CASTET wrote: > Artem Bityutskiy a écrit : > > On Thu, 2011-04-21 at 19:10 +0200, Ivan Djelic wrote: > >> On Thu, Apr 21, 2011 at 04:52:59PM +0100, Matthieu Castet wrote: > >>> Hi, > >>> > >>> I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. > >>> We have a ST flash where ecc where but on bit 5 and 6. > >>> With new kernel all block are bad. > >>> > >>> Is this option is really needed ? > >>> ST datasheet say [1]. We already check the first Word. > >>> Why do we need to check the 6th Byte ? > >> I agree with Matthieu, NAND_BBT_SCANBYTE1AND6 code also seems wrong to me. > > > > This just means that we need a better way for drivers to inform the > > generic code about how exactly blocks are marked as bad. Probably > > drivers could describe this with a data structure, and sometimes even > > provide a "is_block_bad()" function. > > > > The options seem to be not enough. > > > I think we should also unify bad block scanning. Sure, just do this in small incremental steps, send small incremental patches with nice description (and tested). The point is - you should not wait when someone else fixes this for you - i do not think this happens. Additional thing - if you are using MTD and interested in its stability - review others patches which touch the area of your interests :-) -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET 2011-04-21 17:10 ` Ivan Djelic @ 2011-04-21 17:33 ` Brian Norris 2011-04-22 9:02 ` Matthieu CASTET 1 sibling, 1 reply; 17+ messages in thread From: Brian Norris @ 2011-04-21 17:33 UTC (permalink / raw) To: Matthieu CASTET; +Cc: linux-mtd@lists.infradead.org, Brian Norris Hi On 4/21/2011 8:52 AM, Matthieu CASTET wrote: > I believe NAND_BBT_SCANBYTE1AND6 behavior is very dangerous. > We have a ST flash where ecc where but on bit 5 and 6. > With new kernel all block are bad. > > Is this option is really needed ? > ST datasheet say [1]. We already check the first Word. > Why do we need to check the 6th Byte ? > > > Matthieu > > PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes. > > > [1] > The devices are supplied with all the locations inside valid blocks erased > (FFh). The Bad > Block Information is written prior to shipping. Any block, where the 1st and 6th > Bytes, or 1st > Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. I've tried my best to verify that any modifications I have made to bad block scanning comply with the data sheets, but I very well could have made mistakes (especially since there are so many different types of scanning patterns, and very few manufacturers are actually being consistent with these things). That being said, I believe that the data sheet you quoted has some answer: "Any block, where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block." AFAICT, this description means that x8 buswidth devices must scan bytes 1 and 6 while x16 devices only need to scan the first word. So I bet your device is actually an x8 device and so the 1st/6th byte pattern is correct. I think the fact that this conflicts with your ECC patterns is something you must deal with. > PS : the code check 1st, 2nd, 6th, 7th Bytes. So it check too much bytes. I've seen this before. This may be incorrect. Are you sure it's not 1st, 2nd, 5th, 6th though? I believe the "2-byte scans" were chosen before to keep from having to differentiate between x8/x16 buses. Perhaps this should be changed. (volunteers?) While we're on the subject: do people use x16 buses on NAND anymore? Brian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-21 17:33 ` Brian Norris @ 2011-04-22 9:02 ` Matthieu CASTET 2011-04-26 7:30 ` Ricard Wanderlof 0 siblings, 1 reply; 17+ messages in thread From: Matthieu CASTET @ 2011-04-22 9:02 UTC (permalink / raw) To: Brian Norris; +Cc: linux-mtd@lists.infradead.org, Brian Norris Hi, Brian Norris a écrit : > Hi > > On 4/21/2011 8:52 AM, Matthieu CASTET wrote: >> >> [1] >> The devices are supplied with all the locations inside valid blocks erased >> (FFh). The Bad >> Block Information is written prior to shipping. Any block, where the 1st and 6th >> Bytes, or 1st >> Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. > > I've tried my best to verify that any modifications I have made to bad > block scanning comply with the data sheets, but I very well could have > made mistakes (especially since there are so many different types of > scanning patterns, and very few manufacturers are actually being > consistent with these things). Did you ask some clarification to manufacturers ? > > That being said, I believe that the data sheet you quoted has some answer: > "Any block, where the 1st and 6th Bytes, or 1st Word, in the spare area > of the 1st page, does not contain FFh, is a Bad Block." > AFAICT, this description means that x8 buswidth devices must scan bytes > 1 and 6 while x16 devices only need to scan the first word. Did you see real case where 6 was not 0xff but 1 was 0xff ? > So I bet > your device is actually an x8 device and so the 1st/6th byte pattern is > correct. I think the fact that this conflicts with your ECC patterns is > something you must deal with. I don't agree, that's a big mtd regression. If you update your kernel on such flash, you brick it. Matthieu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-22 9:02 ` Matthieu CASTET @ 2011-04-26 7:30 ` Ricard Wanderlof 2011-05-24 1:09 ` Brian Norris 0 siblings, 1 reply; 17+ messages in thread From: Ricard Wanderlof @ 2011-04-26 7:30 UTC (permalink / raw) To: Matthieu CASTET; +Cc: Brian Norris, linux-mtd@lists.infradead.org, Brian Norris On Fri, 22 Apr 2011, Matthieu CASTET wrote: >> So I bet >> your device is actually an x8 device and so the 1st/6th byte pattern is >> correct. I think the fact that this conflicts with your ECC patterns is >> something you must deal with. > I don't agree, that's a big mtd regression. If you update your kernel on such > flash, you brick it. I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-04-26 7:30 ` Ricard Wanderlof @ 2011-05-24 1:09 ` Brian Norris 2011-05-25 16:41 ` Ivan Djelic 0 siblings, 1 reply; 17+ messages in thread From: Brian Norris @ 2011-05-24 1:09 UTC (permalink / raw) To: Ricard Wanderlof Cc: Ivan Djelic, linux-mtd@lists.infradead.org, Matthieu CASTET, Artem Bityutskiy Hi, Sorry this thread has been sitting for a long time. I've been very busy and haven't had time for MTD stuff. >From Matthieu: "Did you ask some clarification to manufacturers ?" No, unfortunately, I did not. I didn't realize the "regression" issues at the time, so I didn't think to look further than my interpretation of the datasheets. On Tue, Apr 26, 2011 at 12:30 AM, Ricard Wanderlof <ricard.wanderlof@axis.com> wrote: > > On Fri, 22 Apr 2011, Matthieu CASTET wrote: > >>> So I bet >>> your device is actually an x8 device and so the 1st/6th byte pattern is >>> correct. I think the fact that this conflicts with your ECC patterns is >>> something you must deal with. >> >> I don't agree, that's a big mtd regression. If you update your kernel on such >> flash, you brick it. > > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason. Right, I see how this could be a problem. So for a resolution, I'd ask for suggestions on which of the following seems best: 1) Completely revert the SCANBYTE1AND6 change 2) Remove the option from nand_get_flash_type(), still allowing drivers to enable the scan option themselves 3) Have nand_get_flash_type() use ECC layout information to decide to scan bytes 1+6 or just byte 1 only Regarding correctness: As far as I can tell, no one has found a definitive answer on the manufacturer intention, right? I'm now leaning toward the intention that software only needs to scan *either* byte 1 *or* byte 6, but I don't know for sure. Thanks, Brian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-24 1:09 ` Brian Norris @ 2011-05-25 16:41 ` Ivan Djelic 2011-05-25 18:04 ` Atlant Schmidt 2011-05-26 7:07 ` Ricard Wanderlof 0 siblings, 2 replies; 17+ messages in thread From: Ivan Djelic @ 2011-05-25 16:41 UTC (permalink / raw) To: Brian Norris Cc: linux-mtd@lists.infradead.org, Ricard Wanderlof, Matthieu Castet, Artem Bityutskiy On Tue, May 24, 2011 at 02:09:10AM +0100, Brian Norris wrote: > >>> So I bet > >>> your device is actually an x8 device and so the 1st/6th byte pattern is > >>> correct. I think the fact that this conflicts with your ECC patterns is > >>> something you must deal with. > >> > >> I don't agree, that's a big mtd regression. If you update your kernel on such > >> flash, you brick it. > > > > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason. > > Right, I see how this could be a problem. So for a resolution, I'd ask > for suggestions on which of the following seems best: > 1) Completely revert the SCANBYTE1AND6 change > 2) Remove the option from nand_get_flash_type(), still allowing > drivers to enable the scan option themselves > 3) Have nand_get_flash_type() use ECC layout information to decide to > scan bytes 1+6 or just byte 1 only > > Regarding correctness: > As far as I can tell, no one has found a definitive answer on the > manufacturer intention, right? I'm now leaning toward the intention > that software only needs to scan *either* byte 1 *or* byte 6, but I > don't know for sure. Hello Brian, Here is a relevant excerpt from a 2004 STM application note (AN1819): RECOGNIZING BAD BLOCKS The devices are supplied with all the locations inside valid blocks erased (FFh). The Bad Block Information is written prior to shipping. For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the 6th Byte/ 1st Word in the spare area of the 1st page does not contain FFh is a Bad Block. For 2112 Byte/1056 Word Page devices, any block, where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. If we check only the 1st byte, we just need to make sure that there is no possibility of having a good erased block with: - 1st byte == bad block marker (usually 0x00) and - 6th byte == 0xff I believe this is unlikely; or rather, it _was_ totally unlikely in 2004 when the application note was written. Therefore, I think we can safely use only the 1st marker byte to detect factory bad blocks in that case (STM large page); the manufacturer simply guarantees that both markers are written when a factory bad block is marked. It does not require you to check both bytes. <digression> The above note is probably not applicable to recent devices. Because bitflips are much more likely to appear, saying that a specific byte marks a bad block if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and states that a marker is 0x00, not just a byte that "does not contain FFh". And recent Micron devices do not store markers in flash; they just return 0x00 for any byte read in a bad block (instead of the real data), using an internal bad block table. </digression> I suggest we revert the SCANBYTE1AND6 change, because: - it breaks existing ecc layouts - factory bad blocks in relevant STM nands can be detected without checking the 6th byte Best Regards, Ivan ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-25 16:41 ` Ivan Djelic @ 2011-05-25 18:04 ` Atlant Schmidt 2011-05-25 18:31 ` Ivan Djelic 2011-05-26 7:07 ` Ricard Wanderlof 1 sibling, 1 reply; 17+ messages in thread From: Atlant Schmidt @ 2011-05-25 18:04 UTC (permalink / raw) To: 'Ivan Djelic', Brian Norris Cc: Ricard Wanderlof, linux-mtd@lists.infradead.org, Matthieu Castet, Artem Bityutskiy Ivan: > <digression> ... > And recent Micron devices do not store markers in flash; they just > return 0x00 for any byte read in a bad block (instead of the real > data), using an internal bad block table. > </digression> Does this mean that it is impossible to mark additional bad blocks in these devices as blocks go hard-bad during use? Or do commands exist to extend the internal bad block table? (And do our MTD drivers know how to do that?) Atlant -----Original Message----- From: linux-mtd-bounces@lists.infradead.org [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Ivan Djelic Sent: Wednesday, May 25, 2011 12:41 To: Brian Norris Cc: linux-mtd@lists.infradead.org; Ricard Wanderlof; Matthieu Castet; Artem Bityutskiy Subject: Re: dangerous NAND_BBT_SCANBYTE1AND6 On Tue, May 24, 2011 at 02:09:10AM +0100, Brian Norris wrote: > >>> So I bet > >>> your device is actually an x8 device and so the 1st/6th byte pattern is > >>> correct. I think the fact that this conflicts with your ECC patterns is > >>> something you must deal with. > >> > >> I don't agree, that's a big mtd regression. If you update your kernel on such > >> flash, you brick it. > > > > I agree, even if the behavior may have been incorrect in the past, we should think very carefully about changing this for exactly this reason. > > Right, I see how this could be a problem. So for a resolution, I'd ask > for suggestions on which of the following seems best: > 1) Completely revert the SCANBYTE1AND6 change > 2) Remove the option from nand_get_flash_type(), still allowing > drivers to enable the scan option themselves > 3) Have nand_get_flash_type() use ECC layout information to decide to > scan bytes 1+6 or just byte 1 only > > Regarding correctness: > As far as I can tell, no one has found a definitive answer on the > manufacturer intention, right? I'm now leaning toward the intention > that software only needs to scan *either* byte 1 *or* byte 6, but I > don't know for sure. Hello Brian, Here is a relevant excerpt from a 2004 STM application note (AN1819): RECOGNIZING BAD BLOCKS The devices are supplied with all the locations inside valid blocks erased (FFh). The Bad Block Information is written prior to shipping. For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the 6th Byte/ 1st Word in the spare area of the 1st page does not contain FFh is a Bad Block. For 2112 Byte/1056 Word Page devices, any block, where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st page, does not contain FFh, is a Bad Block. If we check only the 1st byte, we just need to make sure that there is no possibility of having a good erased block with: - 1st byte == bad block marker (usually 0x00) and - 6th byte == 0xff I believe this is unlikely; or rather, it _was_ totally unlikely in 2004 when the application note was written. Therefore, I think we can safely use only the 1st marker byte to detect factory bad blocks in that case (STM large page); the manufacturer simply guarantees that both markers are written when a factory bad block is marked. It does not require you to check both bytes. <digression> The above note is probably not applicable to recent devices. Because bitflips are much more likely to appear, saying that a specific byte marks a bad block if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and states that a marker is 0x00, not just a byte that "does not contain FFh". And recent Micron devices do not store markers in flash; they just return 0x00 for any byte read in a bad block (instead of the real data), using an internal bad block table. </digression> I suggest we revert the SCANBYTE1AND6 change, because: - it breaks existing ecc layouts - factory bad blocks in relevant STM nands can be detected without checking the 6th byte Best Regards, Ivan ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message. Thank you. Please consider the environment before printing this email. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-25 18:04 ` Atlant Schmidt @ 2011-05-25 18:31 ` Ivan Djelic 2011-05-26 7:09 ` Ricard Wanderlof 0 siblings, 1 reply; 17+ messages in thread From: Ivan Djelic @ 2011-05-25 18:31 UTC (permalink / raw) To: Atlant Schmidt Cc: Ricard Wanderlof, Brian Norris, linux-mtd@lists.infradead.org, Matthieu Castet, Artem Bityutskiy On Wed, May 25, 2011 at 07:04:40PM +0100, Atlant Schmidt wrote: > Ivan: > > > <digression> ... > > And recent Micron devices do not store markers in flash; they just > > return 0x00 for any byte read in a bad block (instead of the real > > data), using an internal bad block table. > > </digression> > > Does this mean that it is impossible to mark additional > bad blocks in these devices as blocks go hard-bad during > use? Or do commands exist to extend the internal bad > block table? (And do our MTD drivers know how to do that?) > Note that the usual bad block detection still works on those Micron devices. They just do not store markers in flash. You can still mark a block gone bad either by writing your own marker into the block or (better) in a separate BBT. The internal Micron table is hard-wired and only used to shortcut access to factory bad blocks AFAIK. Regards, Ivan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-25 18:31 ` Ivan Djelic @ 2011-05-26 7:09 ` Ricard Wanderlof 2011-05-26 7:58 ` Ivan Djelic 0 siblings, 1 reply; 17+ messages in thread From: Ricard Wanderlof @ 2011-05-26 7:09 UTC (permalink / raw) To: Ivan Djelic Cc: Artem Bityutskiy, Matthieu Castet, Ricard Wanderlöf, linux-mtd@lists.infradead.org, Atlant Schmidt, Brian Norris On Wed, 25 May 2011, Ivan Djelic wrote: >>> <digression> ... >>> And recent Micron devices do not store markers in flash; they just >>> return 0x00 for any byte read in a bad block (instead of the real >>> data), using an internal bad block table. >>> </digression> >> ... > Note that the usual bad block detection still works on those Micron devices. > They just do not store markers in flash. > > You can still mark a block gone bad either by writing your own marker into the > block or (better) in a separate BBT. The internal Micron table is hard-wired > and only used to shortcut access to factory bad blocks AFAIK. Does this also mean that if you for some reason screw up and mark lots of (good) blocks as bad, you can just erase all blocks in the flash; the factory-bad ones will refuse to be erased thanks to the on-chip bbt? /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-26 7:09 ` Ricard Wanderlof @ 2011-05-26 7:58 ` Ivan Djelic 0 siblings, 0 replies; 17+ messages in thread From: Ivan Djelic @ 2011-05-26 7:58 UTC (permalink / raw) To: Ricard Wanderlof Cc: Artem Bityutskiy, Matthieu Castet, Ricard Wanderlöf, linux-mtd@lists.infradead.org, Atlant Schmidt, Brian Norris On Thu, May 26, 2011 at 08:09:15AM +0100, Ricard Wanderlof wrote: > > On Wed, 25 May 2011, Ivan Djelic wrote: > > >>> <digression> ... > >>> And recent Micron devices do not store markers in flash; they just > >>> return 0x00 for any byte read in a bad block (instead of the real > >>> data), using an internal bad block table. > >>> </digression> > >> ... > > Note that the usual bad block detection still works on those Micron devices. > > They just do not store markers in flash. > > > > You can still mark a block gone bad either by writing your own marker into the > > block or (better) in a separate BBT. The internal Micron table is hard-wired > > and only used to shortcut access to factory bad blocks AFAIK. > > Does this also mean that if you for some reason screw up and mark lots of > (good) blocks as bad, you can just erase all blocks in the flash; the > factory-bad ones will refuse to be erased thanks to the on-chip bbt? Exactly. Ivan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-25 16:41 ` Ivan Djelic 2011-05-25 18:04 ` Atlant Schmidt @ 2011-05-26 7:07 ` Ricard Wanderlof 2011-05-26 7:57 ` Ivan Djelic 1 sibling, 1 reply; 17+ messages in thread From: Ricard Wanderlof @ 2011-05-26 7:07 UTC (permalink / raw) To: Ivan Djelic Cc: Ricard Wanderlöf, Brian Norris, linux-mtd@lists.infradead.org, Matthieu Castet, Artem Bityutskiy On Wed, 25 May 2011, Ivan Djelic wrote: > Here is a relevant excerpt from a 2004 STM application note (AN1819): > > RECOGNIZING BAD BLOCKS > The devices are supplied with all the locations inside valid blocks > erased (FFh). The Bad Block Information is written prior to shipping. > For 528 Byte/256 Word Page (NANDxxx-A) devices, any block where the > 6th Byte/ 1st Word in the spare area of the 1st page does not contain > FFh is a Bad Block. For 2112 Byte/1056 Word Page devices, any block, > where the 1st and 6th Bytes, or 1st Word, in the spare area of the 1st > page, does not contain FFh, is a Bad Block. > ... > <digression> > The above note is probably not applicable to recent devices. Because bitflips > are much more likely to appear, saying that a specific byte marks a bad block > if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and > states that a marker is 0x00, not just a byte that "does not contain FFh". > And recent Micron devices do not store markers in flash; they just return 0x00 > for any byte read in a bad block (instead of the real data), using an internal > bad block table. > </digression> I'm probably wrong here, but I just want to throw this thought into the pot: I always thought that the reason for the 'not contain FFh' phrasing was that there could be something physically wrong in a bad block so that some bits could not be programmed to 0. Saying 'not contain FFh' would be a way of saying 'we try to set all bytes to 0 but if for some reason some bits are stuck at 1 still treat a non-FFh word as a bad block marker'. This of course does not harmonize with ONFI 2.2. It's just me trying to read between the lines of the specs. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: dangerous NAND_BBT_SCANBYTE1AND6 2011-05-26 7:07 ` Ricard Wanderlof @ 2011-05-26 7:57 ` Ivan Djelic 0 siblings, 0 replies; 17+ messages in thread From: Ivan Djelic @ 2011-05-26 7:57 UTC (permalink / raw) To: Ricard Wanderlof Cc: Ricard Wanderlöf, Brian Norris, linux-mtd@lists.infradead.org, Matthieu Castet, Artem Bityutskiy On Thu, May 26, 2011 at 08:07:36AM +0100, Ricard Wanderlof wrote: > > ... > > <digression> > > The above note is probably not applicable to recent devices. Because bitflips > > are much more likely to appear, saying that a specific byte marks a bad block > > if it "does not contain FFh" it not realistic. ONFI 2.2 clarifies the issue and > > states that a marker is 0x00, not just a byte that "does not contain FFh". > > And recent Micron devices do not store markers in flash; they just return 0x00 > > for any byte read in a bad block (instead of the real data), using an internal > > bad block table. > > </digression> > > I'm probably wrong here, but I just want to throw this thought into the > pot: I always thought that the reason for the 'not contain FFh' phrasing > was that there could be something physically wrong in a bad block so that > some bits could not be programmed to 0. Saying 'not contain FFh' would be > a way of saying 'we try to set all bytes to 0 but if for some reason some > bits are stuck at 1 still treat a non-FFh word as a bad block marker'. I agree. The safest way to check a bb marker is probably to count the number of set bits, and compare it to a threshold; instead of just comparing it to 0xff or 0x00. But even if you stick to a simple comparison with 0x00 or 0xff on old nand devices, I guess the probability that a bit is stuck in the marker is very low and would maybe result in a few spurious bad blocks in a large set of devices. Ivan ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-05-26 7:59 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-21 15:52 dangerous NAND_BBT_SCANBYTE1AND6 Matthieu CASTET 2011-04-21 17:10 ` Ivan Djelic 2011-04-22 4:50 ` Brian Norris 2011-04-22 8:23 ` Artem Bityutskiy 2011-04-22 8:53 ` Matthieu CASTET 2011-04-22 9:28 ` Artem Bityutskiy 2011-04-21 17:33 ` Brian Norris 2011-04-22 9:02 ` Matthieu CASTET 2011-04-26 7:30 ` Ricard Wanderlof 2011-05-24 1:09 ` Brian Norris 2011-05-25 16:41 ` Ivan Djelic 2011-05-25 18:04 ` Atlant Schmidt 2011-05-25 18:31 ` Ivan Djelic 2011-05-26 7:09 ` Ricard Wanderlof 2011-05-26 7:58 ` Ivan Djelic 2011-05-26 7:07 ` Ricard Wanderlof 2011-05-26 7:57 ` Ivan Djelic
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).