* U-Boot <-> Kernel; NAND operation proposal @ 2013-12-18 11:12 Leon Pollak 2013-12-18 11:54 ` Ricard Wanderlof 0 siblings, 1 reply; 6+ messages in thread From: Leon Pollak @ 2013-12-18 11:12 UTC (permalink / raw) To: linux-mtd Hello, all. I beg your pardon ahead for possible stupidity and inconsistency of what I am going to say - this may be simply because of the lack of experience. Below is my story and proposal as the result. During the last 2 years, my product, which is based on DM368 (ARM7 based TI CPU) and Micron's NAND flashes (256MiB, 2K page) behaves unstably. This means that some units from time to time refuse to boot for different reasons. Today, after so long time and so many corrections, I can say that most of the problems (not all!), which lead to the unit unable to start to the end (to the application) where because of the incompatible modes of NAND operating between u-boot and kernel. For example, in the configuration I started from, which was supplied by some vendor as evaluation board, u-boot was configured to use 4-bit HW ECC, while kernel used 1-bit SW ECC. The OOB layouts used in both systems were different. Also BBT were configured differently. There were several other "small things", which combination was inconsistent and produced the incorrect NAND functioning, which finally in some cases made the unit inoperative. -- The major issue here is that such inconsistencies are not manifested in some way, until the unit suddenly refuse to boot up after 2 weeks or 2 years. All this lead me to the following thought (very draftly): Each NAND has the "spare free" area in the first (zero) block, which is used for storing CIS information. This information does not occupy all the block, which usually is several hundreds of kilobytes. So, this "spare" place may be used for storing some descriptive information of ALL possible NAND flash and its service parameters. I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, patter places, bad block marks, and everything else you can imagine. Further, this information must be used both by u-boot and kernel. Or even by other components, for example, RBL/UBL in DM36x from TI. Thanks to all who read this. Best Regards -- Leon Pollak ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: U-Boot <-> Kernel; NAND operation proposal 2013-12-18 11:12 U-Boot <-> Kernel; NAND operation proposal Leon Pollak @ 2013-12-18 11:54 ` Ricard Wanderlof 2013-12-18 12:11 ` Leon Pollak 0 siblings, 1 reply; 6+ messages in thread From: Ricard Wanderlof @ 2013-12-18 11:54 UTC (permalink / raw) To: Leon Pollak; +Cc: linux-mtd@lists.infradead.org On Wed, 18 Dec 2013, Leon Pollak wrote: > I beg your pardon ahead for possible stupidity and inconsistency of what > I am going to say - this may be simply because of the lack of > experience. Below is my story and proposal as the result. > > During the last 2 years, my product, which is based on DM368 (ARM7 based > TI CPU) and Micron's NAND flashes (256MiB, 2K page) behaves unstably. > This means that some units from time to time refuse to boot for > different reasons. > > Today, after so long time and so many corrections, I can say that most > of the problems (not all!), which lead to the unit unable to start to > the end (to the application) where because of the incompatible modes of > NAND operating between u-boot and kernel. > > For example, in the configuration I started from, which was supplied by > some vendor as evaluation board, u-boot was configured to use 4-bit HW > ECC, while kernel used 1-bit SW ECC. > > The OOB layouts used in both systems were different. > Also BBT were configured differently. > > There were several other "small things", which combination was > inconsistent and produced the incorrect NAND functioning, which finally > in some cases made the unit inoperative. It would seem to me that if parameters such as ECC strength and BBT were configured differently between the boot loader and kernel, you would get a system which wouldn't boot even the first time, not work for a while and then fail. > The major issue here is that such inconsistencies are not manifested in > some way, until the unit suddenly refuse to boot up after 2 weeks or 2 > years. > > All this lead me to the following thought (very draftly): > > Each NAND has the "spare free" area in the first (zero) block, which is > used for storing CIS information. This information does not occupy all > the block, which usually is several hundreds of kilobytes. > So, this "spare" place may be used for storing some descriptive > information of ALL possible NAND flash and its service parameters. > I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, patter > places, bad block marks, and everything else you can imagine. > > Further, this information must be used both by u-boot and kernel. Or > even by other components, for example, RBL/UBL in DM36x from TI. I'm not sure I follow you. First of all, what is CIS ? Secondly, the first block in a NAND flash is no different from the other blocks when it comes to the data it can hold. True, in systems where NAND flash is the boot media, the boot loader out of necessity resides in the first block, but a boot loader could fill out the whole block leaving no free space there. /Ricard -- Ricard Wolf Wanderlöf ricardw(at)axis.com Axis Communications AB, Lund, Sweden www.axis.com Phone +46 46 272 2016 Fax +46 46 13 61 30 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: U-Boot <-> Kernel; NAND operation proposal 2013-12-18 11:54 ` Ricard Wanderlof @ 2013-12-18 12:11 ` Leon Pollak 2013-12-18 16:43 ` Peter Barada 2013-12-19 19:59 ` Gupta, Pekon 0 siblings, 2 replies; 6+ messages in thread From: Leon Pollak @ 2013-12-18 12:11 UTC (permalink / raw) To: Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org On Wednesday 18 December 2013 12:54:45 Ricard Wanderlof wrote: > On Wed, 18 Dec 2013, Leon Pollak wrote: > > I beg your pardon ahead for possible stupidity and inconsistency of > > what I am going to say - this may be simply because of the lack of > > experience. Below is my story and proposal as the result. > > > > During the last 2 years, my product, which is based on DM368 (ARM7 > > based TI CPU) and Micron's NAND flashes (256MiB, 2K page) behaves > > unstably. This means that some units from time to time refuse to > > boot for different reasons. > > > > Today, after so long time and so many corrections, I can say that > > most of the problems (not all!), which lead to the unit unable to > > start to the end (to the application) where because of the > > incompatible modes of NAND operating between u-boot and kernel. > > > > For example, in the configuration I started from, which was supplied > > by some vendor as evaluation board, u-boot was configured to use > > 4-bit HW ECC, while kernel used 1-bit SW ECC. > > > > The OOB layouts used in both systems were different. > > Also BBT were configured differently. > > > > There were several other "small things", which combination was > > inconsistent and produced the incorrect NAND functioning, which > > finally in some cases made the unit inoperative. > > It would seem to me that if parameters such as ECC strength and BBT > were configured differently between the boot loader and kernel, you > would get a system which wouldn't boot even the first time, not work > for a while and then fail. It worked...:-( And confused everybody. Fro example - ROM boot loader used HW 4bit ECC to burn and bring up U- Boot, but U-Boot itself used 1-Bit SW ECC to burn YAFFS. Everything worked till there was a second error in YAFFS partition. OOB layout was also different. BBT was not used at all. There were more issues...:( > > The major issue here is that such inconsistencies are not manifested > > in some way, until the unit suddenly refuse to boot up after 2 > > weeks or 2 years. > > > > All this lead me to the following thought (very draftly): > > > > Each NAND has the "spare free" area in the first (zero) block, which > > is used for storing CIS information. This information does not > > occupy all the block, which usually is several hundreds of > > kilobytes. > > So, this "spare" place may be used for storing some descriptive > > information of ALL possible NAND flash and its service parameters. > > I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, patter > > places, bad block marks, and everything else you can imagine. > > > > Further, this information must be used both by u-boot and kernel. Or > > even by other components, for example, RBL/UBL in DM36x from TI. > > I'm not sure I follow you. First of all, what is CIS ? CIS stands for Card Information Structure. > Secondly, the > first block in a NAND flash is no different from the other blocks > when it comes to the data it can hold. Well, I am not a big guru in this. But I saw that all of the vendors I worked with declare the first block to be more robust and require only 1-bit ECC. For example, our Micron chip promises block 0 to work with 1-bit ECC, while all the rest require 4-bit. > True, in systems where NAND > flash is the boot media, the boot loader out of necessity resides in > the first block, but a boot loader could fill out the whole block > leaving no free space there. Hmmm... You probably have much more experience then me. But in my case (DM36x CPU from TI) the CPU ROM boot loader reads block #1(!!! - not 0) to look for User Boot Loader (UBL) which normally has 14-16 KiB size. But I am speaking about the block zero, which contains the CIS and some left space. Again, the whole idea is to have some standard description which unify all components. May be the place to store it in the block zero is not ideal - I have too small experience to judge here... Thank you. -- Leon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: U-Boot <-> Kernel; NAND operation proposal 2013-12-18 12:11 ` Leon Pollak @ 2013-12-18 16:43 ` Peter Barada 2013-12-19 19:59 ` Gupta, Pekon 1 sibling, 0 replies; 6+ messages in thread From: Peter Barada @ 2013-12-18 16:43 UTC (permalink / raw) To: linux-mtd On 12/18/2013 07:11 AM, Leon Pollak wrote: > On Wednesday 18 December 2013 12:54:45 Ricard Wanderlof wrote: >> On Wed, 18 Dec 2013, Leon Pollak wrote: >>> I beg your pardon ahead for possible stupidity and inconsistency of >>> what I am going to say - this may be simply because of the lack of >>> experience. Below is my story and proposal as the result. >>> >>> During the last 2 years, my product, which is based on DM368 (ARM7 >>> based TI CPU) and Micron's NAND flashes (256MiB, 2K page) behaves >>> unstably. This means that some units from time to time refuse to >>> boot for different reasons. >>> >>> Today, after so long time and so many corrections, I can say that >>> most of the problems (not all!), which lead to the unit unable to >>> start to the end (to the application) where because of the >>> incompatible modes of NAND operating between u-boot and kernel. >>> >>> For example, in the configuration I started from, which was supplied >>> by some vendor as evaluation board, u-boot was configured to use >>> 4-bit HW ECC, while kernel used 1-bit SW ECC. >>> >>> The OOB layouts used in both systems were different. >>> Also BBT were configured differently. >>> >>> There were several other "small things", which combination was >>> inconsistent and produced the incorrect NAND functioning, which >>> finally in some cases made the unit inoperative. >> It would seem to me that if parameters such as ECC strength and BBT >> were configured differently between the boot loader and kernel, you >> would get a system which wouldn't boot even the first time, not work >> for a while and then fail. > It worked...:-( > And confused everybody. > Fro example - ROM boot loader used HW 4bit ECC to burn and bring up U- > Boot, but U-Boot itself used 1-Bit SW ECC to burn YAFFS. > Everything worked till there was a second error in YAFFS partition. > > OOB layout was also different. > > BBT was not used at all. > > There were more issues...:( Which exact NAND parts are you using - and what ECC recommendations does the manufacturer have to maintain an acceptable NAND UBER (uncorrectable bit error rate) - is it one bit or four bits? If 4-bit HW ECC is used for u-boot, why wouldn't the kernel use it as well for its filesystems? I think your system will become much more stable if the kernel and YAFFS filesystems used 4-bit HW ECC as well... > > >>> The major issue here is that such inconsistencies are not manifested >>> in some way, until the unit suddenly refuse to boot up after 2 >>> weeks or 2 years. >>> >>> All this lead me to the following thought (very draftly): >>> >>> Each NAND has the "spare free" area in the first (zero) block, which >>> is used for storing CIS information. This information does not >>> occupy all the block, which usually is several hundreds of >>> kilobytes. >>> So, this "spare" place may be used for storing some descriptive >>> information of ALL possible NAND flash and its service parameters. >>> I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, patter >>> places, bad block marks, and everything else you can imagine. >>> >>> Further, this information must be used both by u-boot and kernel. Or >>> even by other components, for example, RBL/UBL in DM36x from TI. >> I'm not sure I follow you. First of all, what is CIS ? > CIS stands for Card Information Structure. > > >> Secondly, the >> first block in a NAND flash is no different from the other blocks >> when it comes to the data it can hold. > Well, I am not a big guru in this. > But I saw that all of the vendors I worked with declare the first block > to be more robust and require only 1-bit ECC. > For example, our Micron chip promises block 0 to work with 1-bit ECC, > while all the rest require 4-bit. > > >> True, in systems where NAND >> flash is the boot media, the boot loader out of necessity resides in >> the first block, but a boot loader could fill out the whole block >> leaving no free space there. > Hmmm... You probably have much more experience then me. > But in my case (DM36x CPU from TI) the CPU ROM boot loader reads block > #1(!!! - not 0) to look for User Boot Loader (UBL) which normally has > 14-16 KiB size. > > But I am speaking about the block zero, which contains the CIS and some > left space. > > Again, the whole idea is to have some standard description which unify > all components. > May be the place to store it in the block zero is not ideal - I have too > small experience to judge here... > > Thank you. -- Peter Barada peter.barada@gmail.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: U-Boot <-> Kernel; NAND operation proposal 2013-12-18 12:11 ` Leon Pollak 2013-12-18 16:43 ` Peter Barada @ 2013-12-19 19:59 ` Gupta, Pekon 2013-12-20 21:49 ` Leon Pollak 1 sibling, 1 reply; 6+ messages in thread From: Gupta, Pekon @ 2013-12-19 19:59 UTC (permalink / raw) To: Leon Pollak, Ricard Wanderlof; +Cc: linux-mtd@lists.infradead.org Hi Leon, >From: Leon Pollak >>On Wednesday 18 December 2013 12:54:45 Ricard Wanderlof wrote: >>> On Wed, 18 Dec 2013, Leon Pollak wrote: >> It would seem to me that if parameters such as ECC strength and BBT >> were configured differently between the boot loader and kernel, you >> would get a system which wouldn't boot even the first time, not work >> for a while and then fail. >It worked...:-( >And confused everybody. >Fro example - ROM boot loader used HW 4bit ECC to burn and bring up U- >Boot, but U-Boot itself used 1-Bit SW ECC to burn YAFFS. >Everything worked till there was a second error in YAFFS partition. > >OOB layout was also different. > >BBT was not used at all. > >There were more issues...:( > So the problem is that you had used 1-bit Hamming for kernel and file-system, and it failed due to wear-n-tear in NAND. Therefore, it's always advisable to use highest possible ecc-scheme available to increase the life-time of your product on field. In-case you can still up-grade your kernel and u-boot to mainline versions, then you have the opportunity to use BCH8_SW ecc-scheme as of today. > >> > The major issue here is that such inconsistencies are not manifested >> > in some way, until the unit suddenly refuse to boot up after 2 >> > weeks or 2 years. >> > >> > All this lead me to the following thought (very draftly): >> > >> > Each NAND has the "spare free" area in the first (zero) block, which >> > is used for storing CIS information. This information does not >> > occupy all the block, which usually is several hundreds of >> > kilobytes. >> > So, this "spare" place may be used for storing some descriptive >> > information of ALL possible NAND flash and its service parameters. >> > I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, patter >> > places, bad block marks, and everything else you can imagine. >> > >> > Further, this information must be used both by u-boot and kernel. Or >> > even by other components, for example, RBL/UBL in DM36x from TI. >> There are other alternatives to do dynamic switching of ecc-scheme: *for u-boot* In earlier versions of u-boot, there was a 'nandecc' command which could be used to switch ecc-schemes on the fly in u-boot. So, if you could flash the above information on _any_ NAND block as raw data. You could use 'nand read.raw' command to read the information back without depending on ecc-scheme. And then dynamically change your ecc-scheme based on it. *for kernel* You can keep multiple DTB flashed into NAND, each selecting a different ecc-scheme. And based on 'raw' data read from NAND in u-boot, you can pre-load the DTB of your choice of ecc-scheme. >> I'm not sure I follow you. First of all, what is CIS ? >CIS stands for Card Information Structure. > > >> Secondly, the >> first block in a NAND flash is no different from the other blocks >> when it comes to the data it can hold. >Well, I am not a big guru in this. >But I saw that all of the vendors I worked with declare the first block >to be more robust and require only 1-bit ECC. >For example, our Micron chip promises block 0 to work with 1-bit ECC, >while all the rest require 4-bit. > Is it possible for you to share the datasheet of such NAND device. (along with section where this detail is mentioned, I'll like to understand the background of this approach). In most of the device I have encountered, all the NAND blocks are same. However, it may happen that vendors may have a separate memory array (as OTP Memory Array, along normal NAND Memory Array) which can be programmed just once, and hence has more endurance. However you can do this for any other block as well. Refer to "Block Lock" and "Block Unlock" commands in Micron datasheet. > >> True, in systems where NAND >> flash is the boot media, the boot loader out of necessity resides in >> the first block, but a boot loader could fill out the whole block >> leaving no free space there. >Hmmm... You probably have much more experience then me. >But in my case (DM36x CPU from TI) the CPU ROM boot loader reads block >#1(!!! - not 0) to look for User Boot Loader (UBL) which normally has >14-16 KiB size. > >But I am speaking about the block zero, which contains the CIS and some >left space. > >Again, the whole idea is to have some standard description which unify >all components. >May be the place to store it in the block zero is not ideal - I have too >small experience to judge here... > There are many alternative to do what you want. But is it possible for you to upgrade to kernel and u-boot ? because these may at-least require DT based kernel and u-boot supporting 'nandecc' command. with regards, pekon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: U-Boot <-> Kernel; NAND operation proposal 2013-12-19 19:59 ` Gupta, Pekon @ 2013-12-20 21:49 ` Leon Pollak 0 siblings, 0 replies; 6+ messages in thread From: Leon Pollak @ 2013-12-20 21:49 UTC (permalink / raw) To: Gupta, Pekon; +Cc: linux-mtd@lists.infradead.org, Ricard Wanderlof Hello, Pekon and others. As for the example of the NANDs with different block policy, please, see the "Features" first page of, for example, http://download.micron.com/pdf/datasheets/flash/nand/2gb_nand_m29b.pdf And yes, I already reworked all components to have one ECC type and layout, thank you. Although it is VERY difficult now to upgrade the systems in the field, I think I will find the way to do this... But the main idea of my turn to the list was not to solve my personal problem, but to raise the question of finding some technique to synchronize all components to one way of flash treatment. This will allow to avoid such problems as mine in the future and will also simplify embedded systems configuration. IMHO. My case was because of the fact that components were collected by some vendor who did not take care of this synchronization. As the result the system looked like working perfect, while... And as these kinds of problems are almost impossible to predict and test, my unification approach may be useful, I think. On Thursday 19 December 2013 19:59:03 Gupta, Pekon wrote: > Hi Leon, > > >From: Leon Pollak > > > >>On Wednesday 18 December 2013 12:54:45 Ricard Wanderlof wrote: > >>> On Wed, 18 Dec 2013, Leon Pollak wrote: > >> It would seem to me that if parameters such as ECC strength and BBT > >> were configured differently between the boot loader and kernel, you > >> would get a system which wouldn't boot even the first time, not > >> work > >> for a while and then fail. > > > >It worked...:-( > >And confused everybody. > >Fro example - ROM boot loader used HW 4bit ECC to burn and bring up > >U- Boot, but U-Boot itself used 1-Bit SW ECC to burn YAFFS. > >Everything worked till there was a second error in YAFFS partition. > > > >OOB layout was also different. > > > >BBT was not used at all. > > > >There were more issues...:( > > So the problem is that you had used 1-bit Hamming for kernel and > file-system, and it failed due to wear-n-tear in NAND. > Therefore, it's always advisable to use highest possible ecc-scheme > available to increase the life-time of your product on field. > > In-case you can still up-grade your kernel and u-boot to mainline > versions, then you have the opportunity to use BCH8_SW ecc-scheme as > of today. > >> > The major issue here is that such inconsistencies are not > >> > manifested > >> > in some way, until the unit suddenly refuse to boot up after 2 > >> > weeks or 2 years. > >> > > >> > All this lead me to the following thought (very draftly): > >> > > >> > Each NAND has the "spare free" area in the first (zero) block, > >> > which > >> > is used for storing CIS information. This information does not > >> > occupy all the block, which usually is several hundreds of > >> > kilobytes. > >> > So, this "spare" place may be used for storing some descriptive > >> > information of ALL possible NAND flash and its service > >> > parameters. > >> > I am speaking about ECC bits, Sw/HW, OOB layout, BBT layout, > >> > patter > >> > places, bad block marks, and everything else you can imagine. > >> > > >> > Further, this information must be used both by u-boot and kernel. > >> > Or > >> > even by other components, for example, RBL/UBL in DM36x from TI. > > There are other alternatives to do dynamic switching of ecc-scheme: > > *for u-boot* > In earlier versions of u-boot, there was a 'nandecc' command which > could be used to switch ecc-schemes on the fly in u-boot. > So, if you could flash the above information on _any_ NAND block > as raw data. You could use 'nand read.raw' command to read the > information back without depending on ecc-scheme. And then > dynamically change your ecc-scheme based on it. > > *for kernel* > You can keep multiple DTB flashed into NAND, each selecting a > different ecc-scheme. And based on 'raw' data read from NAND > in u-boot, you can pre-load the DTB of your choice of ecc-scheme. > > >> I'm not sure I follow you. First of all, what is CIS ? > > > >CIS stands for Card Information Structure. > > > >> Secondly, the > >> first block in a NAND flash is no different from the other blocks > >> when it comes to the data it can hold. > > > >Well, I am not a big guru in this. > >But I saw that all of the vendors I worked with declare the first > >block to be more robust and require only 1-bit ECC. > >For example, our Micron chip promises block 0 to work with 1-bit ECC, > >while all the rest require 4-bit. > > Is it possible for you to share the datasheet of such NAND device. > (along with section where this detail is mentioned, I'll like to > understand the background of this approach). > In most of the device I have encountered, all the NAND blocks are > same. However, it may happen that vendors may have a separate memory > array (as OTP Memory Array, along normal NAND Memory Array) which can > be programmed just once, and hence has more endurance. > However you can do this for any other block as well. > Refer to "Block Lock" and "Block Unlock" commands in Micron datasheet. > >> True, in systems where NAND > >> flash is the boot media, the boot loader out of necessity resides > >> in > >> the first block, but a boot loader could fill out the whole block > >> leaving no free space there. > > > >Hmmm... You probably have much more experience then me. > >But in my case (DM36x CPU from TI) the CPU ROM boot loader reads > >block #1(!!! - not 0) to look for User Boot Loader (UBL) which > >normally has 14-16 KiB size. > > > >But I am speaking about the block zero, which contains the CIS and > >some left space. > > > >Again, the whole idea is to have some standard description which > >unify all components. > >May be the place to store it in the block zero is not ideal - I have > >too small experience to judge here... > > There are many alternative to do what you want. > But is it possible for you to upgrade to kernel and u-boot ? > because these may at-least require DT based kernel and > u-boot supporting 'nandecc' command. > > > with regards, pekon -- Leon ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-20 21:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-18 11:12 U-Boot <-> Kernel; NAND operation proposal Leon Pollak 2013-12-18 11:54 ` Ricard Wanderlof 2013-12-18 12:11 ` Leon Pollak 2013-12-18 16:43 ` Peter Barada 2013-12-19 19:59 ` Gupta, Pekon 2013-12-20 21:49 ` Leon Pollak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).