* ECC configuration of NAND from Linux (MEMSETOOBSEL)
@ 2018-01-12 0:29 Gudjon I. Gudjonsson
2018-01-12 14:46 ` Boris Brezillon
0 siblings, 1 reply; 7+ messages in thread
From: Gudjon I. Gudjonsson @ 2018-01-12 0:29 UTC (permalink / raw)
To: linux-mtd
Hi list
I am trying to upgrade a few embedded Linux systems remotely
and increasing the number of ECC bits at the same time.
I read your FAQ and found the reference to ioctl (MEMSETOOBSEL) [1]
but it seems to be removed from the kernel.
/*
* Note, the following ioctl existed in the past and was removed:
* #define MEMSETOOBSEL _IOW('M', 9, struct nand_oobinfo)
* Try to avoid adding a new ioctl with the same ioctl number.
*/
I assume this is an error in the documentation but I wonder if you
know any solution to my problem?
Regards
Gudjon
[1] http://www.linux-mtd.infradead.org/faq/nand.html
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-12 0:29 ECC configuration of NAND from Linux (MEMSETOOBSEL) Gudjon I. Gudjonsson @ 2018-01-12 14:46 ` Boris Brezillon 2018-01-12 18:50 ` Gudjon I. Gudjonsson 0 siblings, 1 reply; 7+ messages in thread From: Boris Brezillon @ 2018-01-12 14:46 UTC (permalink / raw) To: Gudjon I. Gudjonsson; +Cc: linux-mtd Hello Gudjon, On Fri, 12 Jan 2018 01:29:55 +0100 "Gudjon I. Gudjonsson" <gudjon@gudjon.org> wrote: > Hi list > > I am trying to upgrade a few embedded Linux systems remotely > and increasing the number of ECC bits at the same time. > > I read your FAQ and found the reference to ioctl (MEMSETOOBSEL) [1] > but it seems to be removed from the kernel. > /* > * Note, the following ioctl existed in the past and was removed: > * #define MEMSETOOBSEL _IOW('M', 9, struct nand_oobinfo) > * Try to avoid adding a new ioctl with the same ioctl number. > */ > I assume this is an error in the documentation but I wonder if you > know any solution to my problem? It's not something you can change dynamically. When you change the ECC config, it makes existing content unreadable. In order to change this setting you'll have to erase the whole flash and then change the ECC config in your DT or board file (note that not all drivers support adjusting the ECC strength/step-size). Regards, Boris ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-12 14:46 ` Boris Brezillon @ 2018-01-12 18:50 ` Gudjon I. Gudjonsson 2018-01-13 2:41 ` Steve deRosier 0 siblings, 1 reply; 7+ messages in thread From: Gudjon I. Gudjonsson @ 2018-01-12 18:50 UTC (permalink / raw) To: Boris Brezillon; +Cc: linux-mtd Hi Boris Thanks for the answer. > > I read your FAQ and found the reference to ioctl (MEMSETOOBSEL) [1] > > but it seems to be removed from the kernel. > > /* > > * Note, the following ioctl existed in the past and was removed: > > * #define MEMSETOOBSEL _IOW('M', 9, struct nand_oobinfo) > > * Try to avoid adding a new ioctl with the same ioctl number. > > */ > > I assume this is an error in the documentation but I wonder if you > > know any solution to my problem? > > It's not something you can change dynamically. When you change the ECC > config, it makes existing content unreadable. In order to change this > setting you'll have to erase the whole flash and then change the ECC > config in your DT or board file (note that not all drivers support > adjusting the ECC strength/step-size). I will have to accept that but can you please tell me how to change the ECC strength if my driver supports it? My plan is to use swupdate and update the system using an SD-card that is already installed but I could not find any reference to changing the ECC strength. I am using the Atmel SAMA5d36 CPU and Micron mt29F2G08abaeawp NAND flash. Regards Gudjon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-12 18:50 ` Gudjon I. Gudjonsson @ 2018-01-13 2:41 ` Steve deRosier 2018-01-13 8:24 ` Boris Brezillon 0 siblings, 1 reply; 7+ messages in thread From: Steve deRosier @ 2018-01-13 2:41 UTC (permalink / raw) To: gudjon; +Cc: Boris Brezillon, linux-mtd Hi Gudjon, On Fri, Jan 12, 2018 at 10:50 AM, Gudjon I. Gudjonsson <gudjon@gudjon.org> wrote: > > setting you'll have to erase the whole flash and then change the ECC > > config in your DT or board file (note that not all drivers support > > adjusting the ECC strength/step-size). > I will have to accept that but can you please tell me how to change the > ECC strength if my driver supports it? My plan is to use swupdate and > update the system using an SD-card that is already installed but I could > not find any reference to changing the ECC strength. > I am using the Atmel SAMA5d36 CPU and Micron mt29F2G08abaeawp > NAND flash. > I might be wrong, but I don't think there's any mechanism to change the ECC strength on the fly with that processor and flash combination. In order to do it, you have to adjust it in your device-tree. I went through this in an upgrade scenario on a similar system a few years ago and came to the conclusion that it wasn't viable. As a matter of background, we had two spots on flash for the kernel (kernel-a, kernel-b), and two for a rootfs that was a UBIFS (rootfs-a, rootfs-b). Our upgrade procedure was to run on -a, and flash -b. Next time, run on -b and flash on -a, etc... To do it, here's what would have had to be done: 1. Change the ECC strength in the DT, which then gets appended to the the kernel image. Which means when the new kernel boots the new ECC takes effect and not before. Note that the kernel that is running is using the whatever ECC it was set for. 2. Change our update script to _not_ write the ECC bits when it flashes... this is critical. 3. Now, (assuming running on -a partitions), erase kernel-b, rootfs-b. Then flash the new kernel and new rootfs to the -b partitions _with_out_ ECC bits! 4. Reboot to -b partitions. Note that you're now running a kernel supporting the new ECC layout, but without any ECC actually being performed. 5. Now, erase and reflash -a with the same new kernel and rootfs _with_ ECC bits. 6. Boot to -a. Now you're running with the new ECC layout and with ECC actually being done. I'm going from memory, so I might have missed a step or done something out of order, but you get the point. Now, why all of the above? The problem is the number of ECC bits that gets flashed is dependent on the kernel running flashing it. So, having a kernel running 4 bits trying to flash 8, doesn't work. The solution is by forcing all the written ECC bits to 0xffs by turing off the ECC bits when flashing with nandwrite. The kernel will read and ignore ECC, no matter the set strength, if there's no ECC bits set. So, essentially, you have to write the new stuff with the enhanced bits with no bits actually written, in order to boot into it and then write it correctly a second time. Additionally, you got to remember to include the boot-loaders, so bootstrap+uboot also need to be changed, upgraded, etc. And, if you've got a mix of versions out there, managing them and freely allowing upgrades and downgrades becomes a headache too. It's easier if you don't have to boot into the kernel+rootfs and choose an upgrade process that leverages u-boot, but our system didn't work that way. I also came up with a number of other options, but they're not worth detailing here and they were also a pain and risky to implement. Basically, it's a major pain and since we were already using the recommended minimum for our MT29Fxxxxxx flash, we left it alone. And honestly, we haven't had any significant problems with the flash that could be attributed to ECC strength. All that said - this was a few years ago, and on a 3.8 kernel, running bootstrap and u-boot and running the rootfs directly out of flash on UBIFS. And we were running our own software update program that wasn't very sophisticated. So, your situation may be different and or there might be more options available now on newer kernels. Hopefully that story helps a bit. - Steve Steve deRosier Cal-Sierra Consulting LLC https://www.cal-sierra.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-13 2:41 ` Steve deRosier @ 2018-01-13 8:24 ` Boris Brezillon 2018-01-13 17:34 ` Steve deRosier 0 siblings, 1 reply; 7+ messages in thread From: Boris Brezillon @ 2018-01-13 8:24 UTC (permalink / raw) To: Steve deRosier; +Cc: gudjon, linux-mtd On Fri, 12 Jan 2018 18:41:58 -0800 Steve deRosier <derosier@gmail.com> wrote: > Hi Gudjon, > > On Fri, Jan 12, 2018 at 10:50 AM, Gudjon I. Gudjonsson > <gudjon@gudjon.org> wrote: > > > setting you'll have to erase the whole flash and then change the ECC > > > config in your DT or board file (note that not all drivers support > > > adjusting the ECC strength/step-size). > > I will have to accept that but can you please tell me how to change the > > ECC strength if my driver supports it? My plan is to use swupdate and > > update the system using an SD-card that is already installed but I could > > not find any reference to changing the ECC strength. > > I am using the Atmel SAMA5d36 CPU and Micron mt29F2G08abaeawp > > NAND flash. > > > > I might be wrong, but I don't think there's any mechanism to change > the ECC strength on the fly with that processor and flash combination. > In order to do it, you have to adjust it in your device-tree. I went > through this in an upgrade scenario on a similar system a few years > ago and came to the conclusion that it wasn't viable. As a matter of > background, we had two spots on flash for the kernel (kernel-a, > kernel-b), and two for a rootfs that was a UBIFS (rootfs-a, rootfs-b). > Our upgrade procedure was to run on -a, and flash -b. Next time, run > on -b and flash on -a, etc... To do it, here's what would have had to > be done: > > 1. Change the ECC strength in the DT, which then gets appended to the > the kernel image. Which means when the new kernel boots the new ECC > takes effect and not before. Note that the kernel that is running is > using the whatever ECC it was set for. > 2. Change our update script to _not_ write the ECC bits when it > flashes... this is critical. > 3. Now, (assuming running on -a partitions), erase kernel-b, rootfs-b. > Then flash the new kernel and new rootfs to the -b partitions > _with_out_ ECC bits! > 4. Reboot to -b partitions. Note that you're now running a kernel > supporting the new ECC layout, but without any ECC actually being > performed. > 5. Now, erase and reflash -a with the same new kernel and rootfs > _with_ ECC bits. > 6. Boot to -a. Now you're running with the new ECC layout and with ECC > actually being done. > > I'm going from memory, so I might have missed a step or done something > out of order, but you get the point. Now, why all of the above? The > problem is the number of ECC bits that gets flashed is dependent on > the kernel running flashing it. So, having a kernel running 4 bits > trying to flash 8, doesn't work. The solution is by forcing all the > written ECC bits to 0xffs by turing off the ECC bits when flashing > with nandwrite. The kernel will read and ignore ECC, no matter the set > strength, if there's no ECC bits set. That's not true. If you have all ECC bytes set to 0xff it will simply not boot (or at least it should not), because the ECC engine will report errors everywhere. > So, essentially, you have to > write the new stuff with the enhanced bits with no bits actually > written, in order to boot into it and then write it correctly a second > time. And this trick only works if your NAND supports subpage writes. > > Additionally, you got to remember to include the boot-loaders, so > bootstrap+uboot also need to be changed, upgraded, etc. And, if > you've got a mix of versions out there, managing them and freely > allowing upgrades and downgrades becomes a headache too. It's easier > if you don't have to boot into the kernel+rootfs and choose an upgrade > process that leverages u-boot, but our system didn't work that way. I > also came up with a number of other options, but they're not worth > detailing here and they were also a pain and risky to implement. > > Basically, it's a major pain and since we were already using the > recommended minimum for our MT29Fxxxxxx flash, we left it alone. And > honestly, we haven't had any significant problems with the flash that > could be attributed to ECC strength. That is true. Once you have chosen a specific strength, it's not easy to change it: you have to erase everything and write it again with a different config. > > All that said - this was a few years ago, and on a 3.8 kernel, running > bootstrap and u-boot and running the rootfs directly out of flash on > UBIFS. And we were running our own software update program that wasn't > very sophisticated. So, your situation may be different and or there > might be more options available now on newer kernels. > > Hopefully that story helps a bit. > > - Steve > > Steve deRosier > Cal-Sierra Consulting LLC > https://www.cal-sierra.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-13 8:24 ` Boris Brezillon @ 2018-01-13 17:34 ` Steve deRosier 2018-01-14 15:10 ` Boris Brezillon 0 siblings, 1 reply; 7+ messages in thread From: Steve deRosier @ 2018-01-13 17:34 UTC (permalink / raw) To: Boris Brezillon; +Cc: gudjon, linux-mtd Hi Boris, On Sat, Jan 13, 2018 at 12:24 AM, Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > On Fri, 12 Jan 2018 18:41:58 -0800 > Steve deRosier <derosier@gmail.com> wrote: > >> Hi Gudjon, >> >> On Fri, Jan 12, 2018 at 10:50 AM, Gudjon I. Gudjonsson >> <gudjon@gudjon.org> wrote: >> > > setting you'll have to erase the whole flash and then change the ECC >> > > config in your DT or board file (note that not all drivers support >> > > adjusting the ECC strength/step-size). >> > I will have to accept that but can you please tell me how to change the >> > ECC strength if my driver supports it? My plan is to use swupdate and >> > update the system using an SD-card that is already installed but I could >> > not find any reference to changing the ECC strength. >> > I am using the Atmel SAMA5d36 CPU and Micron mt29F2G08abaeawp >> > NAND flash. >> > >> >> I might be wrong, but I don't think there's any mechanism to change >> the ECC strength on the fly with that processor and flash combination. >> In order to do it, you have to adjust it in your device-tree. I went >> through this in an upgrade scenario on a similar system a few years >> ago and came to the conclusion that it wasn't viable. As a matter of >> background, we had two spots on flash for the kernel (kernel-a, >> kernel-b), and two for a rootfs that was a UBIFS (rootfs-a, rootfs-b). >> Our upgrade procedure was to run on -a, and flash -b. Next time, run >> on -b and flash on -a, etc... To do it, here's what would have had to >> be done: >> >> 1. Change the ECC strength in the DT, which then gets appended to the >> the kernel image. Which means when the new kernel boots the new ECC >> takes effect and not before. Note that the kernel that is running is >> using the whatever ECC it was set for. >> 2. Change our update script to _not_ write the ECC bits when it >> flashes... this is critical. >> 3. Now, (assuming running on -a partitions), erase kernel-b, rootfs-b. >> Then flash the new kernel and new rootfs to the -b partitions >> _with_out_ ECC bits! >> 4. Reboot to -b partitions. Note that you're now running a kernel >> supporting the new ECC layout, but without any ECC actually being >> performed. >> 5. Now, erase and reflash -a with the same new kernel and rootfs >> _with_ ECC bits. >> 6. Boot to -a. Now you're running with the new ECC layout and with ECC >> actually being done. >> >> I'm going from memory, so I might have missed a step or done something >> out of order, but you get the point. Now, why all of the above? The >> problem is the number of ECC bits that gets flashed is dependent on >> the kernel running flashing it. So, having a kernel running 4 bits >> trying to flash 8, doesn't work. The solution is by forcing all the >> written ECC bits to 0xffs by turing off the ECC bits when flashing >> with nandwrite. The kernel will read and ignore ECC, no matter the set >> strength, if there's no ECC bits set. > > That's not true. If you have all ECC bytes set to 0xff it will simply > not boot (or at least it should not), because the ECC engine will report > errors everywhere. > Well, I'm glad you say it shouldn't work that way, because I happen to agree that it shouldn't. However, I can unequivocally confirm that on at least one Atmel processor with one specific NAND with kernel version 3.8, it does indeed work this way in practice. It's very clear from the behavior that ECC-configured, but with the OOB area being 0xffs is being interpreted as "I have no ECC data, so don't bother trying to do ECC". Now, obviously if there are bit-flips, what is read is invalid and can cause random operations. Which, unfortunately, is how I know what the behavior is. I do not know if newer kernels behave this way on the platform in question. I solved the configuration and process issues long ago and so I never had to debug the problem on the newer 4.4 and 4.9 kernels the product uses. >> So, essentially, you have to >> write the new stuff with the enhanced bits with no bits actually >> written, in order to boot into it and then write it correctly a second >> time. > > And this trick only works if your NAND supports subpage writes. The layout of the SLC NAND doesn't allow for subpage writes. It has a 2k-byte + 64 byte OOB page, with a BS of 64 pages. Standard operation is as expected: must erase in blocks, may program individual pages. It is possible to choose to write the 2k byte page with or without ECC and leave the erased 0xFFs in the OOB. This can be confirmed by working directly with the NAND using u-boot's nand commands. The NAND itself is non-ECC, and the PMECC controller on the processor only handles the algorithms. So what to write, including the OOB is all constructed in-software, written to the program page cache and then the command to write is issued. So, even without subpage writes, it's quite easy to write the data without writing the OOB. And, remember - we're not writing the same page twice. First write, with the erased OOB, of the rootfs in this case is to mtd7, and the second write, the one with the correct new ECC data to the OOB, is to mtd6. Perhaps thats the misunderstanding here. I'm not trying to be argumentative, I'm just saying what does indeed happen on this specific platform I worked with. I shared the details of my experience as the OP has a similar platform, but what I experienced may or may not be applicable to his case. I wanted to explain _why_ it is such a pain. And that changing the ECC strength can not be undertaken lightly. - Steve Steve deRosier Cal-Sierra Consulting LLC https://www.cal-sierra.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ECC configuration of NAND from Linux (MEMSETOOBSEL) 2018-01-13 17:34 ` Steve deRosier @ 2018-01-14 15:10 ` Boris Brezillon 0 siblings, 0 replies; 7+ messages in thread From: Boris Brezillon @ 2018-01-14 15:10 UTC (permalink / raw) To: Steve deRosier; +Cc: gudjon, linux-mtd Hi Steve, On Sat, 13 Jan 2018 09:34:52 -0800 Steve deRosier <derosier@gmail.com> wrote: > Hi Boris, > > On Sat, Jan 13, 2018 at 12:24 AM, Boris Brezillon > <boris.brezillon@free-electrons.com> wrote: > > On Fri, 12 Jan 2018 18:41:58 -0800 > > Steve deRosier <derosier@gmail.com> wrote: > > > >> Hi Gudjon, > >> > >> On Fri, Jan 12, 2018 at 10:50 AM, Gudjon I. Gudjonsson > >> <gudjon@gudjon.org> wrote: > >> > > setting you'll have to erase the whole flash and then change the ECC > >> > > config in your DT or board file (note that not all drivers support > >> > > adjusting the ECC strength/step-size). > >> > I will have to accept that but can you please tell me how to change the > >> > ECC strength if my driver supports it? My plan is to use swupdate and > >> > update the system using an SD-card that is already installed but I could > >> > not find any reference to changing the ECC strength. > >> > I am using the Atmel SAMA5d36 CPU and Micron mt29F2G08abaeawp > >> > NAND flash. > >> > > >> > >> I might be wrong, but I don't think there's any mechanism to change > >> the ECC strength on the fly with that processor and flash combination. > >> In order to do it, you have to adjust it in your device-tree. I went > >> through this in an upgrade scenario on a similar system a few years > >> ago and came to the conclusion that it wasn't viable. As a matter of > >> background, we had two spots on flash for the kernel (kernel-a, > >> kernel-b), and two for a rootfs that was a UBIFS (rootfs-a, rootfs-b). > >> Our upgrade procedure was to run on -a, and flash -b. Next time, run > >> on -b and flash on -a, etc... To do it, here's what would have had to > >> be done: > >> > >> 1. Change the ECC strength in the DT, which then gets appended to the > >> the kernel image. Which means when the new kernel boots the new ECC > >> takes effect and not before. Note that the kernel that is running is > >> using the whatever ECC it was set for. > >> 2. Change our update script to _not_ write the ECC bits when it > >> flashes... this is critical. > >> 3. Now, (assuming running on -a partitions), erase kernel-b, rootfs-b. > >> Then flash the new kernel and new rootfs to the -b partitions > >> _with_out_ ECC bits! > >> 4. Reboot to -b partitions. Note that you're now running a kernel > >> supporting the new ECC layout, but without any ECC actually being > >> performed. > >> 5. Now, erase and reflash -a with the same new kernel and rootfs > >> _with_ ECC bits. > >> 6. Boot to -a. Now you're running with the new ECC layout and with ECC > >> actually being done. > >> > >> I'm going from memory, so I might have missed a step or done something > >> out of order, but you get the point. Now, why all of the above? The > >> problem is the number of ECC bits that gets flashed is dependent on > >> the kernel running flashing it. So, having a kernel running 4 bits > >> trying to flash 8, doesn't work. The solution is by forcing all the > >> written ECC bits to 0xffs by turing off the ECC bits when flashing > >> with nandwrite. The kernel will read and ignore ECC, no matter the set > >> strength, if there's no ECC bits set. > > > > That's not true. If you have all ECC bytes set to 0xff it will simply > > not boot (or at least it should not), because the ECC engine will report > > errors everywhere. > > > > Well, I'm glad you say it shouldn't work that way, because I happen to > agree that it shouldn't. However, I can unequivocally confirm that on > at least one Atmel processor with one specific NAND with kernel > version 3.8, it does indeed work this way in practice. It's very clear > from the behavior that ECC-configured, but with the OOB area being > 0xffs is being interpreted as "I have no ECC data, so don't bother > trying to do ECC". Now, obviously if there are bit-flips, what is read > is invalid and can cause random operations. Which, unfortunately, is > how I know what the behavior is. You're right, it seems that this test [1], which is meant detect erased pages, has the side effect of completely disabling ECC correction when ECC bytes are all set to 0xff, which is obviously wrong! > > I do not know if newer kernels behave this way on the platform in > question. I solved the configuration and process issues long ago and > so I never had to debug the problem on the newer 4.4 and 4.9 kernels > the product uses. I confirm that this trick does not work in mainline :-). > > >> So, essentially, you have to > >> write the new stuff with the enhanced bits with no bits actually > >> written, in order to boot into it and then write it correctly a second > >> time. > > > > And this trick only works if your NAND supports subpage writes. > > The layout of the SLC NAND doesn't allow for subpage writes. It has a > 2k-byte + 64 byte OOB page, with a BS of 64 pages. Standard operation > is as expected: must erase in blocks, may program individual pages. It > is possible to choose to write the 2k byte page with or without ECC > and leave the erased 0xFFs in the OOB. This can be confirmed by > working directly with the NAND using u-boot's nand commands. The NAND > itself is non-ECC, and the PMECC controller on the processor only > handles the algorithms. So what to write, including the OOB is all > constructed in-software, written to the program page cache and then > the command to write is issued. So, even without subpage writes, it's > quite easy to write the data without writing the OOB. > > And, remember - we're not writing the same page twice. First write, > with the erased OOB, of the rootfs in this case is to mtd7, and the > second write, the one with the correct new ECC data to the OOB, is to > mtd6. Perhaps thats the misunderstanding here. Indeed, I thought you were overwriting already programmed pages. > > I'm not trying to be argumentative, I'm just saying what does indeed > happen on this specific platform I worked with. I shared the details > of my experience as the OP has a similar platform, but what I > experienced may or may not be applicable to his case. I wanted to > explain _why_ it is such a pain. And that changing the ECC strength > can not be undertaken lightly. It's clearly a pain to change the ECC config after the products have been shipped. Regards, Boris ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-01-14 15:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-12 0:29 ECC configuration of NAND from Linux (MEMSETOOBSEL) Gudjon I. Gudjonsson 2018-01-12 14:46 ` Boris Brezillon 2018-01-12 18:50 ` Gudjon I. Gudjonsson 2018-01-13 2:41 ` Steve deRosier 2018-01-13 8:24 ` Boris Brezillon 2018-01-13 17:34 ` Steve deRosier 2018-01-14 15:10 ` Boris Brezillon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox