* UBI/UBIFS: dealing with MLC's paired pages
@ 2015-09-17 13:22 Boris Brezillon
2015-09-17 15:20 ` Artem Bityutskiy
` (3 more replies)
0 siblings, 4 replies; 43+ messages in thread
From: Boris Brezillon @ 2015-09-17 13:22 UTC (permalink / raw)
To: Artem Bityutskiy, Richard Weinberger
Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian,
Qi Wang 王起 (qiwang), Iwo Mergler,
Jeff Lauruhn (jlauruhn)
Hello,
I'm currently working on the paired pages problem we have on MLC chips.
I remember discussing it with Artem earlier this year when I was
preparing my talk for ELC.
I now have some time I can spend working on this problem and I started
looking at how this can be solved.
First let's take a look at the UBI layer.
There's one basic thing we have to care about: protecting UBI metadata.
There are two kind of metadata:
1/ those stored at the beginning of each erase block (EC and VID
headers)
2/ those stored in specific volumes (layout and fastmap volumes)
We don't have to worry about #2 since those are written using atomic
update, and atomic updates are immune to this paired page corruption
problem (either the whole write is valid, or none of it is valid).
This leaves problem #1.
For this case, Artem suggested to duplicate the EC header in the VID
header so that if page 0 is corrupted we can recover the EC info from
page 1 (which will contain both VID and EC info).
Doing that is fine for dealing with EC header corruption, since, AFAIK,
none of the NAND vendors are pairing page 0 with page 1.
Still remains the VID header corruption problem. Do prevent that we
still have several solutions:
a/ skip the page paired with the VID header. This is doable and can be
hidden from UBI users, but it also means that we're loosing another
page for metadata (not a negligible overhead)
b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap
seems the right place to put that in, since fastmap is already
storing those information for almost all blocks. Still we would have
to modify fastmap a bit to store information about all erase blocks
and not only those that are not part of the fastmap pool.
Also, updating that in real-time would require using a log approach,
instead of the atomic update currently used by fastmap when it runs
out of PEBs in it's free PEB pool. Note that the log approach does
not have to be applied to all fastmap data (we just need it for the
PEB <-> LEB info).
Another off-topic note regarding the suggested log approach: we
could also use it to log which PEB was last written/erased, and use
that to handle the unstable bits issue.
c/ (also suggested by Artem) delay VID write until we have enough data
to write on the LEB, and thus guarantee that it cannot be corrupted
(at least by programming on the paired page ;-)) anymore.
Doing that would also require logging data to be written on those
LEBs somewhere, not to mention the impact of copying the data twice
(once in the log, and then when we have enough data, in the real
block).
I don't have any strong opinion about which solution is the best, also
I'm maybe missing other aspects or better solutions, so feel free to
comment on that and share your thoughts.
That's all for the UBI layer. We will likely need new functions (and
new fields in existing structures) to help UBI users deal with MLC
NANDs: for example a field exposing the storage type or a function
helping users skip one (or several) blocks to secure the data they have
written so far. Anyway, those are things we can discuss after deciding
which approach we want to take.
Now, let's talk about the UBIFS layer. We are facing pretty much the
same problem in there: we need to protect the data we have already
written from time to time.
AFAIU (correct me if I'm wrong), data should be secure when we sync the
file system, or commit the UBIFS journal (feel free to correct me if
I'm not using the right terms in my explanation).
As explained earlier, the only way to secure data is to skip some pages
(those that are paired with the already written ones).
I see two approaches here (there might be more):
1/ do not skip any pages until we are asked to secure the data, and
then skip as much pages as needed to ensure nobody can ever corrupt
the data. With this approach you can loose a non negligible amount
of space. For example, with this paired pages scheme [1], if you
only write page on page 2 and want to secure your data, you'll have
to skip pages 3 to 8.
2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a
block). With this solution you always loose half the NAND capacity,
but in case of small writes, it's still more efficient than #1.
Of course using that solution is not acceptable, because you'll
only be able to use half the NAND capacity, but the plan is to use
it in conjunction with the GC, so that from time to time UBIFS
data chunks/nodes can be put in a single erase block without
skipping half the pages.
Note that currently the GC does not work this way: it tries to
collect chunks one by one and write them to the journal to free a
dirty LEB. What we would need here is a way to collect enough data
to fill an entire block and after that release the LEBs that where
previously using half the LEB capacity.
Of course both of those solutions implies marking the skipped regions
as dirty so that the GC can account for the padded space. For #1 we
should probably also use padding nodes to reflect how much space is lost
on the media, though I'm not sure how this can be done. For #2, we may
have to differentiate 'full' and 'half' LEBs in the LPT.
Anyway, all the above are just some ideas I had or suggestions I got
from other people and I wanted to share. I'm open to any new
suggestions, because none of the proposed solutions are easy to
implement.
Best Regards,
Boris
P.S.: Note that I'm not discussing the WP solution on purpose: I'd like
to have a solution that is completely HW independent.
[1]https://www.olimex.com/Products/Components/IC/H27UBG8T2BTR/resources/H27UBG8T2BTR.pdf,
chapter 6.1. Paired Page Address Information
--
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon @ 2015-09-17 15:20 ` Artem Bityutskiy 2015-09-17 15:46 ` Boris Brezillon 2015-09-29 11:19 ` Richard Weinberger ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: Artem Bityutskiy @ 2015-09-17 15:20 UTC (permalink / raw) To: Boris Brezillon, Richard Weinberger Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 (qiwang), Iwo Mergler, Jeff Lauruhn (jlauruhn) On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote: > Hello, > > I'm currently working on the paired pages problem we have on MLC > chips. > I remember discussing it with Artem earlier this year when I was > preparing my talk for ELC. Hi Boris, excellent summary, very structured. I won't generate any new idea now, just an implementation tactics suggestion. For an implementation, I'd started with a power cut emulator which emulates paired pages. I'd probably do it in UBI, may be lower. I'd also write a good UBI power-cut test application. And then I'd start playing with various implementation approaches. I'd use the test-driven approach. Artem. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 15:20 ` Artem Bityutskiy @ 2015-09-17 15:46 ` Boris Brezillon 2015-09-17 16:47 ` Richard Weinberger 0 siblings, 1 reply; 43+ messages in thread From: Boris Brezillon @ 2015-09-17 15:46 UTC (permalink / raw) To: dedekind1 Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Hi Artem, On Thu, 17 Sep 2015 18:20:39 +0300 Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote: > > Hello, > > > > I'm currently working on the paired pages problem we have on MLC > > chips. > > I remember discussing it with Artem earlier this year when I was > > preparing my talk for ELC. > > Hi Boris, > > excellent summary, very structured. I won't generate any new idea now, > just an implementation tactics suggestion. I'm taking that too :-). > > For an implementation, I'd started with a power cut emulator which > emulates paired pages. I'd probably do it in UBI, may be lower. I actually these kind of emulation in nandsim, though I currently generate a kernel Oops when the emulated power-cut occurs, which is not really easy to use (still both paired pages are corrupted on the file used by nandsim to store NAND data). I'm considering changing the behavior to return -EROFS instead of BUG(), but I'm still not sure upper layers are expecting this error... I know that using an emulation layer is the only way to go if we want to test the implementation, but I managed to generate those paired pages problem manually by launching a reset in the middle of a page program operation. Even this 'page program interruption' code is still hacky, I think we'll be able to 'easily' validate the solution in real world use cases when it's ready. > I'd > also write a good UBI power-cut test application. Not sure what you mean by a UBI power-cut application? > And then I'd start > playing with various implementation approaches. Yep, that was the plan, I was hoping you could help me exclude some of them, but I guess testing all of them is the only way to find the best one :-/. > I'd use the test-driven > approach. Hm, yep I guess that's the only way to test as much cases as possible, but even with that I doubt I'll be able to think of all the cases that could happen in real world. Thanks for the feedback. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 15:46 ` Boris Brezillon @ 2015-09-17 16:47 ` Richard Weinberger 2015-09-18 7:17 ` Andrea Scian 0 siblings, 1 reply; 43+ messages in thread From: Richard Weinberger @ 2015-09-17 16:47 UTC (permalink / raw) To: Boris Brezillon, dedekind1 Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Boris, Am 17.09.2015 um 17:46 schrieb Boris Brezillon: >> I'd >> also write a good UBI power-cut test application. > > Not sure what you mean by a UBI power-cut application? UBI has a mechanism so emulate a power-cut. Userspace can trigger it. I assume Artem meant that we could extend the mechanism to emulate paired page related issues in UBI. >> And then I'd start >> playing with various implementation approaches. > > Yep, that was the plan, I was hoping you could help me exclude some of > them, but I guess testing all of them is the only way to find the > best one :-/. > >> I'd use the test-driven >> approach. > > Hm, yep I guess that's the only way to test as much cases as possible, > but even with that I doubt I'll be able to think of all the cases that > could happen in real world. Yeah, the crucial point is that we have to emulate paired pages very good. Testing using emulation is nice but we need bare metal tests too. I have one board with MLC NAND, I'll happily wear it do death. B-) Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 16:47 ` Richard Weinberger @ 2015-09-18 7:17 ` Andrea Scian 2015-09-18 7:41 ` Boris Brezillon 2015-09-18 7:54 ` Artem Bityutskiy 0 siblings, 2 replies; 43+ messages in thread From: Andrea Scian @ 2015-09-18 7:17 UTC (permalink / raw) To: Richard Weinberger, Boris Brezillon, dedekind1 Cc: linux-mtd, David Woodhouse, Brian Norris, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Dear all, Il 17/09/2015 18:47, Richard Weinberger ha scritto: > Boris, > > Am 17.09.2015 um 17:46 schrieb Boris Brezillon: >>> I'd >>> also write a good UBI power-cut test application. >> Not sure what you mean by a UBI power-cut application? > UBI has a mechanism so emulate a power-cut. Userspace > can trigger it. I assume Artem meant that we could extend the mechanism > to emulate paired page related issues in UBI. > >>> And then I'd start >>> playing with various implementation approaches. >> Yep, that was the plan, I was hoping you could help me exclude some of >> them, but I guess testing all of them is the only way to find the >> best one :-/. >> >>> I'd use the test-driven >>> approach. >> Hm, yep I guess that's the only way to test as much cases as possible, >> but even with that I doubt I'll be able to think of all the cases that >> could happen in real world. > Yeah, the crucial point is that we have to emulate paired pages very good. > Testing using emulation is nice but we need bare metal tests too. > I have one board with MLC NAND, I'll happily wear it do death. B-) I think Boris has the same board somewhere ;-) I perfectly understand the reason why using nandsim (and powercut simulator in general) but, AFAIK, the powercut problem is hard to "simulate" because the main issue is when the device see a loss of power in the middle of an operation (page write or block erase) I think that the best approach for bare metal test is something like the following: - connect a real powercut device (a simple relais that cut the main power supply driven by a GPIO) - drive this device inside the MTD code (probably with random delay after issuing a NAND command) I think that I (as DAVE) can provide this kind of hardware, with an easy plug-in connector on our hostboard (if those are the one that Richard speak about). Please let me know if you're interesting in it, if so I'll forward this request to our hardware guys and give you an official confirm. While running this kind of test, I would also increase CPU load, to reduce bypass capacitor intrusion (which may lead to wrong result in a generic case) Kind Regards, -- Andrea SCIAN DAVE Embedded Systems ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-18 7:17 ` Andrea Scian @ 2015-09-18 7:41 ` Boris Brezillon 2015-09-18 7:54 ` Artem Bityutskiy 1 sibling, 0 replies; 43+ messages in thread From: Boris Brezillon @ 2015-09-18 7:41 UTC (permalink / raw) To: Andrea Scian Cc: Richard Weinberger, dedekind1, linux-mtd, David Woodhouse, Brian Norris, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Hi Andrea, On Fri, 18 Sep 2015 09:17:02 +0200 Andrea Scian <rnd4@dave-tech.it> wrote: > > Dear all, > > Il 17/09/2015 18:47, Richard Weinberger ha scritto: > > Boris, > > > > Am 17.09.2015 um 17:46 schrieb Boris Brezillon: > >>> I'd > >>> also write a good UBI power-cut test application. > >> Not sure what you mean by a UBI power-cut application? > > UBI has a mechanism so emulate a power-cut. Userspace > > can trigger it. I assume Artem meant that we could extend the mechanism > > to emulate paired page related issues in UBI. > > > >>> And then I'd start > >>> playing with various implementation approaches. > >> Yep, that was the plan, I was hoping you could help me exclude some of > >> them, but I guess testing all of them is the only way to find the > >> best one :-/. > >> > >>> I'd use the test-driven > >>> approach. > >> Hm, yep I guess that's the only way to test as much cases as possible, > >> but even with that I doubt I'll be able to think of all the cases that > >> could happen in real world. > > Yeah, the crucial point is that we have to emulate paired pages very good. > > Testing using emulation is nice but we need bare metal tests too. > > I have one board with MLC NAND, I'll happily wear it do death. B-) > > I think Boris has the same board somewhere ;-) Yep :-). > > I perfectly understand the reason why using nandsim (and powercut > simulator in general) but, AFAIK, the powercut problem is hard to > "simulate" because the main issue is when the device see a loss of power > in the middle of an operation (page write or block erase) Well, it can be easily simulated in nandsim. Here is a dirty hack [1] doing that. Of course my implementation is far from perfect, and a lot of things are hardcoded (like the paired pages scheme), but I'm pretty sure it is able to emulate the behavior of a power cut when a specific page in block is accessed. The other reason we want to simulate it is because we need to test what happens if a corruption happens at specific places: corruption of UBI EC, VID and payload data. This means that we need to be able to simulate a powercut when a specific page (relatively to a block) is accessed. > > I think that the best approach for bare metal test is something like the > following: > - connect a real powercut device (a simple relais that cut the main > power supply driven by a GPIO) > - drive this device inside the MTD code (probably with random delay > after issuing a NAND command) Hm, it's seems like a complicated infrastructure. All you need to trigger corruptions in paired pages is to interrupt the program operation in the middle, and this can be done by simply sending a reset command while it's taking place (I tested that method, and if I reset the chip after tPROG / 2 it always corrupts both paired pages). > > I think that I (as DAVE) can provide this kind of hardware, with an easy > plug-in connector on our hostboard (if those are the one that Richard > speak about). > Please let me know if you're interesting in it, if so I'll forward this > request to our hardware guys and give you an official confirm. > > While running this kind of test, I would also increase CPU load, to > reduce bypass capacitor intrusion (which may lead to wrong result in a > generic case) Of course, real world tests are welcome, but I don't think we can rely on them while developing the solution. Anyway, thanks for the proposition. Best Regards, Boris [1]http://code.bulix.org/73xjfn-88945 -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-18 7:17 ` Andrea Scian 2015-09-18 7:41 ` Boris Brezillon @ 2015-09-18 7:54 ` Artem Bityutskiy 2015-09-18 7:57 ` Bityutskiy, Artem 2015-09-18 9:38 ` Andrea Scian 1 sibling, 2 replies; 43+ messages in thread From: Artem Bityutskiy @ 2015-09-18 7:54 UTC (permalink / raw) To: Andrea Scian, Richard Weinberger, Boris Brezillon Cc: linux-mtd, David Woodhouse, Brian Norris, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Hi Andrea, On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote: > I perfectly understand the reason why using nandsim (and powercut > simulator in general) but, AFAIK, the powercut problem is hard to > "simulate" because the main issue is when the device see a loss of > power > in the middle of an operation (page write or block erase) This is right, and no doubts real power cuts testing is the most important thing. However, at the beginning, it is very hard to develop if you do not have a quick way to verify your ideas. Simulation is exactly for this - to make the first reliable draft. Once that work, you go to the second stage - real HW testing. Real HW testing requires a real power cycle, no guarantees power cut happens at the right moment, so you may spend hours emulating just one paired-page case. Compare this to just running a script, and it emulates you 100 paired-page cases during 10 minutes. And you can emulate it easily at the interesting places, not just during the main data writes. So, to recap, I suggest emulation to make the first draft, and then start heavy real testing to shape the final solution. Artem. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-18 7:54 ` Artem Bityutskiy @ 2015-09-18 7:57 ` Bityutskiy, Artem 2015-09-18 9:38 ` Andrea Scian 1 sibling, 0 replies; 43+ messages in thread From: Bityutskiy, Artem @ 2015-09-18 7:57 UTC (permalink / raw) To: boris.brezillon@free-electrons.com, rnd4@dave-tech.it, richard@nod.at Cc: computersforpeace@gmail.com, qiwang@micron.com, Iwo.Mergler@netcommwireless.com, jlauruhn@micron.com, linux-mtd@lists.infradead.org, dwmw2@infradead.org On Fri, 2015-09-18 at 10:54 +0300, Artem Bityutskiy wrote: > Real HW testing requires a real power cycle, no guarantees power cut > happens at the right moment, so you may spend hours emulating just > one > paired-page case. Sorry, I meant reproducing just one paired-page case. -- Best Regards, Artem Bityutskiy --------------------------------------------------------------------- Intel Finland Oy Registered Address: PL 281, 00181 Helsinki Business Identity Code: 0357606 - 4 Domiciled in Helsinki This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-18 7:54 ` Artem Bityutskiy 2015-09-18 7:57 ` Bityutskiy, Artem @ 2015-09-18 9:38 ` Andrea Scian 2015-09-24 1:57 ` Karl Zhang 张双锣 (karlzhang) 1 sibling, 1 reply; 43+ messages in thread From: Andrea Scian @ 2015-09-18 9:38 UTC (permalink / raw) To: dedekind1, Richard Weinberger, Boris Brezillon Cc: linux-mtd, David Woodhouse, Brian Norris, Qi Wang 王起 "(qiwang)", Iwo Mergler, Jeff Lauruhn (jlauruhn) Boris, Artem, thanks to both of you for you detailed description. I'll follow this development, for sure I'll learn a lot :-) Kind Regards, -- Andrea SCIAN DAVE Embedded Systems Il 18/09/2015 09:54, Artem Bityutskiy ha scritto: > Hi Andrea, > > On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote: >> I perfectly understand the reason why using nandsim (and powercut >> simulator in general) but, AFAIK, the powercut problem is hard to >> "simulate" because the main issue is when the device see a loss of >> power >> in the middle of an operation (page write or block erase) > > This is right, and no doubts real power cuts testing is the most > important thing. > > However, at the beginning, it is very hard to develop if you do not > have a quick way to verify your ideas. Simulation is exactly for this - > to make the first reliable draft. Once that work, you go to the second > stage - real HW testing. > > Real HW testing requires a real power cycle, no guarantees power cut > happens at the right moment, so you may spend hours emulating just one > paired-page case. Compare this to just running a script, and it > emulates you 100 paired-page cases during 10 minutes. And you can > emulate it easily at the interesting places, not just during the main > data writes. > > So, to recap, I suggest emulation to make the first draft, and then > start heavy real testing to shape the final solution. > > Artem. > ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: UBI/UBIFS: dealing with MLC's paired pages 2015-09-18 9:38 ` Andrea Scian @ 2015-09-24 1:57 ` Karl Zhang 张双锣 (karlzhang) 2015-09-24 6:31 ` Richard Weinberger 2015-09-24 7:43 ` Boris Brezillon 0 siblings, 2 replies; 43+ messages in thread From: Karl Zhang 张双锣 (karlzhang) @ 2015-09-24 1:57 UTC (permalink / raw) To: Andrea Scian, dedekind1@gmail.com, Richard Weinberger, Boris Brezillon Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn), Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org, Brian Norris, David Woodhouse, shuangshuo@gmail.com Hello Actually, we are working on the paired pages problem too. We have on MLC chips and developed a hardware power control board to simulate the real power loss cycle. We have many ideas to share with you. 1. emulating the paired-page case HW: Develop a power control daughter board to control the power supply to NAND, including voltage/ramp control. SW: Add a module in NAND controller, utilize power board to shut down NAND power when programming paired upper page. This is easy for us to reproduce paired-page case, in order to guarantee the power loss moment, we add use FPGA logic to control the power board and detect the status of NAND. 2. EC/VID header corruption As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good solution to protect EC, and we are doing this and testing on MLC. For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher. We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not use OOB and ECC code always not use all OOB area. We are still developing and testing these solutions to protect EC and VID on MLC. All the above is my limited work on paired pages, and I am open to any new suggestions and cooperation. -----Original Message----- From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Andrea Scian Sent: Friday, September 18, 2015 5:38 PM To: dedekind1@gmail.com; Richard Weinberger; Boris Brezillon Cc: Iwo Mergler; Jeff Lauruhn (jlauruhn); Qi Wang 王起 "(qiwang)"; linux-mtd@lists.infradead.org; Brian Norris; David Woodhouse Subject: Re: UBI/UBIFS: dealing with MLC's paired pages Boris, Artem, thanks to both of you for you detailed description. I'll follow this development, for sure I'll learn a lot :-) Kind Regards, -- Andrea SCIAN DAVE Embedded Systems Il 18/09/2015 09:54, Artem Bityutskiy ha scritto: > Hi Andrea, > > On Fri, 2015-09-18 at 09:17 +0200, Andrea Scian wrote: >> I perfectly understand the reason why using nandsim (and powercut >> simulator in general) but, AFAIK, the powercut problem is hard to >> "simulate" because the main issue is when the device see a loss of >> power in the middle of an operation (page write or block erase) > > This is right, and no doubts real power cuts testing is the most > important thing. > > However, at the beginning, it is very hard to develop if you do not > have a quick way to verify your ideas. Simulation is exactly for this > - to make the first reliable draft. Once that work, you go to the > second stage - real HW testing. > > Real HW testing requires a real power cycle, no guarantees power cut > happens at the right moment, so you may spend hours emulating just one > paired-page case. Compare this to just running a script, and it > emulates you 100 paired-page cases during 10 minutes. And you can > emulate it easily at the interesting places, not just during the main > data writes. > > So, to recap, I suggest emulation to make the first draft, and then > start heavy real testing to shape the final solution. > > Artem. > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-24 1:57 ` Karl Zhang 张双锣 (karlzhang) @ 2015-09-24 6:31 ` Richard Weinberger 2015-09-24 7:43 ` Boris Brezillon 1 sibling, 0 replies; 43+ messages in thread From: Richard Weinberger @ 2015-09-24 6:31 UTC (permalink / raw) To: Karl Zhang 张双锣 (karlzhang), Andrea Scian, dedekind1@gmail.com, Boris Brezillon Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn), Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org, Brian Norris, David Woodhouse, shuangshuo@gmail.com Hi! Am 24.09.2015 um 03:57 schrieb Karl Zhang 张双锣 (karlzhang): > We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not > use OOB and ECC code always not use all OOB area. sorry, I really detest this idea. Not using the OOB area is one of the design principles behind UBI. We've have learned from JFFS and YAFFS that using OOB is problematic. I'd give it up only if nothing else is applicable. Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-24 1:57 ` Karl Zhang 张双锣 (karlzhang) 2015-09-24 6:31 ` Richard Weinberger @ 2015-09-24 7:43 ` Boris Brezillon 2015-09-24 9:44 ` Stefan Roese 1 sibling, 1 reply; 43+ messages in thread From: Boris Brezillon @ 2015-09-24 7:43 UTC (permalink / raw) To: Karl Zhang 张双锣 (karlzhang) Cc: Andrea Scian, dedekind1@gmail.com, Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn), Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org, Brian Norris, David Woodhouse, shuangshuo@gmail.com Hi Karl, On Thu, 24 Sep 2015 01:57:41 +0000 Karl Zhang 张双锣 (karlzhang) <karlzhang@micron.com> wrote: > Hello > > Actually, we are working on the paired pages problem too. We have on MLC > chips and developed a hardware power control board to simulate the real power loss cycle. > > We have many ideas to share with you. > > > 1. emulating the paired-page case > HW: Develop a power control daughter board to control the power supply to NAND, including voltage/ramp control. > SW: Add a module in NAND controller, utilize power board to shut down NAND power when programming paired upper page. Even the SW solution sounds like a HW solution to me :-). When I say SW emulation, I mean something that doesn't require a reboot or power-off operation. > > This is easy for us to reproduce paired-page case, in order to guarantee the power loss moment, we add use FPGA logic > to control the power board and detect the status of NAND. That's not so easy for me ;-). Anyway, as I answered to Andrea, testing on real HW is definitely necessary, but doing it while evaluating the different options is not as efficient as emulating paired pages behavior. > > > 2. EC/VID header corruption > As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good > solution to protect EC, and we are doing this and testing on MLC. > > For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher. > > We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not > use OOB and ECC code always not use all OOB area. > Hm, using the OOB area to do that is not such a good idea IMO, and I see at least 2 reasons: 1/ You're supposing that you'll always have enough space to store the VID info (the header is currently taking 64 bytes, even if we could compress it by removing the padding), and this is not necessarily true (particularly with some NAND controllers which are allowing as much space as possible for ECC bytes). 2/ Most of the time ECC bytes are not protected (and even if some controllers are able to protect a few of them, we're not sure to have 64 bytes of protected OOB bytes per page), which means you're writing something that can be corrupted by bitflips. I know that the header is protected by a CRC, but that won't help recovering the data, just let you know when the header is corruption. To summarize, you'll have to duplicate the VID info in all pages, and you're not guaranteed to have a valid one (even if, the more pages you write the less chance you have to get all of them corrupted) Regarding the fact that you'll have to at least loose/reserve one page to protect the VID info, I actually had another (crazy?) idea (which I didn't expose in my previous mail, because I thought it would be a major change in the UBI design): How about completely getting rid of the VID header and relying on the fastmap database (not the current one, but something close to it) + a UBI journal logging the different changes (PEB erase, LEB map, ...). This way we should be able to recover all the information (even the EC ones) even after a power cut. Of course, still remains the problem of the fastmap volume corruption. > > > We are still developing and testing these solutions to protect EC and VID on MLC. Okay, let us know about the results. Also, can you share the code publicly, or is this something you want to keep private until you have a stable version? > > > All the above is my limited work on paired pages, and I am open to any new suggestions and cooperation. Thanks for sharing your ideas. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-24 7:43 ` Boris Brezillon @ 2015-09-24 9:44 ` Stefan Roese 0 siblings, 0 replies; 43+ messages in thread From: Stefan Roese @ 2015-09-24 9:44 UTC (permalink / raw) To: Boris Brezillon, Karl Zhang 张双锣 (karlzhang) Cc: Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com, Richard Weinberger, shuangshuo@gmail.com, Andrea Scian, Qi Wang 王起 (qiwang), linux-mtd@lists.infradead.org, Brian Norris, David Woodhouse Hi, On 24.09.2015 09:43, Boris Brezillon wrote: >> 2. EC/VID header corruption >> As Boris's excellent summary mentioned, "duplicate the EC header in the VID header", I also believe this is a good >> solution to protect EC, and we are doing this and testing on MLC. >> >> For VID header, I think skip pages will waste too many capacity, and SLC mode conjugation with GC will make PE cycling higher. >> >> We are developing another solution to store VID info into other page's OOB area in its own block, because UBI does not >> use OOB and ECC code always not use all OOB area. >> > > Hm, using the OOB area to do that is not such a good idea IMO, and I see > at least 2 reasons: > > 1/ You're supposing that you'll always have enough space to store the > VID info (the header is currently taking 64 bytes, even if we could > compress it by removing the padding), and this is not necessarily true > (particularly with some NAND controllers which are allowing as much > space as possible for ECC bytes). > > 2/ Most of the time ECC bytes are not protected (and even if some > controllers are able to protect a few of them, we're not sure to have 64 > bytes of protected OOB bytes per page), which means you're writing > something that can be corrupted by bitflips. I know that the header is > protected by a CRC, but that won't help recovering the data, just let > you know when the header is corruption. And 3: Only NAND provides an OOB area. Other flash devices like parallel or SPI NOR don't. And we definitely want to continue supporting platforms with such flash devices and UBI (and UBIFS). Thanks, Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon 2015-09-17 15:20 ` Artem Bityutskiy @ 2015-09-29 11:19 ` Richard Weinberger 2015-09-29 12:51 ` Boris Brezillon 2015-10-23 8:14 ` Boris Brezillon 2015-10-28 12:06 ` Artem Bityutskiy 3 siblings, 1 reply; 43+ messages in thread From: Richard Weinberger @ 2015-09-29 11:19 UTC (permalink / raw) To: Boris Brezillon, Artem Bityutskiy Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 (qiwang), Iwo Mergler, Jeff Lauruhn (jlauruhn) Hi! Am 17.09.2015 um 15:22 schrieb Boris Brezillon: > Hello, > > I'm currently working on the paired pages problem we have on MLC chips. > I remember discussing it with Artem earlier this year when I was > preparing my talk for ELC. > > I now have some time I can spend working on this problem and I started > looking at how this can be solved. > > First let's take a look at the UBI layer. > There's one basic thing we have to care about: protecting UBI metadata. > There are two kind of metadata: > 1/ those stored at the beginning of each erase block (EC and VID > headers) > 2/ those stored in specific volumes (layout and fastmap volumes) > > We don't have to worry about #2 since those are written using atomic > update, and atomic updates are immune to this paired page corruption > problem (either the whole write is valid, or none of it is valid). > > This leaves problem #1. > For this case, Artem suggested to duplicate the EC header in the VID > header so that if page 0 is corrupted we can recover the EC info from > page 1 (which will contain both VID and EC info). > Doing that is fine for dealing with EC header corruption, since, AFAIK, > none of the NAND vendors are pairing page 0 with page 1. > Still remains the VID header corruption problem. Do prevent that we > still have several solutions: > a/ skip the page paired with the VID header. This is doable and can be > hidden from UBI users, but it also means that we're loosing another > page for metadata (not a negligible overhead) > b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap > seems the right place to put that in, since fastmap is already > storing those information for almost all blocks. Still we would have > to modify fastmap a bit to store information about all erase blocks > and not only those that are not part of the fastmap pool. > Also, updating that in real-time would require using a log approach, > instead of the atomic update currently used by fastmap when it runs > out of PEBs in it's free PEB pool. Note that the log approach does > not have to be applied to all fastmap data (we just need it for the > PEB <-> LEB info). > Another off-topic note regarding the suggested log approach: we > could also use it to log which PEB was last written/erased, and use > that to handle the unstable bits issue. > c/ (also suggested by Artem) delay VID write until we have enough data > to write on the LEB, and thus guarantee that it cannot be corrupted > (at least by programming on the paired page ;-)) anymore. > Doing that would also require logging data to be written on those > LEBs somewhere, not to mention the impact of copying the data twice > (once in the log, and then when we have enough data, in the real > block). Let's start with UBI, as soon it is stable on MLC NAND we can focus on UBIFS. Solution a) sounds very promising to me as the can be implemented easily and loosing another page for meta data is IMHO acceptable on MLC. Especially as MLC NANDs are anyways bigger and cheaper than SLC. b) is tricky as fastmap follows the design principle that UBI can fall back to a full scan if the fastmap is corrupted or a self check fails. If the ability to full scan suddenly depends on fastmap it can become messy. In terms of computer science c) is the most elegant solution but converting UBI to a log based "block layer" is not trivial and as you wrote the write overhead is not negligible. So, I'd vote for a) and see how well it does in our powercut tests. :-) Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-29 11:19 ` Richard Weinberger @ 2015-09-29 12:51 ` Boris Brezillon 0 siblings, 0 replies; 43+ messages in thread From: Boris Brezillon @ 2015-09-29 12:51 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 (qiwang), Iwo Mergler, Jeff Lauruhn (jlauruhn) Hi Richard, On Tue, 29 Sep 2015 13:19:01 +0200 Richard Weinberger <richard@nod.at> wrote: > Hi! > > Am 17.09.2015 um 15:22 schrieb Boris Brezillon: > > Hello, > > > > I'm currently working on the paired pages problem we have on MLC chips. > > I remember discussing it with Artem earlier this year when I was > > preparing my talk for ELC. > > > > I now have some time I can spend working on this problem and I started > > looking at how this can be solved. > > > > First let's take a look at the UBI layer. > > There's one basic thing we have to care about: protecting UBI metadata. > > There are two kind of metadata: > > 1/ those stored at the beginning of each erase block (EC and VID > > headers) > > 2/ those stored in specific volumes (layout and fastmap volumes) > > > > We don't have to worry about #2 since those are written using atomic > > update, and atomic updates are immune to this paired page corruption > > problem (either the whole write is valid, or none of it is valid). > > > > This leaves problem #1. > > For this case, Artem suggested to duplicate the EC header in the VID > > header so that if page 0 is corrupted we can recover the EC info from > > page 1 (which will contain both VID and EC info). > > Doing that is fine for dealing with EC header corruption, since, AFAIK, > > none of the NAND vendors are pairing page 0 with page 1. > > Still remains the VID header corruption problem. Do prevent that we > > still have several solutions: > > a/ skip the page paired with the VID header. This is doable and can be > > hidden from UBI users, but it also means that we're loosing another > > page for metadata (not a negligible overhead) > > b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap > > seems the right place to put that in, since fastmap is already > > storing those information for almost all blocks. Still we would have > > to modify fastmap a bit to store information about all erase blocks > > and not only those that are not part of the fastmap pool. > > Also, updating that in real-time would require using a log approach, > > instead of the atomic update currently used by fastmap when it runs > > out of PEBs in it's free PEB pool. Note that the log approach does > > not have to be applied to all fastmap data (we just need it for the > > PEB <-> LEB info). > > Another off-topic note regarding the suggested log approach: we > > could also use it to log which PEB was last written/erased, and use > > that to handle the unstable bits issue. > > c/ (also suggested by Artem) delay VID write until we have enough data > > to write on the LEB, and thus guarantee that it cannot be corrupted > > (at least by programming on the paired page ;-)) anymore. > > Doing that would also require logging data to be written on those > > LEBs somewhere, not to mention the impact of copying the data twice > > (once in the log, and then when we have enough data, in the real > > block). > > Let's start with UBI, as soon it is stable on MLC NAND we can focus on > UBIFS. I wish it was that simple, but the decision we take at the UBI layer will most likely impact the choices we'll have at the UBIFS layer. So yes, focusing on the UBI layer for the implementation sounds sensible, but I think we have to carefully think about the solution we want to test first, and what are the impact on the UBIFS implementation. > > Solution a) sounds very promising to me as the can be implemented easily > and loosing another page for meta data is IMHO acceptable on MLC. > Especially as MLC NANDs are anyways bigger and cheaper than SLC. Yes, solution a) is definitely the simplest one (and probably the one I'll try first). Regarding the overhead, we go from 2/number_of_pages_per_block to 4/number_of_pages_per_block (not counting the overhead of internal volumes, since they should be pretty much the same in both cases), so wouldn't say it's negligible even for MLCs. But I agree that having a reliable solution at the cost of more overhead can be a good match for our first implementation. > > b) is tricky as fastmap follows the design principle that UBI can fall > back to a full scan if the fastmap is corrupted or a self check fails. > If the ability to full scan suddenly depends on fastmap it can become > messy. We are only talking about paired pages corruption here, so I hope both information will not be corrupted at the same time: VID header should be valid unless a power-cut occurred while writing on the page paired with the VID header one, which means fastmap volume should still be valid (unless we are experiencing data-retention issues, which can be true for the SLC case too). And even if the LEB and fastmap information are corrupted, we should be able to reconstruct it and discard the LEB with the corrupted VID header. Anyway, this approach is way more complicated to implement, and I reserve it as a "going further" topic ;-). > > In terms of computer science c) is the most elegant solution but converting > UBI to a log based "block layer" is not trivial and as you wrote the write > overhead is not negligible. Hm, I don't know if it's the most elegant solution (I'm still concerned by the write overhead caused by the extra copy step, though solution b) as some overhead too), but I agree that implementing that one is not trivial. > > So, I'd vote for a) and see how well it does in our powercut tests. :-) a) it is. I'll focus on that solution first. Thanks for the advices. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon 2015-09-17 15:20 ` Artem Bityutskiy 2015-09-29 11:19 ` Richard Weinberger @ 2015-10-23 8:14 ` Boris Brezillon 2015-10-27 20:16 ` Richard Weinberger 2015-10-28 12:24 ` Artem Bityutskiy 2015-10-28 12:06 ` Artem Bityutskiy 3 siblings, 2 replies; 43+ messages in thread From: Boris Brezillon @ 2015-10-23 8:14 UTC (permalink / raw) To: Artem Bityutskiy, Richard Weinberger Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 (beanhuo) Hi, Here is a quick status update of my progress and a few questions to UBI/UBIFS experts. On Thu, 17 Sep 2015 15:22:40 +0200 Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > Hello, > > I'm currently working on the paired pages problem we have on MLC chips. > I remember discussing it with Artem earlier this year when I was > preparing my talk for ELC. > > I now have some time I can spend working on this problem and I started > looking at how this can be solved. > > First let's take a look at the UBI layer. > There's one basic thing we have to care about: protecting UBI metadata. > There are two kind of metadata: > 1/ those stored at the beginning of each erase block (EC and VID > headers) > 2/ those stored in specific volumes (layout and fastmap volumes) > > We don't have to worry about #2 since those are written using atomic > update, and atomic updates are immune to this paired page corruption > problem (either the whole write is valid, or none of it is valid). > > This leaves problem #1. > For this case, Artem suggested to duplicate the EC header in the VID > header so that if page 0 is corrupted we can recover the EC info from > page 1 (which will contain both VID and EC info). > Doing that is fine for dealing with EC header corruption, since, AFAIK, > none of the NAND vendors are pairing page 0 with page 1. > Still remains the VID header corruption problem. Do prevent that we > still have several solutions: > a/ skip the page paired with the VID header. This is doable and can be > hidden from UBI users, but it also means that we're loosing another > page for metadata (not a negligible overhead) > b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap > seems the right place to put that in, since fastmap is already > storing those information for almost all blocks. Still we would have > to modify fastmap a bit to store information about all erase blocks > and not only those that are not part of the fastmap pool. > Also, updating that in real-time would require using a log approach, > instead of the atomic update currently used by fastmap when it runs > out of PEBs in it's free PEB pool. Note that the log approach does > not have to be applied to all fastmap data (we just need it for the > PEB <-> LEB info). > Another off-topic note regarding the suggested log approach: we > could also use it to log which PEB was last written/erased, and use > that to handle the unstable bits issue. > c/ (also suggested by Artem) delay VID write until we have enough data > to write on the LEB, and thus guarantee that it cannot be corrupted > (at least by programming on the paired page ;-)) anymore. > Doing that would also require logging data to be written on those > LEBs somewhere, not to mention the impact of copying the data twice > (once in the log, and then when we have enough data, in the real > block). > > I don't have any strong opinion about which solution is the best, also > I'm maybe missing other aspects or better solutions, so feel free to > comment on that and share your thoughts. I decided to go for the simplest solution (but I can't promise I won't change my mind if this approach appears to be wrong), which is either using a LEB is MLC or SLC mode. In SLC modes, only the first page of each pair is used, which completely address the paired pages problem. For now the SLC mode logic is hidden in the MTD/NAND layers which are providing functions to write/read in SLC mode. Thanks to this differentiation, UBI is now exposing two kind of LEBs: - the secure (small) LEBS (those accessed in SLC mode) - the unsecure (big) LEBS (those accessed in MLC mode) The secure LEBs are marked as such with a flag in the VID header, which allows tracking secure/unsecure LEBs and controlling the maximum size a UBI user can read/write from/to a LEB. This approach assume LEB 0 and 1 are never paired together (which AFAICT is always true), because VID is stored on page 1 and we need the secure_flag information to know how to access the LEB (SLC or MLC mode). Of course I expose a few new helpers in the kernel API, and we'll probably have to do it for the ioctl interface too if this approach is validated. That's all I got for the UBI layer. Richard, Artem, any feedback so far? > > That's all for the UBI layer. We will likely need new functions (and > new fields in existing structures) to help UBI users deal with MLC > NANDs: for example a field exposing the storage type or a function > helping users skip one (or several) blocks to secure the data they have > written so far. Anyway, those are things we can discuss after deciding > which approach we want to take. > > Now, let's talk about the UBIFS layer. We are facing pretty much the > same problem in there: we need to protect the data we have already > written from time to time. > AFAIU (correct me if I'm wrong), data should be secure when we sync the > file system, or commit the UBIFS journal (feel free to correct me if > I'm not using the right terms in my explanation). > As explained earlier, the only way to secure data is to skip some pages > (those that are paired with the already written ones). > > I see two approaches here (there might be more): > 1/ do not skip any pages until we are asked to secure the data, and > then skip as much pages as needed to ensure nobody can ever corrupt > the data. With this approach you can loose a non negligible amount > of space. For example, with this paired pages scheme [1], if you > only write page on page 2 and want to secure your data, you'll have > to skip pages 3 to 8. > 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a > block). With this solution you always loose half the NAND capacity, > but in case of small writes, it's still more efficient than #1. > Of course using that solution is not acceptable, because you'll > only be able to use half the NAND capacity, but the plan is to use > it in conjunction with the GC, so that from time to time UBIFS > data chunks/nodes can be put in a single erase block without > skipping half the pages. > Note that currently the GC does not work this way: it tries to > collect chunks one by one and write them to the journal to free a > dirty LEB. What we would need here is a way to collect enough data > to fill an entire block and after that release the LEBs that where > previously using half the LEB capacity. > > Of course both of those solutions implies marking the skipped regions > as dirty so that the GC can account for the padded space. For #1 we > should probably also use padding nodes to reflect how much space is lost > on the media, though I'm not sure how this can be done. For #2, we may > have to differentiate 'full' and 'half' LEBs in the LPT. If you followed my un/secure LEB approach described above, you probably know that we don't have much solutions for the UBIFS layer. My idea here is to use a garbage collection mechanism which will consolidate data LEBs (LEBs containing valid data nodes). By default all LEBs are used in secure (SLC) mode, which makes the UBIFS layer reliable. From time to time the consolidation GC will choose a few secure LEBs and move their nodes to an unsecure LEB. The idea is to fill the entire unsecure LEB, so that we never write on it afterwards, thus preventing any paired page corruption. Once this copy is finished we can release/unmap the secure LEBs we have consolidated (after adding a bud node to reference the unsecure LEB of course). Here are a few details about the implementation I started to develop (questions will come after ;-)). I added a new category (called LPROPS_FULL) to track the LEBs that are almost full (lp->dirty + lp->free < leb_size / 4), so that we can easily consolidate 2 to 3 full LEBs into a single unsecure LEB. The consolidation is done by filling as much nodes as possible into an unsecure LEB, and after a single pass, this should results in at least one freed LEB freed: the consolidation moves nodes from at least 2 secure LEBs into a single one, so you're freeing 2 LEBs but need to keep one for the next consolidation iteration, hence the single LEB freed. Now comes the questions to the UBIFS experts: - should I create a new journal head to do what's described above? AFAICT I can't use the GC head, because the GC can still do it's job in parallel of the consolidation-GC, and the GC LEB might already be filled with some data nodes, right? I thought about using the data head, but again, it might already point to a partially filled data LEB. I added a journal head called BIG_DATA_HEAD, but I'm not sure this is acceptable, so let me know what you think about that? - when should we run the consolidation-GC? After the standard GC pass, when this one didn't make any progress, or should we launch it as soon as we have enough full LEBs to fill an unsecure LEB? The second solution might have a small impact on performances of an empty FS (below the half capacity size), but ITOH, it will scale better when the FS size exceed this limit (no need to run the GC each time we want to write new data). - I still need to understand the races between TNC and GC, since I'm pretty sure I'll face the same kind of problems with the consolidation-GC. Can someone explains that to me, or should I dig further into the code :-)? I'm pretty sure I forgot a lot of problematics here, also note that my implementation is not finished yet, so this consolidation-GC concept has not been validated. If you see anything that could defeat this approach, please let me know so that I can adjust my development. Thanks. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-23 8:14 ` Boris Brezillon @ 2015-10-27 20:16 ` Richard Weinberger 2015-10-28 9:24 ` Boris Brezillon 2015-10-28 12:24 ` Artem Bityutskiy 1 sibling, 1 reply; 43+ messages in thread From: Richard Weinberger @ 2015-10-27 20:16 UTC (permalink / raw) To: Boris Brezillon, Artem Bityutskiy Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 (beanhuo) Boris, Am 23.10.2015 um 10:14 schrieb Boris Brezillon: >> I'm currently working on the paired pages problem we have on MLC chips. >> I remember discussing it with Artem earlier this year when I was >> preparing my talk for ELC. >> >> I now have some time I can spend working on this problem and I started >> looking at how this can be solved. >> >> First let's take a look at the UBI layer. >> There's one basic thing we have to care about: protecting UBI metadata. >> There are two kind of metadata: >> 1/ those stored at the beginning of each erase block (EC and VID >> headers) >> 2/ those stored in specific volumes (layout and fastmap volumes) >> >> We don't have to worry about #2 since those are written using atomic >> update, and atomic updates are immune to this paired page corruption >> problem (either the whole write is valid, or none of it is valid). >> >> This leaves problem #1. >> For this case, Artem suggested to duplicate the EC header in the VID >> header so that if page 0 is corrupted we can recover the EC info from >> page 1 (which will contain both VID and EC info). >> Doing that is fine for dealing with EC header corruption, since, AFAIK, >> none of the NAND vendors are pairing page 0 with page 1. >> Still remains the VID header corruption problem. Do prevent that we >> still have several solutions: >> a/ skip the page paired with the VID header. This is doable and can be >> hidden from UBI users, but it also means that we're loosing another >> page for metadata (not a negligible overhead) >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap >> seems the right place to put that in, since fastmap is already >> storing those information for almost all blocks. Still we would have >> to modify fastmap a bit to store information about all erase blocks >> and not only those that are not part of the fastmap pool. >> Also, updating that in real-time would require using a log approach, >> instead of the atomic update currently used by fastmap when it runs >> out of PEBs in it's free PEB pool. Note that the log approach does >> not have to be applied to all fastmap data (we just need it for the >> PEB <-> LEB info). >> Another off-topic note regarding the suggested log approach: we >> could also use it to log which PEB was last written/erased, and use >> that to handle the unstable bits issue. >> c/ (also suggested by Artem) delay VID write until we have enough data >> to write on the LEB, and thus guarantee that it cannot be corrupted >> (at least by programming on the paired page ;-)) anymore. >> Doing that would also require logging data to be written on those >> LEBs somewhere, not to mention the impact of copying the data twice >> (once in the log, and then when we have enough data, in the real >> block). >> >> I don't have any strong opinion about which solution is the best, also >> I'm maybe missing other aspects or better solutions, so feel free to >> comment on that and share your thoughts. > > I decided to go for the simplest solution (but I can't promise I won't > change my mind if this approach appears to be wrong), which is either > using a LEB is MLC or SLC mode. In SLC modes, only the first page of > each pair is used, which completely address the paired pages problem. > For now the SLC mode logic is hidden in the MTD/NAND layers which are > providing functions to write/read in SLC mode. > > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > - the secure (small) LEBS (those accessed in SLC mode) > - the unsecure (big) LEBS (those accessed in MLC mode) > > The secure LEBs are marked as such with a flag in the VID header, which > allows tracking secure/unsecure LEBs and controlling the maximum size a > UBI user can read/write from/to a LEB. > This approach assume LEB 0 and 1 are never paired together (which You mean page 0 and 1? > AFAICT is always true), because VID is stored on page 1 and we need the > secure_flag information to know how to access the LEB (SLC or MLC mode). > Of course I expose a few new helpers in the kernel API, and we'll > probably have to do it for the ioctl interface too if this approach is > validated. > > That's all I got for the UBI layer. > Richard, Artem, any feedback so far? Changing the on-flash format of UBI is a rather big thing. If it needs to be done I'm fine with it but we have to give our best to change it only once. :-) >> >> That's all for the UBI layer. We will likely need new functions (and >> new fields in existing structures) to help UBI users deal with MLC >> NANDs: for example a field exposing the storage type or a function >> helping users skip one (or several) blocks to secure the data they have >> written so far. Anyway, those are things we can discuss after deciding >> which approach we want to take. >> >> Now, let's talk about the UBIFS layer. We are facing pretty much the >> same problem in there: we need to protect the data we have already >> written from time to time. >> AFAIU (correct me if I'm wrong), data should be secure when we sync the >> file system, or commit the UBIFS journal (feel free to correct me if >> I'm not using the right terms in my explanation). >> As explained earlier, the only way to secure data is to skip some pages >> (those that are paired with the already written ones). >> >> I see two approaches here (there might be more): >> 1/ do not skip any pages until we are asked to secure the data, and >> then skip as much pages as needed to ensure nobody can ever corrupt >> the data. With this approach you can loose a non negligible amount >> of space. For example, with this paired pages scheme [1], if you >> only write page on page 2 and want to secure your data, you'll have >> to skip pages 3 to 8. >> 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a >> block). With this solution you always loose half the NAND capacity, >> but in case of small writes, it's still more efficient than #1. >> Of course using that solution is not acceptable, because you'll >> only be able to use half the NAND capacity, but the plan is to use >> it in conjunction with the GC, so that from time to time UBIFS >> data chunks/nodes can be put in a single erase block without >> skipping half the pages. >> Note that currently the GC does not work this way: it tries to >> collect chunks one by one and write them to the journal to free a >> dirty LEB. What we would need here is a way to collect enough data >> to fill an entire block and after that release the LEBs that where >> previously using half the LEB capacity. >> >> Of course both of those solutions implies marking the skipped regions >> as dirty so that the GC can account for the padded space. For #1 we >> should probably also use padding nodes to reflect how much space is lost >> on the media, though I'm not sure how this can be done. For #2, we may >> have to differentiate 'full' and 'half' LEBs in the LPT. > > If you followed my un/secure LEB approach described above, you probably > know that we don't have much solutions for the UBIFS layer. > > My idea here is to use a garbage collection mechanism which will > consolidate data LEBs (LEBs containing valid data nodes). > By default all LEBs are used in secure (SLC) mode, which makes the > UBIFS layer reliable. From time to time the consolidation GC will > choose a few secure LEBs and move their nodes to an unsecure LEB. > The idea is to fill the entire unsecure LEB, so that we never write on > it afterwards, thus preventing any paired page corruption. Once this > copy is finished we can release/unmap the secure LEBs we have > consolidated (after adding a bud node to reference the unsecure LEB of > course). > > Here are a few details about the implementation I started to develop > (questions will come after ;-)). > I added a new category (called LPROPS_FULL) to track the LEBs that are > almost full (lp->dirty + lp->free < leb_size / 4), so that we can > easily consolidate 2 to 3 full LEBs into a single unsecure LEB. > The consolidation is done by filling as much nodes as possible into an > unsecure LEB, and after a single pass, this should results in at least > one freed LEB freed: the consolidation moves nodes from at least 2 > secure LEBs into a single one, so you're freeing 2 LEBs but need to > keep one for the next consolidation iteration, hence the single LEB > freed. > > Now comes the questions to the UBIFS experts: > - should I create a new journal head to do what's described above? > AFAICT I can't use the GC head, because the GC can still do it's job > in parallel of the consolidation-GC, and the GC LEB might already be > filled with some data nodes, right? > I thought about using the data head, but again, it might already > point to a partially filled data LEB. > I added a journal head called BIG_DATA_HEAD, but I'm not sure this is > acceptable, so let me know what you think about that? I'd vote for a new head. If it turns out to be similar enough to another head we can still merge it to that head. > - when should we run the consolidation-GC? After the standard GC > pass, when this one didn't make any progress, or should we launch > it as soon as we have enough full LEBs to fill an unsecure LEB? The > second solution might have a small impact on performances of an empty > FS (below the half capacity size), but ITOH, it will scale better when > the FS size exceed this limit (no need to run the GC each time we > want to write new data). I'd go for a hybrid approach. Run the consolidation-GC if standard GC was unable to produce free space and if more than X small LEBs are full. > - I still need to understand the races between TNC and GC, since I'm > pretty sure I'll face the same kind of problems with the > consolidation-GC. Can someone explains that to me, or should I dig > further into the code :-)? Not sure if I understand this questions correctly. What you need for sure is i) a way to find out whether a LEB can be packed and ii) lock it while packing. > I'm pretty sure I forgot a lot of problematics here, also note that my > implementation is not finished yet, so this consolidation-GC concept > has not been validated. If you see anything that could defeat this > approach, please let me know so that I can adjust my development. Please share your patches as soon as possible. Just mark them as RFC (really flaky code). I'll happily test them on my MLC boards and review them. Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-27 20:16 ` Richard Weinberger @ 2015-10-28 9:24 ` Boris Brezillon 2015-10-28 10:44 ` Michal Suchanek 0 siblings, 1 reply; 43+ messages in thread From: Boris Brezillon @ 2015-10-28 9:24 UTC (permalink / raw) To: Richard Weinberger Cc: Artem Bityutskiy, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 (beanhuo) Hi Richard, On Tue, 27 Oct 2015 21:16:28 +0100 Richard Weinberger <richard@nod.at> wrote: > Boris, > > Am 23.10.2015 um 10:14 schrieb Boris Brezillon: > >> I'm currently working on the paired pages problem we have on MLC chips. > >> I remember discussing it with Artem earlier this year when I was > >> preparing my talk for ELC. > >> > >> I now have some time I can spend working on this problem and I started > >> looking at how this can be solved. > >> > >> First let's take a look at the UBI layer. > >> There's one basic thing we have to care about: protecting UBI metadata. > >> There are two kind of metadata: > >> 1/ those stored at the beginning of each erase block (EC and VID > >> headers) > >> 2/ those stored in specific volumes (layout and fastmap volumes) > >> > >> We don't have to worry about #2 since those are written using atomic > >> update, and atomic updates are immune to this paired page corruption > >> problem (either the whole write is valid, or none of it is valid). > >> > >> This leaves problem #1. > >> For this case, Artem suggested to duplicate the EC header in the VID > >> header so that if page 0 is corrupted we can recover the EC info from > >> page 1 (which will contain both VID and EC info). > >> Doing that is fine for dealing with EC header corruption, since, AFAIK, > >> none of the NAND vendors are pairing page 0 with page 1. > >> Still remains the VID header corruption problem. Do prevent that we > >> still have several solutions: > >> a/ skip the page paired with the VID header. This is doable and can be > >> hidden from UBI users, but it also means that we're loosing another > >> page for metadata (not a negligible overhead) > >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap > >> seems the right place to put that in, since fastmap is already > >> storing those information for almost all blocks. Still we would have > >> to modify fastmap a bit to store information about all erase blocks > >> and not only those that are not part of the fastmap pool. > >> Also, updating that in real-time would require using a log approach, > >> instead of the atomic update currently used by fastmap when it runs > >> out of PEBs in it's free PEB pool. Note that the log approach does > >> not have to be applied to all fastmap data (we just need it for the > >> PEB <-> LEB info). > >> Another off-topic note regarding the suggested log approach: we > >> could also use it to log which PEB was last written/erased, and use > >> that to handle the unstable bits issue. > >> c/ (also suggested by Artem) delay VID write until we have enough data > >> to write on the LEB, and thus guarantee that it cannot be corrupted > >> (at least by programming on the paired page ;-)) anymore. > >> Doing that would also require logging data to be written on those > >> LEBs somewhere, not to mention the impact of copying the data twice > >> (once in the log, and then when we have enough data, in the real > >> block). > >> > >> I don't have any strong opinion about which solution is the best, also > >> I'm maybe missing other aspects or better solutions, so feel free to > >> comment on that and share your thoughts. > > > > I decided to go for the simplest solution (but I can't promise I won't > > change my mind if this approach appears to be wrong), which is either > > using a LEB is MLC or SLC mode. In SLC modes, only the first page of > > each pair is used, which completely address the paired pages problem. > > For now the SLC mode logic is hidden in the MTD/NAND layers which are > > providing functions to write/read in SLC mode. > > > > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > > - the secure (small) LEBS (those accessed in SLC mode) > > - the unsecure (big) LEBS (those accessed in MLC mode) > > > > The secure LEBs are marked as such with a flag in the VID header, which > > allows tracking secure/unsecure LEBs and controlling the maximum size a > > UBI user can read/write from/to a LEB. > > This approach assume LEB 0 and 1 are never paired together (which > > You mean page 0 and 1? Yes. > > > AFAICT is always true), because VID is stored on page 1 and we need the > > secure_flag information to know how to access the LEB (SLC or MLC mode). > > Of course I expose a few new helpers in the kernel API, and we'll > > probably have to do it for the ioctl interface too if this approach is > > validated. > > > > That's all I got for the UBI layer. > > Richard, Artem, any feedback so far? > > Changing the on-flash format of UBI is a rather big thing. > If it needs to be done I'm fine with it but we have to give our best > to change it only once. :-) Yes, I know that, and I don't pretend I chose the right solution ;-), any other suggestions to avoid changing the on-flash format? Note that I only added a new flag, and this flag is only set when you map a LEB in SLC mode, which is not the default case, which in turn means you'll be able to attach to an existing UBI partition. Of course the reverse is not true, once you've started using the secure LEB feature you can't attach this image with an UBI implementation that does not support this feature. > > >> > >> That's all for the UBI layer. We will likely need new functions (and > >> new fields in existing structures) to help UBI users deal with MLC > >> NANDs: for example a field exposing the storage type or a function > >> helping users skip one (or several) blocks to secure the data they have > >> written so far. Anyway, those are things we can discuss after deciding > >> which approach we want to take. > >> > >> Now, let's talk about the UBIFS layer. We are facing pretty much the > >> same problem in there: we need to protect the data we have already > >> written from time to time. > >> AFAIU (correct me if I'm wrong), data should be secure when we sync the > >> file system, or commit the UBIFS journal (feel free to correct me if > >> I'm not using the right terms in my explanation). > >> As explained earlier, the only way to secure data is to skip some pages > >> (those that are paired with the already written ones). > >> > >> I see two approaches here (there might be more): > >> 1/ do not skip any pages until we are asked to secure the data, and > >> then skip as much pages as needed to ensure nobody can ever corrupt > >> the data. With this approach you can loose a non negligible amount > >> of space. For example, with this paired pages scheme [1], if you > >> only write page on page 2 and want to secure your data, you'll have > >> to skip pages 3 to 8. > >> 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a > >> block). With this solution you always loose half the NAND capacity, > >> but in case of small writes, it's still more efficient than #1. > >> Of course using that solution is not acceptable, because you'll > >> only be able to use half the NAND capacity, but the plan is to use > >> it in conjunction with the GC, so that from time to time UBIFS > >> data chunks/nodes can be put in a single erase block without > >> skipping half the pages. > >> Note that currently the GC does not work this way: it tries to > >> collect chunks one by one and write them to the journal to free a > >> dirty LEB. What we would need here is a way to collect enough data > >> to fill an entire block and after that release the LEBs that where > >> previously using half the LEB capacity. > >> > >> Of course both of those solutions implies marking the skipped regions > >> as dirty so that the GC can account for the padded space. For #1 we > >> should probably also use padding nodes to reflect how much space is lost > >> on the media, though I'm not sure how this can be done. For #2, we may > >> have to differentiate 'full' and 'half' LEBs in the LPT. > > > > If you followed my un/secure LEB approach described above, you probably > > know that we don't have much solutions for the UBIFS layer. > > > > My idea here is to use a garbage collection mechanism which will > > consolidate data LEBs (LEBs containing valid data nodes). > > By default all LEBs are used in secure (SLC) mode, which makes the > > UBIFS layer reliable. From time to time the consolidation GC will > > choose a few secure LEBs and move their nodes to an unsecure LEB. > > The idea is to fill the entire unsecure LEB, so that we never write on > > it afterwards, thus preventing any paired page corruption. Once this > > copy is finished we can release/unmap the secure LEBs we have > > consolidated (after adding a bud node to reference the unsecure LEB of > > course). > > > > Here are a few details about the implementation I started to develop > > (questions will come after ;-)). > > I added a new category (called LPROPS_FULL) to track the LEBs that are > > almost full (lp->dirty + lp->free < leb_size / 4), so that we can > > easily consolidate 2 to 3 full LEBs into a single unsecure LEB. > > The consolidation is done by filling as much nodes as possible into an > > unsecure LEB, and after a single pass, this should results in at least > > one freed LEB freed: the consolidation moves nodes from at least 2 > > secure LEBs into a single one, so you're freeing 2 LEBs but need to > > keep one for the next consolidation iteration, hence the single LEB > > freed. > > > > Now comes the questions to the UBIFS experts: > > - should I create a new journal head to do what's described above? > > AFAICT I can't use the GC head, because the GC can still do it's job > > in parallel of the consolidation-GC, and the GC LEB might already be > > filled with some data nodes, right? > > I thought about using the data head, but again, it might already > > point to a partially filled data LEB. > > I added a journal head called BIG_DATA_HEAD, but I'm not sure this is > > acceptable, so let me know what you think about that? > > I'd vote for a new head. > If it turns out to be similar enough to another head we can still > merge it to that head. Yep, that's what I chose too. Actually, AFAIU, if we want the standard and consolidation GC to work concurrently we need to add a new journal head anyway. > > > - when should we run the consolidation-GC? After the standard GC > > pass, when this one didn't make any progress, or should we launch > > it as soon as we have enough full LEBs to fill an unsecure LEB? The > > second solution might have a small impact on performances of an empty > > FS (below the half capacity size), but ITOH, it will scale better when > > the FS size exceed this limit (no need to run the GC each time we > > want to write new data). > > I'd go for a hybrid approach. > Run the consolidation-GC if standard GC was unable to produce free space > and if more than X small LEBs are full. That's probably the best solution indeed. > > > - I still need to understand the races between TNC and GC, since I'm > > pretty sure I'll face the same kind of problems with the > > consolidation-GC. Can someone explains that to me, or should I dig > > further into the code :-)? > > Not sure if I understand this questions correctly. > > What you need for sure is i) a way to find out whether a LEB can be packed > and ii) lock it while packing. Hm, locking the whole TNC while we are consolidating several LEBs seems a bit extreme (writing a whole unsecure LEB can take a non-negligible amount of time). I think we can do this consolidation without taking the TNC lock by first writing all the nodes on the new LEB without updating the TNC, and once the unsecure LEB is filled update the TNC in one go (that's what I'm trying to do here [1]). > > > I'm pretty sure I forgot a lot of problematics here, also note that my > > implementation is not finished yet, so this consolidation-GC concept > > has not been validated. If you see anything that could defeat this > > approach, please let me know so that I can adjust my development. > > Please share your patches as soon as possible. Just mark them as RFC > (really flaky code). I'll happily test them on my MLC boards and review them. I can share the code (actually it's already on my github repo [2]), but it's not even tested so don't expect to make it work on your board ;-). Thanks for your first suggestions. Best Regards, Boris [1]https://github.com/bbrezillon/linux-sunxi/blob/23cb262f1c73d24b2a52f41f91fb4c6c1305e8e7/fs/ubifs/gc.c#L739 [2]https://github.com/bbrezillon/linux-sunxi/tree/mlc-wip -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-28 9:24 ` Boris Brezillon @ 2015-10-28 10:44 ` Michal Suchanek 2015-10-28 11:14 ` Boris Brezillon 0 siblings, 1 reply; 43+ messages in thread From: Michal Suchanek @ 2015-10-28 10:44 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn), Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris, David Woodhouse, Bean Huo 霍斌斌 (beanhuo) On 28 October 2015 at 10:24, Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > Hi Richard, > > On Tue, 27 Oct 2015 21:16:28 +0100 > Richard Weinberger <richard@nod.at> wrote: > >> Boris, >> >> Am 23.10.2015 um 10:14 schrieb Boris Brezillon: >> >> I'm currently working on the paired pages problem we have on MLC chips. >> >> I remember discussing it with Artem earlier this year when I was >> >> preparing my talk for ELC. >> >> >> >> I now have some time I can spend working on this problem and I started >> >> looking at how this can be solved. >> >> >> >> First let's take a look at the UBI layer. >> >> There's one basic thing we have to care about: protecting UBI metadata. >> >> There are two kind of metadata: >> >> 1/ those stored at the beginning of each erase block (EC and VID >> >> headers) >> >> 2/ those stored in specific volumes (layout and fastmap volumes) >> >> >> >> We don't have to worry about #2 since those are written using atomic >> >> update, and atomic updates are immune to this paired page corruption >> >> problem (either the whole write is valid, or none of it is valid). >> >> >> >> This leaves problem #1. >> >> For this case, Artem suggested to duplicate the EC header in the VID >> >> header so that if page 0 is corrupted we can recover the EC info from >> >> page 1 (which will contain both VID and EC info). >> >> Doing that is fine for dealing with EC header corruption, since, AFAIK, >> >> none of the NAND vendors are pairing page 0 with page 1. >> >> Still remains the VID header corruption problem. Do prevent that we >> >> still have several solutions: >> >> a/ skip the page paired with the VID header. This is doable and can be >> >> hidden from UBI users, but it also means that we're loosing another >> >> page for metadata (not a negligible overhead) >> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap >> >> seems the right place to put that in, since fastmap is already >> >> storing those information for almost all blocks. Still we would have >> >> to modify fastmap a bit to store information about all erase blocks >> >> and not only those that are not part of the fastmap pool. >> >> Also, updating that in real-time would require using a log approach, >> >> instead of the atomic update currently used by fastmap when it runs >> >> out of PEBs in it's free PEB pool. Note that the log approach does >> >> not have to be applied to all fastmap data (we just need it for the >> >> PEB <-> LEB info). >> >> Another off-topic note regarding the suggested log approach: we >> >> could also use it to log which PEB was last written/erased, and use >> >> that to handle the unstable bits issue. >> >> c/ (also suggested by Artem) delay VID write until we have enough data >> >> to write on the LEB, and thus guarantee that it cannot be corrupted >> >> (at least by programming on the paired page ;-)) anymore. >> >> Doing that would also require logging data to be written on those >> >> LEBs somewhere, not to mention the impact of copying the data twice >> >> (once in the log, and then when we have enough data, in the real >> >> block). >> >> >> >> I don't have any strong opinion about which solution is the best, also >> >> I'm maybe missing other aspects or better solutions, so feel free to >> >> comment on that and share your thoughts. >> > >> > I decided to go for the simplest solution (but I can't promise I won't >> > change my mind if this approach appears to be wrong), which is either >> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of >> > each pair is used, which completely address the paired pages problem. >> > For now the SLC mode logic is hidden in the MTD/NAND layers which are >> > providing functions to write/read in SLC mode. >> > >> > Thanks to this differentiation, UBI is now exposing two kind of LEBs: >> > - the secure (small) LEBS (those accessed in SLC mode) >> > - the unsecure (big) LEBS (those accessed in MLC mode) >> > >> > The secure LEBs are marked as such with a flag in the VID header, which >> > allows tracking secure/unsecure LEBs and controlling the maximum size a >> > UBI user can read/write from/to a LEB. >> > This approach assume LEB 0 and 1 are never paired together (which >> >> You mean page 0 and 1? > > Yes. > >> >> > AFAICT is always true), because VID is stored on page 1 and we need the >> > secure_flag information to know how to access the LEB (SLC or MLC mode). >> > Of course I expose a few new helpers in the kernel API, and we'll >> > probably have to do it for the ioctl interface too if this approach is >> > validated. >> > >> > That's all I got for the UBI layer. >> > Richard, Artem, any feedback so far? >> >> Changing the on-flash format of UBI is a rather big thing. >> If it needs to be done I'm fine with it but we have to give our best >> to change it only once. :-) > > Yes, I know that, and I don't pretend I chose the right solution ;-), > any other suggestions to avoid changing the on-flash format? > > Note that I only added a new flag, and this flag is only set when you > map a LEB in SLC mode, which is not the default case, which in turn > means you'll be able to attach to an existing UBI partition. Of course > the reverse is not true, once you've started using the secure LEB > feature you can't attach this image with an UBI implementation that does > not support this feature. Isn't a secure LEB just a plain LEB with half pages unused? Since you only write secure LEBs normally and unsecure LEBs only in garbage collector and you can tell secure LEB by the layout of used pages there isn't really need for special marking AFAICFT It might be a good idea to not allow mounting a flash which is supposed to be protected against page corruption with a driver that does not support that protection. On the other hand, if backwards compatibility is desired and the information can be stored without introducing a new flag it might be a good idea to allow that as well. Thanks Michal ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-28 10:44 ` Michal Suchanek @ 2015-10-28 11:14 ` Boris Brezillon 2015-10-28 15:50 ` Michal Suchanek 0 siblings, 1 reply; 43+ messages in thread From: Boris Brezillon @ 2015-10-28 11:14 UTC (permalink / raw) To: Michal Suchanek Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn), Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris, David Woodhouse, Bean Huo 霍斌斌 (beanhuo) On Wed, 28 Oct 2015 11:44:49 +0100 Michal Suchanek <hramrach@gmail.com> wrote: > On 28 October 2015 at 10:24, Boris Brezillon > <boris.brezillon@free-electrons.com> wrote: > > Hi Richard, > > > > On Tue, 27 Oct 2015 21:16:28 +0100 > > Richard Weinberger <richard@nod.at> wrote: > > > >> Boris, > >> > >> Am 23.10.2015 um 10:14 schrieb Boris Brezillon: > >> >> I'm currently working on the paired pages problem we have on MLC chips. > >> >> I remember discussing it with Artem earlier this year when I was > >> >> preparing my talk for ELC. > >> >> > >> >> I now have some time I can spend working on this problem and I started > >> >> looking at how this can be solved. > >> >> > >> >> First let's take a look at the UBI layer. > >> >> There's one basic thing we have to care about: protecting UBI metadata. > >> >> There are two kind of metadata: > >> >> 1/ those stored at the beginning of each erase block (EC and VID > >> >> headers) > >> >> 2/ those stored in specific volumes (layout and fastmap volumes) > >> >> > >> >> We don't have to worry about #2 since those are written using atomic > >> >> update, and atomic updates are immune to this paired page corruption > >> >> problem (either the whole write is valid, or none of it is valid). > >> >> > >> >> This leaves problem #1. > >> >> For this case, Artem suggested to duplicate the EC header in the VID > >> >> header so that if page 0 is corrupted we can recover the EC info from > >> >> page 1 (which will contain both VID and EC info). > >> >> Doing that is fine for dealing with EC header corruption, since, AFAIK, > >> >> none of the NAND vendors are pairing page 0 with page 1. > >> >> Still remains the VID header corruption problem. Do prevent that we > >> >> still have several solutions: > >> >> a/ skip the page paired with the VID header. This is doable and can be > >> >> hidden from UBI users, but it also means that we're loosing another > >> >> page for metadata (not a negligible overhead) > >> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap > >> >> seems the right place to put that in, since fastmap is already > >> >> storing those information for almost all blocks. Still we would have > >> >> to modify fastmap a bit to store information about all erase blocks > >> >> and not only those that are not part of the fastmap pool. > >> >> Also, updating that in real-time would require using a log approach, > >> >> instead of the atomic update currently used by fastmap when it runs > >> >> out of PEBs in it's free PEB pool. Note that the log approach does > >> >> not have to be applied to all fastmap data (we just need it for the > >> >> PEB <-> LEB info). > >> >> Another off-topic note regarding the suggested log approach: we > >> >> could also use it to log which PEB was last written/erased, and use > >> >> that to handle the unstable bits issue. > >> >> c/ (also suggested by Artem) delay VID write until we have enough data > >> >> to write on the LEB, and thus guarantee that it cannot be corrupted > >> >> (at least by programming on the paired page ;-)) anymore. > >> >> Doing that would also require logging data to be written on those > >> >> LEBs somewhere, not to mention the impact of copying the data twice > >> >> (once in the log, and then when we have enough data, in the real > >> >> block). > >> >> > >> >> I don't have any strong opinion about which solution is the best, also > >> >> I'm maybe missing other aspects or better solutions, so feel free to > >> >> comment on that and share your thoughts. > >> > > >> > I decided to go for the simplest solution (but I can't promise I won't > >> > change my mind if this approach appears to be wrong), which is either > >> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of > >> > each pair is used, which completely address the paired pages problem. > >> > For now the SLC mode logic is hidden in the MTD/NAND layers which are > >> > providing functions to write/read in SLC mode. > >> > > >> > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > >> > - the secure (small) LEBS (those accessed in SLC mode) > >> > - the unsecure (big) LEBS (those accessed in MLC mode) > >> > > >> > The secure LEBs are marked as such with a flag in the VID header, which > >> > allows tracking secure/unsecure LEBs and controlling the maximum size a > >> > UBI user can read/write from/to a LEB. > >> > This approach assume LEB 0 and 1 are never paired together (which > >> > >> You mean page 0 and 1? > > > > Yes. > > > >> > >> > AFAICT is always true), because VID is stored on page 1 and we need the > >> > secure_flag information to know how to access the LEB (SLC or MLC mode). > >> > Of course I expose a few new helpers in the kernel API, and we'll > >> > probably have to do it for the ioctl interface too if this approach is > >> > validated. > >> > > >> > That's all I got for the UBI layer. > >> > Richard, Artem, any feedback so far? > >> > >> Changing the on-flash format of UBI is a rather big thing. > >> If it needs to be done I'm fine with it but we have to give our best > >> to change it only once. :-) > > > > Yes, I know that, and I don't pretend I chose the right solution ;-), > > any other suggestions to avoid changing the on-flash format? > > > > Note that I only added a new flag, and this flag is only set when you > > map a LEB in SLC mode, which is not the default case, which in turn > > means you'll be able to attach to an existing UBI partition. Of course > > the reverse is not true, once you've started using the secure LEB > > feature you can't attach this image with an UBI implementation that does > > not support this feature. > > Isn't a secure LEB just a plain LEB with half pages unused? Since you > only write secure LEBs normally and unsecure LEBs only in garbage > collector and you can tell secure LEB by the layout of used pages > there isn't really need for special marking AFAICFT This implies scanning several pages per block to determine which type of LEB is in use, which will drastically increase the attach time. The whole point of this flag is to avoid scanning anything else but the EC and VID headers (or the fastmap LEBs if fastmap is in use). > > It might be a good idea to not allow mounting a flash which is > supposed to be protected against page corruption with a driver that > does not support that protection. That can be done by incrementing the UBI_VERSION value... > > On the other hand, if backwards compatibility is desired and the > information can be stored without introducing a new flag it might be a > good idea to allow that as well. ... but I agree that we should avoid breaking the backward compatibility if that's possible. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-28 11:14 ` Boris Brezillon @ 2015-10-28 15:50 ` Michal Suchanek 0 siblings, 0 replies; 43+ messages in thread From: Michal Suchanek @ 2015-10-28 15:50 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, Iwo Mergler, Jeff Lauruhn (jlauruhn), Artem Bityutskiy, Andrea Scian, MTD Maling List, Brian Norris, David Woodhouse, Bean Huo 霍斌斌 (beanhuo) On 28 October 2015 at 12:14, Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > On Wed, 28 Oct 2015 11:44:49 +0100 > Michal Suchanek <hramrach@gmail.com> wrote: > >> On 28 October 2015 at 10:24, Boris Brezillon >> <boris.brezillon@free-electrons.com> wrote: >> > Hi Richard, >> > >> > On Tue, 27 Oct 2015 21:16:28 +0100 >> > Richard Weinberger <richard@nod.at> wrote: >> > >> >> Boris, >> >> >> >> Am 23.10.2015 um 10:14 schrieb Boris Brezillon: >> >> >> I'm currently working on the paired pages problem we have on MLC chips. >> >> >> I remember discussing it with Artem earlier this year when I was >> >> >> preparing my talk for ELC. >> >> >> >> >> >> I now have some time I can spend working on this problem and I started >> >> >> looking at how this can be solved. >> >> >> >> >> >> First let's take a look at the UBI layer. >> >> >> There's one basic thing we have to care about: protecting UBI metadata. >> >> >> There are two kind of metadata: >> >> >> 1/ those stored at the beginning of each erase block (EC and VID >> >> >> headers) >> >> >> 2/ those stored in specific volumes (layout and fastmap volumes) >> >> >> >> >> >> We don't have to worry about #2 since those are written using atomic >> >> >> update, and atomic updates are immune to this paired page corruption >> >> >> problem (either the whole write is valid, or none of it is valid). >> >> >> >> >> >> This leaves problem #1. >> >> >> For this case, Artem suggested to duplicate the EC header in the VID >> >> >> header so that if page 0 is corrupted we can recover the EC info from >> >> >> page 1 (which will contain both VID and EC info). >> >> >> Doing that is fine for dealing with EC header corruption, since, AFAIK, >> >> >> none of the NAND vendors are pairing page 0 with page 1. >> >> >> Still remains the VID header corruption problem. Do prevent that we >> >> >> still have several solutions: >> >> >> a/ skip the page paired with the VID header. This is doable and can be >> >> >> hidden from UBI users, but it also means that we're loosing another >> >> >> page for metadata (not a negligible overhead) >> >> >> b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap >> >> >> seems the right place to put that in, since fastmap is already >> >> >> storing those information for almost all blocks. Still we would have >> >> >> to modify fastmap a bit to store information about all erase blocks >> >> >> and not only those that are not part of the fastmap pool. >> >> >> Also, updating that in real-time would require using a log approach, >> >> >> instead of the atomic update currently used by fastmap when it runs >> >> >> out of PEBs in it's free PEB pool. Note that the log approach does >> >> >> not have to be applied to all fastmap data (we just need it for the >> >> >> PEB <-> LEB info). >> >> >> Another off-topic note regarding the suggested log approach: we >> >> >> could also use it to log which PEB was last written/erased, and use >> >> >> that to handle the unstable bits issue. >> >> >> c/ (also suggested by Artem) delay VID write until we have enough data >> >> >> to write on the LEB, and thus guarantee that it cannot be corrupted >> >> >> (at least by programming on the paired page ;-)) anymore. >> >> >> Doing that would also require logging data to be written on those >> >> >> LEBs somewhere, not to mention the impact of copying the data twice >> >> >> (once in the log, and then when we have enough data, in the real >> >> >> block). >> >> >> >> >> >> I don't have any strong opinion about which solution is the best, also >> >> >> I'm maybe missing other aspects or better solutions, so feel free to >> >> >> comment on that and share your thoughts. >> >> > >> >> > I decided to go for the simplest solution (but I can't promise I won't >> >> > change my mind if this approach appears to be wrong), which is either >> >> > using a LEB is MLC or SLC mode. In SLC modes, only the first page of >> >> > each pair is used, which completely address the paired pages problem. >> >> > For now the SLC mode logic is hidden in the MTD/NAND layers which are >> >> > providing functions to write/read in SLC mode. >> >> > >> >> > Thanks to this differentiation, UBI is now exposing two kind of LEBs: >> >> > - the secure (small) LEBS (those accessed in SLC mode) >> >> > - the unsecure (big) LEBS (those accessed in MLC mode) >> >> > >> >> > The secure LEBs are marked as such with a flag in the VID header, which >> >> > allows tracking secure/unsecure LEBs and controlling the maximum size a >> >> > UBI user can read/write from/to a LEB. >> >> > This approach assume LEB 0 and 1 are never paired together (which >> >> >> >> You mean page 0 and 1? >> > >> > Yes. >> > >> >> >> >> > AFAICT is always true), because VID is stored on page 1 and we need the >> >> > secure_flag information to know how to access the LEB (SLC or MLC mode). >> >> > Of course I expose a few new helpers in the kernel API, and we'll >> >> > probably have to do it for the ioctl interface too if this approach is >> >> > validated. >> >> > >> >> > That's all I got for the UBI layer. >> >> > Richard, Artem, any feedback so far? >> >> >> >> Changing the on-flash format of UBI is a rather big thing. >> >> If it needs to be done I'm fine with it but we have to give our best >> >> to change it only once. :-) >> > >> > Yes, I know that, and I don't pretend I chose the right solution ;-), >> > any other suggestions to avoid changing the on-flash format? >> > >> > Note that I only added a new flag, and this flag is only set when you >> > map a LEB in SLC mode, which is not the default case, which in turn >> > means you'll be able to attach to an existing UBI partition. Of course >> > the reverse is not true, once you've started using the secure LEB >> > feature you can't attach this image with an UBI implementation that does >> > not support this feature. >> >> Isn't a secure LEB just a plain LEB with half pages unused? Since you >> only write secure LEBs normally and unsecure LEBs only in garbage >> collector and you can tell secure LEB by the layout of used pages >> there isn't really need for special marking AFAICFT > > This implies scanning several pages per block to determine which type > of LEB is in use, which will drastically increase the attach time. > The whole point of this flag is to avoid scanning anything else but the > EC and VID headers (or the fastmap LEBs if fastmap is in use). Why do you need to scan anything more than you would normally? You assume that any blocks already written are written correctly and you will write any new blocks securely and perform garbage collection to condense blocks which have unused pages that cannot be written securely or were already used and have stale data. The current data format supposedly already allows determining which pages are in use and which not without extra scanning so there is no more work to be done. The only issue I see is that blocks which are writable by current driver cannot be written securely in some cases. You have to deal with that while upgrading a filesystem from old format anyway so not changing the format will just use the upgrade code path more. Thanks Michal ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-23 8:14 ` Boris Brezillon 2015-10-27 20:16 ` Richard Weinberger @ 2015-10-28 12:24 ` Artem Bityutskiy 2015-10-30 8:15 ` Boris Brezillon 1 sibling, 1 reply; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-28 12:24 UTC (permalink / raw) To: Boris Brezillon, Richard Weinberger Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 (beanhuo) On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote: > I decided to go for the simplest solution (but I can't promise I > won't > change my mind if this approach appears to be wrong), which is either > using a LEB is MLC or SLC mode. In SLC modes, only the first page of > each pair is used, which completely address the paired pages problem. > For now the SLC mode logic is hidden in the MTD/NAND layers which are > providing functions to write/read in SLC mode. Most of the writes go through the journalling subsystem. There are some non-journal writes, related to internal meta-date management, like from other subsystems: log, the master node, LPT, index, GC. In case of journal subsystem, in MLC mode you just skip pages every time the "flush write-buffer" API call is used. In LPT subsystem, you invent a custom solution, skip pages as needed. In master - probably nothing needs to be done, since we have 2 copies. Index, GC - data also goes via journal, so the journal subsystem solution will probably cover it. > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > - the secure (small) LEBS (those accessed in SLC mode) > - the unsecure (big) LEBS (those accessed in MLC mode) Is this really necessary? Feels like a bit of over-complication to the UBI layer. Can UBI care about itself WRT MLC safeness, and let UBIFS care about itself? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-28 12:24 ` Artem Bityutskiy @ 2015-10-30 8:15 ` Boris Brezillon 2015-10-30 8:21 ` Boris Brezillon ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Boris Brezillon @ 2015-10-30 8:15 UTC (permalink / raw) To: Artem Bityutskiy Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 "(beanhuo)" Hi Artem, Don't take the following answer as a try to teach you how UBI/UBIFS work or should work with MLC NANDs. I still listen to your suggestions, but when I had a look at how this "skip pages on demand" approach could be implemented I realized it was not so simple. Also, if you don't mind I'd like to finish my consolidation-GC implementation before trying a new approach, which don't mean I won't consider the "skip pages on demand" one. On Wed, 28 Oct 2015 14:24:45 +0200 Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote: > > I decided to go for the simplest solution (but I can't promise I > > won't > > change my mind if this approach appears to be wrong), which is either > > using a LEB is MLC or SLC mode. In SLC modes, only the first page of > > each pair is used, which completely address the paired pages problem. > > For now the SLC mode logic is hidden in the MTD/NAND layers which are > > providing functions to write/read in SLC mode. > > Most of the writes go through the journalling subsystem. > > There are some non-journal writes, related to internal meta-date > management, like from other subsystems: log, the master node, LPT, > index, GC. > > In case of journal subsystem, in MLC mode you just skip pages every > time the "flush write-buffer" API call is used. > > In LPT subsystem, you invent a custom solution, skip pages as needed. > > In master - probably nothing needs to be done, since we have 2 copies. > > Index, GC - data also goes via journal, so the journal subsystem > solution will probably cover it. For the general concept I agree that it should probably work, but here are my concerns (maybe you'll prove me wrong ;-)): 1/ will you ever be able to use a full LEB without skipping any pages? I mean, when use the "skip pages on demand" you can easily have more than half the page in your LEB skipped, because when you write only on one page, you'll have to skip between 3 to 8 pages (it depends on the pairing scheme). I'll try to run gather some statistics to see how often wbuf are synced to see if that's a real problem. The consolidation approach has the advantage of being able to consolidate existing LEBs to completely fill them, but the consolidation stuff could probably work with the "skip pages on demand". 2/ skipping pages on demand is not as easy as only writing on lower pages of each pair. As you might know, when skipping pages to secure your data, you'll also have to skip some lower pages so that you end up with an offset to a memory region that can be contiguously written to, and when you skip those lower pages, you have to write on it, because NAND chips require that the lower page of each pair be programmed before the higher one (ignoring this will just render some pages unreliable). 3/ UBIFS is really picky when it comes to corrupted nodes detection, and there are a few cases where it refuses to mount the FS when a corrupted node is detected. One of this case is when the corrupted page (filled with one or several nodes) is filled with non-ff data, which is likely to happen with MLC NANDs (paired pages are not contiguous). We discussed about relaxing this policy a few weeks ago, but what should we do when such a corruption is detected? Drop all nodes with a sequence higher or equal to the last valid node on the LEB? Note that with the consolidation-GC approach we don't have this problem because the consolidate LEB is added to journal after it has been completely filled with data, and marked as full (->free = 0) so that nobody can reclaim it to write data on it. > > > > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > > - the secure (small) LEBS (those accessed in SLC mode) > > - the unsecure (big) LEBS (those accessed in MLC mode) > > Is this really necessary? Feels like a bit of over-complication to the > UBI layer. Hm, it's actually not so complicated: SLC mode is implemented by the NAND layer and UBI is just using MTD functions to access the NAND in SLC mode. I'm more concerned by the on-flash format changes problem raised by Richard. > > Can UBI care about itself WRT MLC safeness, and let UBIFS care about > itself? > Sorry but I don't agree here. By exposing the secure LEB concept, UBI does not specifically care about UBIFS, it just provides a way for all UBI users to address the problem brought by paired pages in a generic way. Maybe the secure LEB approach is wrong, but in the end UBI will expose other functions to handle those paired pages problems (ubi_secure_data() to skip pages for example), and this layering (NAND/MTD/UBI/UBIFS) is IMO the only sane way to let each layer handle what it's supposed to handle and let the upper layers use the new features to mitigate the problems. So, no matter which solution is chosen, it will impact the UBI, MTD, and NAND layers. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 8:15 ` Boris Brezillon @ 2015-10-30 8:21 ` Boris Brezillon 2015-10-30 8:50 ` Bean Huo 霍斌斌 (beanhuo) 2015-10-30 9:08 ` Artem Bityutskiy 2 siblings, 0 replies; 43+ messages in thread From: Boris Brezillon @ 2015-10-30 8:21 UTC (permalink / raw) To: Artem Bityutskiy Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 "(beanhuo)" On Fri, 30 Oct 2015 09:15:21 +0100 Boris Brezillon <boris.brezillon@free-electrons.com> wrote: > > 2/ skipping pages on demand is not as easy as only writing on lower > pages of each pair. As you might know, when skipping pages to secure > your data, you'll also have to skip some lower pages so that you end up > with an offset to a memory region that can be contiguously written to, > and when you skip those lower pages, you have to write on it, because > NAND chips require that the lower page of each pair be programmed > before the higher one (ignoring this will just render some pages > unreliable). > > 3/ UBIFS is really picky when it comes to corrupted nodes detection, > and there are a few cases where it refuses to mount the FS when a > corrupted node is detected. One of this case is when the corrupted > page (filled with one or several nodes) is filled with non-ff data, I meant, "Once of this case is when the corrupted page is followed by a page filled with non-ff data" > which is likely to happen with MLC NANDs (paired pages are not > contiguous). We discussed about relaxing this policy a few weeks ago, > but what should we do when such a corruption is detected? Drop all > nodes with a sequence higher or equal to the last valid node on the > LEB? > Note that with the consolidation-GC approach we don't have this > problem because the consolidate LEB is added to journal after it has > been completely filled with data, and marked as full (->free = 0) so > that nobody can reclaim it to write data on it. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 8:15 ` Boris Brezillon 2015-10-30 8:21 ` Boris Brezillon @ 2015-10-30 8:50 ` Bean Huo 霍斌斌 (beanhuo) 2015-10-30 9:08 ` Artem Bityutskiy 2 siblings, 0 replies; 43+ messages in thread From: Bean Huo 霍斌斌 (beanhuo) @ 2015-10-30 8:50 UTC (permalink / raw) To: Boris Brezillon, Artem Bityutskiy Cc: Richard Weinberger, linux-mtd@lists.infradead.org, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn) > Hi Artem, > > Don't take the following answer as a try to teach you how UBI/UBIFS work or > should work with MLC NANDs. I still listen to your suggestions, but when I had > a look at how this "skip pages on demand" approach could be implemented I > realized it was not so simple. > > Also, if you don't mind I'd like to finish my consolidation-GC implementation > before trying a new approach, which don't mean I won't consider the "skip > pages on demand" one. "Skip page" is really a tough solution to code in UBI. Generally, I know that skip page implement in FTL layer to solve MLC paired page issue, that is easy to code. Is it necessary to implement in Linux? I don't know. > On Wed, 28 Oct 2015 14:24:45 +0200 > Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > On Fri, 2015-10-23 at 10:14 +0200, Boris Brezillon wrote: > > > I decided to go for the simplest solution (but I can't promise I > > > won't change my mind if this approach appears to be wrong), which is > > > either using a LEB is MLC or SLC mode. In SLC modes, only the first > > > page of each pair is used, which completely address the paired pages > > > problem. > > > For now the SLC mode logic is hidden in the MTD/NAND layers which > > > are providing functions to write/read in SLC mode. > > > > Most of the writes go through the journalling subsystem. > > > > There are some non-journal writes, related to internal meta-date > > management, like from other subsystems: log, the master node, LPT, > > index, GC. > > > > In case of journal subsystem, in MLC mode you just skip pages every > > time the "flush write-buffer" API call is used. > > > > In LPT subsystem, you invent a custom solution, skip pages as needed. > > > > In master - probably nothing needs to be done, since we have 2 copies. > > > > Index, GC - data also goes via journal, so the journal subsystem > > solution will probably cover it. > > For the general concept I agree that it should probably work, but here are my > concerns (maybe you'll prove me wrong ;-)): > > 1/ will you ever be able to use a full LEB without skipping any pages? > I mean, when use the "skip pages on demand" you can easily have more than > half the page in your LEB skipped, because when you write only on one page, > you'll have to skip between 3 to 8 pages (it depends on the pairing scheme). I'll > try to run gather some statistics to see how often wbuf are synced to see if > that's a real problem. > The consolidation approach has the advantage of being able to consolidate > existing LEBs to completely fill them, but the consolidation stuff could probably > work with the "skip pages on demand". > > 2/ skipping pages on demand is not as easy as only writing on lower pages of > each pair. As you might know, when skipping pages to secure your data, you'll > also have to skip some lower pages so that you end up with an offset to a > memory region that can be contiguously written to, and when you skip those > lower pages, you have to write on it, because NAND chips require that the > lower page of each pair be programmed before the higher one (ignoring this > will just render some pages unreliable). > > 3/ UBIFS is really picky when it comes to corrupted nodes detection, and there > are a few cases where it refuses to mount the FS when a corrupted node is > detected. One of this case is when the corrupted page (filled with one or > several nodes) is filled with non-ff data, which is likely to happen with MLC > NANDs (paired pages are not contiguous). We discussed about relaxing this > policy a few weeks ago, but what should we do when such a corruption is > detected? Drop all nodes with a sequence higher or equal to the last valid > node on the LEB? > Note that with the consolidation-GC approach we don't have this problem > because the consolidate LEB is added to journal after it has been completely > filled with data, and marked as full (->free = 0) so that nobody can reclaim it > to write data on it. > > > > > > > > Thanks to this differentiation, UBI is now exposing two kind of LEBs: > > > - the secure (small) LEBS (those accessed in SLC mode) > > > - the unsecure (big) LEBS (those accessed in MLC mode) > > > > Is this really necessary? Feels like a bit of over-complication to the > > UBI layer. > > Hm, it's actually not so complicated: SLC mode is implemented by the NAND > layer and UBI is just using MTD functions to access the NAND in SLC mode. I'm > more concerned by the on-flash format changes problem raised by Richard. > > > > > Can UBI care about itself WRT MLC safeness, and let UBIFS care about > > itself? > > > > Sorry but I don't agree here. By exposing the secure LEB concept, UBI does not > specifically care about UBIFS, it just provides a way for all UBI users to address > the problem brought by paired pages in a generic way. > Maybe the secure LEB approach is wrong, but in the end UBI will expose other > functions to handle those paired pages problems > (ubi_secure_data() to skip pages for example), and this layering > (NAND/MTD/UBI/UBIFS) is IMO the only sane way to let each layer handle > what it's supposed to handle and let the upper layers use the new features to > mitigate the problems. > So, no matter which solution is chosen, it will impact the UBI, MTD, and NAND > layers. > > Best Regards, > > Boris > > -- > Boris Brezillon, Free Electrons > Embedded Linux and Kernel engineering > http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 8:15 ` Boris Brezillon 2015-10-30 8:21 ` Boris Brezillon 2015-10-30 8:50 ` Bean Huo 霍斌斌 (beanhuo) @ 2015-10-30 9:08 ` Artem Bityutskiy 2015-10-30 9:45 ` Boris Brezillon 2 siblings, 1 reply; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 9:08 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 "(beanhuo)" On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote: > Hi Artem, > > Don't take the following answer as a try to teach you how UBI/UBIFS > work > or should work with MLC NANDs. I still listen to your suggestions, > but > when I had a look at how this "skip pages on demand" approach could > be implemented I realized it was not so simple. Sure. Could you verify my understanding please. You realized that "skip on demand" is not easy, and you suggest that we simply write all the data twice - first time we skip pages, and then we garbage collect everything. At the end, roughly speaking, we trade off half of the IO speed, power, and NAND lifetime. About secure LEBs - do you suggest UBI exposes 2 different LEB sizes at the same time - secure and unsecure, or you it could be only in one of the modes. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 9:08 ` Artem Bityutskiy @ 2015-10-30 9:45 ` Boris Brezillon 2015-10-30 10:09 ` Artem Bityutskiy 2015-10-30 11:43 ` Artem Bityutskiy 0 siblings, 2 replies; 43+ messages in thread From: Boris Brezillon @ 2015-10-30 9:45 UTC (permalink / raw) To: Artem Bityutskiy Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 ""(beanhuo)"" On Fri, 30 Oct 2015 11:08:10 +0200 Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote: > > Hi Artem, > > > > Don't take the following answer as a try to teach you how UBI/UBIFS > > work > > or should work with MLC NANDs. I still listen to your suggestions, > > but > > when I had a look at how this "skip pages on demand" approach could > > be implemented I realized it was not so simple. > > Sure. > > Could you verify my understanding please. > > You realized that "skip on demand" is not easy, and you suggest that we > simply write all the data twice - first time we skip pages, and then we > garbage collect everything. At the end, roughly speaking, we trade off > half of the IO speed, power, and NAND lifetime. That will be pretty much the same with the "skip on demand" approach, because you'll probably loose a lot of space when syncing the wbuf. Remember that you have to skip between 3 and 8 pages, so if your buffers are regularly synced (either manually of by the timer), you'll increase the dirty space in those LEBs, and in the end you'll just rely on the regular GC to collect those partially written LEB. Except that the regular GC will in turn loose some space when syncing its wbuf. Moreover, the standard GC only takes place when you can't find a free LEB anymore, which will probably happen when you reach something close to half the partition size in case of MLC chips (it may be a bit higher if you managed to occupy more than half of each LEB capacity). This means that your FS will become slower when you reach this limit, though maybe this can be addressed by triggering the GC before we run out of free LEBs. > > About secure LEBs - do you suggest UBI exposes 2 different LEB sizes at > the same time - secure and unsecure, or you it could be only in one of > the modes. A given LEB can only be in secure or unsecure mode, but a UBI volume can expose both unsecure and secure LEBs, and those LEBs have different sizes. The secure/unsecure mode is chosen when mapping the LEB, and the LEB stays in this mode until it's unmapped. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 9:45 ` Boris Brezillon @ 2015-10-30 10:09 ` Artem Bityutskiy 2015-10-30 11:49 ` Michal Suchanek 2015-10-30 11:43 ` Artem Bityutskiy 1 sibling, 1 reply; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 10:09 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 ""(beanhuo)"" On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote: > On Fri, 30 Oct 2015 11:08:10 +0200 > Artem Bityutskiy <dedekind1@gmail.com> wrote: > > > On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote: > > > Hi Artem, > > > > > > Don't take the following answer as a try to teach you how > > > UBI/UBIFS > > > work > > > or should work with MLC NANDs. I still listen to your > > > suggestions, > > > but > > > when I had a look at how this "skip pages on demand" approach > > > could > > > be implemented I realized it was not so simple. > > > > Sure. > > > > Could you verify my understanding please. > > > > You realized that "skip on demand" is not easy, and you suggest > > that we > > simply write all the data twice - first time we skip pages, and > > then we > > garbage collect everything. At the end, roughly speaking, we trade > > off > > half of the IO speed, power, and NAND lifetime. So I guess the answer is generally "yes", right? I just want to be clear about the trade-off. > That will be pretty much the same with the "skip on demand" approach, > because you'll probably loose a lot of space when syncing the wbuf. Write buffer is designed to optimized space usage. Instead of wasting the rest of the NAND page, we wait for more data to arrive and put it to the same NAND page with the previous piece of data. This suggests that we do not sync it too often, or at least that the efforts were taken not to do this. Off the top of my head, we sync the write-buffer (AKA wbuf) in these cases: 1. Journal commit, which happens once in a while, depends on journal size. 2. User-initiated sync, like fsync(), sync(), remount, etc. 3. Write-buffer timer, which fires when there were no writes withing certain interval, like 5 seconds. The time can be tuned. 4. Other situations like the end of GC, etc - these are related to meta -data management. Now, imagine you writing a lot of data, like uncompressing a big tarball, or compressing, or just backing up your /home. In this situation you have a continuous flow of data from VFS to UBIFS. UBIFS will keep writing the data to the journal, and there won't be any wbuf syncs. The syncs will happen only on journal commit. So you end up with LEBs full of data and not requiring any GC. But yes, if we are talking about, say, an idle system, which occasionally writes something, there will be a wbuf sync after every write. So in the "I need all your capacity" kind of situations where IO speed matters, and there are a lot of data written - we'd be optimal, no double writes. In the "I am mostly idle" type of situations we'll do double writes. SIGLUNCH, colleagues waiting, sorry, I guess I wrote enough :-) > A given LEB can only be in secure or unsecure mode, but a UBI volume > can expose both unsecure and secure LEBs, and those LEBs have > different > sizes. > The secure/unsecure mode is chosen when mapping the LEB, and the LEB > stays in this mode until it's unmapped. This is not going to be a little value add to UBI, this is going to be a big change in my opinion. If UBIFS ends up using this - may worth the effort. Otherwise, I'd argue that this would need an important customer to be worth the effort. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 10:09 ` Artem Bityutskiy @ 2015-10-30 11:49 ` Michal Suchanek 2015-10-30 12:47 ` Artem Bityutskiy 0 siblings, 1 reply; 43+ messages in thread From: Michal Suchanek @ 2015-10-30 11:49 UTC (permalink / raw) To: Artem Bityutskiy Cc: Boris Brezillon, Iwo Mergler, Jeff Lauruhn (jlauruhn), Richard Weinberger, Andrea Scian, MTD Maling List, Brian Norris, David Woodhouse, Bean Huo 霍斌斌 (beanhuo) On 30 October 2015 at 11:09, Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote: >> On Fri, 30 Oct 2015 11:08:10 +0200 >> Artem Bityutskiy <dedekind1@gmail.com> wrote: >> >> > On Fri, 2015-10-30 at 09:15 +0100, Boris Brezillon wrote: >> > > Hi Artem, >> > > >> > > Don't take the following answer as a try to teach you how >> > > UBI/UBIFS >> > > work >> > > or should work with MLC NANDs. I still listen to your >> > > suggestions, >> > > but >> > > when I had a look at how this "skip pages on demand" approach >> > > could >> > > be implemented I realized it was not so simple. >> > >> > Sure. >> > >> > Could you verify my understanding please. >> > >> > You realized that "skip on demand" is not easy, and you suggest >> > that we >> > simply write all the data twice - first time we skip pages, and >> > then we >> > garbage collect everything. At the end, roughly speaking, we trade >> > off >> > half of the IO speed, power, and NAND lifetime. > > So I guess the answer is generally "yes", right? I just want to be > clear about the trade-off. > >> That will be pretty much the same with the "skip on demand" approach, >> because you'll probably loose a lot of space when syncing the wbuf. > > Write buffer is designed to optimized space usage. Instead of wasting > the rest of the NAND page, we wait for more data to arrive and put it > to the same NAND page with the previous piece of data. > > This suggests that we do not sync it too often, or at least that the > efforts were taken not to do this. > > Off the top of my head, we sync the write-buffer (AKA wbuf) in these > cases: > 1. Journal commit, which happens once in a while, depends on journal > size. > 2. User-initiated sync, like fsync(), sync(), remount, etc. > 3. Write-buffer timer, which fires when there were no writes withing > certain interval, like 5 seconds. The time can be tuned. > 4. Other situations like the end of GC, etc - these are related to meta > -data management. > > Now, imagine you writing a lot of data, like uncompressing a big > tarball, or compressing, or just backing up your /home. In this > situation you have a continuous flow of data from VFS to UBIFS. > > UBIFS will keep writing the data to the journal, and there won't be any > wbuf syncs. The syncs will happen only on journal commit. So you end up > with LEBs full of data and not requiring any GC. Actually, since there is no guarantee that the data ever gets written (or error reported in case it cannot) unless you fsync() every file before close() any sane uncompressor, backup program, etc. will fsync() every file written regardless of its size. So if your home has a lot of configuration files and sources this will not be nice continuous stream of data in many cases. IIRC scp(1) does not fsync() potentially resulting in large amounts of data getting silently lost. For that reason and others it should be renamed to iscp. Thanks Michal ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 11:49 ` Michal Suchanek @ 2015-10-30 12:47 ` Artem Bityutskiy 0 siblings, 0 replies; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 12:47 UTC (permalink / raw) To: Michal Suchanek Cc: Boris Brezillon, Iwo Mergler, Jeff Lauruhn (jlauruhn), Richard Weinberger, Andrea Scian, MTD Maling List, Brian Norris, David Woodhouse, Bean Huo 霍斌斌 (beanhuo) On Fri, 2015-10-30 at 12:49 +0100, Michal Suchanek wrote: > Actually, since there is no guarantee that the data ever gets written > (or error reported in case it cannot) unless you fsync() every file > before close() any sane uncompressor, backup program, etc. will > fsync() every file written regardless of its size. So if your home > has > a lot of configuration files and sources this will not be nice > continuous stream of data in many cases. > > IIRC scp(1) does not fsync() potentially resulting in large amounts > of > data getting silently lost. For that reason and others it should be > renamed to iscp. This is true. If you get more or less steady stream of incoming writes without syncs in-between, you do not need to sync the write-buffer, you do not need to skip pages to cover the MLC paired pages. In these situations we do not have to end up with double writing. If there are no writes for some time, it is good idea to sync. VFS will write-back by timeout, UBIFS will flush write-buffer by timeout. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 9:45 ` Boris Brezillon 2015-10-30 10:09 ` Artem Bityutskiy @ 2015-10-30 11:43 ` Artem Bityutskiy 2015-10-30 11:59 ` Richard Weinberger 2015-10-30 12:30 ` Boris Brezillon 1 sibling, 2 replies; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 11:43 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 ""(beanhuo)"" On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote: > Moreover, the standard GC only takes place when you can't find a free > LEB anymore, which will probably happen when you reach something > close > to half the partition size in case of MLC chips (it may be a bit > higher if you managed to occupy more than half of each LEB capacity). > This means that your FS will become slower when you reach this limit, > though maybe this can be addressed by triggering the GC before we run > out of free LEBs. Right. I'd call it a detail. But the big picture is - if you have to GC all the data you write, you write twice. When exactly you do the second write is a detail - sometimes it is deferred, it is in background etc, sometimes right away - you have to GC older data before being able to write new data. Now, by no means I am criticizing you or your decisions, you are doing great job. I am more like summarizing and trying to give you some food for thoughts. :-) ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 11:43 ` Artem Bityutskiy @ 2015-10-30 11:59 ` Richard Weinberger 2015-10-30 12:29 ` Artem Bityutskiy 2015-10-30 12:30 ` Boris Brezillon 1 sibling, 1 reply; 43+ messages in thread From: Richard Weinberger @ 2015-10-30 11:59 UTC (permalink / raw) To: dedekind1, Boris Brezillon Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 ""(beanhuo)"" Am 30.10.2015 um 12:43 schrieb Artem Bityutskiy: > On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote: >> Moreover, the standard GC only takes place when you can't find a free >> LEB anymore, which will probably happen when you reach something >> close >> to half the partition size in case of MLC chips (it may be a bit >> higher if you managed to occupy more than half of each LEB capacity). >> This means that your FS will become slower when you reach this limit, >> though maybe this can be addressed by triggering the GC before we run >> out of free LEBs. > > Right. I'd call it a detail. But the big picture is - if you have to GC > all the data you write, you write twice. When exactly you do the second > write is a detail - sometimes it is deferred, it is in background etc, > sometimes right away - you have to GC older data before being able to > write new data. That is a valid concern. But to me the idea sounds promising and is worth a try. We will stress test it and figure how much the actual overhead is. Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 11:59 ` Richard Weinberger @ 2015-10-30 12:29 ` Artem Bityutskiy 2015-10-30 12:31 ` Bityutskiy, Artem 0 siblings, 1 reply; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 12:29 UTC (permalink / raw) To: Richard Weinberger, Boris Brezillon Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 ""(beanhuo)"" On Fri, 2015-10-30 at 12:59 +0100, Richard Weinberger wrote: > That is a valid concern. > But to me the idea sounds promising and is worth a try. > We will stress test it and figure how much the actual overhead is. Well, for me the question of "do we double-write or try to do a better job" is more of a fundamental question, not a concern. Right now I personally do not share the opinion that doing a better job is hard, and double write is easy. I may be wrong though. So for me it does not look like - "hey, we'll just write twice, it is worth a try" is a good starting point. And then "hey, we did not try to do a better job, so we write twice, let's have this upstream" is a strong position. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 12:29 ` Artem Bityutskiy @ 2015-10-30 12:31 ` Bityutskiy, Artem 0 siblings, 0 replies; 43+ messages in thread From: Bityutskiy, Artem @ 2015-10-30 12:31 UTC (permalink / raw) To: boris.brezillon@free-electrons.com, richard@nod.at Cc: beanhuo@micron.com, computersforpeace@gmail.com, Iwo.Mergler@netcommwireless.com, rnd4@dave-tech.it, linux-mtd@lists.infradead.org, dwmw2@infradead.org, jlauruhn@micron.com On Fri, 2015-10-30 at 14:29 +0200, Artem Bityutskiy wrote: > And then "hey, we did not try to do a better job, so we write twice, > let's have this upstream" is a strong position. I meant NOT a strong position here, sorry. -- Best Regards, Artem Bityutskiy --------------------------------------------------------------------- Intel Finland Oy Registered Address: PL 281, 00181 Helsinki Business Identity Code: 0357606 - 4 Domiciled in Helsinki This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 11:43 ` Artem Bityutskiy 2015-10-30 11:59 ` Richard Weinberger @ 2015-10-30 12:30 ` Boris Brezillon 2015-10-30 12:41 ` Artem Bityutskiy 1 sibling, 1 reply; 43+ messages in thread From: Boris Brezillon @ 2015-10-30 12:30 UTC (permalink / raw) To: Artem Bityutskiy Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 """(beanhuo)""" On Fri, 30 Oct 2015 13:43:15 +0200 Artem Bityutskiy <dedekind1@gmail.com> wrote: > On Fri, 2015-10-30 at 10:45 +0100, Boris Brezillon wrote: > > Moreover, the standard GC only takes place when you can't find a free > > LEB anymore, which will probably happen when you reach something > > close > > to half the partition size in case of MLC chips (it may be a bit > > higher if you managed to occupy more than half of each LEB capacity). > > This means that your FS will become slower when you reach this limit, > > though maybe this can be addressed by triggering the GC before we run > > out of free LEBs. > > Right. I'd call it a detail. But the big picture is - if you have to GC > all the data you write, you write twice. When exactly you do the second > write is a detail - sometimes it is deferred, it is in background etc, > sometimes right away - you have to GC older data before being able to > write new data. You're right, but it makes a big difference when all your writes are taking longer because you need to run the GC to retrieve a free LEB, and this is probably what's gonna happen when your FS raises ~1/2 its maximum size. Doing it in background (collecting a few valid nodes on each GC step and letting user operations take place between each of these step) should help mitigating this problem. > > Now, by no means I am criticizing you or your decisions, you are doing > great job. I am more like summarizing and trying to give you some food > for thoughts. :-) No problem, I don't take it personally. I actually think arguing on technical stuff is a good way to find the best solution ;-). -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-10-30 12:30 ` Boris Brezillon @ 2015-10-30 12:41 ` Artem Bityutskiy 0 siblings, 0 replies; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-30 12:41 UTC (permalink / raw) To: Boris Brezillon Cc: Richard Weinberger, linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Iwo Mergler, Jeff Lauruhn (jlauruhn), Bean Huo 霍斌斌 """(beanhuo)""" On Fri, 2015-10-30 at 13:30 +0100, Boris Brezillon wrote: > You're right, but it makes a big difference when all your writes are > taking longer because you need to run the GC to retrieve a free LEB, > and > this is probably what's gonna happen when your FS raises ~1/2 its > maximum size. Doing it in background (collecting a few valid nodes on > each GC step and letting user operations take place between each of > these step) should help mitigating this problem. It makes difference, yes. However, again, the worst case scenario is that whenever I need to write, I have do GC, because I am "punished" by previous writes. The worst case scenario is twice as slow write. Guaranteed twice as fast wear is the other implication. Increased power consumption is another one. Not every embedded system will find the "you have to do a lot of job in background" UBIFS feature attractive. Anyway, could you spend a bit more time trying to provide convincing arguments that doing "skip on demand" is hard, or does not gain anything. You expressed this opinion, but so far it did not look 100% convincing. Thanks! ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon ` (2 preceding siblings ...) 2015-10-23 8:14 ` Boris Brezillon @ 2015-10-28 12:06 ` Artem Bityutskiy 3 siblings, 0 replies; 43+ messages in thread From: Artem Bityutskiy @ 2015-10-28 12:06 UTC (permalink / raw) To: Boris Brezillon, Richard Weinberger Cc: linux-mtd, David Woodhouse, Brian Norris, Andrea Scian, Qi Wang 王起 (qiwang), Iwo Mergler, Jeff Lauruhn (jlauruhn) On Thu, 2015-09-17 at 15:22 +0200, Boris Brezillon wrote: > 1/ do not skip any pages until we are asked to secure the data, and > then skip as much pages as needed to ensure nobody can ever > corrupt > the data. With this approach you can loose a non negligible amount > of space. For example, with this paired pages scheme [1], if you > only write page on page 2 and want to secure your data, you'll > have > to skip pages 3 to 8. This sounds like a the right way to go to me. ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <A765B125120D1346A63912DDE6D8B6310BF4CAA8@NTXXIAMBX02.xacn.micron.com>]
* Re: UBI/UBIFS: dealing with MLC's paired pages [not found] <A765B125120D1346A63912DDE6D8B6310BF4CAA8@NTXXIAMBX02.xacn.micron.com> @ 2015-09-25 7:30 ` Boris Brezillon 2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo) 2015-09-25 8:30 ` Karl Zhang 张双锣 (karlzhang) 0 siblings, 2 replies; 43+ messages in thread From: Boris Brezillon @ 2015-09-25 7:30 UTC (permalink / raw) To: Bean Huo 霍斌斌 (beanhuo) Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com, Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it, dwmw2@infradead.org Hi, On Fri, 25 Sep 2015 00:43:37 +0000 Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote: > > And 3: > > Only NAND provides an OOB area. Other flash devices like parallel > > or SPI NOR don't. And we definitely want to continue supporting > > platforms with such flash devices and UBI (and UBIFS). > > > > Thanks, > > Stefan > > > For MLC NAND paired pages issue, we have developed two methods to solve it in UBI layer, > We hope that every expert on UBI/UBIFS can give more suggestions about how to improve and perfect it. > I think ,this week, I can submit first solution patch out. Currently do coding style. Are you referring to the suggestion proposed by Karl (using the OOB area to store UBI metadata), or is this something else? Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: UBI/UBIFS: dealing with MLC's paired pages 2015-09-25 7:30 ` Boris Brezillon @ 2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo) 2015-09-25 8:35 ` Richard Weinberger 2015-09-25 8:48 ` Boris Brezillon 2015-09-25 8:30 ` Karl Zhang 张双锣 (karlzhang) 1 sibling, 2 replies; 43+ messages in thread From: Bean Huo 霍斌斌 (beanhuo) @ 2015-09-25 8:25 UTC (permalink / raw) To: Boris Brezillon Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com, Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it, dwmw2@infradead.org, Karl Zhang 张双锣 (karlzhang) > Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote: > > > > And 3: > > > Only NAND provides an OOB area. Other flash devices like parallel or > > > SPI NOR don't. And we definitely want to continue supporting > > > platforms with such flash devices and UBI (and UBIFS). > > > > > > Thanks, > > > Stefan > > > > > For MLC NAND paired pages issue, we have developed two methods to > > solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more > suggestions about how to improve and perfect it. > > I think ,this week, I can submit first solution patch out. Currently do coding > style. > > Are you referring to the suggestion proposed by Karl (using the OOB area to > store UBI metadata), or is this something else? > > Best Regards, > > Boris > > -- > Boris Brezillon, Free Electrons > Embedded Linux and Kernel engineering > http://free-electrons.com Hi, Boris For NAND OOB area, it is dedicated for ECC value, at the same time, User available area in OOB is not covered by ECC protection mechanism. so saving EC or VID information in OOB is not a perfect solution. But if there is additional space, and this space can be covered by ECC, We can try it. By the way, I want to allocate a new internal volume to solve paired pages issue. How do you think about this ? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo) @ 2015-09-25 8:35 ` Richard Weinberger 2015-09-25 8:48 ` Boris Brezillon 1 sibling, 0 replies; 43+ messages in thread From: Richard Weinberger @ 2015-09-25 8:35 UTC (permalink / raw) To: Bean Huo 霍斌斌 (beanhuo), Boris Brezillon Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com, Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com, shuangshuo@gmail.com, rnd4@dave-tech.it, dwmw2@infradead.org, Karl Zhang 张双锣 (karlzhang) Am 25.09.2015 um 10:25 schrieb Bean Huo 霍斌斌 (beanhuo): > For NAND OOB area, it is dedicated for ECC value, at the same time, > User available area in OOB is not covered by ECC protection mechanism. > so saving EC or VID information in OOB is not a perfect solution. > But if there is additional space, and this space can be covered by ECC, > We can try it. As I said, I'm *really* against using OOB and see it as the absolutely last choice. > By the way, I want to allocate a new internal volume to solve paired pages issue. > How do you think about this ? More details please. :) Adding a new internal volume to UBI is not a big deal. Thanks, //richard ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo) 2015-09-25 8:35 ` Richard Weinberger @ 2015-09-25 8:48 ` Boris Brezillon 1 sibling, 0 replies; 43+ messages in thread From: Boris Brezillon @ 2015-09-25 8:48 UTC (permalink / raw) To: Bean Huo 霍斌斌 (beanhuo) Cc: linux-mtd@lists.infradead.org, computersforpeace@gmail.com, Stefan Roese, Iwo Mergler, Jeff Lauruhn (jlauruhn), dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it, dwmw2@infradead.org, Karl Zhang 张双锣 (karlzhang) Hi Bean, On Fri, 25 Sep 2015 08:25:44 +0000 Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote: > > Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote: > > > > > > And 3: > > > > Only NAND provides an OOB area. Other flash devices like parallel or > > > > SPI NOR don't. And we definitely want to continue supporting > > > > platforms with such flash devices and UBI (and UBIFS). > > > > > > > > Thanks, > > > > Stefan > > > > > > > For MLC NAND paired pages issue, we have developed two methods to > > > solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more > > suggestions about how to improve and perfect it. > > > I think ,this week, I can submit first solution patch out. Currently do coding > > style. > > > > Are you referring to the suggestion proposed by Karl (using the OOB area to > > store UBI metadata), or is this something else? > > > > Best Regards, > > > > Boris > > > > -- > > Boris Brezillon, Free Electrons > > Embedded Linux and Kernel engineering > > http://free-electrons.com > > Hi, Boris > For NAND OOB area, it is dedicated for ECC value, at the same time, > User available area in OOB is not covered by ECC protection mechanism. > so saving EC or VID information in OOB is not a perfect solution. > But if there is additional space, and this space can be covered by ECC, > We can try it. The problem is, we want the solution to be as generic as possible, and not all NAND/ECC controllers are able to protect OOB bytes :-/. > By the way, I want to allocate a new internal volume to solve paired pages issue. > How do you think about this ? I actually had a similar idea, but instead of creating a new metadata volume, I wanted to reuse the fastmap one. My idea was not to duplicate data from already programmed pages in this volume (not sure this was your idea either, could you tell us more about what you had in mind?), but instead use it to log UBI operations like - PEB erasure: to save the EC counter and recover it if a power-cut occurs after the erasure but before the EC header is written to the block. Doing that would also partly solve the 'unstable bits' issue, since we would be able to know which block was being erased before the power-cut occured. - LEB map: to save the lnum <-> pnum association and recover from VID header corruption. - Last written block (still not happy with that): to log on which block the last write operation has taken place. This would help solving the 'unstable bits' problem, but it would also add a non negligible overhead if writes are not consecutive (not done in the same LEB) Other ideas are welcome. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
* RE: UBI/UBIFS: dealing with MLC's paired pages 2015-09-25 7:30 ` Boris Brezillon 2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo) @ 2015-09-25 8:30 ` Karl Zhang 张双锣 (karlzhang) 2015-09-25 8:56 ` Boris Brezillon 1 sibling, 1 reply; 43+ messages in thread From: Karl Zhang 张双锣 (karlzhang) @ 2015-09-25 8:30 UTC (permalink / raw) To: Boris Brezillon, Bean Huo 霍斌斌 (beanhuo) Cc: Iwo Mergler, computersforpeace@gmail.com, dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it, Jeff Lauruhn (jlauruhn), linux-mtd@lists.infradead.org, Stefan Roese, dwmw2@infradead.org Hi Boris Something else suggestions. We know that using the OOB area to store UBI metadata(Backup data) is not a good idea, this is just a thinking, and also do a little work to implement it and verification . For paired page issue from MLC is troublesome, we are trying our best to deal with it. Although attempt to use some bad ideas. > > Only NAND provides an OOB area. Some devices(NOR) do not have OOB, simultaneously, they do not have paired page issue, right? I think they do not need to store a redundant metadata to anywhere. AFAIK , only MLC(TLC maybe) NAND need store some metadata for another copy, and try to protected it by ECC. But, if OOB have some unused area (at least 48 bytes) and the chip has paired page issue , could we take a little advantage of them to reduce UBI fail rate? Above, is just some thinking , wish we can have some better solutions. As Bean said, most customer do not want UBIFS crash, and so do we. Thanks to your patient to point out my wrong opinion. BR -----Original Message----- From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Boris Brezillon Sent: Friday, September 25, 2015 3:30 PM To: Bean Huo 霍斌斌 (beanhuo) Cc: Iwo Mergler; computersforpeace@gmail.com; dedekind1@gmail.com; richard@nod.at; shuangshuo@gmail.com; rnd4@dave-tech.it; Jeff Lauruhn (jlauruhn); linux-mtd@lists.infradead.org; Stefan Roese; dwmw2@infradead.org Subject: Re: UBI/UBIFS: dealing with MLC's paired pages Hi, On Fri, 25 Sep 2015 00:43:37 +0000 Bean Huo 霍斌斌 (beanhuo) <beanhuo@micron.com> wrote: > > And 3: > > Only NAND provides an OOB area. Other flash devices like parallel or > > SPI NOR don't. And we definitely want to continue supporting > > platforms with such flash devices and UBI (and UBIFS). > > > > Thanks, > > Stefan > > > For MLC NAND paired pages issue, we have developed two methods to > solve it in UBI layer, We hope that every expert on UBI/UBIFS can give more suggestions about how to improve and perfect it. > I think ,this week, I can submit first solution patch out. Currently do coding style. Are you referring to the suggestion proposed by Karl (using the OOB area to store UBI metadata), or is this something else? Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: UBI/UBIFS: dealing with MLC's paired pages 2015-09-25 8:30 ` Karl Zhang 张双锣 (karlzhang) @ 2015-09-25 8:56 ` Boris Brezillon 0 siblings, 0 replies; 43+ messages in thread From: Boris Brezillon @ 2015-09-25 8:56 UTC (permalink / raw) To: Karl Zhang 张双锣 (karlzhang) Cc: Bean Huo 霍斌斌 (beanhuo), Iwo Mergler, computersforpeace@gmail.com, dedekind1@gmail.com, richard@nod.at, shuangshuo@gmail.com, rnd4@dave-tech.it, Jeff Lauruhn (jlauruhn), linux-mtd@lists.infradead.org, Stefan Roese, dwmw2@infradead.org On Fri, 25 Sep 2015 08:30:23 +0000 Karl Zhang 张双锣 (karlzhang) <karlzhang@micron.com> wrote: > Hi Boris > > Something else suggestions. > > We know that using the OOB area to store UBI metadata(Backup data) is not a good idea, this is just a thinking, and also do a little work to implement it and verification . > > For paired page issue from MLC is troublesome, we are trying our best to deal with it. Although attempt to use some bad ideas. > > > > Only NAND provides an OOB area. > Some devices(NOR) do not have OOB, simultaneously, they do not have paired page issue, right? I think they do not need to store a redundant metadata to anywhere. > AFAIK , only MLC(TLC maybe) NAND need store some metadata for another copy, and try to protected it by ECC. > > But, if OOB have some unused area (at least 48 bytes) and the chip has paired page issue , could we take a little advantage of them to reduce UBI fail rate? > > Above, is just some thinking , wish we can have some better solutions. > As Bean said, most customer do not want UBIFS crash, and so do we. > > Thanks to your patient to point out my wrong opinion. Don't be sorry, and there's no wrong opinion: the whole point of this discussion is sharing our ideas and arguing to find the best solution. Regarding the use of extra OOB bytes, I'll follow Richard's opinion: let's keep it as a 'last resort' solution if we fail to find an acceptable alternative. Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2015-10-30 12:48 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-17 13:22 UBI/UBIFS: dealing with MLC's paired pages Boris Brezillon
2015-09-17 15:20 ` Artem Bityutskiy
2015-09-17 15:46 ` Boris Brezillon
2015-09-17 16:47 ` Richard Weinberger
2015-09-18 7:17 ` Andrea Scian
2015-09-18 7:41 ` Boris Brezillon
2015-09-18 7:54 ` Artem Bityutskiy
2015-09-18 7:57 ` Bityutskiy, Artem
2015-09-18 9:38 ` Andrea Scian
2015-09-24 1:57 ` Karl Zhang 张双锣 (karlzhang)
2015-09-24 6:31 ` Richard Weinberger
2015-09-24 7:43 ` Boris Brezillon
2015-09-24 9:44 ` Stefan Roese
2015-09-29 11:19 ` Richard Weinberger
2015-09-29 12:51 ` Boris Brezillon
2015-10-23 8:14 ` Boris Brezillon
2015-10-27 20:16 ` Richard Weinberger
2015-10-28 9:24 ` Boris Brezillon
2015-10-28 10:44 ` Michal Suchanek
2015-10-28 11:14 ` Boris Brezillon
2015-10-28 15:50 ` Michal Suchanek
2015-10-28 12:24 ` Artem Bityutskiy
2015-10-30 8:15 ` Boris Brezillon
2015-10-30 8:21 ` Boris Brezillon
2015-10-30 8:50 ` Bean Huo 霍斌斌 (beanhuo)
2015-10-30 9:08 ` Artem Bityutskiy
2015-10-30 9:45 ` Boris Brezillon
2015-10-30 10:09 ` Artem Bityutskiy
2015-10-30 11:49 ` Michal Suchanek
2015-10-30 12:47 ` Artem Bityutskiy
2015-10-30 11:43 ` Artem Bityutskiy
2015-10-30 11:59 ` Richard Weinberger
2015-10-30 12:29 ` Artem Bityutskiy
2015-10-30 12:31 ` Bityutskiy, Artem
2015-10-30 12:30 ` Boris Brezillon
2015-10-30 12:41 ` Artem Bityutskiy
2015-10-28 12:06 ` Artem Bityutskiy
[not found] <A765B125120D1346A63912DDE6D8B6310BF4CAA8@NTXXIAMBX02.xacn.micron.com>
2015-09-25 7:30 ` Boris Brezillon
2015-09-25 8:25 ` Bean Huo 霍斌斌 (beanhuo)
2015-09-25 8:35 ` Richard Weinberger
2015-09-25 8:48 ` Boris Brezillon
2015-09-25 8:30 ` Karl Zhang 张双锣 (karlzhang)
2015-09-25 8:56 ` Boris Brezillon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).