From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from down.free-electrons.com ([37.187.137.238] helo=mail.free-electrons.com) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZcZ9T-0006Nl-L4 for linux-mtd@lists.infradead.org; Thu, 17 Sep 2015 13:23:05 +0000 Date: Thu, 17 Sep 2015 15:22:40 +0200 From: Boris Brezillon To: Artem Bityutskiy , Richard Weinberger Cc: linux-mtd@lists.infradead.org, David Woodhouse , Brian Norris , Andrea Scian , "Qi Wang =?UTF-8?B?546L6LW3?= (qiwang)" , Iwo Mergler , "Jeff Lauruhn (jlauruhn)" Subject: UBI/UBIFS: dealing with MLC's paired pages Message-ID: <20150917152240.757c9e90@bbrezillon> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hello, I'm currently working on the paired pages problem we have on MLC chips. I remember discussing it with Artem earlier this year when I was preparing my talk for ELC. I now have some time I can spend working on this problem and I started looking at how this can be solved. First let's take a look at the UBI layer. There's one basic thing we have to care about: protecting UBI metadata. There are two kind of metadata: 1/ those stored at the beginning of each erase block (EC and VID headers) 2/ those stored in specific volumes (layout and fastmap volumes) We don't have to worry about #2 since those are written using atomic update, and atomic updates are immune to this paired page corruption problem (either the whole write is valid, or none of it is valid). This leaves problem #1. For this case, Artem suggested to duplicate the EC header in the VID header so that if page 0 is corrupted we can recover the EC info from page 1 (which will contain both VID and EC info). Doing that is fine for dealing with EC header corruption, since, AFAIK, none of the NAND vendors are pairing page 0 with page 1. Still remains the VID header corruption problem. Do prevent that we still have several solutions: a/ skip the page paired with the VID header. This is doable and can be hidden from UBI users, but it also means that we're loosing another page for metadata (not a negligible overhead) b/ storing VID info (PEB <-> LEB association) somewhere else. Fastmap seems the right place to put that in, since fastmap is already storing those information for almost all blocks. Still we would have to modify fastmap a bit to store information about all erase blocks and not only those that are not part of the fastmap pool. Also, updating that in real-time would require using a log approach, instead of the atomic update currently used by fastmap when it runs out of PEBs in it's free PEB pool. Note that the log approach does not have to be applied to all fastmap data (we just need it for the PEB <-> LEB info). Another off-topic note regarding the suggested log approach: we could also use it to log which PEB was last written/erased, and use that to handle the unstable bits issue. c/ (also suggested by Artem) delay VID write until we have enough data to write on the LEB, and thus guarantee that it cannot be corrupted (at least by programming on the paired page ;-)) anymore. Doing that would also require logging data to be written on those LEBs somewhere, not to mention the impact of copying the data twice (once in the log, and then when we have enough data, in the real block). I don't have any strong opinion about which solution is the best, also I'm maybe missing other aspects or better solutions, so feel free to comment on that and share your thoughts. That's all for the UBI layer. We will likely need new functions (and new fields in existing structures) to help UBI users deal with MLC NANDs: for example a field exposing the storage type or a function helping users skip one (or several) blocks to secure the data they have written so far. Anyway, those are things we can discuss after deciding which approach we want to take. Now, let's talk about the UBIFS layer. We are facing pretty much the same problem in there: we need to protect the data we have already written from time to time. AFAIU (correct me if I'm wrong), data should be secure when we sync the file system, or commit the UBIFS journal (feel free to correct me if I'm not using the right terms in my explanation). As explained earlier, the only way to secure data is to skip some pages (those that are paired with the already written ones). I see two approaches here (there might be more): 1/ do not skip any pages until we are asked to secure the data, and then skip as much pages as needed to ensure nobody can ever corrupt the data. With this approach you can loose a non negligible amount of space. For example, with this paired pages scheme [1], if you only write page on page 2 and want to secure your data, you'll have to skip pages 3 to 8. 2/ use the NAND in 'SLC mode' (AKA only write on half the pages in a block). With this solution you always loose half the NAND capacity, but in case of small writes, it's still more efficient than #1. Of course using that solution is not acceptable, because you'll only be able to use half the NAND capacity, but the plan is to use it in conjunction with the GC, so that from time to time UBIFS data chunks/nodes can be put in a single erase block without skipping half the pages. Note that currently the GC does not work this way: it tries to collect chunks one by one and write them to the journal to free a dirty LEB. What we would need here is a way to collect enough data to fill an entire block and after that release the LEBs that where previously using half the LEB capacity. Of course both of those solutions implies marking the skipped regions as dirty so that the GC can account for the padded space. For #1 we should probably also use padding nodes to reflect how much space is lost on the media, though I'm not sure how this can be done. For #2, we may have to differentiate 'full' and 'half' LEBs in the LPT. Anyway, all the above are just some ideas I had or suggestions I got from other people and I wanted to share. I'm open to any new suggestions, because none of the proposed solutions are easy to implement. Best Regards, Boris P.S.: Note that I'm not discussing the WP solution on purpose: I'd like to have a solution that is completely HW independent. [1]https://www.olimex.com/Products/Components/IC/H27UBG8T2BTR/resources/H27UBG8T2BTR.pdf, chapter 6.1. Paired Page Address Information -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com