From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.nokia.com ([192.100.122.230] helo=mgw-mx03.nokia.com) by bombadil.infradead.org with esmtps (Exim 4.69 #1 (Red Hat Linux)) id 1LujsG-0001KI-R3 for linux-mtd@lists.infradead.org; Fri, 17 Apr 2009 08:56:57 +0000 Subject: Re: UBIFS Corrupt during power failure From: Artem Bityutskiy To: Jamie Lokier In-Reply-To: <20090416213400.GA10578@shareable.org> References: <1239383500.3390.76.camel@localhost.localdomain> <1239689500.3390.82.camel@localhost.localdomain> <20090414180010.GC32311@shareable.org> <1239775237.3390.144.camel@localhost.localdomain> <20090415160921.GA4325@shareable.org> <1239860798.3390.205.camel@localhost.localdomain> <20090416213400.GA10578@shareable.org> Content-Type: text/plain; charset="UTF-8" Date: Fri, 17 Apr 2009 11:56:33 +0300 Message-Id: <1239958593.3390.293.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: Urs Muff , Eric Holmberg , linux-mtd@lists.infradead.org, Adrian Hunter Reply-To: dedekind@infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Jamie, thanks for feedback! On Thu, 2009-04-16 at 22:34 +0100, Jamie Lokier wrote: > > 1. eraseblock > > 2. Min. I/O unit size, which is mtd->writesize in MTD, and > > ubi->min_io_size in UBI. This corresponds to NAND page, and 1 byte in > > NOR. > > I guess 1 byte in NOR because you can overwrite a word to set the other byte? > Logically min_io_size should be 1 bit :-) > > > 3. There are also sub-pages in case of NAND, but I consider them as a > > kind of hack. UBI does not expose information about them, and UBIFS does > > not use them. > > UBI FAQ (http://www.linux-mtd.infradead.org/faq/ubi.html#L_find_min_io_size) > > UBI: physical eraseblock size: 131072 bytes (128 KiB) > UBI: logical eraseblock size: 129024 bytes > UBI: smallest flash I/O unit: 2048 > UBI: sub-page size: 512 > > Note, if sup-page size was not printed, the flash is not NAND > flash or NAND flash which does not have sub-pages. > > UBI does not expose information about sub-pages? It prints about them, just for info. But the UBI "front-ent" API does not contain sub-page info. > Googling for "NAND sub-page" didn't help explain them much. Can you > recommend a URL, just so I can understand NAND sub-pages? There is info at the MTD web site. But for your convenience, I've also added this: http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage > > Now obviously, we need to extend this model. I would suggest to > > introduce a notion of "max. I/O size". It would be: > > > > 1. 64-bytes in case of Eric's NOR. This would be taken from CFI info. > > 2. If we ever have a striping layer, which can interleave between 2 or > > more chips, then max. I/O size will be N * ubi->min_io_size. > > > > Thoughts? > > 0. It's more accurate to call it "max parallel write size". > That NOR chip has a read page size too, which is different :-) > > 1. Alignment, or can we assume alignment is the same as its size? Yes, I think so. > 2. If striping uses larger stripes (the same way as RAID uses 1MB > stripes instead of 1 sector stripes), then this value needs to be > max(N * strip_size, N * ubi->min_io_size), because the chip block > writes done in parallel are not contiguous in the combined MTD. OK. > > 3. 2 assumes that striping works like this: > > Start write at offset P to chip 0, chip 1, chip 2, chip 3. > Wait for _all_ chips to finish. > Start write at offset P+block_size to chip 0, chip 1, chip 2, chip 3. > Wait for _all_ chips to finish. > etc. Right. > But if striping is implemented in a more relaxed way to get higher > performance, it will do this: > > Start write at offset P to chip 0, chip 1, chip 2, chip 3. > Wait for any chip to finish. > Start write at offset P+block_size on the chip which finished. > Wait for any chip to finish. > Start write at next block on the chip which finished. > Wait for any chip to finish. > etc. Yeah... > That makes the range of parallel writes, and so > corruption-on-power-loss, even larger than max(N * strip_size, N * > block_size). The range is as large as the whole write, if one chip > is writing much faster than the others, so it cannot be represented > by a small number. Then I guess we should just introduce mtd->max_corruption ? This would mean maximum amount of bytes corruption may span in vase of power cuts? -- Best regards, Artem Bityutskiy (Битюцкий Артём)