From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from a.ns.miles-group.at ([95.130.255.143] helo=radon.swed.at) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZrAIz-0004L0-4B for linux-mtd@lists.infradead.org; Tue, 27 Oct 2015 19:53:14 +0000 Subject: Re: UBIFS corruption after power cut - possibly unstable bits issue? To: Tim Harvey References: <562E8697.50207@nod.at> <562E9E0B.5030204@nod.at> Cc: Artem Bityutskiy , Adrian Hunter , linux-mtd@lists.infradead.org From: Richard Weinberger Message-ID: <562FD60E.9020807@nod.at> Date: Tue, 27 Oct 2015 20:52:46 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Tim, Am 27.10.2015 um 20:01 schrieb Tim Harvey: > I'm not understanding what is making you say that the issue I > encountered is 'not' the unstable bits issue described at > http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits? My > understanding is that the 'unstable bit' issue refers to bits which > are truly unstable and can read either way each and every read due to > not getting properly erased/written. You are right. I was sorting out the unstable bits issue a bit too early. I'm sorry. Let's double check. Can you enable UBI verbose logging while testing? Such that we can see which blocks were written/erased while the power cut happened? > If I understand what you are saying you are thinking that my issue is > instead the result of a never-used PEB that had bit-flips from the > manufacturer in which case the bits would read the same every time? > How can we know this PEB was never before used and isn't one that was > being erased/written during a power cut? I've seen bit flips on cheap SLC NANDs which came out of a sudden. According to the FAE I was talking to this is legit for NAND as long the flipping bits are fixable by the ECC engine. > In my test scenario where the rootfs is mounted from the kernel > read-only, but later mounted read-write by userspace (yet not being > specifically written to by userspace) then power-cut should 'any' NAND > writes would be occurring at all? And if not as I suspect, then how > could a subsequent boot end up using a PEB that may have been never > previously used and have bit-flips from the manufacturer? UBIFS's has a wandering journal. During the remount it moved maybe. But for a more expressive analysis I'd need a nanddump to find out which blocks are in which role. Can you share the nanddump? > Should we be doing an erase block on every NAND block during our board > manufacturing process to avoid this? Sorry, I don't understand this sentence. Do you mean a full erasure of the whole NAND? If so, it would not help as the bit flips can come later. (Without writing/erasing the block) The root cause is that your NFC cannot correct bit flips on empty pages. > It sounds like this 'unexpected bit-flips on erased pages from the > mfg' issue is a ticking time-bomb for people using ubi/ubifs NAND. > Shouldn't the http://www.linux-mtd.infradead.org/doc/ubifs.html page > be updated to refer to this known issue as well as the unstable bit > issue? As I said the root cause is that some NFCs cannot correct bit flips on empty pages. Instead of putting warnings to ubifs.html I'd love to see a solution on the said drivers or MTD core. > I can add some debugging to find out - what specifically would be > helpful to add? A hexdump of the buffer would be a good start. > Thanks for the help! Thanks for sharing your issues. This is the only way to address them. That said, as far on no board I had access to I was able to reproduce the unstable bits issue. It was always something else. Thanks, //richard