From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from a.ns.miles-group.at ([95.130.255.143] helo=radon.swed.at) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YB6ZS-0004aA-LT for linux-mtd@lists.infradead.org; Tue, 13 Jan 2015 18:52:07 +0000 Message-ID: <54B5693C.6020700@nod.at> Date: Tue, 13 Jan 2015 19:51:40 +0100 From: Richard Weinberger MIME-Version: 1.0 To: Brian Norris Subject: Re: [PATCH] mtd: nand: default bitflip-reporting threshold to 75% of correction strength References: <54B38745.70007@atmel.com> <1421095889-12717-1-git-send-email-computersforpeace@gmail.com> <54B51CCA.1090707@nod.at> <20150113184805.GS9759@ld-irv-0074> In-Reply-To: <20150113184805.GS9759@ld-irv-0074> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: Ricard Wanderlof , Steve deRosier , Josh Wu , "linux-mtd@lists.infradead.org" , Ezequiel Garcia , Huang Shijie List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Brian, Am 13.01.2015 um 19:48 schrieb Brian Norris: > Hi Richard, > > On Tue, Jan 13, 2015 at 02:25:30PM +0100, Richard Weinberger wrote: >> Am 12.01.2015 um 21:51 schrieb Brian Norris: >>> The MTD API reports -EUCLEAN only if the maximum number of bitflips >>> found in any ECC block exceeds a certain threshold. This is done to >>> avoid excessive -EUCLEAN reports to MTD users, which may induce >>> additional scrubbing of data, even when the ECC algorithm in use is >>> perfectly capable of handling the bitflips. >>> >>> This threshold can be controlled by user-space (via sysfs), to allow >>> users to determine what they are willing to tolerate in their >>> application. But it still helps to have sane defaults. >>> >>> In recent discussion [1], it was pointed out that our default threshold >>> is equal to the correction strength. That means that we won't actually >>> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there >>> are almost too many to handle. It was determined that 3/4 of the >>> correction strength is probably a better default. >>> >>> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html >> >> I like this change but I have one question. >> >> UBI will treat a block as bad if it shows bitflips (EUCLEAN) right >> after erasure. > > Can you elaborate? When "after erasure"? The closest I see is that UBI > will mark a block bad if it sees an -EIO failure from sync_erase() in > erase_worker(). If you have extra debug checks on, then > ubi_self_check_all_ff() could potentially give you bitflip problems > after the erase, but that's an odd corner case anyway, which many > drivers have been handling in hacked together ad-hoc ways anyway (search > for "bitflips in erase pages"). > > So I can't pinpoint what you're talking about, exactly. See torture_peb() out: mutex_unlock(&ubi->buf_mutex); if (err == UBI_IO_BITFLIPS || mtd_is_eccerr(err)) { /* * If a bit-flip or data integrity error was detected, the test * has not passed because it happened on a freshly erased * physical eraseblock which means something is wrong with it. */ ubi_err(ubi, "read problems on freshly erased PEB %d, must be bad", pnum); err = -EIO; } >> For SLC NAND this works very well. >> Does this also hold for MLC NAND? If one or two bit flips are okay >> even for a freshly erased MLC NAND this change could cause UBI to >> mark good blocks as bad depending on the ECC strength. > > I would typically assume that MLC NAND users must be using significantly > stronger ECC (e.g., 12-bit / 512-byte, at least), so "one or two > bitflips" would still fall well under 75% of 12 bits. Same here. I just want to make sure that UBI does not assume a perfect NAND world. :) Thanks, //richard