From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752994AbcFNU3P (ORCPT ); Tue, 14 Jun 2016 16:29:15 -0400 Received: from down.free-electrons.com ([37.187.137.238]:50424 "EHLO mail.free-electrons.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752205AbcFNU3N (ORCPT ); Tue, 14 Jun 2016 16:29:13 -0400 Date: Tue, 14 Jun 2016 22:29:10 +0200 From: Boris Brezillon To: "George Spelvin" Cc: beanhuo@micron.com, computersforpeace@gmail.com, linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, richard@nod.at Subject: Re: [PATCH 2/4] mtd: nand: implement two pairing scheme Message-ID: <20160614222910.0f9ff7c2@bbrezillon> In-Reply-To: <20160614090726.20977.qmail@ns.sciencehorizons.net> References: <20160612231314.15d06854@bbrezillon> <20160614090726.20977.qmail@ns.sciencehorizons.net> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14 Jun 2016 05:07:26 -0400 "George Spelvin" wrote: > Boris Brezillon wrote: > > On 12 Jun 2016 16:24:53 George Spelvin wrote: > >> Boris Brezillon wrote: > >> My problem is that I don't really understand MLC programming. > > > I came to the same conclusion: we really have these 2 cases in the > > wild, which makes it even more complicated to define a standard > > behavior. > > I did find a useful stuy of the issue: "Program Interference in MLC NAND > Flash Memory: Characterization, Modeling, and Mitigation" > > https://users.ece.cmu.edu/~omutlu/pub/flash-programming-interference_iccd13.pdf > > It describes the write-disturb-precompensation technique, and also > shows how the two-stage programming works. (Although the fact that the > "least significant bit" is the *largest* voltage difference and is shown > on the *left* makes no sense at all.) I think I read this document back when I started to look at how MLC NANDs were working, but I didn't have the background to understand what all this meant. Reading it again makes a lot more sense, and actually I now understand why those NANDs require data scrambling/randomization (their program disturb modeling was done with random data, which makes it irrelevant when repeated pattern are programmed). > > Looking at the demonstrated programming sequence, it looks like > it should be possible to probe for the bit assignment. If you have > a half-programmed page, then any bits programmed to "0" are actually > sitting close to the threshold between the two middle voltage levels. > > So you'll get a lot of errors reading them as "1", but the interesting > part is the read-back of the unprogrammed bit. > > If the chip is using the binary sequence, you'll read either 10 or 01. > If the chip us ising the Gray-code sequence, you'll read 10 or 00. > > Basically, you read both pages and see which bit combination never > appears. That is the combination that corresponds to the highest voltage > level. > > Another interesting paper is "Read Disturb Errors in MLC NAND Flash > Memory: Characterization, Mitigation, and Recovery" > https://users.ece.cmu.edu/~omutlu/pub/flash-read-disturb-errors_dsn15.pdf > > That talks about tricks that do as you observe: increase read error to start. > (In order to decreaease read disturb, and thus read errors later.) > > >> It's more considering it to have 16K pages that can be accessed in half-pages. > > > Yes, I know, but it's not really easy to fake that at the NAND level, > > because programming 2 pages still requires 2 page program operation. > > The MTD user could detect that the pairing scheme always exposes 2 > > consecutive non-paired pages, but as you've seen, this condition does > > not necessarily imply the 'pair coupling' constraint, and we don't want > > to increase the min_io_size value if it's not really necessary. > > Ideally, it would be nice to separate the "SLC hack" from the "later > write failures can corrupt earlier data" workaround. > > First, you get the latter working on SLC flash. Then you add MLC, and > make MLC another reason why it can happen. > > But I'm not certain this is actually necessary. Could listing 4 pages > rather than 2 as in other data sheets just be an editing or translation > error? Maybe someoe got confused about "in the same row" when they > wrote that clarifying example. > > > I'm just realizing this is actually a non-issue for the solution we > > developed with Ricard. As I said, it's unsafe to partially write a > > block in MLC mode, so the only sane way is either to write a block in > > SLC mode, or atomically write a block in MLC mode, and that's what > > we're doing with our 'UBI LEB consolidation' approach. I'm pretty sure > > the problem described in the Hynix datasheet does not happen when only > > writing in SLC mode. So, even if the pairing scheme does not account > > for this extra 'coupling' constraint, we should be safe. > > I can't see any reason why it would affect MLC and not SLC. -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com