From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.newsguy.com ([74.209.136.69]) by canuck.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1RM0Zx-0004V1-Ve for linux-mtd@lists.infradead.org; Thu, 03 Nov 2011 16:55:50 +0000 Message-ID: <4EB2C760.3030705@newsguy.com> Date: Thu, 03 Nov 2011 09:54:56 -0700 From: Mike Dunn MIME-Version: 1.0 To: Ivan Djelic Subject: Re: [PATCH v2] mtd: nand: Add driver for M-sys / Sandisk diskonchip G4 References: <1320274859-3270-1-git-send-email-mikedunn@newsguy.com> <20111103085824.GA28465@parrot.com> In-Reply-To: <20111103085824.GA28465@parrot.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "linux-mtd@lists.infradead.org" List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 11/03/2011 01:58 AM, Ivan Djelic wrote: > Hi Mike, > A few comments below: > >> + >> +/* value generated by the HW ecc generator upon reading blank page */ >> +static uint8_t blank_read_hwecc[7] = { >> + 0xcf, 0x72, 0xfc, 0x1b, 0xa9, 0xc7, 0xb9 }; > Using this blank ecc value to detect blank pages after reading them is not that > reliable, because if a bitflip appears in an erased page, the HW ecc generator > will generate a completely different sequence of bytes. Your driver will think > the page is not blank, it will try to correct errors and fail. And UBIFS will > not appreciate that. > > I can see two cleaner alternatives to solve this issue: > > 1. When you program a page, before writing hwecc to oob, adjust it like this: > > hwecc[i] ^= blank_read_hwecc[i]^0xff; > > The effect of this is that you now get 0xffs as ecc for blank pages, and bitflip > correction on erased pages for free. This is transparent to your controller, > because this "adjustment" cancels itself upon reading when calc_ecc^recv_ecc is > computed. This is neat! Will try this today. > 2. Use unprotected spare oob byte 15 as a programmming marker: remove it from > the oob_free list, and force it to 0 when you program a page. Now, you can > easily detect if a page is blank or has been programmed by checking this byte. > You can for instance count the number of bits set to 1 in the byte, and decide > it is blank if that number is greater than 4; this ensures you are robust to > bitflips in the marker byte itself. > > My preference would go to option 2 in your case. Hmm, another good idea. I haven't given much thought to robustness until now. Heck, I never expected to get this driver working right at all. Once again, I'm in your debt, Ivan. >> + >> + /* fix the errors */ >> + fixederrs = 0; >> + for (i = 0; i < numerrs; i++) { >> + if (errpos[i] > DOCG4_USERDATA_LEN * 8) >> + continue; /* error in ecc oob bytes; ignore */ >> + change_bit(errpos[i], (unsigned long *)buf); >> + fixederrs++; >> + } > Here, your check on errorpos[i] to skip ecc oob locations is correct; but a > bitflip (in ecc bytes or elsewhere) is an indication that the nand page needs > scrubbing (you don't want bitflips to accumulate), so you should increase > fixederrs in that case also. Thanks. Will make this change. > If you see a _lot_ of scrubbing operations from UBI, you may also consider > concealing bitflip corrections; i.e. report them only if the number of corrected > bits is above a certain threshold (e.g. >= 3 in your case). Something to keep in mind when I start stress testing with ubifs. I have noticed some scrubbing in the testing so far. At one point, it marked a block as bad, so this may be an issue. The error rate seems erratic. Sometimes there are many, and even the occasional uncorrectable error. Lately, there have just a few one-bit errors when I run nandtest over the whole device. I noticed this inconsistency while observing operation under the PalmOS code as well, so it's not a problem in my driver. Maybe due to solar activity :-) Thanks again, Mike