From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-vbr13.xs4all.nl ([194.109.24.33]) by bombadil.infradead.org with esmtp (Exim 4.69 #1 (Red Hat Linux)) id 1MggN6-0006le-Ik for linux-mtd@lists.infradead.org; Thu, 27 Aug 2009 14:54:45 +0000 Received: from mail3.aimsys.nl (a80-127-156-242.adsl.xs4all.nl [80.127.156.242]) by smtp-vbr13.xs4all.nl (8.13.8/8.13.8) with ESMTP id n7REsaeU079514 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 27 Aug 2009 16:54:37 +0200 (CEST) (envelope-from nvbolhuis@aimvalley.nl) Message-ID: <4A969E2C.7080302@aimvalley.nl> Date: Thu, 27 Aug 2009 16:54:36 +0200 From: Norbert van Bolhuis MIME-Version: 1.0 To: Mtd List Subject: Interest in making jffs2 physically correct NAND bit-flips Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , First some background: Unfortunately we have deployed products that have a NAND device which suffers from bit-flips. Occasionally uncorrectable bit-flips occur (-EBADMSG) causing data loss. We use JFFS2. As known JFFS2 detects and corrects single bit-flips (per 256 byte subpage) but it doesn't physically correct them on the NAND device itself. To prevent further trouble (2 bit-flips in one subpage) we've made JFFS2 correct NAND bit-flips on the NAND device itself (and try to upgrade the deployed products a.s.a.p.). Btw. in our case the bit-flips are caused by a single bits charge loss causing a programmed 0 to become 1. I know UBI(FS) already corrects bit-flips on NAND, but it is not an option for us to upgrade deployed products to another flash fs. Besides we're using the good old linux-2.4 kernel. My question is: is there any interest in an mtd-2.6.git patch to make jffs2 correct NAND bit-flips ? I'm asking because it is quite some work to forward patch (and test) the fix to mtd-2.6.git. Also note that most likely reviews/corrections (code style issues, bugs, etc..) have to be made (after all I'm not a kernel/mtd developer). Of course I'm willing to process all corrections/comments/questions that one may have. We make jffs2 correct the NAND bit-flips in a couple of (simple) steps: -1- detect bit-flip and post JEB to a worker_list -2- process worker_list (in a separate thread) by moving the JEB from regular jffs2 list (e.g. dirty_list) to the bad_used list. -3- trigger JFFS2 GC to process the JEB on the bad_used list JFFS2 GC does the rest (moving the valid data to another block and erase the JEB). The fix can be turned on/off with a kernel config #define. Since we have quite some static data on the NAND (cached by the kernel) reading NAND blocks and detecting bit-flips also happens regularly (by a seperate thread). We also have a lot of free space (and not a lot of dirty nodes) so originally GC never got triggered. Any feedback is welcome.