From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lazybastard.de ([212.112.238.170] helo=longford.lazybastard.org) by bombadil.infradead.org with esmtps (Exim 4.68 #1 (Red Hat Linux)) id 1JHcnh-0002qS-4h for linux-mtd@lists.infradead.org; Wed, 23 Jan 2008 10:25:51 +0000 Date: Wed, 23 Jan 2008 11:19:15 +0100 From: =?utf-8?B?SsO2cm4=?= Engel To: Ricard Wanderlof Subject: Re: Jffs2 and big file = very slow jffs2_garbage_collect_pass Message-ID: <20080123101915.GA24953@lazybastard.org> References: <20080118181744.GA15039@lazybastard.org> <4794C107.7070600@parrot.com> <20080121212555.GA14472@lazybastard.org> <20080121161612.3ca2f093@zod.rchland.ibm.com> <20080121222952.GC14472@lazybastard.org> <4795AFE3.506@parrot.com> <20080122120300.GA18884@lazybastard.org> <20080122150514.GD18884@lazybastard.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Cc: linux-mtd@lists.infradead.org, =?utf-8?B?SsO2cm4=?= Engel , David Woodhouse , Josh Boyer , Matthieu CASTET List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, 23 January 2008 10:23:55 +0100, Ricard Wanderlof wrote: > On Tue, 22 Jan 2008, Jörn Engel wrote: > > >- Moderate: one block continuously spews -EUCLEAN, then becomes > > terminally bad. > > If those are just random bitflips, garbage collection will move the > > data sooner or later. Logfs does not force GC to happen soon when > > encountering -EUCLEAN, which it should. Are correctable errors an > > indication of block going bad in the near future? If yes, I should do > > something about it. > > I would say that correctable errors occurring "soon" after writing are an > indication that the block is going bad. My experience has been that > extensive reading can cause bitflips (and it probably happens over time > too), but that for fresh blocks, billions of read operations need to be > done before a bit flips. For blocks that are nearing their best before > date, a couple of hundred thousand reads can cause a bit to flip. So if I > was implementing some sort of 'when is this block considered > bad'-algorithm, I'd try to keep tabs on how often the block has been > (read-) accessed in relation to when it was last writen. If this number is > "low", the block should be considered bad and not used again. That sounds like an impossible strategy. Causing a write for every read will significantly increase write pressure, thereby reduce flash lifetime, reduce performance etc. What would be possible was a counter for soft/hard errors per physical block. On soft error, move data elsewhere and reuse the block, but increment the error counter. If the counter increases beyond 17 (or any other random number), mark the block as bad. Limit can be an mkfs option. > I'm also think that when (if) logfs decides a block is bad, it should mark > it bad using mtd->block_markbad(). That way, if the flash is rewritten by > something else than logfs (say during a firmware upgrade), bad blocks can > be handled in a consistent and startad way. Maybe I should revive the old patch then. I don't think it matters much either way. > We ran some tests here on a particular flash chip type to try and > determine at least some of the failure modes that are related to block > wear (due to write/erase) and bit decay (due to reading). The end result > was basically what I tried to describe above, but I can go into more > detail if you're interested. I do remember your mail describing the test. One of the interesting conclusions is that even awefully worn out block is still good enough to store short-lived information. It appears to be a surprisingly robust strategy to have a high wear-out, as long as you keep the wear constantly high and replace block contents at a high rate. Jörn -- You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. -- Rob Pike