From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Chinner Subject: Re: Fix(es) for ext2 fsync bug Date: Thu, 15 Feb 2007 10:32:44 +1100 Message-ID: <20070214233244.GW44411608@melbourne.sgi.com> References: <20070214195453.GB7521@nifty> <20070214203101.GQ44411608@melbourne.sgi.com> <1171488382.13092.5.camel@kleikamp.austin.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Chinner , Valerie Henson , linux-fsdevel@vger.kernel.org, Can Sar , Junfeng Yang , Dawson Engler , "Theodore Ts'o" To: Dave Kleikamp Return-path: Received: from omx2-ext.sgi.com ([192.48.171.19]:52906 "EHLO omx2.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751384AbXBNXdh (ORCPT ); Wed, 14 Feb 2007 18:33:37 -0500 Content-Disposition: inline In-Reply-To: <1171488382.13092.5.camel@kleikamp.austin.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Feb 14, 2007 at 03:26:22PM -0600, Dave Kleikamp wrote: > On Thu, 2007-02-15 at 07:31 +1100, David Chinner wrote: > > On Wed, Feb 14, 2007 at 11:54:54AM -0800, Valerie Henson wrote: > > > Just some quick notes on possible ways to fix the ext2 fsync bug that > > > eXplode found. Whether or not anyone will bother to implement it is > > > another matter. > > > > > > Background: The eXplode file system checker found a bug in ext2 fsync > > > behavior. Do the following: truncate file A, create file B which > > > reallocates one of A's old indirect blocks, fsync file B. If you then > > > crash before file A's metadata is all written out, fsck will complete > > > the truncate for file A... thereby deleting file B's data. So fsync > > > file B doesn't guarantee data is on disk after a crash. Details: > > > > > > http://www.stanford.edu/~engler/explode-osdi06.pdf > > > > > > Two possible solutions I can think of: > > > > > > * Rearrange order of duplicate block checking and fixing file size in > > > fsck. Not sure how hard this is. (Ted?) > > > > > > * Keep a set of "still allocated on disk" block bitmaps that gets > > > flushed whenever a sync happens. Don't allocate these blocks. > > > Journaling file systems already have to do this. > > > > You don't need anything on disk or to fsck to fix this problem - just > > avoid it completely by keeping a list of recently truncated blocks in > > memory and don't reuse them until the old owner inode is sync'd to disk. > > I think that's pretty much what Val is suggesting. She suggests bitmaps > rather than a list though. Maybe she should have used a better term than > "flushed", as this list only needs to be cleared, rather than written to > disk. Yeah, probably was - I misparsed the still allocated on disk block bitmaps phrase differently to what may have been intended... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group