From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265005AbUELDmJ (ORCPT ); Tue, 11 May 2004 23:42:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265006AbUELDmJ (ORCPT ); Tue, 11 May 2004 23:42:09 -0400 Received: from thunk.org ([140.239.227.29]:53158 "EHLO thunker.thunk.org") by vger.kernel.org with ESMTP id S265005AbUELDmE (ORCPT ); Tue, 11 May 2004 23:42:04 -0400 Date: Tue, 11 May 2004 23:41:23 -0400 From: "Theodore Ts'o" To: Valdis.Kletnieks@vt.edu Cc: root@chaos.analogic.com, Junfeng Yang , Linux Kernel Mailing List , ext2-devel@lists.sourceforge.net, mc@cs.Stanford.EDU, madan@cs.Stanford.EDU, "David L. Dill" Subject: Re: [Ext2-devel] Re: [CHECKER] e2fsck writes out blocks out of order, causing root dir to be corrupted (ext3, linux 2.4.19, e2fsprogs 1.34) Message-ID: <20040512034123.GA6469@thunk.org> Mail-Followup-To: Theodore Ts'o , Valdis.Kletnieks@vt.edu, root@chaos.analogic.com, Junfeng Yang , Linux Kernel Mailing List , ext2-devel@lists.sourceforge.net, mc@cs.Stanford.EDU, madan@cs.Stanford.EDU, "David L. Dill" References: <200405120309.i4C39jUZ017062@turing-police.cc.vt.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200405120309.i4C39jUZ017062@turing-police.cc.vt.edu> User-Agent: Mutt/1.5.5.1+cvs20040105i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 11, 2004 at 11:09:45PM -0400, Valdis.Kletnieks@vt.edu wrote: > > I suspect this bug is merely a special case of "your filesystem can > get scrogged if something's doing caching behind your back" - the > same sort of issues that prompted recent "flush the IDE cache on > shutdown" fixes, and the well-known issues with using journalling > file systems on a file-backed loopback device. No, not really. This is the case of "we moved code that was original kernel-space to user space", and we failed to simulate certain functions, such as sync_blockdev(). > Having said that, I admit being surprised that their demonstration > test case is *that* simple - that's a truly small number of I/Os to > get it into a repeatably corruptible state. I'm sure many of us > have a mental image of these class of failures as being heisenbugs, > dependent on the cache contents. Well, the demonstration test case *wasn't* that simple. It required the system crashing twice at very specific points. Once to create the filesystem requiring a journal replay, and then a second crash at exactly the right time in the middle of e2fsck's journal replaying code. This failure would be fairly hard to replicate in real-life conditions, since it would require to crashes in quick succession, at very carefully chosen points, although if you had really flaky AC mains, I suppose it might be considered a more likely failure case. > Hardly - the class of errors is one that does (or should) concern > the kernel community - and I don't consider identifying a "your > filesystem *will* be toast if you get into this repeatable scenario" > a troll. At the very least, we can consider what additional > hardening we can do to either the kernel or userspace to make sure > that we don't re-order the blocks - note the key phrase here: > > "Neither of these pay attention to the journaling constraints of > EXT3 and JBD." Well, actually it was the e2fsprogs user space code that wasn't paying attention t the journalling constraints, mainly because we had been we weren't faithfully implementing sync_blockdev()/fsync_no_super(). The patch to fix this in e2fsprogs was fairly simple. - Ted