From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing Date: Tue, 6 Jan 2015 09:53:47 +0100 Message-ID: <20150106085347.GA15729@quack.suse.cz> References: <5486221D.6000006@fb.com> <20141210112759.GC25671@quack.suse.cz> <54886242.6050704@fb.com> <20150105214755.GA31508@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Sage Weil , Josef Bacik , Jan Kara , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org To: Dave Chinner Return-path: Received: from cantor2.suse.de ([195.135.220.15]:34239 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754604AbbAFIxx (ORCPT ); Tue, 6 Jan 2015 03:53:53 -0500 Content-Disposition: inline In-Reply-To: <20150105214755.GA31508@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue 06-01-15 08:47:55, Dave Chinner wrote: > > As things stand now the other devs are loathe to touch any remotely exotic > > fs call, but that hardly seems ideal. Hopefully a common framework for > > powerfail testing can improve on this. Perhaps there are other ways we > > make it easier to tell what is (well) tested, and conversely ensure that > > those tests are well-aligned with what real users are doing... > > We don't actually need power failure (or even device failure) > infrastructure to test data integrity on failure. Filesystems just > need a shutdown method that stops any IO from being issued once the > shutdown flag is set. XFS has this and it's used by xfstests via the > "godown" utility to shut the fileystem down in various > circumstances. We've been using this for data integrity and log > recovery testing in xfstests for many years. > > Hence we know if the device behaves correctly w.r.t cache flushes > and FUA then the filesystem will behave correctly on power loss. We > don't need a device power fail simulator to tell us violating > fundamental architectural assumptions will corrupt filesystems.... I think that fs ioctl cannot easily simulate the situation where on-device volatile caches aren't properly flushed in all the necessary cases (we had a bugs like this in ext3/4 in the past which were hit by real users). I also think that simulating the device failure in a different layer is simpler than checking for superblock flag in all the places where the filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like xfs has and we rely on flusher thread to flush committed metadata to final location on disk so that writeback path completely avoids ext4 code - it's a generic writeback of the block device mapping). So I like the solution with the dm target more than a fs ioctl although I agree that it's more clumsy from the xfstests perspective. Honza -- Jan Kara SUSE Labs, CR