From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josef Bacik Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Working towards better power fail testing Date: Tue, 6 Jan 2015 11:39:27 -0500 Message-ID: <54AC0FBF.3020104@fb.com> References: <5486221D.6000006@fb.com> <20141210112759.GC25671@quack.suse.cz> <54886242.6050704@fb.com> <20150105214755.GA31508@dastard> <20150106085347.GA15729@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: Sage Weil , , To: Jan Kara , Dave Chinner Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:36906 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756165AbbAFQjp (ORCPT ); Tue, 6 Jan 2015 11:39:45 -0500 In-Reply-To: <20150106085347.GA15729@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 01/06/2015 03:53 AM, Jan Kara wrote: > On Tue 06-01-15 08:47:55, Dave Chinner wrote: >>> As things stand now the other devs are loathe to touch any remotely exotic >>> fs call, but that hardly seems ideal. Hopefully a common framework for >>> powerfail testing can improve on this. Perhaps there are other ways we >>> make it easier to tell what is (well) tested, and conversely ensure that >>> those tests are well-aligned with what real users are doing... >> >> We don't actually need power failure (or even device failure) >> infrastructure to test data integrity on failure. Filesystems just >> need a shutdown method that stops any IO from being issued once the >> shutdown flag is set. XFS has this and it's used by xfstests via the >> "godown" utility to shut the fileystem down in various >> circumstances. We've been using this for data integrity and log >> recovery testing in xfstests for many years. >> >> Hence we know if the device behaves correctly w.r.t cache flushes >> and FUA then the filesystem will behave correctly on power loss. We >> don't need a device power fail simulator to tell us violating >> fundamental architectural assumptions will corrupt filesystems.... > I think that fs ioctl cannot easily simulate the situation where > on-device volatile caches aren't properly flushed in all the necessary > cases (we had a bugs like this in ext3/4 in the past which were hit by real > users). > Agreed, my dm thing was meant to expose problems where we do not wait on IO properly before writing our super, a problem we've had at least twice so far. I wanted something that was nice and simple and would quickly expose these kind of bugs. > I also think that simulating the device failure in a different layer is > simpler than checking for superblock flag in all the places where the > filesystem submits IO (e.g. ext4 doesn't have dedicated buffer layer like > xfs has and we rely on flusher thread to flush committed metadata to final > location on disk so that writeback path completely avoids ext4 code - it's > a generic writeback of the block device mapping). So I like the solution > with the dm target more than a fs ioctl although I agree that it's more > clumsy from the xfstests perspective. > So I'm working in support to xfstests fsx to emit the proper dm messages when it does an fsync so we can easily build a test to stress test fsync in all the horrible ways that fsx works. Building tests around the dm target I've written is pretty simple, you just do something like create device mkfs device mark the mkfs in the log mount device do your operations unmount replay log in whichever way you want and verify the contents The replay thing is accomplished by the library and some helper functions in xfstests, so it's no more awkward than what we do with dm flakey, and gives us a bit more reproduce-ability and lets us check more esoteric failure conditions. Like Jan says, we all do things differently, we are all our own little snowflakes, I feel like a dm target is a nice solution where we can impose a certain set of rules in very little code and all agree that it's correct, and then build tests around that. Then our current fs'es will be well tested and any new fs'es will be equally well tested, all without having to add fs specific code that could be buggy. Thanks, Josef