From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756647AbbESQhs (ORCPT ); Tue, 19 May 2015 12:37:48 -0400 Received: from relay6-d.mail.gandi.net ([217.70.183.198]:40610 "EHLO relay6-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754311AbbESQhq (ORCPT ); Tue, 19 May 2015 12:37:46 -0400 X-Originating-IP: 50.43.43.179 Date: Tue, 19 May 2015 09:37:40 -0700 From: Josh Triplett To: "Theodore Ts'o" , Lukas Czerner , linux-kernel@vger.kernel.org Subject: Re: Nature of ext4 corruption fixed by recent patch? Message-ID: <20150519163739.GC2598@x> References: <20150518225824.GA21502@cloud> <20150519134005.GB20421@thunk.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150519134005.GB20421@thunk.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 19, 2015 at 09:40:05AM -0400, Theodore Ts'o wrote: > On Mon, May 18, 2015 at 03:58:24PM -0700, josh@joshtriplett.org wrote: > > > > I recently had my server's filesystem implode, and I'm currently in the > > process of cleaning it up. It had widespread corruption in files and > > directories scattered across the filesystem, though all vaguely recently > > changed. Directories appeared corrupted or truncated, various files > > showed up as piles of NULs, and 5000+ files and directories ended up in > > lost+found. I observed this corruption shortly after a reboot into > > 4.0.2 (from a previous kernel of 3.16), with ext4 noticing an > > inconsistency and mounting the filesystem read-only. The underling > > disks had no errors. > > > > Reading about the corruption issue fixed by > > d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption > > caused by unwritten and delayed extents"), it sounds plausible. Can > > that strike both file data and directory data, assuming all of that data > > ended up grouped with a delayed extent? Would that bug manifest as > > corrupted directories and files filled with NULs? The system is a > > 72-way server on which I was doing piles of parallel git pulls and > > builds, so hitting a race seems plausible. > > Unfortunately, I don't think you can blame all of your problems on the > bug fixed by this particular bug. First of all, it doesn't apply to > directories at all; secondly, it's been around for a long time. I'd > have to check and see whether or not 3.16 had the problem, but it > wouldn't surprise me at all. Finally, git pulls and builds are not > at all likely to hit the problem. > > It requires the combination of (a) writing to a portion of a file that > was not previously allocated using buffered I/O, (b) an fallocate of a > region of the file which is a superset of region written in (a) before > it has chance to be written to disk, (c) waiting for the file data in > (a) to be written out to disk (either via fsync or via the writeback > daemons), and then (d) before the extent status cache gets pushed out > of memory, another random write to a portion of the file covered by > (a) -- in which case that specific portion of (a) could be replaced by > all zeros. > > Even most database or torrent downloads are not likely to hit this > pattern, since it requires an fallocate of a previous previously (and > very recently) allocated region of a file using a buffered write. > Torrent downloads will tend to fallocate the whole file in advance, > and while Oracle or DB2 might intermix writes and fallocates, they > don't fallocate previously written regions of the file, and they use > direct I/O in any case. Ah, thanks for the clarification. :( In particular, I didn't realize this was *only* the data of the delayed-extent-based files. The bug here seems to have struck various recently-written files and directories. (Recent in days, not seconds, as far as I can tell; and it isn't universal based on age.) The initial symptom was ext4 noticing that a directory was corrupt (truncated, IIRC) and immediately marking the whole filesystem read-only. > So it's pretty hard to hit this bug by accident, unless you happen to > be using fsx, and even then, the only files that would get corrupted > would be the files being written using fsx. So I'm afraid you'll have > to look farther afield, and consider other bugs as well as potential > hardware problems before trusting the system again. I'm quite skeptical of hardware problems. The system is a few months old, well past infant-mortality and too young for burnout. And I've tested the disks carefully. Are there any other known bugs that seem likely to fit the symptoms and circumstances? Note that since I saw this after rebooting from 3.16 into 4.0.2, I don't know whether the corruption was more likely caused by 3.16 or 4.0.2. > P.S. It's bugs like these which is why I'm always amused by people > who think that just because a file system is safely being used by > their developers, that it's safe to throw production workloads on > them. Heh. Yeah, I like exciting new software in most areas, but not in filesystems. In filesystems I prefer boring. :) > These sorts of subtle data corruptors tend to be highly timing > depend, and very hard to find. Sometimes these bugs can hang around > for years before they are found and fixed. The flip side is that > fortunately, they tend to strike very rarely. ...lucky me. > It's also why I'm very > grateful for developers like Jan and Lukas. :-) Indeed. - Josh Triplett