From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756647AbbESQhs (ORCPT <rfc822;w@1wt.eu>);
	Tue, 19 May 2015 12:37:48 -0400
Received: from relay6-d.mail.gandi.net ([217.70.183.198]:40610 "EHLO
	relay6-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754311AbbESQhq (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 19 May 2015 12:37:46 -0400
X-Originating-IP: 50.43.43.179
Date: Tue, 19 May 2015 09:37:40 -0700
From: Josh Triplett <josh@joshtriplett.org>
To: "Theodore Ts'o" <tytso@mit.edu>, Lukas Czerner <lczerner@redhat.com>,
        linux-kernel@vger.kernel.org
Subject: Re: Nature of ext4 corruption fixed by recent patch?
Message-ID: <20150519163739.GC2598@x>
References: <20150518225824.GA21502@cloud>
 <20150519134005.GB20421@thunk.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150519134005.GB20421@thunk.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 19, 2015 at 09:40:05AM -0400, Theodore Ts'o wrote:
> On Mon, May 18, 2015 at 03:58:24PM -0700, josh@joshtriplett.org wrote:
> > 
> > I recently had my server's filesystem implode, and I'm currently in the
> > process of cleaning it up.  It had widespread corruption in files and
> > directories scattered across the filesystem, though all vaguely recently
> > changed.  Directories appeared corrupted or truncated, various files
> > showed up as piles of NULs, and 5000+ files and directories ended up in
> > lost+found.  I observed this corruption shortly after a reboot into
> > 4.0.2 (from a previous kernel of 3.16), with ext4 noticing an
> > inconsistency and mounting the filesystem read-only.  The underling
> > disks had no errors.
> > 
> > Reading about the corruption issue fixed by
> > d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption
> > caused by unwritten and delayed extents"), it sounds plausible.  Can
> > that strike both file data and directory data, assuming all of that data
> > ended up grouped with a delayed extent?  Would that bug manifest as
> > corrupted directories and files filled with NULs?  The system is a
> > 72-way server on which I was doing piles of parallel git pulls and
> > builds, so hitting a race seems plausible.
> 
> Unfortunately, I don't think you can blame all of your problems on the
> bug fixed by this particular bug.  First of all, it doesn't apply to
> directories at all; secondly, it's been around for a long time.  I'd
> have to check and see whether or not 3.16 had the problem, but it
> wouldn't surprise me at all.  Finally, git pulls and builds are not
> at all likely to hit the problem.
>
> It requires the combination of (a) writing to a portion of a file that
> was not previously allocated using buffered I/O, (b) an fallocate of a
> region of the file which is a superset of region written in (a) before
> it has chance to be written to disk, (c) waiting for the file data in
> (a) to be written out to disk (either via fsync or via the writeback
> daemons), and then (d) before the extent status cache gets pushed out
> of memory, another random write to a portion of the file covered by
> (a) -- in which case that specific portion of (a) could be replaced by
> all zeros.
> 
> Even most database or torrent downloads are not likely to hit this
> pattern, since it requires an fallocate of a previous previously (and
> very recently) allocated region of a file using a buffered write.
> Torrent downloads will tend to fallocate the whole file in advance,
> and while Oracle or DB2 might intermix writes and fallocates, they
> don't fallocate previously written regions of the file, and they use
> direct I/O in any case.

Ah, thanks for the clarification. :(

In particular, I didn't realize this was *only* the data of the
delayed-extent-based files.  The bug here seems to have struck various
recently-written files and directories.  (Recent in days, not seconds,
as far as I can tell; and it isn't universal based on age.) The initial
symptom was ext4 noticing that a directory was corrupt (truncated, IIRC)
and immediately marking the whole filesystem read-only.

> So it's pretty hard to hit this bug by accident, unless you happen to
> be using fsx, and even then, the only files that would get corrupted
> would be the files being written using fsx.  So I'm afraid you'll have
> to look farther afield, and consider other bugs as well as potential
> hardware problems before trusting the system again.

I'm quite skeptical of hardware problems.  The system is a few months
old, well past infant-mortality and too young for burnout.  And I've
tested the disks carefully.

Are there any other known bugs that seem likely to fit the symptoms and
circumstances?

Note that since I saw this after rebooting from 3.16 into 4.0.2, I don't
know whether the corruption was more likely caused by 3.16 or 4.0.2.

> P.S.  It's bugs like these which is why I'm always amused by people
> who think that just because a file system is safely being used by
> their developers, that it's safe to throw production workloads on
> them.

Heh.  Yeah, I like exciting new software in most areas, but not in
filesystems.  In filesystems I prefer boring. :)

> These sorts of subtle data corruptors tend to be highly timing
> depend, and very hard to find.  Sometimes these bugs can hang around
> for years before they are found and fixed.  The flip side is that
> fortunately, they tend to strike very rarely.

...lucky me.

> It's also why I'm very
> grateful for developers like Jan and Lukas.  :-)

Indeed.

- Josh Triplett