linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Nix <nix@esperi.org.uk>
Cc: Eric Sandeen <sandeen@redhat.com>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Bryan Schumaker <bjschuma@netapp.com>,
	Peng Tao <bergwolf@gmail.com>,
	Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org,
	linux-nfs@vger.kernel.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Fri, 26 Oct 2012 16:56:18 -0400	[thread overview]
Message-ID: <20121026205618.GC8614@thunk.org> (raw)
In-Reply-To: <87wqydx957.fsf@spindle.srvr.nix>

On Fri, Oct 26, 2012 at 09:37:08PM +0100, Nix wrote:
> 
> I can reproduce this on a small filesystem and stick the image somewhere
> if that would be of any use to anyone. (If I'm very lucky, merely making
> this offer will make the problem go away. :} )

I'm not sure the image is going to be that useful.  What we really
need to do is to get a reliable reproduction of what _you_ are seeing.

It's clear from Eric's experiments that journal_checksum is dangerous.
In fact, I will likely put it under an #ifdef EXT4_EXPERIMENTAL to try
to discourage people from using it in the future.  There are things
I've been planning on doing to make it be safer, but there's a very
good *reason* that both journal_checksum and journal_async_commit are
not on by default.

That's why one of the things I asked you to do when you had time was
to see if you could reproduce the problem you are seeing w/o
nobarrier,journal_checksum,journal_async_commit.

The other experiment that would be really useful if you could do is to
try to apply these two patches which I sent earlier this week:

[PATCH 1/2] ext4: revert "jbd2: don't write superblock when if its empty
[PATCH 2/2] ext4: fix I/O error when unmounting an ro file system

... and see if they make a difference.

If they don't make a difference, I don't want to apply patches just
for placebo/PR reasons.  And for Eric at least, he can reproduce the
journal checksum error followed by fairly significant corruption
reported by e2fsck with journal_checksum, and the presence or absense
of these patches make no difference for him.  So I really don't want
to push these patches to Linus until I get confirmation that they make
a difference to *somebody*.

Regards,

						- Ted

  reply	other threads:[~2012-10-26 20:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87objupjlr.fsf@spindle.srvr.nix>
2012-10-23  1:33 ` Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o [this message]
2012-10-26 20:59                 ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27  3:11                     ` Jim Rees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121026205618.GC8614@thunk.org \
    --to=tytso@mit.edu \
    --cc=Trond.Myklebust@netapp.com \
    --cc=bergwolf@gmail.com \
    --cc=bfields@fieldses.org \
    --cc=bjschuma@netapp.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).