From: Eric Sandeen <sandeen@redhat.com>
To: Nix <nix@esperi.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
gregkh@linuxfoundation.org
Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)
Date: Sat, 27 Oct 2012 16:19:33 -0500 [thread overview]
Message-ID: <508C4FE5.1030102@redhat.com> (raw)
In-Reply-To: <87fw4zzra3.fsf@spindle.srvr.nix>
On 10/27/12 1:47 PM, Nix wrote:
> On 27 Oct 2012, Theodore Ts'o said:
>
>> On Sat, Oct 27, 2012 at 01:45:25PM +0100, Nix wrote:
>>> Ah! it's turned on by journal_async_commit. OK, that alone argues
>>> against use of journal_async_commit, tested or not, and I'd not have
>>> turned it on if I'd noticed that.
>>>
>>> (So, the combinations I'll be trying for effect on this bug are:
>>>
>>> journal_async_commit (as now)
>>> journal_checksum
>>> none
>>
>> Can you also check and see whether the presence or absence of
>> "nobarrier" makes a difference?
>
> Done. (Also checked the effect of your patches posted earlier this week:
> no effect, I'm afraid, certainly not under the fails-even-on-3.6.1 test
> I was carrying out, umount -l'ing /var as the very last thing I did
> before /sbin/reboot -f.)
>
> nobarrier makes a difference that I, at least, did not expect:
>
> [no options] No corruption
>
> nobarrier No corruption
>
> journal_checksum Corruption
> Corrupted transaction, journal aborted
>
> nobarrier,journal_checksum Corruption
> Corrupted transaction, journal aborted
>
> journal_async_commit Corruption
> Corrupted transaction, journal aborted
>
> nobarrier,journal_async_commit Corruption
> No corrupted transaction or aborted journal
That's what we needed. Woulda been great a few days ago ;)
In my testing journal_checksum is broken, and my bisection seems to
implicate
commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31
Author: Theodore Ts'o <tytso@mit.edu>
Date: Mon Feb 6 20:12:03 2012 -0500
ext4: fold ext4_claim_inode into ext4_new_inode
as the culprit. I haven't had time to look into why, yet.
-Eric
> I didn't expect the last case at all, and it adequately explains why you
> are mostly seeing corrupted journal messages in your tests but I was
> not. It also explains why when I saw this for the first time I was able
> to mount the resulting corrupted filesystem read-write and corrupt it
> further before I noticed that anything was wrong.
>
> It is also clear that journal_checksum and all that relies on it is
> worse than useless right now, as Eric reported while I was testing this.
> It should probably be marked CONFIG_BROKEN in future 3.[346].* stable
> kernels, if CONFIG_BROKEN existed anymore, which it doesn't.
>
> It's a shame journal_async_commit depends on a broken feature: it might
> be notionally unsafe but on some of my systems (without nobarrier or
> flashy caching controllers) it was associated with a noticeable speedup
> of metadata-heavy workloads -- though that was way back in 2009...
> however, "safety first" definitely applies in this case.
>
next prev parent reply other threads:[~2012-10-27 21:19 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87objupjlr.fsf@spindle.srvr.nix>
[not found] ` <20121023013343.GB6370@fieldses.org>
[not found] ` <87mwzdnuww.fsf@spindle.srvr.nix>
[not found] ` <20121023143019.GA3040@fieldses.org>
[not found] ` <874nllxi7e.fsf_-_@spindle.srvr.nix>
[not found] ` <874nllxi7e.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-23 20:57 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 22:19 ` Theodore Ts'o
2012-10-23 22:47 ` Nix
2012-10-23 23:16 ` Theodore Ts'o
2012-10-23 23:06 ` Nix
2012-10-23 23:28 ` Theodore Ts'o
2012-10-23 23:34 ` Nix
2012-10-24 0:57 ` Eric Sandeen
2012-10-24 20:17 ` Jan Kara
2012-10-26 15:25 ` Eric Sandeen
2012-10-24 19:13 ` Jannis Achstetter
2012-10-24 21:31 ` Theodore Ts'o
2012-10-24 22:05 ` Jannis Achstetter
2012-10-24 23:47 ` Nix
2012-10-25 17:02 ` Felipe Contreras
[not found] ` <87pq48nbyz.fsf_-_-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-24 1:13 ` Eric Sandeen
2012-10-24 4:15 ` Nix
2012-10-24 4:27 ` Eric Sandeen
2012-10-24 5:23 ` Theodore Ts'o
2012-10-24 7:00 ` Hugh Dickins
2012-10-24 11:46 ` Nix
2012-10-24 11:45 ` Nix
2012-10-24 17:22 ` Eric Sandeen
2012-10-24 19:49 ` Nix
2012-10-24 19:54 ` Nix
2012-10-24 20:30 ` Eric Sandeen
2012-10-24 20:34 ` Nix
2012-10-24 20:45 ` Nix
2012-10-24 21:08 ` Theodore Ts'o
2012-10-24 23:27 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42 ` Nix
2012-10-25 1:10 ` Theodore Ts'o
2012-10-25 1:45 ` Nix
2012-10-25 14:12 ` Theodore Ts'o
2012-10-25 14:15 ` Nix
2012-10-25 17:39 ` Nix
2012-10-25 11:06 ` Nix
2012-10-26 0:22 ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26 20:35 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Eric Sandeen
2012-10-26 20:37 ` Nix
[not found] ` <87wqydx957.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 20:56 ` Theodore Ts'o
[not found] ` <20121026205618.GC8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-26 20:59 ` Nix
[not found] ` <87objpx84k.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-26 21:15 ` Theodore Ts'o
2012-10-26 21:19 ` Nix
[not found] ` <87haphx76u.fsf-AdTWujXS48Mg67Zj9sPl2A@public.gmane.org>
2012-10-27 0:22 ` Theodore Ts'o
2012-10-27 12:45 ` Nix
2012-10-27 17:55 ` Theodore Ts'o
2012-10-27 18:47 ` Nix
2012-10-27 21:19 ` Eric Sandeen [this message]
2012-10-27 21:21 ` Nix
2012-10-27 21:23 ` Eric Sandeen
2012-10-27 21:29 ` Nix
2012-10-27 21:34 ` Eric Sandeen
2012-10-27 21:40 ` Nix
[not found] ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29 2:09 ` Eric Sandeen
2012-10-27 22:42 ` Eric Sandeen
2012-10-29 1:00 ` Theodore Ts'o
2012-10-29 1:04 ` Nix
2012-10-29 2:24 ` Eric Sandeen
2012-10-29 2:34 ` Theodore Ts'o
2012-10-29 2:35 ` Eric Sandeen
2012-10-29 2:42 ` Theodore Ts'o
2012-10-27 18:30 ` Eric Sandeen
[not found] ` <20121026211542.GE8614-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2012-10-27 3:11 ` Jim Rees
2012-10-27 8:01 ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28 4:23 ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28 13:59 ` Nix
2012-10-29 2:30 ` [PATCH -v3] " Theodore Ts'o
2012-10-29 3:24 ` Eric Sandeen
2012-10-29 5:07 ` Andreas Dilger
2012-10-29 17:08 ` Darrick J. Wong
[not found] <jXsTo-5lW-13@gated-at.bofh.it>
[not found] ` <jXBDk-7vn-13@gated-at.bofh.it>
[not found] ` <jXNl8-5m5-13@gated-at.bofh.it>
[not found] ` <jXNOa-5MR-23@gated-at.bofh.it>
[not found] ` <jXPGh-87s-5@gated-at.bofh.it>
[not found] ` <jXTJW-4CH-55@gated-at.bofh.it>
[not found] ` <jXUZj-6mo-13@gated-at.bofh.it>
[not found] ` <jXVLH-7kO-5@gated-at.bofh.it>
[not found] ` <jXW53-7CC-5@gated-at.bofh.it>
[not found] ` <jXWeJ-7Lk-1@gated-at.bofh.it>
2012-10-24 17:38 ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Martin
2012-10-26 20:13 ` Martin
2012-10-26 20:24 ` Nix
2012-10-26 20:44 ` Martin
2012-10-26 20:47 ` Nix
2012-10-26 21:10 ` Theodore Ts'o
2012-10-26 23:15 ` Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508C4FE5.1030102@redhat.com \
--to=sandeen@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nix@esperi.org.uk \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).