From: Dmitry Monakhov <dmonakhov@openvz.org>
To: Theodore Ts'o <tytso@mit.edu>, David Jander <david@protonic.nl>
Cc: Matteo Croce <technoboy85@gmail.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Fri, 04 Jul 2014 16:38:50 +0400 [thread overview]
Message-ID: <87oax5z4bp.fsf@openvz.org> (raw)
In-Reply-To: <20140704122022.GC10514@thunk.org>
On Fri, 4 Jul 2014 08:20:22 -0400, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote:
> >
> > Here is the output I am getting... AFAICS no problems on the raw device. Is
> > this sufficient testing, Ted?
>
> I'm not sure what theory Dmitry was trying to pursue when he requested
> that you run the fio test. Dmitry?
Because at this moment we have some complex storage+fs interaction,
My idea was to simply isolate raw dev case and run integrity test on that storage.
fio/libaio is trivial and easy way to do it(except it does not issued
flush cmd). Unfortunetly according to David test finished w/o any
error. So my theory about broken strorage driver was not confirmed.
>
>
> Please note that at this point there may be multiple causes with
> similar symptoms that are showing up. So just because one person
> reports one set of data points, such as someone claiming they've seen
> this without a power drop to the storage device, that therefore all of
> the problems were caused by flaky I/O to the device.
>
> Right now, there are multiple theories floating around --- and it may
> be that more than one of them are true (i.e., there may be multiple
> bugs here). Some of the possibilities, which again, may not be
> mutually exclusive:
>
> 1) Some kind of eMMC driver bug, which is possibly causing the CACHE
> FLUSH command not to be sent.
>
> 2) Some kind of hardware problem involving flash translation layers
> not having durable transactions of their flash metadata across power
> failures.
>
> 3) Some kind of ext4/jbd2 bug, recently introduced, where we are
> modifying some ext4 metadata (either the block allocation bitmap or
> block group summary statistics) outside of a valid transaction handle.
>
> 4) Some other kind of hard-to-reproduce race or wild pointer which is
> sometimes corrupting fs data structures.
>
>
> If someone has a easy to reproduce failure case, the first step is to
> do a very rough bisection test. Does the easy-to-reproduce failure go
> away if you use 3.14? 3.12? Also, if you can describe in great
> detail your hardware and software configuration, and under what
> circumstances the problem reproduces, and when it doesn't, that would
> also be critical. Whether you are just doing reset or a power cycle
> if an unclean shutdown is involved, might also be important.
>
> And at this point, because I'm getting very suspicious that there may
> be more than one root cause, we should try to keep the debugging of
> one person's reproduction, such as David's, separate from another's,
> such as Matteo's. It may be that there ultimately have the same root
> cause, and so if one person is able to get an interesting reproduction
> result, it would be great for the other person to try running the same
> experiment on their hardware/software configuration. But what we must
> not do is assume that one person's experiment is automatically
> applicable to other circumstances.
>
> Cheers,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-07-04 12:38 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01 6:26 ` David Jander
2014-07-01 8:00 ` Matteo Croce
2014-07-01 8:42 ` Darrick J. Wong
2014-07-01 8:55 ` Matteo Croce
2014-07-02 13:49 ` Dmitry Monakhov
2014-07-03 13:43 ` Theodore Ts'o
2014-07-03 14:15 ` David Jander
2014-07-03 14:46 ` Theodore Ts'o
2014-07-03 14:57 ` Dmitry Monakhov
2014-07-03 14:58 ` Dmitry Monakhov
2014-07-04 9:40 ` David Jander
2014-07-04 10:17 ` Dmitry Monakhov
2014-07-04 11:28 ` David Jander
2014-07-04 12:20 ` Theodore Ts'o
2014-07-04 12:38 ` Dmitry Monakhov [this message]
2014-07-04 13:45 ` David Jander
2014-07-04 18:45 ` Theodore Ts'o
2014-07-04 22:46 ` Dave Chinner
2014-07-05 2:30 ` Dmitry Monakhov
2014-07-05 20:36 ` Theodore Ts'o
2014-07-07 12:17 ` David Jander
2014-07-07 15:53 ` Theodore Ts'o
2014-07-07 22:31 ` Darrick J. Wong
2014-07-07 22:56 ` Theodore Ts'o
2014-07-10 18:57 ` Eric Whitney
2014-07-10 20:01 ` Darrick J. Wong
2014-07-10 21:31 ` Matteo Croce
2014-07-10 22:32 ` Theodore Ts'o
2014-07-11 0:13 ` Darrick J. Wong
2014-07-11 0:45 ` Eric Whitney
2014-07-11 8:50 ` Jaehoon Chung
2014-07-11 11:43 ` Theodore Ts'o
2014-07-15 6:31 ` David Jander
2014-07-10 23:29 ` Azat Khuzhin
2014-07-04 11:04 ` Jaehoon Chung
2014-07-04 11:32 ` David Jander
2014-07-01 12:07 ` Jaehoon Chung
2014-07-01 13:50 ` David Jander
2014-07-01 15:58 ` Theodore Ts'o
2014-07-01 16:14 ` Lukáš Czerner
2014-07-01 16:36 ` Eric Whitney
2014-07-02 8:34 ` Matteo Croce
2014-07-02 10:17 ` David Jander
2014-07-02 10:19 ` Matteo Croce
2014-07-03 17:14 ` Eric Whitney
2014-07-03 23:17 ` Theodore Ts'o
2014-07-04 20:48 ` Eric Whitney
2014-07-02 9:44 ` David Jander
2014-07-01 9:02 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87oax5z4bp.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=darrick.wong@oracle.com \
--cc=david@protonic.nl \
--cc=linux-ext4@vger.kernel.org \
--cc=technoboy85@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.