All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: David Jander <david@protonic.nl>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>,
	Matteo Croce <technoboy85@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Fri, 4 Jul 2014 08:20:22 -0400	[thread overview]
Message-ID: <20140704122022.GC10514@thunk.org> (raw)
In-Reply-To: <20140704132802.0d43b1fc@archvile>

On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote:
> 
> Here is the output I am getting... AFAICS no problems on the raw device. Is
> this sufficient testing, Ted?

I'm not sure what theory Dmitry was trying to pursue when he requested
that you run the fio test.  Dmitry?


Please note that at this point there may be multiple causes with
similar symptoms that are showing up.  So just because one person
reports one set of data points, such as someone claiming they've seen
this without a power drop to the storage device, that therefore all of
the problems were caused by flaky I/O to the device.

Right now, there are multiple theories floating around --- and it may
be that more than one of them are true (i.e., there may be multiple
bugs here).  Some of the possibilities, which again, may not be
mutually exclusive:

1) Some kind of eMMC driver bug, which is possibly causing the CACHE
FLUSH command not to be sent.

2) Some kind of hardware problem involving flash translation layers
not having durable transactions of their flash metadata across power
failures.

3) Some kind of ext4/jbd2 bug, recently introduced, where we are
modifying some ext4 metadata (either the block allocation bitmap or
block group summary statistics) outside of a valid transaction handle.

4) Some other kind of hard-to-reproduce race or wild pointer which is
sometimes corrupting fs data structures.


If someone has a easy to reproduce failure case, the first step is to
do a very rough bisection test.  Does the easy-to-reproduce failure go
away if you use 3.14?  3.12?  Also, if you can describe in great
detail your hardware and software configuration, and under what
circumstances the problem reproduces, and when it doesn't, that would
also be critical.  Whether you are just doing reset or a power cycle
if an unclean shutdown is involved, might also be important.

And at this point, because I'm getting very suspicious that there may
be more than one root cause, we should try to keep the debugging of
one person's reproduction, such as David's, separate from another's,
such as Matteo's.  It may be that there ultimately have the same root
cause, and so if one person is able to get an interesting reproduction
result, it would be great for the other person to try running the same
experiment on their hardware/software configuration.  But what we must
not do is assume that one person's experiment is automatically
applicable to other circumstances.

Cheers,

						- Ted

  reply	other threads:[~2014-07-04 12:20 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o [this message]
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704122022.GC10514@thunk.org \
    --to=tytso@mit.edu \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.