Re: ext4: journal has aborted

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Jander <david@protonic.nl>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>,
	Matteo Croce <technoboy85@gmail.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Fri, 4 Jul 2014 15:45:59 +0200	[thread overview]
Message-ID: <20140704154559.026331ec@archvile> (raw)
In-Reply-To: <20140704122022.GC10514@thunk.org>

Hi Ted, Dmitry,

On Fri, 4 Jul 2014 08:20:22 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote:
> > 
> > Here is the output I am getting... AFAICS no problems on the raw device. Is
> > this sufficient testing, Ted?
> 
> I'm not sure what theory Dmitry was trying to pursue when he requested
> that you run the fio test.  Dmitry?
> 
> 
> Please note that at this point there may be multiple causes with
> similar symptoms that are showing up.  So just because one person
> reports one set of data points, such as someone claiming they've seen
> this without a power drop to the storage device, that therefore all of
> the problems were caused by flaky I/O to the device.
> 
> Right now, there are multiple theories floating around --- and it may
> be that more than one of them are true (i.e., there may be multiple
> bugs here).  Some of the possibilities, which again, may not be
> mutually exclusive:
> 
> 1) Some kind of eMMC driver bug, which is possibly causing the CACHE
> FLUSH command not to be sent.

How can I investigate this? According to the fio tests I ran and the
explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH
commands is the only thing left to be discarded on the eMMC driver front,
right?

> 2) Some kind of hardware problem involving flash translation layers
> not having durable transactions of their flash metadata across power
> failures.

That would be like blaming Micron (the eMMC part manufacturer) for faulty
firmware... could be, but how can we test this?

> 3) Some kind of ext4/jbd2 bug, recently introduced, where we are
> modifying some ext4 metadata (either the block allocation bitmap or
> block group summary statistics) outside of a valid transaction handle.

I think I have some more evidence to support this case:

Until previously, I did not run fsck EVER! I know that this is not a good idea
to do in a production environment, but I am only testing right now, and in
theory it should not be necessary, right?

What I did this time, was to run fsck.ext3 or fsck.ext4 (depending on FS
format of course) once every one or two power cycles.

So effectively, what I did amounts to this:

CASE 1: fsck on every power-cycle:

1.- Boot from clean filesystem
2.- Run the following command line:
$ cp -a /usr . & bonnie\+\+ -r 32 -u 100:100 & bonnie\+\+ -r 32 -u 102:102

3.- Hit CTRL+Z (to stop the second bonnie++ process)
4.- Execute "sync"
5.- While "sync" was running, cut off the power supply.
6.- Turn on power and boot from external medium
7.- Run fsck.ext3/4 on eMMC device
8.- Repeat

In this case, there was a minor difference for the fsck output of both
filesystems:

EXT4 was always something like this:

# fsck.ext4 /dev/mmcblk1p2
e2fsck 1.42.5 (29-Jul-2012)
rootfs: recovering journal
Setting free inodes count to 37692 (was 37695)
Setting free blocks count to 136285 (was 136291)
rootfs: clean, 7140/44832 files, 42915/179200 blocks

While for EXT3 the output did not contain the "Setting free * count..."
messages:

# fsck.ext3 -p /dev/mmcblk1p2
rootfs: recovering journal
rootfs: clean, 4895/44832 files, 36473/179200 blocks

CASE 2: fsck on every other power-cycle:

Same as CASE 1 steps 1...5 and then:
6.- Turn on power and boot again from dirty internal eMMC without running fsck.
7.- Repeat steps 2...5 one more time
8.- Perform steps 6...8 from CASE 1.

With this test, the following difference became apparent:

With EXT3: fsck.ext3 did the same as in CASE 1

With EXT4: I get a long list of errors that are being fixed.
It starts like this:

# fsck.ext4 /dev/mmcblk1p2
e2fsck 1.42.5 (29-Jul-2012)
rootfs: recovering journal
rootfs contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 4591, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4594, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4595, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4596, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4597, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4598, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4599, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4600, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4601, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4602, i_blocks is 16, should be 8.  Fix<y>? yes
Inode 4603, i_blocks is 16, should be 8.  Fix<y>? yes
...
...
Eventually I pressed CTRL+C and restarted fsck with the option "-p", because
this list was getting a little long.
...
...

# fsck.ext4 -p /dev/mmcblk1p2
rootfs contains a file system with errors, check forced.
rootfs: Inode 5391, i_blocks is 32, should be 16.  FIXED.
rootfs: Inode 5392, i_blocks is 16, should be 8.  FIXED.
rootfs: Inode 5393, i_blocks is 48, should be 24.  FIXED.
rootfs: Inode 5394, i_blocks is 32, should be 16.  FIXED.
rootfs: Inode 5395, i_blocks is 16, should be 8.  FIXED.
...
...
rootfs: Inode 5854, i_blocks is 240, should be 120.  FIXED.
rootfs: Inode 5857, i_blocks is 576, should be 288.  FIXED.
rootfs: Inode 5860, i_blocks is 512, should be 256.  FIXED.
rootfs: Inode 5863, i_blocks is 656, should be 328.  FIXED.
rootfs: Inode 5866, i_blocks is 480, should be 240.  FIXED.
rootfs: Inode 5869, i_blocks is 176, should be 88.  FIXED.
rootfs: Inode 5872, i_blocks is 336, should be 168.  FIXED.
rootfs: 11379/44832 files (0.1% non-contiguous), 70010/179200 blocks
#

> 4) Some other kind of hard-to-reproduce race or wild pointer which is
> sometimes corrupting fs data structures.

I don't have such a hard time reproducing it... but it does take quite some
time (booting several times, re-installing, testing, etc...)

> If someone has a easy to reproduce failure case, the first step is to
> do a very rough bisection test.  Does the easy-to-reproduce failure go
> away if you use 3.14?  3.12?  Also, if you can describe in great
> detail your hardware and software configuration, and under what
> circumstances the problem reproduces, and when it doesn't, that would
> also be critical.  Whether you are just doing reset or a power cycle
> if an unclean shutdown is involved, might also be important.

Until now, I always do a power-cycle, but I can try to check if I am able to
reproduce the problem with just a "shutdown -f" (AFAIK, this does NOT sync
filesystems, right?)

I will try to check 3.14 and 3.12 (if 3.14 still seems buggy). It could take
quite a while until I have results... certainly not before monday.

> And at this point, because I'm getting very suspicious that there may
> be more than one root cause, we should try to keep the debugging of
> one person's reproduction, such as David's, separate from another's,
> such as Matteo's.  It may be that there ultimately have the same root
> cause, and so if one person is able to get an interesting reproduction
> result, it would be great for the other person to try running the same
> experiment on their hardware/software configuration.  But what we must
> not do is assume that one person's experiment is automatically
> applicable to other circumstances.

I agree.

Best regards,

-- 
David Jander
Protonic Holland.

next prev parent reply	other threads:[~2014-07-04 13:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander [this message]
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704154559.026331ec@archvile \
    --to=david@protonic.nl \
    --cc=darrick.wong@oracle.com \
    --cc=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).