public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Whitney <enwlinux@gmail.com>
To: Matteo Croce <technoboy85@gmail.com>
Cc: David Jander <david@protonic.nl>,
	Eric Whitney <enwlinux@gmail.com>, Theodore Ts'o <tytso@mit.edu>,
	Jaehoon Chung <jh80.chung@samsung.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Thu, 3 Jul 2014 13:14:34 -0400	[thread overview]
Message-ID: <20140703171434.GA15790@wallace> (raw)
In-Reply-To: <CAFnufp0Fm8F=qu87R5LiOAunc_2axk4+g9J4euOjaCcNak0MnQ@mail.gmail.com>

* Matteo Croce <technoboy85@gmail.com>:
> 2014-07-02 12:17 GMT+02:00 David Jander <david@protonic.nl>:
> >
> > Hi Eric,
> >
> > On Tue, 1 Jul 2014 12:36:46 -0400
> > Eric Whitney <enwlinux@gmail.com> wrote:
> >
> >> * Theodore Ts'o <tytso@mit.edu>:
> >> > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote:
> >> > > Hi,
> >> > >
> >> > > i have interesting for this problem..Because i also found the same problem..
> >> > > Is it Journal problem?
> >> > >
> >> > > I used the Linux version 3.16.0-rc3.
> >> > >
> >> > > [    3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt.
> >> > > [    3.877937] Aborting journal on device mmcblk0p13-8.
> >> > > [    3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error
> >> >
> >> > This message means that the file system has detected an inconsistency
> >> > --- specifically, that the number of blocks marked as in use in the
> >> > allocation bbitmap is different from what is in the block group
> >> > descriptors.
> >> >
> >> > The file system has been marked to force a panic after an error, at
> >> > which point e2fsck will be able to repair the inconsistency.
> >> >
> >> > What's not clear is *how* the why this happened.  It can happen simply
> >> > because of a hardware problem.  (In particular, not all mmc flash
> >> > devices handle power failures gracefully.)  Or it could be a cosmic,
> >> > ray, or it might be a kernel bug.
> >> >
> >> > Normally I would chalk this up to a hardware bug, bug it's possible
> >> > that it is a kernel bug.  If people can reliably reproduce the problem
> >> > where no power failures or other unclean shutdowns were involved
> >> > (since the last time file system has been checked using e2fsck) then
> >> > that would be realy interesting.
> >>
> >> Hi Ted:
> >>
> >> I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync
> >> patch) regression on the Pandaboard this morning.  A generic/068 hang
> >> on data_journal required a reboot for recovery (old bug, though rarer lately).
> >> On reboot, the root filesystem - default 4K, and on an SD card - went ro
> >> after the same sort of bad block bitmap / journal abort sequence.  Rebooting
> >> forced a fsck that cleared up the problem.  The target test filesystem was on
> >> a USB-attached disk, and it did not exhibit the same problems on recovery.
> >
> > Please be careful about conclusions from regular SD cards and USB sticks for
> > mass-storage. Unlike hardened eMMC (4.41+), these COTS mass-storage devices
> > are not meant for intensive use and can perfectly easily corrupt data out of
> > themselves. I've seen it happening many times already.
> >
> >> So, it looks like there might be more than just hardware involved here,
> >> although eMMC/flash might be a common denominator.  I'll see if I can come up
> >> with a reliable reproducer once the regression pass is finished if someone
> >> doesn't beat me to it.

I've not found a reproducer that doesn't involve an unclean shutdown, which
is what Ted's looking for.

However, I've noted a behavioral change that might be of interest with
the failure scenario described above using xfstests generic/068 that
occurred between 3.14 and 3.15-rc3.  It's possible that this change would
make filesystem damage caused by an unclean shutdown more likely or more
noticable, and perhaps it's in play for the power fail/cycle cases
described in this thread.

FWIW, I've also been able to reproduce that failure scenario on an x86_64 KVM
with raw virtio disks alone.  It's just a lot harder to get there with that
configuration - many more trials required.

The change is that the root filesystem sustains damage reported as -

EXT4-fs error (device mmcblk0p3): ext4_mb_generate_buddy:757: group 65, 1243 clusters in bitmap, 1244 in gd; block bitmap corrupt.
Aborting journal on device mmcblk0p3-8.
EXT4-fs error (device mmcblk0p3): ext4_journal_check_start:56: Detected aborted journal
EXT4-fs (mmcblk0p3): Remounting filesystem read-only

- when generic/068 is run on a separate test filesystem that forces an ext4
failure requiring a power cycle / reset to recover from a hung reboot attempt.
This doesn't happen in 3.14 on either my x86_64 or ARM test systems.
Generally, the root filesystem doesn't appear to be affected at all or is
minimally affected (does not require fsck to fully recover) in 3.14, whereas
a fsck is usually required to recover the root in 3.15-rc3.

My attempts to bisect further into 3.15-rc1 to 3.15-rc3 haven't gone well as
yet - other kernel problems are making it difficult to work in there.

Eric


> >
> > I agree that there is a strong correlation towards flash-based storage, but I
> > cannot explain why this factor would make a difference. How are flash-based
> > block-devices different to ext4 than spinning-disk media (besides trim
> > support)?
> 
> maybe the zero access time can trigger some race condition?
> 
> > Best regards,
> >
> > --
> > David Jander
> > Protonic Holland.
> 
> 
> 
> -- 
> Matteo Croce
> OpenWrt Developer

  reply	other threads:[~2014-07-03 17:14 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney [this message]
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140703171434.GA15790@wallace \
    --to=enwlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=jh80.chung@samsung.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox