All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Whitney <enwlinux@gmail.com>
To: Matteo Croce <technoboy85@gmail.com>
Cc: David Jander <david@protonic.nl>,
	Eric Whitney <enwlinux@gmail.com>, Theodore Ts'o <tytso@mit.edu>,
	Jaehoon Chung <jh80.chung@samsung.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-ext4@vger.kernel.org
Subject: Re: ext4: journal has aborted
Date: Thu, 3 Jul 2014 13:14:34 -0400	[thread overview]
Message-ID: <20140703171434.GA15790@wallace> (raw)
In-Reply-To: <CAFnufp0Fm8F=qu87R5LiOAunc_2axk4+g9J4euOjaCcNak0MnQ@mail.gmail.com>

* Matteo Croce <technoboy85@gmail.com>:
> 2014-07-02 12:17 GMT+02:00 David Jander <david@protonic.nl>:
> >
> > Hi Eric,
> >
> > On Tue, 1 Jul 2014 12:36:46 -0400
> > Eric Whitney <enwlinux@gmail.com> wrote:
> >
> >> * Theodore Ts'o <tytso@mit.edu>:
> >> > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote:
> >> > > Hi,
> >> > >
> >> > > i have interesting for this problem..Because i also found the same problem..
> >> > > Is it Journal problem?
> >> > >
> >> > > I used the Linux version 3.16.0-rc3.
> >> > >
> >> > > [    3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt.
> >> > > [    3.877937] Aborting journal on device mmcblk0p13-8.
> >> > > [    3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error
> >> >
> >> > This message means that the file system has detected an inconsistency
> >> > --- specifically, that the number of blocks marked as in use in the
> >> > allocation bbitmap is different from what is in the block group
> >> > descriptors.
> >> >
> >> > The file system has been marked to force a panic after an error, at
> >> > which point e2fsck will be able to repair the inconsistency.
> >> >
> >> > What's not clear is *how* the why this happened.  It can happen simply
> >> > because of a hardware problem.  (In particular, not all mmc flash
> >> > devices handle power failures gracefully.)  Or it could be a cosmic,
> >> > ray, or it might be a kernel bug.
> >> >
> >> > Normally I would chalk this up to a hardware bug, bug it's possible
> >> > that it is a kernel bug.  If people can reliably reproduce the problem
> >> > where no power failures or other unclean shutdowns were involved
> >> > (since the last time file system has been checked using e2fsck) then
> >> > that would be realy interesting.
> >>
> >> Hi Ted:
> >>
> >> I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync
> >> patch) regression on the Pandaboard this morning.  A generic/068 hang
> >> on data_journal required a reboot for recovery (old bug, though rarer lately).
> >> On reboot, the root filesystem - default 4K, and on an SD card - went ro
> >> after the same sort of bad block bitmap / journal abort sequence.  Rebooting
> >> forced a fsck that cleared up the problem.  The target test filesystem was on
> >> a USB-attached disk, and it did not exhibit the same problems on recovery.
> >
> > Please be careful about conclusions from regular SD cards and USB sticks for
> > mass-storage. Unlike hardened eMMC (4.41+), these COTS mass-storage devices
> > are not meant for intensive use and can perfectly easily corrupt data out of
> > themselves. I've seen it happening many times already.
> >
> >> So, it looks like there might be more than just hardware involved here,
> >> although eMMC/flash might be a common denominator.  I'll see if I can come up
> >> with a reliable reproducer once the regression pass is finished if someone
> >> doesn't beat me to it.

I've not found a reproducer that doesn't involve an unclean shutdown, which
is what Ted's looking for.

However, I've noted a behavioral change that might be of interest with
the failure scenario described above using xfstests generic/068 that
occurred between 3.14 and 3.15-rc3.  It's possible that this change would
make filesystem damage caused by an unclean shutdown more likely or more
noticable, and perhaps it's in play for the power fail/cycle cases
described in this thread.

FWIW, I've also been able to reproduce that failure scenario on an x86_64 KVM
with raw virtio disks alone.  It's just a lot harder to get there with that
configuration - many more trials required.

The change is that the root filesystem sustains damage reported as -

EXT4-fs error (device mmcblk0p3): ext4_mb_generate_buddy:757: group 65, 1243 clusters in bitmap, 1244 in gd; block bitmap corrupt.
Aborting journal on device mmcblk0p3-8.
EXT4-fs error (device mmcblk0p3): ext4_journal_check_start:56: Detected aborted journal
EXT4-fs (mmcblk0p3): Remounting filesystem read-only

- when generic/068 is run on a separate test filesystem that forces an ext4
failure requiring a power cycle / reset to recover from a hung reboot attempt.
This doesn't happen in 3.14 on either my x86_64 or ARM test systems.
Generally, the root filesystem doesn't appear to be affected at all or is
minimally affected (does not require fsck to fully recover) in 3.14, whereas
a fsck is usually required to recover the root in 3.15-rc3.

My attempts to bisect further into 3.15-rc1 to 3.15-rc3 haven't gone well as
yet - other kernel problems are making it difficult to work in there.

Eric


> >
> > I agree that there is a strong correlation towards flash-based storage, but I
> > cannot explain why this factor would make a difference. How are flash-based
> > block-devices different to ext4 than spinning-disk media (besides trim
> > support)?
> 
> maybe the zero access time can trigger some race condition?
> 
> > Best regards,
> >
> > --
> > David Jander
> > Protonic Holland.
> 
> 
> 
> -- 
> Matteo Croce
> OpenWrt Developer

  reply	other threads:[~2014-07-03 17:14 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 21:30 ext4: journal has aborted Matteo Croce
2014-07-01  6:26 ` David Jander
2014-07-01  8:00   ` Matteo Croce
2014-07-01  8:42   ` Darrick J. Wong
2014-07-01  8:55     ` Matteo Croce
2014-07-02 13:49       ` Dmitry Monakhov
2014-07-03 13:43       ` Theodore Ts'o
2014-07-03 14:15         ` David Jander
2014-07-03 14:46           ` Theodore Ts'o
2014-07-03 14:57           ` Dmitry Monakhov
2014-07-03 14:58           ` Dmitry Monakhov
2014-07-04  9:40             ` David Jander
2014-07-04 10:17               ` Dmitry Monakhov
2014-07-04 11:28                 ` David Jander
2014-07-04 12:20                   ` Theodore Ts'o
2014-07-04 12:38                     ` Dmitry Monakhov
2014-07-04 13:45                     ` David Jander
2014-07-04 18:45                       ` Theodore Ts'o
2014-07-04 22:46                         ` Dave Chinner
2014-07-05  2:30                         ` Dmitry Monakhov
2014-07-05 20:36                         ` Theodore Ts'o
2014-07-07 12:17                         ` David Jander
2014-07-07 15:53                           ` Theodore Ts'o
2014-07-07 22:31                             ` Darrick J. Wong
2014-07-07 22:56                             ` Theodore Ts'o
2014-07-10 18:57                               ` Eric Whitney
2014-07-10 20:01                                 ` Darrick J. Wong
2014-07-10 21:31                                   ` Matteo Croce
2014-07-10 22:32                                     ` Theodore Ts'o
2014-07-11  0:13                                       ` Darrick J. Wong
2014-07-11  0:45                                         ` Eric Whitney
2014-07-11  8:50                                           ` Jaehoon Chung
2014-07-11 11:43                                           ` Theodore Ts'o
2014-07-15  6:31                                           ` David Jander
2014-07-10 23:29                                 ` Azat Khuzhin
2014-07-04 11:04               ` Jaehoon Chung
2014-07-04 11:32                 ` David Jander
2014-07-01 12:07     ` Jaehoon Chung
2014-07-01 13:50       ` David Jander
2014-07-01 15:58       ` Theodore Ts'o
2014-07-01 16:14         ` Lukáš Czerner
2014-07-01 16:36         ` Eric Whitney
2014-07-02  8:34           ` Matteo Croce
2014-07-02 10:17           ` David Jander
2014-07-02 10:19             ` Matteo Croce
2014-07-03 17:14               ` Eric Whitney [this message]
2014-07-03 23:17                 ` Theodore Ts'o
2014-07-04 20:48                   ` Eric Whitney
2014-07-02  9:44         ` David Jander
2014-07-01  9:02   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140703171434.GA15790@wallace \
    --to=enwlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@protonic.nl \
    --cc=jh80.chung@samsung.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=technoboy85@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.