From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Carlos E. R." <carlos.e.r@opensuse.org>,
XFS mailing list <xfs@oss.sgi.com>
Subject: Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
Date: Fri, 4 Jul 2014 08:40:49 -0400 [thread overview]
Message-ID: <20140704124049.GB12151@bfoster.bfoster> (raw)
In-Reply-To: <20140704014008.GI9508@dastard>
On Fri, Jul 04, 2014 at 11:40:08AM +1000, Dave Chinner wrote:
> On Fri, Jul 04, 2014 at 03:29:31AM +0200, Carlos E. R. wrote:
> > On Friday, 2014-07-04 at 10:04 +1000, Dave Chinner wrote:
> > >On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:
> > >>Ok, true, there is no formal "Oops".
> > >>
> > >>But no, the system does not remains fine, I had to hit the hardware
> > >>reset or power off button to get out.
> > >
> > >That usually only happens when the root filesystem is shut down and
> > >you can't access any of the binaries needed to run the system. Is
> > >the filesystem that is shutting down the root?
> >
> > No, it is not. Root is separate and using ext4. The problematic one
> > is /home.
> >
> >
> > What I did, as far I remember, was, when I noticed that home had
> > failed and was read only, to switch to runlevel 1, umount /home
> > (killing the apps that were still using it), then tried to mount it
> > again to replay the log, prior to using xfs-repair on it. Mount
> > hung. ctrl-alt-supr failed, or appeared to fail. So reset button...
>
> That's a completely different issue to having a shutdown filesystem
> hang your system. That's a mount problem, and likely a known issue.
> You need to be specific when describing a problem, otherwise we
> waste time going down the wrong paths.
>
> > >>No, the on disk filesystem is not healthy. If I continue using it,
> > >>after reboot and using "xfs_repair" several times, it fails again
> > >>within a day.
> > >
> > >After at least one hibernation and thaw cycle, right?
> >
> > Yes. 3, I think.
>
> Then hibernation has caused the corruption. It may take some time
> for the corruption to be detected, but there isn't any doubt in my
> mind that hibernation is the cause of your problems.
>
> So, until we have kernel fixes, you'd do best to turn off
> hibernation. If you can't live with leaving your machine powered up
> or switching it off, then use suspend-to-ram rather than
> suspend-to-disk to avoid the problematic snapshot/restore
> situation....
>
FWIW, I ran through a bunch of hibernation tests yesterday and couldn't
seem to reproduce anything interesting. I ran a preallocating workload
while constantly hibernating and waking a vm. I also tried using a hack
to avoid the eofblocks trim on release to make the test more effective,
and another to invoke the hibernation from the eofblocks background
scanner to "improve" the chances of conflict. I also ran a truncate test
to stress xfs_itruncate_extents() during hibernation cycles (there's
actually an instance of this in Carlos' reported output that doesn't
seem to involve a workqueue, attributed to thunderbird iirc) and ran
these similar tests going back to v3.11.0 as well as the latest
3.16.0-rc2.
None of this really means anything outside of there isn't quite enough
information to reproduce. It looks simple enough to enable freezing on
the eofblocks (or other xfs) workqueues by setting a flag, so we could
go and do that, but that still isn't definite. E.g., that thunderbird
truncate instance of failure stands out a bit to me.
Carlos,
You've indicated in your previous replies that you have reproduced this
repeatedly or more easily after you hit the problem and before you run a
reformat and restore sequence, enough to give you the impression at
least that the reformat is necessary. If you have the time, could you
run some of your typical activities through some hibernation cycles in
an attempt to narrow down what might contribute to this? E.g., perhaps
this only occurs with thunderbird or some other particular application
running, etc. If you have the ability to try a more recent kernel for a
period of time, that could be interesting as well.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-07-04 12:32 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-02 9:57 Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue Carlos E. R.
2014-07-02 12:04 ` Brian Foster
2014-07-02 13:07 ` Mark Tinguely
2014-07-03 2:54 ` Carlos E. R.
2014-07-03 3:00 ` Carlos E. R.
2014-07-03 9:43 ` Dave Chinner
2014-07-03 17:40 ` Brian Foster
2014-07-03 23:34 ` Carlos E. R.
2014-07-04 0:04 ` Dave Chinner
2014-07-04 1:29 ` Carlos E. R.
2014-07-04 1:40 ` Dave Chinner
2014-07-04 2:42 ` Carlos E. R.
2014-07-04 3:12 ` Carlos E. R.
2014-07-04 12:40 ` Brian Foster [this message]
2014-07-04 13:36 ` Carlos E. R.
2014-07-03 17:39 ` Brian Foster
2014-07-04 21:32 ` Carlos E. R.
2014-07-05 12:28 ` Brian Foster
2014-07-12 0:30 ` Carlos E. R.
2014-07-12 1:30 ` Carlos E. R.
2014-07-12 1:45 ` Carlos E. R.
2014-07-12 14:26 ` Brian Foster
2014-07-12 14:19 ` Brian Foster
2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
2014-08-11 14:44 ` Brian Foster
2014-08-11 14:58 ` Carlos E. R.
2014-08-11 17:05 ` Carlos E. R.
2014-08-11 21:31 ` Carlos E. R.
[not found] ` <53E938CC.4010103@sgi.com>
2014-08-11 22:01 ` Carlos E. R.
2014-08-11 14:57 ` Mark Tinguely
2014-08-11 15:34 ` Carlos E. R.
2014-08-11 16:14 ` Brian Foster
2014-08-11 17:08 ` Carlos E. R.
2014-08-11 21:27 ` Mark Tinguely
2014-08-11 21:50 ` Carlos E. R.
2014-08-11 21:56 ` Mark Tinguely
2014-08-11 22:36 ` Carlos E. R.
2014-08-12 0:17 ` Carlos E. R.
2014-08-12 16:51 ` Brian Foster
2014-08-12 21:17 ` Carlos E. R.
2014-08-13 12:04 ` Brian Foster
2014-08-13 13:29 ` Mark Tinguely
2014-08-13 21:04 ` Dave Chinner
2014-08-12 21:27 ` Eric Sandeen
2014-08-12 21:57 ` Dave Chinner
2014-08-12 21:59 ` Brian Foster
2014-08-12 22:21 ` Eric Sandeen
2014-08-12 23:16 ` Dave Chinner
2014-08-13 0:07 ` Carlos E. R.
2014-09-30 22:27 ` Happened again, 20140930 " Carlos E. R.
2014-10-01 0:45 ` Dave Chinner
2014-10-01 2:48 ` Carlos E. R.
2014-10-01 3:04 ` Eric Sandeen
2014-10-02 11:32 ` Jan Kara
2014-10-02 11:46 ` Carlos E. R.
2014-10-05 14:28 ` Carlos E. R.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140704124049.GB12151@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=carlos.e.r@opensuse.org \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox