public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Carlos E. R." <carlos.e.r@opensuse.org>
Cc: XFS mailing list <xfs@oss.sgi.com>
Subject: Re: Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.
Date: Fri, 4 Jul 2014 10:04:26 +1000	[thread overview]
Message-ID: <20140704000426.GX4453@dastard> (raw)
In-Reply-To: <alpine.LSU.2.11.1407040113340.9881@Telcontar.valinor>

On Fri, Jul 04, 2014 at 01:34:52AM +0200, Carlos E. R. wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> 
> On Thursday, 2014-07-03 at 19:43 +1000, Dave Chinner wrote:
> >On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
> >>On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote:
> >>>On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote:
> >>
> >>...
> 
> >>hibernated at least once a day, perhaps three times if I have to go
> >>out several times. It makes no sense to me to leave the machine
> >>powered doing nothing, if hibernating is so easy and reliable - till
> >>now. If I have to leave for more than a week, I tend to do a full
> >>"halt".
> >
> >Hibernation has always been suspect w.r.t. flushing filesystem
> >metadata. It does not guarantee that the filesystem is quiesced
> >and idle, it just does a sync() and hopes that is sufficient to get
> >the filesystem into a consistent state. The mess that this leaves is
> >then left to filesystem developers to play whack-a-mole with when
> >users have problems.
> 
> 
> Ah, but my problem would then not happen always on the same
> partition. It would affect others, would not?

It needs a busy/dirty filesystem. if the other filesystems are
mostly idle, then they are unlikely to trip over the problem.

> >>But soon after, it oopses:
> >
> >Point of note: there is no oops or crash occurring. XFS dumps the
> >stack when a corruption occurs to tell use where it was detected
> >and then shuts down the filesystem. Your system is still just fine
> >apart from not being able to access that filesystem until you
> >unmount it, rpeair it and mount it again.
> 
> Ok, true, there is no formal "Oops".
> 
> But no, the system does not remains fine, I had to hit the hardware
> reset or power off button to get out.

That usually only happens when the root filesystem is shut down and
you can't access any of the binaries needed to run the system. Is
the filesystem that is shutting down the root?

> >>Question.
> >>
> >>As this always happens on recovery from hibernation, and seeing the message
> >>"Corruption of in-memory data detected", could it be that thawing does a bad
> >>memory recovery from the swap?  I thought that the procedure includes some
> >>checksum, but I don't know for sure.
> >
> >It's the fact that the filesystem si still running and modifying
> >state when the snapshot is being taken that results in the snapshot
> >image containing an inconsistent snapshot. That then gets loaded
> >on thaw and it goes boom.
> 
> But it only happens on the /home partition, not on the email
> partition, for instance, also in the same hard disk.

/home is typically where all the application have open files and are
writing data to.

Email partitions are unlikely to have problems because email
programs are pretty good about using fsync() to ensure your email
doesn't go missing and so aren't dirty at the time of a hibernation.

> Unless... there are probably more things writing on the home
> partition than on the mail partition any time.

*nod*

> >>To me, there are two problems:
> >>
> >> 1) The corruption itself.
> >> 2) That xfs_repair fails to repair the filesystem. In fact, I believe
> >>    it does not detect it!
> >
> >That's because the filesystem is likely to be consistent on disk.
> >The issue is in-memory corruption, not on-disk corruption, like
> >the messages are telling us:
> 
> No, the on disk filesystem is not healthy. If I continue using it,
> after reboot and using "xfs_repair" several times, it fails again
> within a day.

After at least one hibernation and thaw cycle, right?

FWIW, to rule out other issues with repair, you should probably
upgrade to the 3.2.0 xfsprogs release...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-07-04  0:06 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-02  9:57 Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue Carlos E. R.
2014-07-02 12:04 ` Brian Foster
2014-07-02 13:07   ` Mark Tinguely
2014-07-03  2:54     ` Carlos E. R.
2014-07-03  3:00   ` Carlos E. R.
2014-07-03  9:43     ` Dave Chinner
2014-07-03 17:40       ` Brian Foster
2014-07-03 23:34       ` Carlos E. R.
2014-07-04  0:04         ` Dave Chinner [this message]
2014-07-04  1:29           ` Carlos E. R.
2014-07-04  1:40             ` Dave Chinner
2014-07-04  2:42               ` Carlos E. R.
2014-07-04  3:12                 ` Carlos E. R.
2014-07-04 12:40               ` Brian Foster
2014-07-04 13:36                 ` Carlos E. R.
2014-07-03 17:39     ` Brian Foster
2014-07-04 21:32       ` Carlos E. R.
2014-07-05 12:28         ` Brian Foster
2014-07-12  0:30           ` Carlos E. R.
2014-07-12  1:30             ` Carlos E. R.
2014-07-12  1:45               ` Carlos E. R.
2014-07-12 14:26                 ` Brian Foster
2014-07-12 14:19             ` Brian Foster
2014-08-11 14:23 ` Subject : Happened again, 20140811 -- " Carlos E. R.
2014-08-11 14:44   ` Brian Foster
2014-08-11 14:58     ` Carlos E. R.
2014-08-11 17:05       ` Carlos E. R.
2014-08-11 21:31         ` Carlos E. R.
     [not found]           ` <53E938CC.4010103@sgi.com>
2014-08-11 22:01             ` Carlos E. R.
2014-08-11 14:57   ` Mark Tinguely
2014-08-11 15:34     ` Carlos E. R.
2014-08-11 16:14       ` Brian Foster
2014-08-11 17:08         ` Carlos E. R.
2014-08-11 21:27       ` Mark Tinguely
2014-08-11 21:50         ` Carlos E. R.
2014-08-11 21:56           ` Mark Tinguely
2014-08-11 22:36             ` Carlos E. R.
2014-08-12  0:17               ` Carlos E. R.
2014-08-12 16:51                 ` Brian Foster
2014-08-12 21:17                   ` Carlos E. R.
2014-08-13 12:04                     ` Brian Foster
2014-08-13 13:29                       ` Mark Tinguely
2014-08-13 21:04                       ` Dave Chinner
2014-08-12 21:27                   ` Eric Sandeen
2014-08-12 21:57                     ` Dave Chinner
2014-08-12 21:59                     ` Brian Foster
2014-08-12 22:21                       ` Eric Sandeen
2014-08-12 23:16                         ` Dave Chinner
2014-08-13  0:07                           ` Carlos E. R.
2014-09-30 22:27   ` Happened again, 20140930 " Carlos E. R.
2014-10-01  0:45     ` Dave Chinner
2014-10-01  2:48       ` Carlos E. R.
2014-10-01  3:04         ` Eric Sandeen
2014-10-02 11:32         ` Jan Kara
2014-10-02 11:46           ` Carlos E. R.
2014-10-05 14:28             ` Carlos E. R.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704000426.GX4453@dastard \
    --to=david@fromorbit.com \
    --cc=carlos.e.r@opensuse.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox