public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Shrinath M <shrinath.m@webyog.com>
Cc: Sabyasachi Ruj <sabyasachi.ruj@webyog.com>,
	Vivek Goel <vivek.goel@webyog.com>,
	Supratik Goswami <supratik.goswami@webyog.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	xfs@oss.sgi.com
Subject: Re: XFS filesystem on EC2 instance corrupts and shuts down
Date: Wed, 13 Mar 2013 13:56:35 -0500	[thread overview]
Message-ID: <5140CBE3.80705@sandeen.net> (raw)
In-Reply-To: <CAOdS1hngSuHn_HiremLyUS7Qd9eZ68=8arfBuHnEpwXQaBw9Wg@mail.gmail.com>

On 3/13/13 1:07 PM, Shrinath M wrote:
> Sorry to be asking in dev thread, but Amazon seems to be clueless in this case :(
> Can someone tell me where can we find the logs/output of xfs repair
> after this runs? We just reboot the machine when we see this and the
> /var/log/messages or dmesg seems to know nothing about what it
> repaired.

xfs_repair does not run automatically at boot on any OS I know of; xfs simply
replays the log.  But then I don't know what OS you are running, looks like
an amazon special?  It's a pity they can't support the OS they provide you,
because on an older kernel like this, upstream developers will be less
interested unless the problem persists in upstream kernels.  This sort
of support is usually best left to an OS vendor.

But all that aside, you list this as the first error:

    Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0): Corruption detected. Unmount and run xfs_repair

but I am wondering if there might be more information before this which is not in your trimmed logs.

The text above is from xfs_corruption_error() which calls xfs_error_report() before
the above message, and which should normally tell us a lot more about what went wrong, for 
example something like "Internal error %s at line %d of file %s.  Caller 0x%"
and possibly a hexdump or stack trace.

One of the things in
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
is:

" dmesg output showing all error messages and stack traces "

If you really didn't get anything else before this, try:

echo 11 > /proc/sys/fs/xfs/error_level

to capture the one instance where a corruption does not trigger verbose logs. That actually might be what you hit.

It's a little odd that you get:

Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014259] XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.

because AFAIK, 117 is not any known error number (not even xfs's old EFSCORRUPTED value, which was 990)
But I see other references in various places to this error number coming from XFS - so I'm not sure.

-Eric

> 
> On Wed, Mar 6, 2013 at 7:55 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com>> wrote:
> 
>     I would suggest contacting Amazon's customer support channel (or the vendor you paid for the Linux instance you are running).
> 
>     XFS developer list is probably not the correct forum to help you debug this :)
> 
>     Good luck!
> 
>     Ric
> 
> 
> 
>     On 03/06/2013 08:12 AM, Supratik Goswami wrote:
> 
>         Have we created a ticket with AWS ?
> 
>         It could be an EBS issue who knows, we need to confirm that first.
> 
>         --
>         Warm Regards
> 
>         Supratik
> 
> 
>         On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com> <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>>> wrote:
> 
>             On 03/06/2013 08:03 AM, Shrinath M wrote:
> 
> 
>                 On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@redhat.com <mailto:rwheeler@redhat.com>
>                 <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>> <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>
> 
>                 <mailto:rwheeler@redhat.com <mailto:rwheeler@redhat.com>>>> wrote:
> 
>                     I think that you would need to verify that the Amazon storage is not
>                     throwing errors - do your logs show IO errors or issues before XFS
>                 hits an
>                     issue?
> 
> 
>                 No IO errors in /var/log/messages.
>                 Where else should I be looking?
> 
> 
> 
>             Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>             Error Detected. Shutting down filesystem
> 
>             Is an IO error from MD.
> 
>             I would suggest trying to reproduce without MD in the picture first -
>             always best to try to reproduce with the simplest setup first and work
>             your way up the complexity ladder,
> 
>             Ric
> 
> 
> 
> 
>         _________________________________________________
>         xfs mailing list
>         xfs@oss.sgi.com <mailto:xfs@oss.sgi.com>
>         http://oss.sgi.com/mailman/__listinfo/xfs <http://oss.sgi.com/mailman/listinfo/xfs>
> 
> 
> 
> 
> 
> -- 
> Regards
> *Shrinath.M*
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-03-13 18:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06  8:07 XFS filesystem on EC2 instance corrupts and shuts down Shrinath M
2013-03-06 12:59 ` Ric Wheeler
2013-03-06 13:03   ` Shrinath M
2013-03-06 13:08     ` Ric Wheeler
2013-03-06 13:12       ` Supratik Goswami
2013-03-06 13:15         ` Supratik Goswami
2013-03-06 14:25         ` Ric Wheeler
2013-03-13 18:07           ` Shrinath M
2013-03-13 18:24             ` Ben Myers
2013-03-13 18:56             ` Eric Sandeen [this message]
2013-03-13 19:10               ` Eric Sandeen
2013-03-13 23:42               ` Dave Chinner
2013-03-14  1:28                 ` Shrinath M
2013-03-14 13:31                   ` Stan Hoeppner
2013-03-14 22:02                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5140CBE3.80705@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=rwheeler@redhat.com \
    --cc=sabyasachi.ruj@webyog.com \
    --cc=shrinath.m@webyog.com \
    --cc=supratik.goswami@webyog.com \
    --cc=vivek.goel@webyog.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox