From: Eric Sandeen <sandeen@sandeen.net>
To: Ric Wheeler <rwheeler@redhat.com>,
ceph-users@lists.ceph.com, Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: [ceph-users] xfs corruption, data disaster!
Date: Mon, 11 May 2015 09:54:18 -0500 [thread overview]
Message-ID: <5550C29A.2070206@sandeen.net> (raw)
In-Reply-To: <5550C11F.9090807@redhat.com>
On 5/11/15 9:47 AM, Ric Wheeler wrote:
> On 05/05/2015 04:13 AM, Yujian Peng wrote:
>> Emmanuel Florac <eflorac@...> writes:
>>
>>> Le Mon, 4 May 2015 07:00:32 +0000 (UTC)
>>> Yujian Peng <pengyujian5201314 <at> 126.com> écrivait:
>>>
>>>> I'm encountering a data disaster. I have a ceph cluster with 145 osd.
>>>> The data center had a power problem yesterday, and all of the ceph
>>>> nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
>>>> data corruption. Some disks are unable to mount, and some disks have
>>>> IO errors in syslog. mount: Structure needs cleaning
>>>> xfs_log_forece: error 5 returned
>>>> I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
>>>> reported a leveldb error:
>>>> Error initializing leveldb: Corruption: checksum mismatch
>>>> I cannot start the 6 osds and 22 pgs is down.
>>>> This is really a tragedy for me. Can you give me some idea to
>>>> recovery the xfs? Thanks very much!
>>> For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com
>>>
>>> You didn't give enough details, by far. What version of kernel and
>>> distro are you running? If there were errors, please post extensive
>>> logs. If you have IO errors on some disks, you probably MUST replace
>>> them before going any further.
>>>
>>> Why did you run xfs_repair -L ? Did you try xfs_repair without options
>>> first? Were you running the very very latest version of xfs_repair
>>> (3.2.2) ?
>>>
>> The OS is ubuntu 12.04.5 with kernel 3.13.0
>> uname -a
>> Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
>> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> cat /etc/issue
>> Ubuntu 12.04.5 LTS \n \l
>> xfs_repair -V
>> xfs_repair version 3.1.7
>> I've tried xfs_repair without options, but it showed me some errors, so I
>> used the -L option.
>> Thanks for your reply!
>>
>
> Responding quickly to a couple of things:
>
> * xfs_repair -L wipes out the XFS log, not normally a good thing to do
And if required due to an unreplayable log, often indicates some problem
with the storage system. For example a volatile write cache not synced as
needed, and lost along with a power loss, leading to a corrupted and
unreplayable XFS log.
> * replacing disks with IO errors is not a great idea if you still
> need that data. You might want to copy the data from that disk to a
> new disk (same or greater size) and then try to repair that new disk.
> A lot depends on the type of IO error you see - you might have cable
> issues, HBA issues, or fairly normal read issues (which are not worth
> replacing a disk for).
Just a note that XFS sometimes starts saying "IO error" when the filesystem
has shut down; this isn't the same as a block-device-level IO error, but you
haven't posted logs or anything, so I'm just guessing here.
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
-Eric
> You should work with your vendor's support team if you have a support
> contract or post the the XFS devel list (copied above) for help.
>
> Good luck!
>
> Ric
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2015-05-11 14:54 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <loom.20150504T085721-88@post.gmane.org>
[not found] ` <20150504161912.6ff8621b@harpe.intellique.com>
[not found] ` <loom.20150505T030824-422@post.gmane.org>
2015-05-11 14:47 ` [ceph-users] xfs corruption, data disaster! Ric Wheeler
2015-05-11 14:54 ` Eric Sandeen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5550C29A.2070206@sandeen.net \
--to=sandeen@sandeen.net \
--cc=ceph-users@lists.ceph.com \
--cc=rwheeler@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.