public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [ceph-users] xfs corruption, data disaster!
       [not found]   ` <loom.20150505T030824-422@post.gmane.org>
@ 2015-05-11 14:47     ` Ric Wheeler
  2015-05-11 14:54       ` Eric Sandeen
  0 siblings, 1 reply; 2+ messages in thread
From: Ric Wheeler @ 2015-05-11 14:47 UTC (permalink / raw)
  To: ceph-users, Linux fs XFS

On 05/05/2015 04:13 AM, Yujian Peng wrote:
> Emmanuel Florac <eflorac@...> writes:
>
>> Le Mon, 4 May 2015 07:00:32 +0000 (UTC)
>> Yujian Peng <pengyujian5201314 <at> 126.com> écrivait:
>>
>>> I'm encountering a data disaster. I have a ceph cluster with 145 osd.
>>> The data center had a power problem yesterday, and all of the ceph
>>> nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
>>> data corruption. Some disks are unable to mount, and some disks have
>>> IO errors in syslog. mount: Structure needs cleaning
>>> 	xfs_log_forece: error 5 returned
>>> I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
>>> reported a leveldb error:
>>> 	Error initializing leveldb: Corruption: checksum mismatch
>>> I cannot start the 6 osds and 22 pgs is down.
>>> This is really a tragedy for me. Can you give me some idea to
>>> recovery the xfs? Thanks very much!
>> For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com
>>
>> You didn't give enough details, by far. What version of kernel and
>> distro are you running? If there were errors, please post extensive
>> logs. If you have IO errors on some disks, you probably MUST replace
>> them before going any further.
>>
>> Why did you run xfs_repair -L ? Did you try xfs_repair without options
>> first? Were you running the very very latest version of xfs_repair
>> (3.2.2) ?
>>
> The OS is ubuntu 12.04.5 with kernel 3.13.0
> uname -a
> Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> cat /etc/issue
> Ubuntu 12.04.5 LTS \n \l
> xfs_repair -V
> xfs_repair version 3.1.7
> I've tried xfs_repair without options, but it showed me some errors, so I
> used the -L option.
> Thanks for your reply!
>

Responding quickly to a couple of things:

* xfs_repair -L wipes out the XFS log, not normally a good thing to do

* replacing disks with IO errors is not a great idea if you still need that 
data. You might want to copy the data from that disk to a new disk (same or 
greater size) and then try to repair that new disk.  A lot depends on the type 
of IO error you see - you might have cable issues, HBA issues, or fairly normal 
read issues (which are not worth replacing a disk for).

You should work with your vendor's support team if you have a support contract 
or post the the XFS devel list (copied above) for help.

Good luck!

Ric



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [ceph-users] xfs corruption, data disaster!
  2015-05-11 14:47     ` [ceph-users] xfs corruption, data disaster! Ric Wheeler
@ 2015-05-11 14:54       ` Eric Sandeen
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Sandeen @ 2015-05-11 14:54 UTC (permalink / raw)
  To: Ric Wheeler, ceph-users, Linux fs XFS

On 5/11/15 9:47 AM, Ric Wheeler wrote:
> On 05/05/2015 04:13 AM, Yujian Peng wrote:
>> Emmanuel Florac <eflorac@...> writes:
>>
>>> Le Mon, 4 May 2015 07:00:32 +0000 (UTC)
>>> Yujian Peng <pengyujian5201314 <at> 126.com> écrivait:
>>>
>>>> I'm encountering a data disaster. I have a ceph cluster with 145 osd.
>>>> The data center had a power problem yesterday, and all of the ceph
>>>> nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
>>>> data corruption. Some disks are unable to mount, and some disks have
>>>> IO errors in syslog. mount: Structure needs cleaning
>>>>     xfs_log_forece: error 5 returned
>>>> I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
>>>> reported a leveldb error:
>>>>     Error initializing leveldb: Corruption: checksum mismatch
>>>> I cannot start the 6 osds and 22 pgs is down.
>>>> This is really a tragedy for me. Can you give me some idea to
>>>> recovery the xfs? Thanks very much!
>>> For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com
>>>
>>> You didn't give enough details, by far. What version of kernel and
>>> distro are you running? If there were errors, please post extensive
>>> logs. If you have IO errors on some disks, you probably MUST replace
>>> them before going any further.
>>>
>>> Why did you run xfs_repair -L ? Did you try xfs_repair without options
>>> first? Were you running the very very latest version of xfs_repair
>>> (3.2.2) ?
>>>
>> The OS is ubuntu 12.04.5 with kernel 3.13.0
>> uname -a
>> Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
>> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> cat /etc/issue
>> Ubuntu 12.04.5 LTS \n \l
>> xfs_repair -V
>> xfs_repair version 3.1.7
>> I've tried xfs_repair without options, but it showed me some errors, so I
>> used the -L option.
>> Thanks for your reply!
>>
> 
> Responding quickly to a couple of things:
> 
> * xfs_repair -L wipes out the XFS log, not normally a good thing to do

And if required due to an unreplayable log, often indicates some problem
with the storage system.  For example a volatile write cache not synced as
needed, and lost along with a power loss, leading to a corrupted and
unreplayable XFS log.

> * replacing disks with IO errors is not a great idea if you still
> need that data. You might want to copy the data from that disk to a
> new disk (same or greater size) and then try to repair that new disk.
> A lot depends on the type of IO error you see - you might have cable
> issues, HBA issues, or fairly normal read issues (which are not worth
> replacing a disk for).

Just a note that XFS sometimes starts saying "IO error" when the filesystem
has shut down; this isn't the same as a block-device-level IO error, but you
haven't posted logs or anything, so I'm just guessing here.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

-Eric

> You should work with your vendor's support team if you have a support
> contract or post the the XFS devel list (copied above) for help.
> 
> Good luck!
> 
> Ric
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-05-11 14:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <loom.20150504T085721-88@post.gmane.org>
     [not found] ` <20150504161912.6ff8621b@harpe.intellique.com>
     [not found]   ` <loom.20150505T030824-422@post.gmane.org>
2015-05-11 14:47     ` [ceph-users] xfs corruption, data disaster! Ric Wheeler
2015-05-11 14:54       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox