xfs data loss

* xfs data loss
@ 2009-09-06  9:00 Passerone, Daniele
  2009-09-06  9:30 ` Michael Monnerie
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Passerone, Daniele @ 2009-09-06  9:00 UTC (permalink / raw)
  To: xfs@oss.sgi.com

> [ ... ]

Hi Peter, thank you for your long message. Some of the things you suppose,

though, may not be exact. I'll try to give you some new element.

>But there was apparently a power "event" of some sort, and IIRC
>the system stopped working, and there were other signs that the
>block layer had suffered damage

DP> 2) /dev/md5, a 19+1 RAID 5, that could not mount
DP> anymore...lost superblock.

PG> The fact that were was apparent difficulty means that the
PG> automatic "resync" that RAID5 implementatioqns do if only 1 drive
PG> has been lost did not work, which is ominous.

PG> With a 19+1 RAID5 with 2 devices dead you have lost around 5-6%
PG> of the data; regrettably this is not 5-6% of the files, but most
PG> likely 5-6% of most files (and probably quite a bit of XFS metadata).

Up to now I found no damage in any file of md5 after recovery with

the mdadm --assemble --assume-clean.

Just an example: a MB-sized tar.gz file, compression of a postscript file,

uncompressed perfectly and was visualized in a perfect way by ghostview.

Moreover, a device died (a different one) yesterday, and in the messages I have:

Sep 4 11:00:44 ipazia-sun kernel: Badness in mv_start_dma at drivers/ata/sata_mv.c:651

Sep 4 11:00:44 ipazia-sun kernel:

Sep 4 11:00:44 ipazia-sun kernel: Call Trace: <ffffffff88099f96>{:sata_mv:mv_qc_issue+292}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff88035600>{:scsi_mod:scsi_done+0} <ffffffff8807b214>{:libata:ata_scsi_rw_xlat+0}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff8807727b>{:libata:ata_qc_issue+1037} <ffffffff88035600>{:scsi_mod:scsi_done+0}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff8807b214>{:libata:ata_scsi_rw_xlat+0} <ffffffff8807b4a9>{:libata:ata_scsi_translate+286}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff88035600>{:scsi_mod:scsi_done+0} <ffffffff8807d549>{:libata:ata_scsi_queuecmd+315}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff88035a6d>{:scsi_mod:scsi_dispatch_cmd+546}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff8803b06d>{:scsi_mod:scsi_request_fn+760} <ffffffff801e8aff>{elv_insert+230}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff801ed890>{__make_request+987} <ffffffff80164059>{mempool_alloc+49}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff801eaa13>{generic_make_request+538} <ffffffff8018b629>{__bio_clone+116}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80147d5d>{keventd_create_kthread+0} <ffffffff801ec844>{submit_bio+186}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80275ae8>{md_update_sb+270} <ffffffff802780bb>{md_check_recovery+371}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80147d5d>{keventd_create_kthread+0} <ffffffff880f6f61>{:raid5:raid5d+21}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80279990>{md_thread+267} <ffffffff80148166>{autoremove_wake_function+0}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80147d5d>{keventd_create_kthread+0} <ffffffff80279885>{md_thread+0}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80148025>{kthread+236} <ffffffff8010bea6>{child_rip+8}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff80147d5d>{keventd_create_kthread+0} <ffffffff80147f39>{kthread+0}

Sep 4 11:00:44 ipazia-sun kernel: <ffffffff8010be9e>{child_rip+0}

Sep 4 11:01:44 ipazia-sun kernel: ata42: Entering mv_eng_timeout

Sep 4 11:01:44 ipazia-sun kernel: mmio_base ffffc20001000000 ap ffff8103f8b4c488 qc ffff8103f8b4cf68 scsi_cmnd ffff8101f7e556c0 &cmnd ffff8101f7e5571c

Sep 4 11:01:44 ipazia-sun kernel: ata42: no sense translation for status: 0x40

Sep 4 11:01:44 ipazia-sun kernel: ata42: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00

Sep 4 11:01:44 ipazia-sun kernel: ata42: status=0x40 { DriveReady }

Sep 4 11:01:44 ipazia-sun kernel: end_request: I/O error, dev sdap, sector 976767935

Sep 4 11:01:44 ipazia-sun kernel: RAID5 conf printout:

(...)

DP> The resync of the /dev/md5 was performed, the raid was again
DP> with 20 working devices,

PG> The original 20 devices or did you put in 2 new blank hard drives?
PG> I feel like that 2 blank drives went in, but then later I read
PG>that all [original] 20 drives could be read for a few MB at the
PG>beginning.

No. No blank drives went in. And I always used the original 20 devices.

I therefore suspect that the "broken devices" indication, since it is repeatedly found

in the last weeks, and always for different devices/filesystems, has to do with the RAID controller,

and not with a specific device failure-.

PG>Well, I can try to explain the bits that maybe are missing.

PG>* Almost all your problems are block layer problems. Since XFS
PG>  assumes error free block layer, it is your task to ensure that
PG>  the black layer is error free. Which means that almost all the
PG>  work that you should have done was to first ensure that the
PG>  block layer is error free, byt testing fully each drive and
PG>  then putting together the array. It is quite likely that none
PG>  of the issues that your have reported has much to do with XFS.

Couild have to do with the raid controller layer?

PG>* This makes it look like that the *filesystem* is fine, even if
PG> quite a bit of data in each file has been replaced. XFS wisely
PG>  does nothing for the data (other than avoiding to deliberately
PG>  damage it) -- if your application does not add redundancy or
PG>  checksums to the data, you have no way to reconstruct it or even
PG>  check whether it is damaged in case of partial loss.

Well, a binary file with 5% data loss would simply not work.

But I have executables on this filesystem, and they run!

PG > * 2 or more in each of the 20 disk arrays is damaged in the same
PG >offsets, and full data recovery is not possible.

PG>* Somehow 'xfs_repair' managed to rebuild the metadata of
PG>  '/dev/md5' despite a loss of 5-6% of it, so it looks
PG>  "consistent" as far as XFS is concerned, but up to 5-6% of
PG>  each file is essentially random, and it is very difficult to
PG>  know where the random part are.

I don't see any element to support this - at present.

PG>* With '/dev/md4' 'xfs_repair' the 5-6% metadata lost was in
PG>  more critical parts of the filesystem, so the metadata for
PG>  half of the files is gone. Of the remaining files, up to
PG>  5-6% of their data is random.

Half of the file was gone already before repair, and it remains gone after,

and for the remaining files, I have no sign of randomness.

Summarizing, it may well be that the devices are broken but I suspect, again, a failure in the controller.

Could it be?

I contacted Sun and they asked me output of Siga, ipmi, etc.

DAniele

________________________________

 *   Previous message: xfs data loss <http://oss.sgi.com/pipermail/xfs/2009-September/042515.html>
 *   Next message: [PATCH 2/4] xfs: make sure xfs_sync_fsdata covers the log <http://oss.sgi.com/pipermail/xfs/2009-September/042516.html>
 *   Messages sorted by: [ date ]<http://oss.sgi.com/pipermail/xfs/2009-September/date.html#42539> [ thread ]<http://oss.sgi.com/pipermail/xfs/2009-September/thread.html#42539> [ subject ]<http://oss.sgi.com/pipermail/xfs/2009-September/subject.html#42539> [ author ]<http://oss.sgi.com/pipermail/xfs/2009-September/author.html#42539>

________________________________
More information about the xfs mailing list<http://oss.sgi.com/mailman/listinfo/xfs>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 29+ messages in thread