Partially corrupted raid array beneath xfs

* Partially corrupted raid array beneath xfs
@ 2012-01-24 17:59 Christopher Evans
  2012-01-25  0:49 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Christopher Evans @ 2012-01-24 17:59 UTC (permalink / raw)
  To: xfs

[-- Attachment #1.1: Type: text/plain, Size: 1867 bytes --]

I made a mistake by recreating a raid 6 array, instead of taking the proper
steps to rebuild it. Is there a way I can get find out which directories,
files are/might be corrupted if 64k blocks of data offset every 21 times
for an unknown count. Unfortunetly I've already mounted the raid array and
have gotten xfs errors because of the corrupted data beneath it.

OS: Centos 5.5 64bit 2.6.18-194.el5
kmod-xfs: 0.4-2
xfsprogs: 2.9.4-1.el5.centos

I ran mdadm --create /dev/md0 --level=6 --raid-devices=23 /dev/sda /dev/sdb
/dev/sdc /dev/sdd /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
/dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds
/dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx, and then 5 minutes later set
the drive I replaces as faulty with mdadm --manage --set-faulty /dev/md0
/dev/sdm. This should result in 64k every 21 times being random data, for 5
minutes worth of raid rebuild ( which took ~20-30 hours to rebuild ).

In my testing with a vm with 4 drives in raid 6, I believe I only corrupted
the first 5 minutes of raid rebuild. After I create a raid 6 array, I would
dd if=/dev/zero of=/dev/md0. Then I would set a data drive to faultly, and
remove it. Running hexdump on it would result in all zeros. To give
different data I would dd if=/dev/urandom of=/dev/removed_drive. When I
recreate the array, it would recognize that three of the drives had been in
an array already and ask if I want to continue. Since I said yes it would
use the data that would be for the data drives it seems. If I then set the
drive with randomized data to faulty during the rebuild, it would seem to
continue the rebuild as if the drive was failed/missing. When I would add
the drive back, it would rebuild the array again. The beginning of the
corrupted drive would still show random data, but data further down the
disk would show zeros.

[-- Attachment #1.2: Type: text/html, Size: 1978 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread