* not enough operational mirrors @ 2014-09-22 5:32 Ian Young 2014-09-22 5:47 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: Ian Young @ 2014-09-22 5:32 UTC (permalink / raw) To: linux-raid My 6-drive software RAID 10 array failed. The individual drives failed one at a time over the past few months but it's been an extremely busy summer and I didn't have the free time to RMA the drives and rebuild the array. Now I'm wishing I had acted sooner because three of the drives are marked as removed and the array doesn't have enough mirrors to start. I followed the recovery instructions at raid.wiki.kernel.org and, before making things any worse, saved the status using mdadm --examine and consulted this mailing list. Here's the status: http://pastebin.com/KkV8e8Gq I can see that the event counts on sdd2 and sdf2 are significantly far behind, so we can consider that data too old. sdc2 is only behind by two events, so any data loss there should be minimal. If I can make the array start with sd[abce]2 I think that will be enough to mount the filesystem, back up my data, and start replacing drives. How do I do that? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-22 5:32 not enough operational mirrors Ian Young @ 2014-09-22 5:47 ` NeilBrown 2014-09-22 17:17 ` Ian Young [not found] ` <CANs+QMwUWZ+z0Kk-voHRLZaherOh8K8o_gCXVvw7nnXYT_goUg@mail.gmail.com> 0 siblings, 2 replies; 8+ messages in thread From: NeilBrown @ 2014-09-22 5:47 UTC (permalink / raw) To: Ian Young; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1132 bytes --] On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: > My 6-drive software RAID 10 array failed. The individual drives > failed one at a time over the past few months but it's been an > extremely busy summer and I didn't have the free time to RMA the > drives and rebuild the array. Now I'm wishing I had acted sooner > because three of the drives are marked as removed and the array > doesn't have enough mirrors to start. I followed the recovery > instructions at raid.wiki.kernel.org and, before making things any > worse, saved the status using mdadm --examine and consulted this > mailing list. Here's the status: > > http://pastebin.com/KkV8e8Gq > > I can see that the event counts on sdd2 and sdf2 are significantly far > behind, so we can consider that data too old. sdc2 is only behind by > two events, so any data loss there should be minimal. If I can make > the array start with sd[abce]2 I think that will be enough to mount > the filesystem, back up my data, and start replacing drives. How do I > do that? Use the "--force" option with "--assemble". NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-22 5:47 ` NeilBrown @ 2014-09-22 17:17 ` Ian Young 2014-09-22 23:53 ` NeilBrown [not found] ` <CANs+QMwUWZ+z0Kk-voHRLZaherOh8K8o_gCXVvw7nnXYT_goUg@mail.gmail.com> 1 sibling, 1 reply; 8+ messages in thread From: Ian Young @ 2014-09-22 17:17 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid I forced the three good disks and the one that was behind by two events to assemble: mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 Then I added the other two disks and let it sync overnight: mdadm --add --force /dev/md0 /dev/sdd2 mdadm --add --force /dev/md0 /dev/sdf2 I rebooted the system in recovery mode and the root filesystem is back! However, / is read-only and my /srv partition, which is the largest and has most of my data, can't mount. When I try to examine the array, it says "no md superblock detected on /dev/md0." On top of the software RAID, I have four logical volumes. Here is the full LVM configuration: http://pastebin.com/gzdZq5DL How do I recover the superblock? On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: > On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: > >> My 6-drive software RAID 10 array failed. The individual drives >> failed one at a time over the past few months but it's been an >> extremely busy summer and I didn't have the free time to RMA the >> drives and rebuild the array. Now I'm wishing I had acted sooner >> because three of the drives are marked as removed and the array >> doesn't have enough mirrors to start. I followed the recovery >> instructions at raid.wiki.kernel.org and, before making things any >> worse, saved the status using mdadm --examine and consulted this >> mailing list. Here's the status: >> >> http://pastebin.com/KkV8e8Gq >> >> I can see that the event counts on sdd2 and sdf2 are significantly far >> behind, so we can consider that data too old. sdc2 is only behind by >> two events, so any data loss there should be minimal. If I can make >> the array start with sd[abce]2 I think that will be enough to mount >> the filesystem, back up my data, and start replacing drives. How do I >> do that? > > Use the "--force" option with "--assemble". > > NeilBrown ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-22 17:17 ` Ian Young @ 2014-09-22 23:53 ` NeilBrown 2014-09-23 0:55 ` Ian Young 0 siblings, 1 reply; 8+ messages in thread From: NeilBrown @ 2014-09-22 23:53 UTC (permalink / raw) To: Ian Young; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2490 bytes --] On Mon, 22 Sep 2014 10:17:46 -0700 Ian Young <ian@duffrecords.com> wrote: > I forced the three good disks and the one that was behind by two > events to assemble: > > mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 > > Then I added the other two disks and let it sync overnight: > > mdadm --add --force /dev/md0 /dev/sdd2 > mdadm --add --force /dev/md0 /dev/sdf2 > > I rebooted the system in recovery mode and the root filesystem is > back! However, / is read-only and my /srv partition, which is the > largest and has most of my data, can't mount. When I try to examine > the array, it says "no md superblock detected on /dev/md0." On top of > the software RAID, I have four logical volumes. Here is the full LVM > configuration: > > http://pastebin.com/gzdZq5DL > > How do I recover the superblock? What sort of filesystem is it? ext4?? Try "fsck -n" and see if it finds anything. The fact that LVM found everything suggests that the array is mostly working. Maybe just one superblock got corrupted somehow. If 'fsck' doesn't get you anywhere you might need to ask on a forum dedicated to the particular filesystem. NeilBrown > > On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: > > On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: > > > >> My 6-drive software RAID 10 array failed. The individual drives > >> failed one at a time over the past few months but it's been an > >> extremely busy summer and I didn't have the free time to RMA the > >> drives and rebuild the array. Now I'm wishing I had acted sooner > >> because three of the drives are marked as removed and the array > >> doesn't have enough mirrors to start. I followed the recovery > >> instructions at raid.wiki.kernel.org and, before making things any > >> worse, saved the status using mdadm --examine and consulted this > >> mailing list. Here's the status: > >> > >> http://pastebin.com/KkV8e8Gq > >> > >> I can see that the event counts on sdd2 and sdf2 are significantly far > >> behind, so we can consider that data too old. sdc2 is only behind by > >> two events, so any data loss there should be minimal. If I can make > >> the array start with sd[abce]2 I think that will be enough to mount > >> the filesystem, back up my data, and start replacing drives. How do I > >> do that? > > > > Use the "--force" option with "--assemble". > > > > NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-22 23:53 ` NeilBrown @ 2014-09-23 0:55 ` Ian Young 2014-09-23 17:07 ` Ian Young 0 siblings, 1 reply; 8+ messages in thread From: Ian Young @ 2014-09-23 0:55 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid It's XFS. I'm running: xfs_repair -n /dev/mapper/vg_raid10-srv I expect it will take hours or days as this volume is 8.15 TiB. On Mon, Sep 22, 2014 at 4:53 PM, NeilBrown <neilb@suse.de> wrote: > On Mon, 22 Sep 2014 10:17:46 -0700 Ian Young <ian@duffrecords.com> wrote: > >> I forced the three good disks and the one that was behind by two >> events to assemble: >> >> mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 >> >> Then I added the other two disks and let it sync overnight: >> >> mdadm --add --force /dev/md0 /dev/sdd2 >> mdadm --add --force /dev/md0 /dev/sdf2 >> >> I rebooted the system in recovery mode and the root filesystem is >> back! However, / is read-only and my /srv partition, which is the >> largest and has most of my data, can't mount. When I try to examine >> the array, it says "no md superblock detected on /dev/md0." On top of >> the software RAID, I have four logical volumes. Here is the full LVM >> configuration: >> >> http://pastebin.com/gzdZq5DL >> >> How do I recover the superblock? > > What sort of filesystem is it? ext4?? > > Try "fsck -n" and see if it finds anything. > > The fact that LVM found everything suggests that the array is mostly > working. Maybe just one superblock got corrupted somehow. If 'fsck' doesn't > get you anywhere you might need to ask on a forum dedicated to the particular > filesystem. > > NeilBrown > > >> >> On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: >> > On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: >> > >> >> My 6-drive software RAID 10 array failed. The individual drives >> >> failed one at a time over the past few months but it's been an >> >> extremely busy summer and I didn't have the free time to RMA the >> >> drives and rebuild the array. Now I'm wishing I had acted sooner >> >> because three of the drives are marked as removed and the array >> >> doesn't have enough mirrors to start. I followed the recovery >> >> instructions at raid.wiki.kernel.org and, before making things any >> >> worse, saved the status using mdadm --examine and consulted this >> >> mailing list. Here's the status: >> >> >> >> http://pastebin.com/KkV8e8Gq >> >> >> >> I can see that the event counts on sdd2 and sdf2 are significantly far >> >> behind, so we can consider that data too old. sdc2 is only behind by >> >> two events, so any data loss there should be minimal. If I can make >> >> the array start with sd[abce]2 I think that will be enough to mount >> >> the filesystem, back up my data, and start replacing drives. How do I >> >> do that? >> > >> > Use the "--force" option with "--assemble". >> > >> > NeilBrown > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-23 0:55 ` Ian Young @ 2014-09-23 17:07 ` Ian Young 2014-10-05 21:43 ` Ian Young 0 siblings, 1 reply; 8+ messages in thread From: Ian Young @ 2014-09-23 17:07 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid I booted from a live CD so I could use version 3.1.10 of xfs_repair (versions < 3.1.8 reportedly have a bug when using ag_stride), then ran the following command: xfs_repair -P -o bhash=16384 -o ihash=16384 -o ag_stride=16 /dev/mapper/vg_raid10-srv It stopped after a few seconds, saying: xfs_repair: read failed: Input/output error XFS: failed to find log head zero_log: cannot find log head/tail (xlog_find_tail=5), zeroing it anyway xfs_repair: libxfs_device_zero write failed: Input/output error However, I was able to mount the volume after that and my data was still there! Thanks for pointing me in the right direction with the RAID. On Mon, Sep 22, 2014 at 5:55 PM, Ian Young <ian@duffrecords.com> wrote: > It's XFS. I'm running: > > xfs_repair -n /dev/mapper/vg_raid10-srv > > I expect it will take hours or days as this volume is 8.15 TiB. > > On Mon, Sep 22, 2014 at 4:53 PM, NeilBrown <neilb@suse.de> wrote: >> On Mon, 22 Sep 2014 10:17:46 -0700 Ian Young <ian@duffrecords.com> wrote: >> >>> I forced the three good disks and the one that was behind by two >>> events to assemble: >>> >>> mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 >>> >>> Then I added the other two disks and let it sync overnight: >>> >>> mdadm --add --force /dev/md0 /dev/sdd2 >>> mdadm --add --force /dev/md0 /dev/sdf2 >>> >>> I rebooted the system in recovery mode and the root filesystem is >>> back! However, / is read-only and my /srv partition, which is the >>> largest and has most of my data, can't mount. When I try to examine >>> the array, it says "no md superblock detected on /dev/md0." On top of >>> the software RAID, I have four logical volumes. Here is the full LVM >>> configuration: >>> >>> http://pastebin.com/gzdZq5DL >>> >>> How do I recover the superblock? >> >> What sort of filesystem is it? ext4?? >> >> Try "fsck -n" and see if it finds anything. >> >> The fact that LVM found everything suggests that the array is mostly >> working. Maybe just one superblock got corrupted somehow. If 'fsck' doesn't >> get you anywhere you might need to ask on a forum dedicated to the particular >> filesystem. >> >> NeilBrown >> >> >>> >>> On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: >>> > On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: >>> > >>> >> My 6-drive software RAID 10 array failed. The individual drives >>> >> failed one at a time over the past few months but it's been an >>> >> extremely busy summer and I didn't have the free time to RMA the >>> >> drives and rebuild the array. Now I'm wishing I had acted sooner >>> >> because three of the drives are marked as removed and the array >>> >> doesn't have enough mirrors to start. I followed the recovery >>> >> instructions at raid.wiki.kernel.org and, before making things any >>> >> worse, saved the status using mdadm --examine and consulted this >>> >> mailing list. Here's the status: >>> >> >>> >> http://pastebin.com/KkV8e8Gq >>> >> >>> >> I can see that the event counts on sdd2 and sdf2 are significantly far >>> >> behind, so we can consider that data too old. sdc2 is only behind by >>> >> two events, so any data loss there should be minimal. If I can make >>> >> the array start with sd[abce]2 I think that will be enough to mount >>> >> the filesystem, back up my data, and start replacing drives. How do I >>> >> do that? >>> > >>> > Use the "--force" option with "--assemble". >>> > >>> > NeilBrown >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: not enough operational mirrors 2014-09-23 17:07 ` Ian Young @ 2014-10-05 21:43 ` Ian Young 0 siblings, 0 replies; 8+ messages in thread From: Ian Young @ 2014-10-05 21:43 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid I've received two replacement drives and added them to the array. One of them finished synchronizing and became an active member. The other, sdf, has been treated as a spare. After running a smartctl test on each of the drives, I found that sde has errors, preventing the sync process from making sdf an active member. I have tried a couple of recommendations I read on various sites, such as stopping the array and recreating it with the "--assume-clean" option (not possible because a process is using the array) and growing the array one disk larger (not possible because this is RAID 10). Should I try to repair the bad blocks or is there a way to force sde and sdf to sync first? [root@localhost ~]# smartctl -l selftest /dev/sde smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.6.10-4.fc18.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 11822 1187144704 # 2 Short offline Completed: read failure 90% 11814 1187144704 On Tue, Sep 23, 2014 at 10:07 AM, Ian Young <ian@duffrecords.com> wrote: > I booted from a live CD so I could use version 3.1.10 of xfs_repair > (versions < 3.1.8 reportedly have a bug when using ag_stride), then > ran the following command: > > xfs_repair -P -o bhash=16384 -o ihash=16384 -o ag_stride=16 > /dev/mapper/vg_raid10-srv > > It stopped after a few seconds, saying: > > xfs_repair: read failed: Input/output error > XFS: failed to find log head > zero_log: cannot find log head/tail (xlog_find_tail=5), zeroing it anyway > xfs_repair: libxfs_device_zero write failed: Input/output error > > However, I was able to mount the volume after that and my data was > still there! Thanks for pointing me in the right direction with the > RAID. > > On Mon, Sep 22, 2014 at 5:55 PM, Ian Young <ian@duffrecords.com> wrote: >> It's XFS. I'm running: >> >> xfs_repair -n /dev/mapper/vg_raid10-srv >> >> I expect it will take hours or days as this volume is 8.15 TiB. >> >> On Mon, Sep 22, 2014 at 4:53 PM, NeilBrown <neilb@suse.de> wrote: >>> On Mon, 22 Sep 2014 10:17:46 -0700 Ian Young <ian@duffrecords.com> wrote: >>> >>>> I forced the three good disks and the one that was behind by two >>>> events to assemble: >>>> >>>> mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 >>>> >>>> Then I added the other two disks and let it sync overnight: >>>> >>>> mdadm --add --force /dev/md0 /dev/sdd2 >>>> mdadm --add --force /dev/md0 /dev/sdf2 >>>> >>>> I rebooted the system in recovery mode and the root filesystem is >>>> back! However, / is read-only and my /srv partition, which is the >>>> largest and has most of my data, can't mount. When I try to examine >>>> the array, it says "no md superblock detected on /dev/md0." On top of >>>> the software RAID, I have four logical volumes. Here is the full LVM >>>> configuration: >>>> >>>> http://pastebin.com/gzdZq5DL >>>> >>>> How do I recover the superblock? >>> >>> What sort of filesystem is it? ext4?? >>> >>> Try "fsck -n" and see if it finds anything. >>> >>> The fact that LVM found everything suggests that the array is mostly >>> working. Maybe just one superblock got corrupted somehow. If 'fsck' doesn't >>> get you anywhere you might need to ask on a forum dedicated to the particular >>> filesystem. >>> >>> NeilBrown >>> >>> >>>> >>>> On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: >>>> > On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: >>>> > >>>> >> My 6-drive software RAID 10 array failed. The individual drives >>>> >> failed one at a time over the past few months but it's been an >>>> >> extremely busy summer and I didn't have the free time to RMA the >>>> >> drives and rebuild the array. Now I'm wishing I had acted sooner >>>> >> because three of the drives are marked as removed and the array >>>> >> doesn't have enough mirrors to start. I followed the recovery >>>> >> instructions at raid.wiki.kernel.org and, before making things any >>>> >> worse, saved the status using mdadm --examine and consulted this >>>> >> mailing list. Here's the status: >>>> >> >>>> >> http://pastebin.com/KkV8e8Gq >>>> >> >>>> >> I can see that the event counts on sdd2 and sdf2 are significantly far >>>> >> behind, so we can consider that data too old. sdc2 is only behind by >>>> >> two events, so any data loss there should be minimal. If I can make >>>> >> the array start with sd[abce]2 I think that will be enough to mount >>>> >> the filesystem, back up my data, and start replacing drives. How do I >>>> >> do that? >>>> > >>>> > Use the "--force" option with "--assemble". >>>> > >>>> > NeilBrown >>> ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CANs+QMwUWZ+z0Kk-voHRLZaherOh8K8o_gCXVvw7nnXYT_goUg@mail.gmail.com>]
* Re: not enough operational mirrors [not found] ` <CANs+QMwUWZ+z0Kk-voHRLZaherOh8K8o_gCXVvw7nnXYT_goUg@mail.gmail.com> @ 2014-09-22 23:09 ` Ian Young 0 siblings, 0 replies; 8+ messages in thread From: Ian Young @ 2014-09-22 23:09 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Oops, I meant to say the error I get when trying to mount /srv is this: root@dtla:~# mount /srv mount: /dev/mapper/vg_raid10-srv: can't read superblock Aren't there other copies of the superblock? I'm not sure how it works with LVM. On Mon, Sep 22, 2014 at 10:17 AM, Ian Young <ian@duffrecords.com> wrote: > I forced the three good disks and the one that was behind by two events to > assemble: > > mdadm --assemble --force /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sde2 > > Then I added the other two disks and let it sync overnight: > > mdadm --add --force /dev/md0 /dev/sdd2 > mdadm --add --force /dev/md0 /dev/sdf2 > > I rebooted the system in recovery mode and the root filesystem is back! > However, / is read-only and my /srv partition, which is the largest and has > most of my data, can't mount. When I try to examine the array, it says "no > md superblock detected on /dev/md0." On top of the software RAID, I have > four logical volumes. Here is the full LVM configuration: > > http://pastebin.com/gzdZq5DL > > How do I recover the superblock? > > On Sun, Sep 21, 2014 at 10:47 PM, NeilBrown <neilb@suse.de> wrote: >> >> On Sun, 21 Sep 2014 22:32:19 -0700 Ian Young <ian@duffrecords.com> wrote: >> >> > My 6-drive software RAID 10 array failed. The individual drives >> > failed one at a time over the past few months but it's been an >> > extremely busy summer and I didn't have the free time to RMA the >> > drives and rebuild the array. Now I'm wishing I had acted sooner >> > because three of the drives are marked as removed and the array >> > doesn't have enough mirrors to start. I followed the recovery >> > instructions at raid.wiki.kernel.org and, before making things any >> > worse, saved the status using mdadm --examine and consulted this >> > mailing list. Here's the status: >> > >> > http://pastebin.com/KkV8e8Gq >> > >> > I can see that the event counts on sdd2 and sdf2 are significantly far >> > behind, so we can consider that data too old. sdc2 is only behind by >> > two events, so any data loss there should be minimal. If I can make >> > the array start with sd[abce]2 I think that will be enough to mount >> > the filesystem, back up my data, and start replacing drives. How do I >> > do that? >> >> Use the "--force" option with "--assemble". >> >> NeilBrown > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-10-05 21:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-22 5:32 not enough operational mirrors Ian Young
2014-09-22 5:47 ` NeilBrown
2014-09-22 17:17 ` Ian Young
2014-09-22 23:53 ` NeilBrown
2014-09-23 0:55 ` Ian Young
2014-09-23 17:07 ` Ian Young
2014-10-05 21:43 ` Ian Young
[not found] ` <CANs+QMwUWZ+z0Kk-voHRLZaherOh8K8o_gCXVvw7nnXYT_goUg@mail.gmail.com>
2014-09-22 23:09 ` Ian Young
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).