* Need Help for Raid Recovery
@ 2017-09-25 7:53 Xuesong LU
2017-09-25 17:57 ` Wols Lists
0 siblings, 1 reply; 4+ messages in thread
From: Xuesong LU @ 2017-09-25 7:53 UTC (permalink / raw)
To: linux-raid
Hi,
One of the raid devices on my server could not assemble after server
reboot. The raid is level 6 assembled using partition sdg1~sdp1, 10
disks in total. Below is what I have done to the issue.
1) mdadm --examine /dev/sd[g-p]1. All event counts are the same.
2) mdadm --assemble /dev/md0 /dev/sd[g-p]1. It read 'md0 is started',
then the server rebooted itself. However, the server got stuck for one
hour at the centos 7 loading bar screen.
3) I forced shutdown and restarted the server. This time --examine said
'couldn't find superblocks on sdg1'. Also partition file sdk1, sdo1 and
sdp1 disappeared. I guess this is because I forced shutdown?
More details are here.
cat /proc/mdstat shows only md1, where is another raid device. However,
what I want to recover is /dev/md0.
mdadm --assemble /dev/md0 /dev/sd[g-p]1 says 'only six drives found' so
cannot assemble.
mdadm --examine /dev/sd[g-p] prints raid partition information for
sd[h,i,j,l,m,n]. No superblock detected on sd[g,k,o,p].
mdadm --examine /dev/sd[g-p]1 prints raid partition information for
sd[h,i,j,l,m,n]1. The event counts are the same. It also prints 'No
superblock detected' for sdg1, and 'no such files' for sd[k,o,p]1.
(Gone? Can still recover?)
I have UUID for md0 in the mdadm.conf file.
Is there still hope to recover my raid? Thank you!
--
Best,
Xuesong
________________________________
Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need Help for Raid Recovery
2017-09-25 7:53 Need Help for Raid Recovery Xuesong LU
@ 2017-09-25 17:57 ` Wols Lists
2017-09-28 5:31 ` Xuesong LU
0 siblings, 1 reply; 4+ messages in thread
From: Wols Lists @ 2017-09-25 17:57 UTC (permalink / raw)
To: Xuesong LU, linux-raid
On 25/09/17 08:53, Xuesong LU wrote:
> mdadm --examine /dev/sd[g-p] prints raid partition information for
> sd[h,i,j,l,m,n]. No superblock detected on sd[g,k,o,p].
>
> mdadm --examine /dev/sd[g-p]1 prints raid partition information for
> sd[h,i,j,l,m,n]1. The event counts are the same. It also prints 'No
> superblock detected' for sdg1, and 'no such files' for sd[k,o,p]1.
> (Gone? Can still recover?)
>
> I have UUID for md0 in the mdadm.conf file.
>
> Is there still hope to recover my raid? Thank you!
Go to the raid wiki, and post the requested info to the list.
https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
Make sure you're using the latest mdadm.
This looks a little familiar to me, but so long as nothing has written
to the disk, and/or the only thing that has been damaged is the
superblocks, this looks rather hopeful.
Bear in mind, though, if the superblocks ARE gone, then it's going to be
a hassle to get it back. A fair bit of trial and error is on the cards
(read up on overlays). But as long as nothing has actually stomped on
the data itself, you should be able to get it back.
Do an "fdisk -l" on k, o, p. I remember a problem a while back when
something stamped on the partition table ...
Cheers,
Wol
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need Help for Raid Recovery
2017-09-25 17:57 ` Wols Lists
@ 2017-09-28 5:31 ` Xuesong LU
2017-09-28 15:08 ` Wols Lists
0 siblings, 1 reply; 4+ messages in thread
From: Xuesong LU @ 2017-09-28 5:31 UTC (permalink / raw)
To: Wols Lists, linux-raid
Hi,
I got the problem solved with some luck.
I am not sure how the problem came out. The thing was that the server
suffered from a sudden power-off and went into the emergency mode
(centos 7) after power resumed. In the emergency mode, I could not
assemble md0 and also could not see some of the physical volumes (which
actually had made me panic), and the system could not enter the default
mode after reboot.
Then after a few tries I simply ran 'mount -a' as suggested on the web.
It said /dev/sdd1 could not be found, though I could see sdd1 by running
'ls /dev'. So I commented out sdd1 in /etc/fstab, and had the system
enter the default mode after rebooting twice. Surprisingly and
differently from what I saw under emergency mode, all the filesystems
including md0 were just fine, even if I had deleted the line about md0
array in mdadm.conf file.
I am not quite sure about what actually happened, but what I learned is
that the filesystems one can see under emergency mode are different from
that under default mode (probably because of the minimum load). So don't
panic if one finds some pvs, lvs or even raids disappear for no reason
under emergency mode. Try to find which filesystem is actually missing
(e.g., run 'mount -a') and comment it out in the fstab file. Then try to
boot into the default mode, and most probably, all files are just as
good as what they were.
Thank Wol and Hank anyway for the reply:)
Best,
Xuesong
On 9/26/2017 1:57 AM, Wols Lists wrote:
> On 25/09/17 08:53, Xuesong LU wrote:
>> mdadm --examine /dev/sd[g-p] prints raid partition information for
>> sd[h,i,j,l,m,n]. No superblock detected on sd[g,k,o,p].
>>
>> mdadm --examine /dev/sd[g-p]1 prints raid partition information for
>> sd[h,i,j,l,m,n]1. The event counts are the same. It also prints 'No
>> superblock detected' for sdg1, and 'no such files' for sd[k,o,p]1.
>> (Gone? Can still recover?)
>>
>> I have UUID for md0 in the mdadm.conf file.
>>
>> Is there still hope to recover my raid? Thank you!
> Go to the raid wiki, and post the requested info to the list.
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> Make sure you're using the latest mdadm.
>
> This looks a little familiar to me, but so long as nothing has written
> to the disk, and/or the only thing that has been damaged is the
> superblocks, this looks rather hopeful.
>
> Bear in mind, though, if the superblocks ARE gone, then it's going to be
> a hassle to get it back. A fair bit of trial and error is on the cards
> (read up on overlays). But as long as nothing has actually stomped on
> the data itself, you should be able to get it back.
>
> Do an "fdisk -l" on k, o, p. I remember a problem a while back when
> something stamped on the partition table ...
>
> Cheers,
> Wol
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
________________________________
Important: This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately; you should not copy or use it for any purpose, nor disclose its contents to any other person. Thank you.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Need Help for Raid Recovery
2017-09-28 5:31 ` Xuesong LU
@ 2017-09-28 15:08 ` Wols Lists
0 siblings, 0 replies; 4+ messages in thread
From: Wols Lists @ 2017-09-28 15:08 UTC (permalink / raw)
To: linux-raid
On 28/09/17 06:31, Xuesong LU wrote:
> Thank Wol and Hank anyway for the reply:)
This has now been written up on the raid wiki :-)
In the "When things go wrogn" section, I've added a new page "Easy
Fixes". It's meant for all those little things where the experienced
sysadmin will go "not again, click click click, fixed", while the newbie
will panic "My array's disappeared, help, the world's imploded".
At the end of the day, when things go wrong it usually is just a little
glitch that experience can solve in seconds, and this page is meant for
passing that knowledge on.
I've also added a section about making sure your controller card is
compatible with your drives :-) (ServeRAID-BR10i (LSI SAS1068E) strageness)
Cheers,
Wol
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-28 15:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-25 7:53 Need Help for Raid Recovery Xuesong LU
2017-09-25 17:57 ` Wols Lists
2017-09-28 5:31 ` Xuesong LU
2017-09-28 15:08 ` Wols Lists
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox