* Raid1 recovery problem - need help!
@ 2003-09-26 5:57 Tomi Orava
2003-09-26 6:13 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26 5:57 UTC (permalink / raw)
To: linux-raid
Hi everybody,
I have a slight problem here with my RAID 1+0 setup.
One of the discs just failed and I'm unable to figure
out why (for example) mdadm is unable to start the
particular RAID1 (/dev/md4-submirror, failed disc is /dev/sdd).
The setup is this:
------- /etc/raidtab -----------------------------------
##
## Data-Disk - Part 1
##
raiddev /dev/md3
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
## device /dev/hdk2
device /dev/hde2
raid-disk 0
## device /dev/hdm2
device /dev/hdg2
raid-disk 1
## failed-disk 1
##
## Data-Disk - Part 2
##
raiddev /dev/md4
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
## device /dev/hdi2
device /dev/sdd2
raid-disk 0
## failed-disk 1
## device /dev/hdg2
device /dev/sdc2
raid-disk 1
## failed-disk 1
##
## Data-Disk - Raid 1+0
##
raiddev /dev/md5
raid-level 0
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/md3
raid-disk 0
device /dev/md4
raid-disk 1
--------------------------------------------------------------
mdadm --detail /dev/md3
/dev/md3:
Version : 00.90.00
Creation Time : Mon Sep 16 23:33:02 2002
Raid Level : raid1
Array Size : 58616640 (55.90 GiB 60.02 GB)
Device Size : 58616640 (55.90 GiB 60.02 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Fri Sep 26 08:06:35 2003
State : dirty, no-errors
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Number Major Minor RaidDevice State
0 0 0 0 faulty removed /dev/swap
1 33 2 1 active sync /dev/hde2
2 34 2 2 spare /dev/hdg2
UUID : fb702fa9:a4c55dcf:87cccecb:31f1e9ba
Events : 0.465
mdadm --examine /dev/sdc2
/dev/sdc2:
Magic : a92b4efc
Version : 00.90.00
UUID : d5c2d1f7:cf7cb245:2d9e7057:097ec133
Creation Time : Mon Sep 16 22:05:35 2002
Raid Level : raid1
Device Size : 60034880 (57.25 GiB 61.48 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 4
Update Time : Thu Mar 13 05:03:45 2003
State : dirty, no-errors
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Checksum : 512475a - correct
Events : 0.423
Number Major Minor RaidDevice State
this 2 8 50 2 spare /dev/sdd2
0 0 0 0 0 faulty removed /dev/swap
1 1 8 34 1 active sync /dev/sdc2
2 2 8 50 2 spare /dev/sdd2
-------------------------------------------------------
>mdadm -v -R /dev/md4
mdadm: failed to run array /dev/md4: No such device
The problem occurred while resynccing the /dev/md4 RAID1-mirror
and of course the "master" device was the faulty one.
The question now is, is someone capable of describing how to get
this /dev/md4-mirror started so that I can continue copying data
out of RAID1+0 (dev/md5) mirror ?
Regards,
Tomi Orava
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 5:57 Raid1 recovery problem - need help! Tomi Orava
@ 2003-09-26 6:13 ` Neil Brown
2003-09-26 6:25 ` Tomi Orava
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2003-09-26 6:13 UTC (permalink / raw)
To: Tomi Orava; +Cc: linux-raid
On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
>
> Hi everybody,
>
> I have a slight problem here with my RAID 1+0 setup.
> One of the discs just failed and I'm unable to figure
> out why (for example) mdadm is unable to start the
> particular RAID1 (/dev/md4-submirror, failed disc is /dev/sdd).
,,,
>
> The problem occurred while resynccing the /dev/md4 RAID1-mirror
> and of course the "master" device was the faulty one.
> The question now is, is someone capable of describing how to get
> this /dev/md4-mirror started so that I can continue copying data
> out of RAID1+0 (dev/md5) mirror ?
You need to assemble the array before it can run, so maybe:
mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133 /dev/sd*
then
mdadm --assemble /dev/md5 /dev/md3 /dev/md4
If now, what errors do you get, and what does "cat /proc/mdstat" show?
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 6:13 ` Neil Brown
@ 2003-09-26 6:25 ` Tomi Orava
2003-09-26 6:45 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26 6:25 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil,
Thank you very much for your fast reply.
>> The problem occurred while resynccing the /dev/md4 RAID1-mirror
>> and of course the "master" device was the faulty one.
>> The question now is, is someone capable of describing how to get
>> this /dev/md4-mirror started so that I can continue copying data
>> out of RAID1+0 (dev/md5) mirror ?
>
> You need to assemble the array before it can run, so maybe:
>
> mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133
> /dev/sd*
The command: mdadm --assemble /dev/md4
--uuid=d5c2d1f7:cf7cb245:2d9e7057:097ec133 /dev/sdc2
Results:
mdadm: /dev/md4 assembled from 0 drives and 1 spare - not enough to start
the array.
/proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md4 : inactive sdc2[2]
0 blocks
The thing is that I have no spare-discs available in my system and
that "spare" disc _is_ the one that has always been the second mirror
of /dev/md4.
> then
> mdadm --assemble /dev/md5 /dev/md3 /dev/md4
Regards,
Tomi Orava
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 6:25 ` Tomi Orava
@ 2003-09-26 6:45 ` Neil Brown
2003-09-26 7:10 ` Tomi Orava
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2003-09-26 6:45 UTC (permalink / raw)
To: Tomi Orava; +Cc: linux-raid
On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
>
> Neil,
>
> Thank you very much for your fast reply.
>
> >> The problem occurred while resynccing the /dev/md4 RAID1-mirror
> >> and of course the "master" device was the faulty one.
> >> The question now is, is someone capable of describing how to get
> >> this /dev/md4-mirror started so that I can continue copying data
> >> out of RAID1+0 (dev/md5) mirror ?
> >
> > You need to assemble the array before it can run, so maybe:
> >
> > mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133
> > /dev/sd*
>
> The command: mdadm --assemble /dev/md4
> --uuid=d5c2d1f7:cf7cb245:2d9e7057:097ec133 /dev/sdc2
>
> Results:
> mdadm: /dev/md4 assembled from 0 drives and 1 spare - not enough to start
> the array.
>
> /proc/mdstat:
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md4 : inactive sdc2[2]
> 0 blocks
>
> The thing is that I have no spare-discs available in my system and
> that "spare" disc _is_ the one that has always been the second mirror
> of /dev/md4.
>
It looks like maybe it was /dev/sdc that failed and when you rebooted
without it, the old sdd was renamed to sdc.
In any case, if the failed drive is really dead, the best you can hope
for it
mdadm -C /dev/md1 -l 1 -n 2 /dev/sdc2 missing
and hope the data on it is recent enough.
Note: this will not change the data on sdc2. It will only change the
superblock and allow you to access what is there.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 6:45 ` Neil Brown
@ 2003-09-26 7:10 ` Tomi Orava
2003-09-26 7:45 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26 7:10 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
>> /proc/mdstat:
>> Personalities : [linear] [raid0] [raid1] [raid5]
>> read_ahead 1024 sectors
>> md4 : inactive sdc2[2]
>> blocks
>>
>> The thing is that I have no spare-discs available in my system and
>> that "spare" disc _is_ the one that has always been the second mirror
>> of /dev/md4.
>>
>
> It looks like maybe it was /dev/sdc that failed and when you rebooted
> without it, the old sdd was renamed to sdc.
>
> In any case, if the failed drive is really dead, the best you can hope
> for it
> mdadm -C /dev/md1 -l 1 -n 2 /dev/sdc2 missing
> and hope the data on it is recent enough.
> Note: this will not change the data on sdc2. It will only change the
> superblock and allow you to access what is there.
Ok, now the status is after the following command:
>mdadm -v --assemble /dev/md5 /dev/md3 /dev/md4
mdadm: looking for devices for /dev/md5
mdadm: /dev/md3 is identified as a member of /dev/md5, slot 0.
mdadm: /dev/md4 is identified as a member of /dev/md5, slot 1.
mdadm: added /dev/md4 to /dev/md5 as 1
mdadm: added /dev/md3 to /dev/md5 as 0
mdadm: /dev/md5 assembled from 1 drive - not enough to start the array.
/proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md5 : inactive md3[0] md4[1]
0 blocks
md4 : active raid1 sdc2[0]
60034880 blocks [2/1] [U_]
md3 : active raid1 hdg2[2] hde2[1]
58616640 blocks [2/1] [_U]
Any ideas ? :)
Regards,
Tomi Orava
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 7:10 ` Tomi Orava
@ 2003-09-26 7:45 ` Neil Brown
2003-09-26 8:14 ` Tomi Orava
2003-10-09 1:36 ` raid resync Bo Moon
0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2003-09-26 7:45 UTC (permalink / raw)
To: Tomi Orava; +Cc: linux-raid
On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
> > superblock and allow you to access what is there.
>
> Ok, now the status is after the following command:
>
> >mdadm -v --assemble /dev/md5 /dev/md3 /dev/md4
>
> mdadm: looking for devices for /dev/md5
> mdadm: /dev/md3 is identified as a member of /dev/md5, slot 0.
> mdadm: /dev/md4 is identified as a member of /dev/md5, slot 1.
> mdadm: added /dev/md4 to /dev/md5 as 1
> mdadm: added /dev/md3 to /dev/md5 as 0
> mdadm: /dev/md5 assembled from 1 drive - not enough to start the array.
>
Hmm. my guess is that if you "mdadm -E" /dev/md3 and /dev/md4, you
will find that the event numbers a different. i.e. it things /dev/md4
is too out-of-date.
Presumably the mirror halves fell out-of-sync some time ago and for
some reason never rebuild properly.
You could try
mdadm --assemble --force /dev/md5 /dev/md3 /dev/md4
It should assemble successfully, but I make not promises about what
sort of data will be on them.
NeilBrown
>
> /proc/mdstat:
>
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md5 : inactive md3[0] md4[1]
> 0 blocks
>
> md4 : active raid1 sdc2[0]
> 60034880 blocks [2/1] [U_]
>
> md3 : active raid1 hdg2[2] hde2[1]
> 58616640 blocks [2/1] [_U]
>
>
> Any ideas ? :)
>
> Regards,
> Tomi Orava
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 recovery problem - need help!
2003-09-26 7:45 ` Neil Brown
@ 2003-09-26 8:14 ` Tomi Orava
2003-10-09 1:36 ` raid resync Bo Moon
1 sibling, 0 replies; 9+ messages in thread
From: Tomi Orava @ 2003-09-26 8:14 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil,
Thank you, the last command was able to get the RAID1+0 running
as needed. Unfortunately, the re-assembled disc is not functional.
However, I still wonder if I have made a mistake in figuring out the
proper failed disc (based on your comments) ... Therefore, I'll have
to switch the discs as soon as I get back to home and check if I can
get anything else from the raid-device with your incstructions
(as neither of the discs are absolutely dead, one of them just
hangs the machine completely sooner or later ie in 5mins or 1h).
I'm also wondering, how it can be that a single IDE-disc (in this
case IBM 60GXP or 75GXP don't know which one for sure) is able to
halt the whole machine ... I have 6 discs connected to that server
and they are all the only drives in their own channel (no slaves).
Four of the drives are connected to HPT374-chip and two more to
Sil680 PCI-card. I don't really expect any solution for this question,
I'm more or less just wondering if anyone else has had similar
problems with their hardware ?
> It should assemble successfully, but I make not promises about what
> sort of data will be on them.
Thank you for your time Neil!
Sincerely,
Tomi Orava
PS. I wonder if there exists any problem solving examples about real life
cases in any of the docs (HOWTO, man-page etc).
Perhaps it might be helpful to some one if I just gather up
the mails related to this case and put them on the web for example ...
even though every problem has little bit different environment &
setup, but it might still help in deciding which commands to run
and which ones to avoid (in order to not lose data by mistake).
^ permalink raw reply [flat|nested] 9+ messages in thread
* raid resync
2003-09-26 7:45 ` Neil Brown
2003-09-26 8:14 ` Tomi Orava
@ 2003-10-09 1:36 ` Bo Moon
2003-10-13 1:31 ` Neil Brown
1 sibling, 1 reply; 9+ messages in thread
From: Bo Moon @ 2003-10-09 1:36 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil,
From md.c, there must be some changes or improvement to RAID1&5 resync code,
which enabled the request based resynchronization.
Does it mean we could enable or disable this resync?
If so, how could we make it(where in the code)?
Why do we have this feature the disk will be under no integrity if we
disable this resync?
Thanks in advance,
Bo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid resync
2003-10-09 1:36 ` raid resync Bo Moon
@ 2003-10-13 1:31 ` Neil Brown
0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2003-10-13 1:31 UTC (permalink / raw)
To: Bo Moon; +Cc: linux-raid
On Wednesday October 8, bo@anthologysolutions.com wrote:
> Neil,
>
> >From md.c, there must be some changes or improvement to RAID1&5 resync code,
> which enabled the request based resynchronization.
I think you are referring to the comment in the top of md.c and I
think you are misunderstanding what is says. Just read it as "I
changed some stuff so it worked better", or simply ignore it.
Does that help?
NeilBrown
>
> Does it mean we could enable or disable this resync?
> If so, how could we make it(where in the code)?
>
> Why do we have this feature the disk will be under no integrity if we
> disable this resync?
>
> Thanks in advance,
>
> Bo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-10-13 1:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-26 5:57 Raid1 recovery problem - need help! Tomi Orava
2003-09-26 6:13 ` Neil Brown
2003-09-26 6:25 ` Tomi Orava
2003-09-26 6:45 ` Neil Brown
2003-09-26 7:10 ` Tomi Orava
2003-09-26 7:45 ` Neil Brown
2003-09-26 8:14 ` Tomi Orava
2003-10-09 1:36 ` raid resync Bo Moon
2003-10-13 1:31 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).