Raid1 recovery problem

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid1 recovery problem - need help!
@ 2003-09-26  5:57 Tomi Orava
  2003-09-26  6:13 ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26  5:57 UTC (permalink / raw)
  To: linux-raid


Hi everybody,

I have a slight problem here with my RAID 1+0 setup.
One of the discs just failed and I'm unable to figure
out why (for example) mdadm is unable to start the
particular RAID1 (/dev/md4-submirror, failed disc is /dev/sdd).

The setup is this:
------- /etc/raidtab -----------------------------------

##
## Data-Disk - Part 1
##
raiddev /dev/md3
        raid-level              1
        nr-raid-disks           2
        nr-spare-disks          0
        persistent-superblock   1

        ## device               /dev/hdk2
        device                  /dev/hde2
        raid-disk               0

        ## device               /dev/hdm2
        device                  /dev/hdg2
        raid-disk               1
        ## failed-disk          1


##
## Data-Disk - Part 2
##
raiddev /dev/md4
        raid-level              1
        nr-raid-disks           2
        nr-spare-disks          0
        persistent-superblock   1

        ## device               /dev/hdi2
        device                  /dev/sdd2
        raid-disk               0
        ## failed-disk          1

        ## device               /dev/hdg2
        device                  /dev/sdc2
        raid-disk               1
        ## failed-disk          1

##
## Data-Disk - Raid 1+0
##
raiddev /dev/md5
        raid-level              0
        nr-raid-disks           2
        nr-spare-disks          0
        persistent-superblock   1
        chunk-size              32

        device                  /dev/md3
        raid-disk               0

        device                  /dev/md4
        raid-disk               1
--------------------------------------------------------------

mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90.00
  Creation Time : Mon Sep 16 23:33:02 2002
     Raid Level : raid1
     Array Size : 58616640 (55.90 GiB 60.02 GB)
    Device Size : 58616640 (55.90 GiB 60.02 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Fri Sep 26 08:06:35 2003
          State : dirty, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1


    Number   Major   Minor   RaidDevice State
       0       0        0        0      faulty removed   /dev/swap
       1      33        2        1      active sync   /dev/hde2
       2      34        2        2      spare   /dev/hdg2
           UUID : fb702fa9:a4c55dcf:87cccecb:31f1e9ba
         Events : 0.465

mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : d5c2d1f7:cf7cb245:2d9e7057:097ec133
  Creation Time : Mon Sep 16 22:05:35 2002
     Raid Level : raid1
    Device Size : 60034880 (57.25 GiB 61.48 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 4

    Update Time : Thu Mar 13 05:03:45 2003
          State : dirty, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 512475a - correct
         Events : 0.423


      Number   Major   Minor   RaidDevice State
this     2       8       50        2      spare   /dev/sdd2
   0     0       0        0        0      faulty removed   /dev/swap
   1     1       8       34        1      active sync   /dev/sdc2
   2     2       8       50        2      spare   /dev/sdd2

-------------------------------------------------------
>mdadm -v -R /dev/md4
mdadm: failed to run array /dev/md4: No such device

The problem occurred while resynccing the /dev/md4 RAID1-mirror
and of course the "master" device was the faulty one.
The question now is, is someone capable of describing how to get
this /dev/md4-mirror started so that I can continue copying data
out of RAID1+0 (dev/md5) mirror ?

Regards,
Tomi Orava







^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  5:57 Raid1 recovery problem - need help! Tomi Orava
@ 2003-09-26  6:13 ` Neil Brown
  2003-09-26  6:25   ` Tomi Orava
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2003-09-26  6:13 UTC (permalink / raw)
  To: Tomi Orava; +Cc: linux-raid

On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
> 
> Hi everybody,
> 
> I have a slight problem here with my RAID 1+0 setup.
> One of the discs just failed and I'm unable to figure
> out why (for example) mdadm is unable to start the
> particular RAID1 (/dev/md4-submirror, failed disc is /dev/sdd).
,,,
> 
> The problem occurred while resynccing the /dev/md4 RAID1-mirror
> and of course the "master" device was the faulty one.
> The question now is, is someone capable of describing how to get
> this /dev/md4-mirror started so that I can continue copying data
> out of RAID1+0 (dev/md5) mirror ?

You need to assemble the array before it can run, so maybe:

   mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133  /dev/sd*

then
   mdadm --assemble /dev/md5 /dev/md3 /dev/md4

If now, what errors do you get, and what does "cat /proc/mdstat" show?

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  6:13 ` Neil Brown
@ 2003-09-26  6:25   ` Tomi Orava
  2003-09-26  6:45     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26  6:25 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


Neil,

Thank you very much for your fast reply.

>> The problem occurred while resynccing the /dev/md4 RAID1-mirror
>> and of course the "master" device was the faulty one.
>> The question now is, is someone capable of describing how to get
>> this /dev/md4-mirror started so that I can continue copying data
>> out of RAID1+0 (dev/md5) mirror ?
>
> You need to assemble the array before it can run, so maybe:
>
>    mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133
> /dev/sd*

The command: mdadm --assemble /dev/md4
--uuid=d5c2d1f7:cf7cb245:2d9e7057:097ec133 /dev/sdc2

Results:
mdadm: /dev/md4 assembled from 0 drives and 1 spare - not enough to start
the array.

/proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md4 : inactive sdc2[2]
      0 blocks

The thing is that I have no spare-discs available in my system and
that "spare" disc _is_ the one that has always been the second mirror
of /dev/md4.


> then
>    mdadm --assemble /dev/md5 /dev/md3 /dev/md4

Regards,
Tomi Orava

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  6:25   ` Tomi Orava
@ 2003-09-26  6:45     ` Neil Brown
  2003-09-26  7:10       ` Tomi Orava
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2003-09-26  6:45 UTC (permalink / raw)
  To: Tomi Orava; +Cc: linux-raid

On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
> 
> Neil,
> 
> Thank you very much for your fast reply.
> 
> >> The problem occurred while resynccing the /dev/md4 RAID1-mirror
> >> and of course the "master" device was the faulty one.
> >> The question now is, is someone capable of describing how to get
> >> this /dev/md4-mirror started so that I can continue copying data
> >> out of RAID1+0 (dev/md5) mirror ?
> >
> > You need to assemble the array before it can run, so maybe:
> >
> >    mdadm --assemble /dev/md4 --uid=d5c2d1f7:cf7cb245:2d9e7057:097ec133
> > /dev/sd*
> 
> The command: mdadm --assemble /dev/md4
> --uuid=d5c2d1f7:cf7cb245:2d9e7057:097ec133 /dev/sdc2
> 
> Results:
> mdadm: /dev/md4 assembled from 0 drives and 1 spare - not enough to start
> the array.
> 
> /proc/mdstat:
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md4 : inactive sdc2[2]
>       0 blocks
> 
> The thing is that I have no spare-discs available in my system and
> that "spare" disc _is_ the one that has always been the second mirror
> of /dev/md4.
> 

It looks like maybe it was /dev/sdc that failed and when you rebooted
without it, the old sdd was renamed to sdc.

In any case, if the failed drive is really dead, the best you can hope
for it
  mdadm -C /dev/md1 -l 1 -n 2 /dev/sdc2 missing
and hope the data on it is recent enough.
Note: this will not change the data on sdc2.  It will only change the
superblock and allow you to access what is there.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  6:45     ` Neil Brown
@ 2003-09-26  7:10       ` Tomi Orava
  2003-09-26  7:45         ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Tomi Orava @ 2003-09-26  7:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


>> /proc/mdstat:
>> Personalities : [linear] [raid0] [raid1] [raid5]
>> read_ahead 1024 sectors
>> md4 : inactive sdc2[2]
>>        blocks
>>
>> The thing is that I have no spare-discs available in my system and
>> that "spare" disc _is_ the one that has always been the second mirror
>> of /dev/md4.
>>
>
> It looks like maybe it was /dev/sdc that failed and when you rebooted
> without it, the old sdd was renamed to sdc.
>
> In any case, if the failed drive is really dead, the best you can hope
> for it
>   mdadm -C /dev/md1 -l 1 -n 2 /dev/sdc2 missing
> and hope the data on it is recent enough.
> Note: this will not change the data on sdc2.  It will only change the
> superblock and allow you to access what is there.

Ok, now the status is after the following command:

>mdadm -v --assemble /dev/md5 /dev/md3 /dev/md4

mdadm: looking for devices for /dev/md5
mdadm: /dev/md3 is identified as a member of /dev/md5, slot 0.
mdadm: /dev/md4 is identified as a member of /dev/md5, slot 1.
mdadm: added /dev/md4 to /dev/md5 as 1
mdadm: added /dev/md3 to /dev/md5 as 0
mdadm: /dev/md5 assembled from 1 drive - not enough to start the array.


/proc/mdstat:

Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md5 : inactive md3[0] md4[1]
      0 blocks

md4 : active raid1 sdc2[0]
      60034880 blocks [2/1] [U_]

md3 : active raid1 hdg2[2] hde2[1]
      58616640 blocks [2/1] [_U]


Any ideas ? :)

Regards,
Tomi Orava


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  7:10       ` Tomi Orava
@ 2003-09-26  7:45         ` Neil Brown
  2003-09-26  8:14           ` Tomi Orava
  2003-10-09  1:36           ` raid resync Bo Moon
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2003-09-26  7:45 UTC (permalink / raw)
  To: Tomi Orava; +Cc: linux-raid

On Friday September 26, tomimo+linux-raid@ncircle.nullnet.fi wrote:
> > superblock and allow you to access what is there.
> 
> Ok, now the status is after the following command:
> 
> >mdadm -v --assemble /dev/md5 /dev/md3 /dev/md4
> 
> mdadm: looking for devices for /dev/md5
> mdadm: /dev/md3 is identified as a member of /dev/md5, slot 0.
> mdadm: /dev/md4 is identified as a member of /dev/md5, slot 1.
> mdadm: added /dev/md4 to /dev/md5 as 1
> mdadm: added /dev/md3 to /dev/md5 as 0
> mdadm: /dev/md5 assembled from 1 drive - not enough to start the array.
> 

Hmm. my guess is that if you "mdadm -E" /dev/md3 and /dev/md4, you
will find that the event numbers a different.  i.e. it things /dev/md4
is too out-of-date.
Presumably the mirror halves fell out-of-sync some time ago and for
some reason never rebuild properly.

You could try
   mdadm --assemble --force /dev/md5 /dev/md3 /dev/md4

It should assemble successfully, but I make not promises about what
sort of data will be on them.

NeilBrown


> 
> /proc/mdstat:
> 
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md5 : inactive md3[0] md4[1]
>       0 blocks
> 
> md4 : active raid1 sdc2[0]
>       60034880 blocks [2/1] [U_]
> 
> md3 : active raid1 hdg2[2] hde2[1]
>       58616640 blocks [2/1] [_U]
> 
> 
> Any ideas ? :)
> 
> Regards,
> Tomi Orava
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 recovery problem - need help!
  2003-09-26  7:45         ` Neil Brown
@ 2003-09-26  8:14           ` Tomi Orava
  2003-10-09  1:36           ` raid resync Bo Moon
  1 sibling, 0 replies; 9+ messages in thread
From: Tomi Orava @ 2003-09-26  8:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil,

Thank you, the last command was able to get the RAID1+0 running
as needed. Unfortunately, the re-assembled disc is not functional.
However, I still wonder if I have made a mistake in figuring out the
proper failed disc (based on your comments) ... Therefore, I'll have
to switch the discs as soon as I get back to home and check if I can
get anything else from the raid-device with your incstructions
(as neither of the discs are absolutely dead, one of them just
hangs the machine completely sooner or later ie in 5mins or 1h).

I'm also wondering, how it can be that a single IDE-disc (in this
case IBM 60GXP or 75GXP don't know which one for sure) is able to
halt the whole machine ... I have 6 discs connected to that server
and they are all the only drives in their own channel (no slaves).
Four of the drives are connected to HPT374-chip and two more to
Sil680 PCI-card. I don't really expect any solution for this question,
I'm more or less just wondering if anyone else has had similar
problems with their hardware ?

> It should assemble successfully, but I make not promises about what
> sort of data will be on them.

Thank you for your time Neil!

Sincerely,
Tomi Orava

PS. I wonder if there exists any problem solving examples about real life
    cases in any of the docs (HOWTO, man-page etc).
    Perhaps it might be helpful to some one if I just gather up
    the mails related to this case and put them on the web for example ...
    even though every problem has little bit different environment &
    setup, but it might still help in deciding which commands to run
    and which ones to avoid (in order to not lose data by mistake).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* raid resync
  2003-09-26  7:45         ` Neil Brown
  2003-09-26  8:14           ` Tomi Orava
@ 2003-10-09  1:36           ` Bo Moon
  2003-10-13  1:31             ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Bo Moon @ 2003-10-09  1:36 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil,

From md.c, there must be some changes or improvement to RAID1&5 resync code,
which enabled the request based resynchronization.

Does it mean we could enable or disable this resync?
If so, how could we make it(where in the code)?

Why do we have this feature the disk will be under no integrity if we
disable this resync?

Thanks in advance,

Bo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid resync
  2003-10-09  1:36           ` raid resync Bo Moon
@ 2003-10-13  1:31             ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2003-10-13  1:31 UTC (permalink / raw)
  To: Bo Moon; +Cc: linux-raid

On Wednesday October 8, bo@anthologysolutions.com wrote:
> Neil,
> 
> >From md.c, there must be some changes or improvement to RAID1&5 resync code,
> which enabled the request based resynchronization.

I think you are referring to the comment in the top of md.c and I
think you are misunderstanding what is says.  Just read it as "I
changed some stuff so it worked better", or simply ignore it.

Does that help?

NeilBrown

> 
> Does it mean we could enable or disable this resync?
> If so, how could we make it(where in the code)?
> 
> Why do we have this feature the disk will be under no integrity if we
> disable this resync?
> 
> Thanks in advance,
> 
> Bo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-10-13  1:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-26  5:57 Raid1 recovery problem - need help! Tomi Orava
2003-09-26  6:13 ` Neil Brown
2003-09-26  6:25   ` Tomi Orava
2003-09-26  6:45     ` Neil Brown
2003-09-26  7:10       ` Tomi Orava
2003-09-26  7:45         ` Neil Brown
2003-09-26  8:14           ` Tomi Orava
2003-10-09  1:36           ` raid resync Bo Moon
2003-10-13  1:31             ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).