RAID reconstruction problems

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID reconstruction problems
@ 2003-09-02  3:04 Michael Welsh Duggan
  2003-09-02 16:04 ` Bernd Schubert
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Michael Welsh Duggan @ 2003-09-02  3:04 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

I currently have two small Software RAIDs, a RAID 1 for my root
partition, and a RAID 5 for my usr partition.  One of the disks in the
arrays died, and I threw in a new disk in with the intention of
rebuilding the arrays.

The rebuilds failed, but in an extremely strange fashion.  Monitoring
/proc/mdstat, it seems that the rebuilds are going just fine.  When
they finish however, /proc/mdstat includes the new disk, but also
declares it invalid.  The system continues running in degraded mode.

When I run this from the root console, I get some messages from the
raid subsystem, including full debugging output.  I have not yet
figured out how to capture this output in order to include in this
message, but I did write down a part of one attempt (this was by hand,
so there may be small inconsistancies):

RAID5 conf printout
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
md: bug in file raid5.c, line 1901

Here is some output from my system.  If any more information would be
useful, or anyone thinks I should try something else, please let me
know.  I would like to get out of my currently degraded state!


[-- Attachment #2: Output from my system --]
[-- Type: text/plain, Size: 7655 bytes --]

maru:/# uname -a
Linux maru 2.4.21 #3 Fri Aug 29 13:14:01 EDT 2003 i686 GNU/Linux
maru:/# cat ~md5i/dmesg-raid
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs     :  1841.200 MB/sec
   32regs    :   935.600 MB/sec
   pIII_sse  :  2052.000 MB/sec
   pII_mmx   :  2247.600 MB/sec
   p5_mmx    :  2383.200 MB/sec
raid5: using function: pIII_sse (2052.000 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
 [events: 00000198]
 [events: 00000008]
 [events: 00000196]
 [events: 000000f3]
 [events: 00000086]
 [events: 00000008]
md: autorun ...
md: considering ide/host2/bus1/target0/lun0/part3 ...
md:  adding ide/host2/bus1/target0/lun0/part3 ...
md:  adding ide/host0/bus0/target1/lun0/part3 ...
md: created md0
md: bind<ide/host0/bus0/target1/lun0/part3,1>
md: bind<ide/host2/bus1/target0/lun0/part3,2>
md: running: <ide/host2/bus1/target0/lun0/part3><ide/host0/bus0/target1/lun0/part3>
md: ide/host2/bus1/target0/lun0/part3's event counter: 00000008
md: ide/host0/bus0/target1/lun0/part3's event counter: 00000008
md0: max total readahead window set to 496k
md0: 2 data-disks, max readahead per data-disk: 248k
raid5: device ide/host2/bus1/target0/lun0/part3 operational as raid disk 2
raid5: device ide/host0/bus0/target1/lun0/part3 operational as raid disk 0
raid5: md0, not all disks are operational -- trying to recover array
raid5: allocated 3284kB for md0
raid5: raid level 5 set md0 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
 disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
md: updating md0 RAID superblock on device
md: ide/host2/bus1/target0/lun0/part3 [events: 00000009]<6>(write) ide/host2/bus1/target0/lun0/part3's sb offset: 53640960
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ide/host0/bus0/target1/lun0/part3 [events: 00000009]<6>(write) ide/host0/bus0/target1/lun0/part3's sb offset: 53616832
md: considering ide/host2/bus1/target0/lun0/part1 ...
md:  adding ide/host2/bus1/target0/lun0/part1 ...
md:  adding ide/host0/bus1/target0/lun0/part1 ...
md:  adding ide/host0/bus0/target1/lun0/part1 ...
md: created md1
md: bind<ide/host0/bus0/target1/lun0/part1,1>
md: bind<ide/host0/bus1/target0/lun0/part1,2>
md: bind<ide/host2/bus1/target0/lun0/part1,3>
md: running: <ide/host2/bus1/target0/lun0/part1><ide/host0/bus1/target0/lun0/part1><ide/host0/bus0/target1/lun0/part1>
md: ide/host2/bus1/target0/lun0/part1's event counter: 00000086
md: ide/host0/bus1/target0/lun0/part1's event counter: 00000196
md: ide/host0/bus0/target1/lun0/part1's event counter: 00000198
md: superblock update time inconsistency -- using the most recent one
md: freshest: ide/host0/bus0/target1/lun0/part1
md: kicking non-fresh ide/host2/bus1/target0/lun0/part1 from array!
md: unbind<ide/host2/bus1/target0/lun0/part1,2>
md: export_rdev(ide/host2/bus1/target0/lun0/part1)
md: kicking non-fresh ide/host0/bus1/target0/lun0/part1 from array!
md: unbind<ide/host0/bus1/target0/lun0/part1,1>
md: export_rdev(ide/host0/bus1/target0/lun0/part1)
md1: removing former faulty ide/host0/bus1/target0/lun0/part1!
md: RAID level 1 does not need chunksize! Continuing anyway.
md1: max total readahead window set to 124k
md1: 1 data-disks, max readahead per data-disk: 124k
raid1: device ide/host0/bus0/target1/lun0/part1 operational as mirror 0
raid1: md1, not all disks are operational -- trying to recover array
raid1: raid set md1 active with 1 out of 2 mirrors
md: updating md1 RAID superblock on device
md: ide/host0/bus0/target1/lun0/part1 [events: 00000199]<6>(write) ide/host0/bus0/target1/lun0/part1's sb offset: 6144704
md: recovery thread got woken up ...
md1: no spare disk to reconstruct array! -- continuing in degraded mode
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: considering ide/host0/bus1/target0/lun0/part3 ...
md:  adding ide/host0/bus1/target0/lun0/part3 ...
md: md0 already running, cannot run ide/host0/bus1/target0/lun0/part3
md: export_rdev(ide/host0/bus1/target0/lun0/part3)
md: (ide/host0/bus1/target0/lun0/part3 was pending)
md: ... autorun DONE.
maru:/# cat /proc/mdstat 
Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md1 : active raid1 ide/host0/bus0/target1/lun0/part1[0]
      6144704 blocks [2/1] [U_]
      
md0 : active raid5 ide/host2/bus1/target0/lun0/part3[2] ide/host0/bus0/target1/lun0/part3[0]
      107233664 blocks level 5, 32k chunk, algorithm 2 [3/2] [U_U]
      
unused devices: <none>
maru:/# lsraid -A -a /dev/md0
[dev   9,   0] /dev/md0         94BF0D82.2B9C1BFB.89401B38.92B8F93B online
[dev   3,  67] /dev/ide/host0/bus0/target1/lun0/part3 94BF0D82.2B9C1BFB.89401B38.92B8F93B good
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing
[dev  34,   3] /dev/ide/host2/bus1/target0/lun0/part3 94BF0D82.2B9C1BFB.89401B38.92B8F93B good

maru:/# lsraid -A -a /dev/md1
[dev   9,   1] /dev/md1         0E953226.03C91D46.CD00D52F.83A1334E online
[dev   3,  65] /dev/ide/host0/bus0/target1/lun0/part1 0E953226.03C91D46.CD00D52F.83A1334E good
[dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000 missing

maru:/# cat /etc/raidtab
raiddev /dev/md0
        raid-level      5
        nr-raid-disks   3
        nr-spare-disks  0
        persistent-superblock 1
        parity-algorithm        left-symmetric
        chunk-size      32
        device          /dev/hdb3
        raid-disk       0
        device          /dev/hdc3
        raid-disk       1
        device		/dev/hdg3
        raid-disk       2

raiddev /dev/md1
        raid-level      1
        nr-raid-disks   2
        nr-spare-disks  1
        persistent-superblock 1
        chunk-size      4
        device          /dev/hdb1
        raid-disk       0
        device          /dev/hdc1
        raid-disk       1
	device		/dev/hdg1
	spare-disk	0
maru:/# ls -l /dev/hdb1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdb1 -> ide/host0/bus0/target1/lun0/part1
maru:/# ls -l /dev/hdc1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdc1 -> ide/host0/bus1/target0/lun0/part1
maru:/# ls -l /dev/hdg1
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdg1 -> ide/host2/bus1/target0/lun0/part1
maru:/# ls -l /dev/hdb3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdb3 -> ide/host0/bus0/target1/lun0/part3
maru:/# ls -l /dev/hdc3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdc3 -> ide/host0/bus1/target0/lun0/part3
maru:/# ls -l /dev/hdg3
lr-xr-xr-x    1 root     root           33 Sep  1 18:29 /dev/hdg3 -> ide/host2/bus1/target0/lun0/part3
maru:/# raidhotadd /dev/md1 /dev/hdc1
maru:/# echo Waited for some time...
Waited for some time...
maru:/# cat /proc/mdstat
Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md1 : active raid1 ide/host0/bus1/target0/lun0/part1[2] ide/host0/bus0/target1/lun0/part1[0]
      6144704 blocks [2/1] [U_]
      
md0 : active raid5 ide/host2/bus1/target0/lun0/part3[2] ide/host0/bus0/target1/lun0/part3[0]
      107233664 blocks level 5, 32k chunk, algorithm 2 [3/2] [U_U]
      
unused devices: <none>
maru:/# 

[-- Attachment #3: Type: text/plain, Size: 44 bytes --]


-- 
Michael Welsh Duggan
(md5i@cs.cmu.edu)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID reconstruction problems
  2003-09-02  3:04 RAID reconstruction problems Michael Welsh Duggan
@ 2003-09-02 16:04 ` Bernd Schubert
  2003-09-02 18:29 ` Donghui Wen
  2003-09-03  1:51 ` Michael Welsh Duggan
  2 siblings, 0 replies; 5+ messages in thread
From: Bernd Schubert @ 2003-09-02 16:04 UTC (permalink / raw)
  To: Michael Duggan; +Cc: linux-raid

On Tuesday 02 September 2003 05:04, Michael Welsh Duggan wrote:
> I currently have two small Software RAIDs, a RAID 1 for my root
> partition, and a RAID 5 for my usr partition.  One of the disks in the
> arrays died, and I threw in a new disk in with the intention of
> rebuilding the arrays.
>
> The rebuilds failed, but in an extremely strange fashion.  Monitoring
> /proc/mdstat, it seems that the rebuilds are going just fine.  When
> they finish however, /proc/mdstat includes the new disk, but also
> declares it invalid.  The system continues running in degraded mode.
>
> When I run this from the root console, I get some messages from the
> raid subsystem, including full debugging output.  I have not yet
> figured out how to capture this output in order to include in this
> message, but I did write down a part of one attempt (this was by hand,
> so there may be small inconsistancies):
>
> RAID5 conf printout
>  --- rd:3 wd:2 fd:1
>  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
>  disk 1, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
>  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
> md: bug in file raid5.c, line 1901
>
> Here is some output from my system.  If any more information would be
> useful, or anyone thinks I should try something else, please let me
> know.  I would like to get out of my currently degraded state!

Hi,

Neil has posted several mdadm commands for similar problems like this. So I 
think you should install mdadm, search in the list-archive for problems like 
this and try to get a working array using mdadm.

Regards,
	Bernd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID reconstruction problems
  2003-09-02  3:04 RAID reconstruction problems Michael Welsh Duggan
  2003-09-02 16:04 ` Bernd Schubert
@ 2003-09-02 18:29 ` Donghui Wen
  2003-09-03  1:51 ` Michael Welsh Duggan
  2 siblings, 0 replies; 5+ messages in thread
From: Donghui Wen @ 2003-09-02 18:29 UTC (permalink / raw)
  To: Michael Duggan, linux-raid

Have you partitioned the new disk before you rebuild?

Donghui

----- Original Message -----
From: "Michael Welsh Duggan" <md5i@cs.cmu.edu>
To: <linux-raid@vger.kernel.org>
Sent: Monday, September 01, 2003 8:04 PM
Subject: RAID reconstruction problems


> I currently have two small Software RAIDs, a RAID 1 for my root
> partition, and a RAID 5 for my usr partition.  One of the disks in the
> arrays died, and I threw in a new disk in with the intention of
> rebuilding the arrays.
>
> The rebuilds failed, but in an extremely strange fashion.  Monitoring
> /proc/mdstat, it seems that the rebuilds are going just fine.  When
> they finish however, /proc/mdstat includes the new disk, but also
> declares it invalid.  The system continues running in degraded mode.
>
> When I run this from the root console, I get some messages from the
> raid subsystem, including full debugging output.  I have not yet
> figured out how to capture this output in order to include in this
> message, but I did write down a part of one attempt (this was by hand,
> so there may be small inconsistancies):
>
> RAID5 conf printout
>  --- rd:3 wd:2 fd:1
>  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
>  disk 1, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
>  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
> md: bug in file raid5.c, line 1901
>
> Here is some output from my system.  If any more information would be
> useful, or anyone thinks I should try something else, please let me
> know.  I would like to get out of my currently degraded state!
>
>


----------------------------------------------------------------------------
----


>
> --
> Michael Welsh Duggan
> (md5i@cs.cmu.edu)
>

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Have you partitioned the new disk before you rebuild?

Donghui

----- Original Message -----
From: "Michael Welsh Duggan" <md5i@cs.cmu.edu>
To: <linux-raid@vger.kernel.org>
Sent: Monday, September 01, 2003 8:04 PM
Subject: RAID reconstruction problems


> I currently have two small Software RAIDs, a RAID 1 for my root
> partition, and a RAID 5 for my usr partition.  One of the disks in the
> arrays died, and I threw in a new disk in with the intention of
> rebuilding the arrays.
>
> The rebuilds failed, but in an extremely strange fashion.  Monitoring
> /proc/mdstat, it seems that the rebuilds are going just fine.  When
> they finish however, /proc/mdstat includes the new disk, but also
> declares it invalid.  The system continues running in degraded mode.
>
> When I run this from the root console, I get some messages from the
> raid subsystem, including full debugging output.  I have not yet
> figured out how to capture this output in order to include in this
> message, but I did write down a part of one attempt (this was by hand,
> so there may be small inconsistancies):
>
> RAID5 conf printout
>  --- rd:3 wd:2 fd:1
>  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:ide/host0/bus0/target1/lun0/part3
>  disk 1, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
>  disk 2, s:0, o:1, n:2 rd:2 us:1 dev:ide/host2/bus1/target0/lun0/part3
> md: bug in file raid5.c, line 1901
>
> Here is some output from my system.  If any more information would be
> useful, or anyone thinks I should try something else, please let me
> know.  I would like to get out of my currently degraded state!
>
>


----------------------------------------------------------------------------
----


>
> --
> Michael Welsh Duggan
> (md5i@cs.cmu.edu)
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID reconstruction problems
  2003-09-02  3:04 RAID reconstruction problems Michael Welsh Duggan
  2003-09-02 16:04 ` Bernd Schubert
  2003-09-02 18:29 ` Donghui Wen
@ 2003-09-03  1:51 ` Michael Welsh Duggan
  2003-09-04 17:07   ` Bernd Schubert
  2 siblings, 1 reply; 5+ messages in thread
From: Michael Welsh Duggan @ 2003-09-03  1:51 UTC (permalink / raw)
  To: linux-raid

Sorry about putting the wrong email address on my initial email.  

In reply to Donghui Wen, Yes, it was partitioned correctly.

In reply to Bernd Schubert, I have installed mdadm, and am looking
around for some pointers.  I am unsure what is going on still.  mdadm
reports the follownig for /dev/md1:

/dev/md1:
        Version : 00.90.00
  Creation Time : Sun Mar 16 21:42:44 2003
     Raid Level : raid1
     Array Size : 6144704 (5.86 GiB 6.29 GB)
    Device Size : 6144704 (5.86 GiB 6.29 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Sep  1 19:48:11 2003
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       3       65        0      active sync       /dev/ide/host0/bus0/target1/lun0/part1
       1       0        0        0      sync
       2      22        1        2      active       /dev/ide/host0/bus1/target0/lun0/part1
           UUID : 0e953226:03c91d46:cd00d52f:83a1334e
         Events : 0.413


/proc/mdstat reports the following:

md1 : active raid1 ide/host0/bus1/target0/lun0/part1[2]
ide/host0/bus0/target1/lun0/part1[0]
      6144704 blocks [2/2] [U_]

I'll play around with it some more, but it anyone recognizes these
symptoms, please reply.

-- 
Michael Welsh Duggan
(mwd@cert.org)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID reconstruction problems
  2003-09-03  1:51 ` Michael Welsh Duggan
@ 2003-09-04 17:07   ` Bernd Schubert
  0 siblings, 0 replies; 5+ messages in thread
From: Bernd Schubert @ 2003-09-04 17:07 UTC (permalink / raw)
  To: linux-raid

On Wednesday 03 September 2003 03:51, Michael Welsh Duggan wrote:
> Sorry about putting the wrong email address on my initial email.
>
> In reply to Donghui Wen, Yes, it was partitioned correctly.
>
> In reply to Bernd Schubert, I have installed mdadm, and am looking
> around for some pointers.  I am unsure what is going on still.  mdadm
> reports the follownig for /dev/md1:
>
> /dev/md1:
>         Version : 00.90.00
>   Creation Time : Sun Mar 16 21:42:44 2003
>      Raid Level : raid1
>      Array Size : 6144704 (5.86 GiB 6.29 GB)
>     Device Size : 6144704 (5.86 GiB 6.29 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 1
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Sep  1 19:48:11 2003
>           State : dirty, no-errors
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>
>     Number   Major   Minor   RaidDevice State
>        0       3       65        0      active sync      
> /dev/ide/host0/bus0/target1/lun0/part1 1       0        0        0     
> sync
>        2      22        1        2      active      
> /dev/ide/host0/bus1/target0/lun0/part1 UUID :
> 0e953226:03c91d46:cd00d52f:83a1334e
>          Events : 0.413
>
>
> /proc/mdstat reports the following:
>
> md1 : active raid1 ide/host0/bus1/target0/lun0/part1[2]
> ide/host0/bus0/target1/lun0/part1[0]
>       6144704 blocks [2/2] [U_]
>
> I'll play around with it some more, but it anyone recognizes these
> symptoms, please reply.

Hello Michael,

perhaps the same issue that David Chow had a few weeks ago? See the attached 
mail, you will also find it in the archives.

Bernd

PS: I removed the attachment for the ML, as its anyway in the archives.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-09-04 17:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-02  3:04 RAID reconstruction problems Michael Welsh Duggan
2003-09-02 16:04 ` Bernd Schubert
2003-09-02 18:29 ` Donghui Wen
2003-09-03  1:51 ` Michael Welsh Duggan
2003-09-04 17:07   ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).