linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Recovering a raid5 array with strange event count
@ 2007-04-13 10:14 Chris Allen
  2007-04-13 12:13 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Allen @ 2007-04-13 10:14 UTC (permalink / raw)
  To: linux-raid

Dear All,

I have an 8-drive raid-5 array running under 2.6.11. This morning it 
bombed out, and when I brought
it up again, two drives had incorrect event counts:


sda1: 0.8258715
sdb1: 0.8258715
sdc1: 0.8258715
sdd1: 0.8258715
sde1: 0.8258715
sdf1: 0.8258715
sdg1: 0.8258708
sdh1: 0.8258716


sdg1 is out of date (expected), but sdh1 has received an extra event.

Any attempt to restart with mdadm --assemble --force, results in an an
un-startable array with an event count of 0.8258715.

Can anybody advise on the correct command to use to get it started again?
I'm assuming I'll need to use mdadm --create --assume-clean - but I'm 
not sure
which drives should be included/excluded when I do this.


Many thanks!


Chris Allen.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering a raid5 array with strange event count
  2007-04-13 10:14 Recovering a raid5 array with strange event count Chris Allen
@ 2007-04-13 12:13 ` Neil Brown
  2007-04-13 13:07   ` Chris Allen
  2007-04-16 13:55   ` Chris Allen
  0 siblings, 2 replies; 4+ messages in thread
From: Neil Brown @ 2007-04-13 12:13 UTC (permalink / raw)
  To: Chris Allen; +Cc: linux-raid

On Friday April 13, chris@cjx.com wrote:
> Dear All,
> 
> I have an 8-drive raid-5 array running under 2.6.11. This morning it 
> bombed out, and when I brought
> it up again, two drives had incorrect event counts:
> 
> 
> sda1: 0.8258715
> sdb1: 0.8258715
> sdc1: 0.8258715
> sdd1: 0.8258715
> sde1: 0.8258715
> sdf1: 0.8258715
> sdg1: 0.8258708
> sdh1: 0.8258716
> 
> 
> sdg1 is out of date (expected), but sdh1 has received an extra event.
> 
> Any attempt to restart with mdadm --assemble --force, results in an an
> un-startable array with an event count of 0.8258715.
> 
> Can anybody advise on the correct command to use to get it started again?
> I'm assuming I'll need to use mdadm --create --assume-clean - but I'm 
> not sure
> which drives should be included/excluded when I do this.

A difference of 1 in event counts is not supposed to cause a problem.
Have you tried simply assembling the array without including sdg1.
e.g.
  mdadm -A /dev/md0 /dev/sd[abcdefh]1

??

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering a raid5 array with strange event count
  2007-04-13 12:13 ` Neil Brown
@ 2007-04-13 13:07   ` Chris Allen
  2007-04-16 13:55   ` Chris Allen
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Allen @ 2007-04-13 13:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid





Neil Brown wrote:
> On Friday April 13, chris@cjx.com wrote:
>   
>> Dear All,
>>
>> I have an 8-drive raid-5 array running under 2.6.11. This morning it 
>> bombed out, and when I brought
>> it up again, two drives had incorrect event counts:
>>
>>
>> sda1: 0.8258715
>> sdb1: 0.8258715
>> sdc1: 0.8258715
>> sdd1: 0.8258715
>> sde1: 0.8258715
>> sdf1: 0.8258715
>> sdg1: 0.8258708
>> sdh1: 0.8258716
>>
>>
>> sdg1 is out of date (expected), but sdh1 has received an extra event.
>>
>> Any attempt to restart with mdadm --assemble --force, results in an an
>> un-startable array with an event count of 0.8258715.
>>
>> Can anybody advise on the correct command to use to get it started again?
>> I'm assuming I'll need to use mdadm --create --assume-clean - but I'm 
>> not sure
>> which drives should be included/excluded when I do this.
>>     
>
> A difference of 1 in event counts is not supposed to cause a problem.
> Have you tried simply assembling the array without including sdg1.
> e.g.
>   mdadm -A /dev/md0 /dev/sd[abcdefh]1
>
>
>   

# mdadm -A /dev/md0 /dev/sd[abcdefh]1

mdadm: /dev/md0 assembled from 7 drives - need all 8 to start it (use 
--run to insist)

# mdadm -D /dev/md0

mdadm: md device /dev/md0 does not appear to be active.


mdadm --run /dev/md0

mdadm: failed to run array /dev/md0: invalid argument



I've attached the syslog, the dump for the assembled array, the dump for 
each drive
and the contents of /proc/mdstat. Using --force makes no difference.




Apr 13 13:59:45 snap29 kernel: md: bind<sdb1>
Apr 13 13:59:45 snap29 kernel: md: bind<sdc1>
Apr 13 13:59:45 snap29 kernel: md: bind<sdd1>
Apr 13 13:59:45 snap29 kernel: md: bind<sde1>
Apr 13 13:59:45 snap29 kernel: md: bind<sdf1>
Apr 13 13:59:45 snap29 kernel: md: bind<sdh1>
Apr 13 13:59:45 snap29 kernel: md: bind<sda1>
Apr 13 14:00:01 snap29 kernel: md: md0: raid array is not clean -- 
starting background reconstruction
Apr 13 14:00:01 snap29 kernel: raid5: device sda1 operational as raid disk 0
Apr 13 14:00:01 snap29 kernel: raid5: device sdh1 operational as raid disk 7
Apr 13 14:00:01 snap29 kernel: raid5: device sdf1 operational as raid disk 5
Apr 13 14:00:01 snap29 kernel: raid5: device sde1 operational as raid disk 4
Apr 13 14:00:01 snap29 kernel: raid5: device sdd1 operational as raid disk 3
Apr 13 14:00:01 snap29 kernel: raid5: device sdc1 operational as raid disk 2
Apr 13 14:00:01 snap29 kernel: raid5: device sdb1 operational as raid disk 1
Apr 13 14:00:01 snap29 kernel: raid5: cannot start dirty degraded array 
for md0
Apr 13 14:00:01 snap29 kernel: RAID5 conf printout:
Apr 13 14:00:01 snap29 kernel:  --- rd:8 wd:7 fd:1
Apr 13 14:00:01 snap29 kernel:  disk 0, o:1, dev:sda1
Apr 13 14:00:01 snap29 kernel:  disk 1, o:1, dev:sdb1
Apr 13 14:00:01 snap29 kernel:  disk 2, o:1, dev:sdc1
Apr 13 14:00:01 snap29 kernel:  disk 3, o:1, dev:sdd1
Apr 13 14:00:01 snap29 kernel:  disk 4, o:1, dev:sde1
Apr 13 14:00:01 snap29 kernel:  disk 5, o:1, dev:sdf1
Apr 13 14:00:01 snap29 kernel:  disk 7, o:1, dev:sdh1
Apr 13 14:00:01 snap29 kernel: raid5: failed to run raid set md0
Apr 13 14:00:01 snap29 kernel: md: pers->run() failed ...

/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
   Raid Devices : 8
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Apr 13 10:11:15 2007
          State : active, degraded, Not Started
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
         Events : 0.8258715

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
       4       8       65        4      active sync   /dev/sde1
       5       8       81        5      active sync   /dev/sdf1
       6       0        0        6      removed
       7       8      113        7      active sync   /dev/sdh1



/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bd5a - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bd6b - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bd7d - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bd8f - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       49        3      active sync   /dev/sdd1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bda1 - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       65        4      active sync   /dev/sde1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sdf1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:12 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bdb3 - correct
         Events : 0.8258715

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       81        5      active sync   /dev/sdf1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : 31b253f9:02049908:aa4bb1ab:753b8fda
  Creation Time : Wed Apr 19 06:23:21 2006
     Raid Level : raid5
  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 3418687552 (3260.31 GiB 3500.74 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

    Update Time : Fri Apr 13 10:11:15 2007
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : a469bddb - correct
         Events : 0.8258716

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     7       8      113        7      active sync   /dev/sdh1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1



Personalities : [raid5]
md0 : inactive sda1[0] sdh1[7] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
      3418687552 blocks
unused devices: <none>






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Recovering a raid5 array with strange event count
  2007-04-13 12:13 ` Neil Brown
  2007-04-13 13:07   ` Chris Allen
@ 2007-04-16 13:55   ` Chris Allen
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Allen @ 2007-04-16 13:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: Chris Allen, linux-raid



Neil Brown wrote:
> On Friday April 13, chris@cjx.com wrote:
>   
>> Dear All,
>>
>> I have an 8-drive raid-5 array running under 2.6.11. This morning it 
>> bombed out, and when I brought
>> it up again, two drives had incorrect event counts:
>>
>>
>> sda1: 0.8258715
>> sdb1: 0.8258715
>> sdc1: 0.8258715
>> sdd1: 0.8258715
>> sde1: 0.8258715
>> sdf1: 0.8258715
>> sdg1: 0.8258708
>> sdh1: 0.8258716
>>
>>
>> sdg1 is out of date (expected), but sdh1 has received an extra event.
>>
>> Any attempt to restart with mdadm --assemble --force, results in an an
>> un-startable array with an event count of 0.8258715.
>>
>> Can anybody advise on the correct command to use to get it started again?
>> I'm assuming I'll need to use mdadm --create --assume-clean - but I'm 
>> not sure
>> which drives should be included/excluded when I do this.
>>     
>
> A difference of 1 in event counts is not supposed to cause a problem.
> Have you tried simply assembling the array without including sdg1.
> e.g.
>   mdadm -A /dev/md0 /dev/sd[abcdefh]1
>
>
>   

Further to this, I have tried upgrading the kernel to 2.6.17. I get the 
same errors.

Don't know if it is any use, but here is the tail of an strace for an 
assemble command for
both the bad system and a similar good system:


STRACE FROM ASSEMBLE - BAD ARRAY:

_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0Z\0\0\0\1\0\0\0\0\0\0\0\371S\2621I\311"..., 
4096) = 4096
close(4)                                = 0
stat64("/dev/sdi1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 129), 
...}) = 0
open("/dev/sdb1", O_RDONLY|O_EXCL)      = 4
ioctl(4, BLKGETSIZE64, 0xbffdf150)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0Z\0\0\0\1\0\0\0\0\0\0\0\371S\2621I\311"..., 
4096) = 4096
close(4)                                = 0
ioctl(3, 0x40480923, 0xbffdf2c0)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x40140921, 0xbffdf324)        = 0
ioctl(3, 0x400c0930, 0)                 = -1 EIO (Input/output error)
write(2, "mdadm: failed to RUN_ARRAY /dev/"..., 56mdadm: failed to 
RUN_ARRAY /dev/md0: Input/output error
) = 56
exit_group(1)                           = ?



SAME COMMAND, GOOD ARRAY:

_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0Z\0\0\0\0\0\0\0\0\0\0\0\316\360\34;:"..., 
4096) = 4096
close(4)                                = 0
stat64("/dev/sdh1", {st_mode=S_IFBLK|0640, st_rdev=makedev(8, 113), 
...}) = 0
open("/dev/sda1", O_RDONLY|O_EXCL)      = 4
ioctl(4, BLKGETSIZE64, 0xbfcae6d8)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0Z\0\0\0\0\0\0\0\0\0\0\0\316\360\34;:"..., 
4096) = 4096
close(4)                                = 0
ioctl(3, 0x40480923, 0xbfcae800)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x40140921, 0xbfcae85c)        = 0
ioctl(3, 0x400c0930, 0)                 = 0
write(2, "mdadm: /dev/md0 has been started"..., 46mdadm: /dev/md0 has 
been started with 8 drives) = 46
write(2, ".\n", 2.
)                      = 2
exit_group(0)                           = ?




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-04-16 13:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-13 10:14 Recovering a raid5 array with strange event count Chris Allen
2007-04-13 12:13 ` Neil Brown
2007-04-13 13:07   ` Chris Allen
2007-04-16 13:55   ` Chris Allen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).