a couple of mdadm questions

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* a couple of mdadm questions
@ 2003-09-06 20:24 seth vidal
  2003-09-08  7:06 ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: seth vidal @ 2003-09-06 20:24 UTC (permalink / raw)
  To: linux-raid

Hi,
 I'm running Red Hat Linux 7.3 - with the 2.4.20-20.7 kernels.

I have a raid array of 7 73GB u160 scsi disks.

I had a disk failure on one disk. This is in a dell powervault 220S
drive array. 

I ran:
mdadm /dev/md1 --remove /dev/sdd1

I removed the failed disk. Inserted the new disk. Partitioned it and
ran:

mdadm /dev/md1 --add /dev/sdd1

the drive reconstructed and everything seems happy.

This has happened on 2 separate disks at different times and it has
recovered both times.

When I do a mdadm -D /dev/md1 it lists out very oddly:

mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.00
  Creation Time : Wed Nov  6 11:09:01 2002
     Raid Level : raid5
     Array Size : 430091520 (410.17 GiB 440.41 GB)
    Device Size : 71681920 (68.36 GiB 73.40 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Sep  6 14:53:58 2003
          State : dirty, no-errors
 Active Devices : 7
Working Devices : 5
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 64K

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1
       4       8       81        4      active sync   /dev/sdf1
       5       8       97        5      active sync   /dev/sdg1
       6       8      113        6      active sync   /dev/sdh1
           UUID : 3b48fd52:94bb97fd:89437dea:126fd0fc
         Events : 0.82

So why does this say - 5 working devices, 2 failed devices and 7 active
devices?

It seems like it should read:
7 active devices and 7 working devices.
In addition, I can't get State: dirty, no-errors to go away.

I considered recreating this array with:

mdadm -C /dev/md1 -l 5 -n 7 -c 64 /dev/sdb1 /dev/sdc1 /dev/sdd1 \
/dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1

but I was a little leery that I might screw something up. There is a lot
of important data on this array.

The only other thing that is very odd is that on boot the system always
claims to fail to start the array, that there are too few drives. But
then it starts, mounts and the data all looks good. I've compared big
chunks of the data with md5sum and it's valid. So I think it has
something to do with the Working Device counts.

Is that the case?

This is on mdadm 1.2.0.

Thanks
-sv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: a couple of mdadm questions
  2003-09-06 20:24 a couple of mdadm questions seth vidal
@ 2003-09-08  7:06 ` Neil Brown
  2003-09-08 13:53   ` seth vidal
  2003-09-08 20:09   ` Luca Berra
  0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2003-09-08  7:06 UTC (permalink / raw)
  To: seth vidal; +Cc: linux-raid

On  September 6, skvidal@phy.duke.edu wrote:
> 
> When I do a mdadm -D /dev/md1 it lists out very oddly:
> 
....
> 
> So why does this say - 5 working devices, 2 failed devices and 7 active
> devices?
> 

Because the code in the kernel for keeping these counters up-to-date
is rather fragile and probably broken, but as the counters aren't
actually used for anything (much) I have never bothered fixing it.

> 
> It seems like it should read:
> 7 active devices and 7 working devices.
> In addition, I can't get State: dirty, no-errors to go away.
> 
> I considered recreating this array with:
> 
> mdadm -C /dev/md1 -l 5 -n 7 -c 64 /dev/sdb1 /dev/sdc1 /dev/sdd1 \
> /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
> 
> but I was a little leery that I might screw something up. There is a lot
> of important data on this array.
> 

You could get mdadm 1.3.0, add some patches from
    http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/
and try
   --assemble --update=summaries

it should fix these counts for you.

> 
> The only other thing that is very odd is that on boot the system always
> claims to fail to start the array, that there are too few drives. But
> then it starts, mounts and the data all looks good. I've compared big
> chunks of the data with md5sum and it's valid. So I think it has
> something to do with the Working Device counts.
> 
> Is that the case?

Probably.  What is the actual message?

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: a couple of mdadm questions
  2003-09-08  7:06 ` Neil Brown
@ 2003-09-08 13:53   ` seth vidal
  2003-09-12  2:58     ` Neil Brown
  2003-09-08 20:09   ` Luca Berra
  1 sibling, 1 reply; 6+ messages in thread
From: seth vidal @ 2003-09-08 13:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

> You could get mdadm 1.3.0, add some patches from
>     http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/
> and try
>    --assemble --update=summaries
> 
> it should fix these counts for you.
hmmm

good call.
Thanks



> Probably.  What is the actual message?

All of the output attached below:

raid5: using function: p5_mmx (2653.200 MB/sec)
md: raid5 personality registered as nr 4
Journalled Block Device driver loaded
md: Autodetecting RAID arrays.
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
md: autorun ...
md: considering sdf1 ...
md:  adding sdf1 ...
md:  adding sde1 ...
md:  adding sdc1 ...
md: created md1
md: bind<sdc1,1>
md: bind<sde1,2>
md: bind<sdf1,3>
md: running: <sdf1><sde1><sdc1>
md: sdf1's event counter: 00000053
md: sde1's event counter: 00000053
md: sdc1's event counter: 00000053
md1: former device sdb1 is unavailable, removing from array!
md1: former device sdd1 is unavailable, removing from array!
md1: former device sdg1 is unavailable, removing from array!
md1: former device sdh1 is unavailable, removing from array!
md1: max total readahead window set to 1536k
md1: 6 data-disks, max readahead per data-disk: 256k
raid5: device sdf1 operational as raid disk 4
raid5: device sde1 operational as raid disk 3
raid5: device sdc1 operational as raid disk 1
raid5: not enough operational devices for md1 (4/7 failed)
RAID5 conf printout:
 --- rd:7 wd:3 fd:4
 disk 0, s:0, o:0, n:0 rd:0 us:1 dev:[dev 00:00]
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc1
 disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00]
 disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sde1
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sdf1
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
 disk 6, s:0, o:0, n:6 rd:6 us:1 dev:[dev 00:00]
raid5: failed to run raid set md1
md: pers->run() failed ...
md :do_md_run() returned -22
md: md1 stopped.
md: unbind<sdf1,2>
md: export_rdev(sdf1)
md: unbind<sde1,1>
md: export_rdev(sde1)
md: unbind<sdc1,0>
md: export_rdev(sdc1)
md: ... autorun DONE.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 116k freed
Adding Swap: 2096472k swap-space (priority -1)
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,1), internal journal
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
 [events: 00000053]
md: autorun ...
md: considering sdh1 ...
md:  adding sdh1 ...
md:  adding sdg1 ...
md:  adding sdf1 ...
md:  adding sde1 ...
md:  adding sdd1 ...
md:  adding sdb1 ...
md:  adding sdc1 ...
md: created md1
md: bind<sdc1,1>
md: bind<sdb1,2>
md: bind<sdd1,3>
md: bind<sde1,4>
md: bind<sdf1,5>
md: bind<sdg1,6>
md: bind<sdh1,7>
md: running: <sdh1><sdg1><sdf1><sde1><sdd1><sdb1><sdc1>
md: sdh1's event counter: 00000053
md: sdg1's event counter: 00000053
md: sdf1's event counter: 00000053
md: sde1's event counter: 00000053
md: sdd1's event counter: 00000053
md: sdb1's event counter: 00000053
md: sdc1's event counter: 00000053
md1: max total readahead window set to 1536k
md1: 6 data-disks, max readahead per data-disk: 256k
raid5: device sdh1 operational as raid disk 6
raid5: device sdg1 operational as raid disk 5
raid5: device sdf1 operational as raid disk 4
raid5: device sde1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 0
raid5: device sdc1 operational as raid disk 1
raid5: allocated 7475kB for md1
raid5: raid level 5 set md1 active with 7 out of 7 devices, algorithm 0
RAID5 conf printout:
 --- rd:7 wd:7 fd:0
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb1
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc1
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd1
 disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sde1
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sdf1
 disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sdg1
 disk 6, s:0, o:1, n:6 rd:6 us:1 dev:sdh1
RAID5 conf printout:
 --- rd:7 wd:7 fd:0
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdb1
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc1
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd1
 disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sde1
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sdf1
 disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sdg1
 disk 6, s:0, o:1, n:6 rd:6 us:1 dev:sdh1
md: updating md1 RAID superblock on device
md: sdh1 [events: 00000054]<6>(write) sdh1's sb offset: 71681920
md: sdg1 [events: 00000054]<6>(write) sdg1's sb offset: 71681920
md: sdf1 [events: 00000054]<6>(write) sdf1's sb offset: 71681920
md: sde1 [events: 00000054]<6>(write) sde1's sb offset: 71681920
md: sdd1 [events: 00000054]<6>(write) sdd1's sb offset: 71681920
md: sdb1 [events: 00000054]<6>(write) sdb1's sb offset: 71681920
md: sdc1 [events: 00000054]<6>(write) sdc1's sb offset: 71681920
md: ... autorun DONE.
raid5: switching cache buffer size, 4096 --> 1024
raid5: switching cache buffer size, 1024 --> 4096


Thanks
-sv



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: a couple of mdadm questions
  2003-09-08  7:06 ` Neil Brown
  2003-09-08 13:53   ` seth vidal
@ 2003-09-08 20:09   ` Luca Berra
  2003-09-08 22:35     ` Neil Brown
  1 sibling, 1 reply; 6+ messages in thread
From: Luca Berra @ 2003-09-08 20:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Mon, Sep 08, 2003 at 05:06:53PM +1000, Neil Brown wrote:
>You could get mdadm 1.3.0, add some patches from
>    http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/

just a silly question:

003MdSuperFix
Status: ok
Make sure unused superblock descriptor entries aren't failed.
This confuses 2.4 kernel code. 

do you mean that without this patch 2.4 kernel gets confused

or do you mean that this patch is harmful for 2.4

regards,
L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: a couple of mdadm questions
  2003-09-08 20:09   ` Luca Berra
@ 2003-09-08 22:35     ` Neil Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2003-09-08 22:35 UTC (permalink / raw)
  To: Luca Berra; +Cc: linux-raid

On Monday September 8, bluca@comedia.it wrote:
> On Mon, Sep 08, 2003 at 05:06:53PM +1000, Neil Brown wrote:
> >You could get mdadm 1.3.0, add some patches from
> >    http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/
> 
> just a silly question:
> 
> 003MdSuperFix
> Status: ok
> Make sure unused superblock descriptor entries aren't failed.
> This confuses 2.4 kernel code. 
> 
> do you mean that without this patch 2.4 kernel gets confused
> 
> or do you mean that this patch is harmful for 2.4
> 

The patch modifies --update=summaries to make some further
normalisations of the superblock.  At least one superblock has been
seen in the wild that was not correct (i.e. had unused descriptors
that were marked 'failed') and this superblock confused 2.4 badly.

i.e. with this patch it is possible to fix a rare condition that
confused 2.4

I hope that makes it reasonably clear :-)

NeilBrown


> regards,
> L.
> 
> -- 
> Luca Berra -- bluca@comedia.it
>         Communication Media & Services S.r.l.
>  /"\
>  \ /     ASCII RIBBON CAMPAIGN
>   X        AGAINST HTML MAIL
>  / \

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: a couple of mdadm questions
  2003-09-08 13:53   ` seth vidal
@ 2003-09-12  2:58     ` Neil Brown
  0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2003-09-12  2:58 UTC (permalink / raw)
  To: seth vidal; +Cc: linux-raid

On  September 8, skvidal@phy.duke.edu wrote:
> 
> > Probably.  What is the actual message?
> 
> All of the output attached below:
> 
> raid5: using function: p5_mmx (2653.200 MB/sec)
> md: raid5 personality registered as nr 4
> Journalled Block Device driver loaded
> md: Autodetecting RAID arrays.
>  [events: 00000053]
>  [events: 00000053]
>  [events: 00000053]
> md: autorun ...
> md: considering sdf1 ...
> md:  adding sdf1 ...
> md:  adding sde1 ...
> md:  adding sdc1 ...
> md: created md1
> md: bind<sdc1,1>
> md: bind<sde1,2>
> md: bind<sdf1,3>
> md: running: <sdf1><sde1><sdc1>
> md: sdf1's event counter: 00000053
> md: sde1's event counter: 00000053
> md: sdc1's event counter: 00000053
> md1: former device sdb1 is unavailable, removing from array!
> md1: former device sdd1 is unavailable, removing from array!
> md1: former device sdg1 is unavailable, removing from array!
> md1: former device sdh1 is unavailable, removing from array!

Looks like sd[bdgh]1 are *not* marked as auto-detect-raid in the
partition table.  Use fdisk/cfdisk to check.

NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-12  2:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-06 20:24 a couple of mdadm questions seth vidal
2003-09-08  7:06 ` Neil Brown
2003-09-08 13:53   ` seth vidal
2003-09-12  2:58     ` Neil Brown
2003-09-08 20:09   ` Luca Berra
2003-09-08 22:35     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).