linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID SCSI binding
@ 2003-02-28 12:19 Mike Black
  2003-02-28 20:49 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Black @ 2003-02-28 12:19 UTC (permalink / raw)
  To: linux-kernel, linux-raid

Linux 2.4.20 (but not unique to this kernel -- been this way for over a year):
I have a RAID5 array that doesn't startup properly during boot (I have to stop it and restart after the system is up).  I've had
this problem forever and have been trying to fix it.
Here's what it looks like when it's up and running:
md4 : active raid5 sdn1[0] sdt1[6] sds1[5] sdr1[4] sdq1[3] sdp1[2] sdo1[1]
     216490752 blocks level 5, 128k chunk, algorithm 0 [7/7] [UUUUUUU]
The partitions types are set to 0x83 and the array is being manually started in rc.local instead of doing auto-start.

I think I finally have and idea of what's happening.  The major device # may be confusing the RAID bootup process.

During boot the devices are listed properly but then they don't all get bound.

Feb 28 05:42:48 yeti kernel:  sda: sda1
Feb 28 05:42:48 yeti kernel:  sdb: sdb1
Feb 28 05:42:48 yeti kernel:  sdc: sdc1
Feb 28 05:42:48 yeti kernel:  sdd: sdd1
Feb 28 05:42:48 yeti kernel:  sde: sde1
Feb 28 05:42:48 yeti kernel:  sdf: sdf1
Feb 28 05:42:48 yeti kernel:  sdg: sdg1
Feb 28 05:42:48 yeti kernel:  sdh: sdh1
Feb 28 05:42:48 yeti kernel:  sdi: sdi1
Feb 28 05:42:48 yeti kernel:  sdj: sdj1
Feb 28 05:42:48 yeti kernel:  sdk: sdk1
Feb 28 05:42:48 yeti kernel:  sdl: sdl1
Feb 28 05:42:48 yeti kernel:  sdm: sdm1
Feb 28 05:42:48 yeti kernel:  sdn: sdn1
Feb 28 05:42:48 yeti kernel:  sdo: sdo1
Feb 28 05:42:48 yeti kernel:  sdp: sdp1
Feb 28 05:42:48 yeti kernel:  sdq: sdq1
Feb 28 05:42:48 yeti kernel:  sdr: sdr1
Feb 28 05:42:48 yeti kernel:  sds: sds1
Feb 28 05:42:48 yeti kernel:  sdt: sdt1
Feb 28 05:42:49 yeti kernel:  [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdo1,1>
Feb 28 05:42:49 yeti kernel:  [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdp1,2>
Feb 28 05:42:49 yeti kernel:  [events: 000000bc]
Feb 28 05:42:49 yeti kernel: md: bind<sdn1,3>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdb1,1>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdc1,2>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sda1,3>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sde1,4>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdf1,5>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdg1,6>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdh1,7>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdi1,8>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdj1,9>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdk1,10>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdl1,11>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdd1,12>
Feb 28 05:42:49 yeti kernel:  [events: 000000ab]
Feb 28 05:42:49 yeti kernel: md: bind<sdm1,13>

Note that sdq,sdr,sds , and sdt do not get an md: bind entry
The raid ends up in /proc/mdstat with just sdn,o,p in it.
I then stop and restart the array and everything is OK.
Feb 28 05:53:30 yeti kernel: md: md4 stopped.
Feb 28 05:53:30 yeti kernel: md: unbind<sdn1,2>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdn1)
Feb 28 05:53:30 yeti kernel: md: unbind<sdp1,1>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdp1)
Feb 28 05:53:30 yeti kernel: md: unbind<sdo1,0>
Feb 28 05:53:30 yeti kernel: md: export_rdev(sdo1)
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdo1,1>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdp1,2>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdq1,3>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdr1,4>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sds1,5>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdt1,6>
Feb 28 05:53:34 yeti kernel:  [events: 000000bc]
Feb 28 05:53:34 yeti kernel: md: bind<sdn1,7>
Feb 28 05:53:34 yeti kernel: md: sdn1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdt1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdn1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdt1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sds1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdr1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdq1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdp1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md: sdo1's event counter: 000000bc
Feb 28 05:53:34 yeti kernel: md4: max total readahead window set to 3072k
Feb 28 05:53:34 yeti kernel: md4: 6 data-disks, max readahead per data-disk: 512k
Feb 28 05:53:34 yeti kernel: raid5: device sdn1 operational as raid disk 0
Feb 28 05:53:34 yeti kernel: raid5: device sdt1 operational as raid disk 6
Feb 28 05:53:34 yeti kernel: raid5: device sds1 operational as raid disk 5
Feb 28 05:53:34 yeti kernel: raid5: device sdr1 operational as raid disk 4
Feb 28 05:53:34 yeti kernel: raid5: device sdq1 operational as raid disk 3
Feb 28 05:53:34 yeti kernel: raid5: device sdp1 operational as raid disk 2
Feb 28 05:53:34 yeti kernel: raid5: device sdo1 operational as raid disk 1
Feb 28 05:53:34 yeti kernel: raid5: allocated 7483kB for md4
Feb 28 05:53:34 yeti kernel: md: updating md4 RAID superblock on device
Feb 28 05:53:34 yeti kernel: md: sdn1 [events: 000000bd]<6>(write) sdn1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdt1 [events: 000000bd]<6>(write) sdt1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sds1 [events: 000000bd]<6>(write) sds1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdr1 [events: 000000bd]<6>(write) sdr1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdq1 [events: 000000bd]<6>(write) sdq1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdp1 [events: 000000bd]<6>(write) sdp1's sb offset: 36081856
Feb 28 05:53:34 yeti kernel: md: sdo1 [events: 000000bd]<6>(write) sdo1's sb offset: 36081856


I noticed that the major node is different starting at sdq.

brw-rw----    1 root     disk       8, 208 Feb 22  2002 /dev/sdn
brw-rw----    1 root     disk       8, 224 Feb 22  2002 /dev/sdo
brw-rw----    1 root     disk       8, 240 Feb 22  2002 /dev/sdp
brw-rw----    1 root     disk      65,   0 Feb 22  2002 /dev/sdq
brw-rw----    1 root     disk      65,  16 Feb 22  2002 /dev/sdr
brw-rw----    1 root     disk      65,  32 Feb 22  2002 /dev/sds
brw-rw----    1 root     disk      65,  48 Feb 22  2002 /dev/sdt

So it looks to me like the md driver doesn't like the raid system crossing the major device boundary for some odd reason.
Although I'm not sure why I can stop and restart after the system is totally up but not start it in rc.local using the same
commands:

We're not hitting the 27 maximum disks so that isn't it.

Looking thru the code in md.c I don't quite see where the problem is (or may be).

Michael D. Black mblack@csi-inc.com
http://www.csi-inc.com/
http://www.csi-inc.com/~mike
321-676-2923, x203
Melbourne FL

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RAID SCSI binding
  2003-02-28 12:19 RAID SCSI binding Mike Black
@ 2003-02-28 20:49 ` Neil Brown
  2003-03-03 13:32   ` Mike Black
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2003-02-28 20:49 UTC (permalink / raw)
  To: Mike Black; +Cc: linux-kernel, linux-raid

On Friday February 28, mblack@csi-inc.com wrote:
> Linux 2.4.20 (but not unique to this kernel -- been this way for over a year):
> I have a RAID5 array that doesn't startup properly during boot (I have to stop it and restart after the system is up).  I've had
> this problem forever and have been trying to fix it.
> Here's what it looks like when it's up and running:
> md4 : active raid5 sdn1[0] sdt1[6] sds1[5] sdr1[4] sdq1[3] sdp1[2] sdo1[1]
>      216490752 blocks level 5, 128k chunk, algorithm 0 [7/7] [UUUUUUU]
> The partitions types are set to 0x83 and the array is being manually
> started in rc.local instead of doing auto-start.

What tool are you using to start the array? raidstart or mdadm?
There doesn't seem to be enough noise in the logs for it to be
raidstart (no "md: autorun ...") so I assume mdadm.

What do you get if you add "-v" to the mdadm running from rc.local?
Can you show me
  mdadm -E /dev/sda1 /dev/sdo1

NeilBrown

> 
> So it looks to me like the md driver doesn't like the raid system crossing the major device boundary for some odd reason.
> Although I'm not sure why I can stop and restart after the system is totally up but not start it in rc.local using the same
> commands:

I suspect this is just a co-incidence.  I cannot imagine how the major
device number could affect things at all.

NeilBrown

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RAID SCSI binding
  2003-02-28 20:49 ` Neil Brown
@ 2003-03-03 13:32   ` Mike Black
  0 siblings, 0 replies; 3+ messages in thread
From: Mike Black @ 2003-03-03 13:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, linux-raid

Using mdadm v1.0.0

/dev/sdn1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 05084a65:52ffddf9:e971cafd:3aca3445
  Creation Time : Wed Nov  7 07:19:15 2001
     Raid Level : raid5
    Device Size : 36081792 (34.41 GiB 36.94 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 4

    Update Time : Fri Feb 28 05:53:34 2003
          State : dirty, no-errors
 Active Devices : 7
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a1e07486 - correct
         Events : 0.189

         Layout : left-asymmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     0       8      209        0      active sync   /dev/sdn1
   0     0       8      209        0      active sync   /dev/sdn1
   1     1       8      225        1      active sync   /dev/sdo1
   2     2       8      241        2      active sync   /dev/sdp1
   3     3      65        1        3      active sync   /dev/sdq1
   4     4      65       17        4      active sync   /dev/sdr1
   5     5      65       33        5      active sync   /dev/sds1
   6     6      65       49        6      active sync   /dev/sdt1
/dev/sdo1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 05084a65:52ffddf9:e971cafd:3aca3445
  Creation Time : Wed Nov  7 07:19:15 2001
     Raid Level : raid5
    Device Size : 36081792 (34.41 GiB 36.94 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 4

    Update Time : Fri Feb 28 05:53:34 2003
          State : dirty, no-errors
 Active Devices : 7
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a1e07498 - correct
         Events : 0.189

         Layout : left-asymmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     1       8      225        1      active sync   /dev/sdo1
   0     0       8      209        0      active sync   /dev/sdn1
   1     1       8      225        1      active sync   /dev/sdo1
   2     2       8      241        2      active sync   /dev/sdp1
   3     3      65        1        3      active sync   /dev/sdq1
   4     4      65       17        4      active sync   /dev/sdr1
   5     5      65       33        5      active sync   /dev/sds1
   6     6      65       49        6      active sync   /dev/sdt1

----- Original Message -----
From: "Neil Brown" <neilb@cse.unsw.edu.au>
To: "Mike Black" <mblack@csi-inc.com>
Cc: "linux-kernel" <linux-kernel@vger.kernel.org>; "linux-raid" <linux-raid@vger.kernel.org>
Sent: Friday, February 28, 2003 3:49 PM
Subject: Re: RAID SCSI binding


> On Friday February 28, mblack@csi-inc.com wrote:
> > Linux 2.4.20 (but not unique to this kernel -- been this way for over a year):
> > I have a RAID5 array that doesn't startup properly during boot (I have to stop it and restart after the system is up).  I've had
> > this problem forever and have been trying to fix it.
> > Here's what it looks like when it's up and running:
> > md4 : active raid5 sdn1[0] sdt1[6] sds1[5] sdr1[4] sdq1[3] sdp1[2] sdo1[1]
> >      216490752 blocks level 5, 128k chunk, algorithm 0 [7/7] [UUUUUUU]
> > The partitions types are set to 0x83 and the array is being manually
> > started in rc.local instead of doing auto-start.
>
> What tool are you using to start the array? raidstart or mdadm?
> There doesn't seem to be enough noise in the logs for it to be
> raidstart (no "md: autorun ...") so I assume mdadm.
>
> What do you get if you add "-v" to the mdadm running from rc.local?
> Can you show me
>   mdadm -E /dev/sda1 /dev/sdo1
>
> NeilBrown
>
> >
> > So it looks to me like the md driver doesn't like the raid system crossing the major device boundary for some odd reason.
> > Although I'm not sure why I can stop and restart after the system is totally up but not start it in rc.local using the same
> > commands:
>
> I suspect this is just a co-incidence.  I cannot imagine how the major
> device number could affect things at all.
>
> NeilBrown


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-03-03 13:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-28 12:19 RAID SCSI binding Mike Black
2003-02-28 20:49 ` Neil Brown
2003-03-03 13:32   ` Mike Black

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).