linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Raid5 Failure
@ 2005-07-17 15:44 David M. Strang
  2005-07-17 22:05 ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-17 15:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

-(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument

Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0 
superblocks.

What, if anything, can I do since I'm using a version 1.0 superblock?

-- David M. Strang

----- Original Message ----- 
From: David M. Strang
To: Neil Brown
Cc: linux-raid@vger.kernel.org
Sent: Friday, July 15, 2005 4:25 PM
Subject: Re: Raid5 Failure


Okay, the array rebuilt - but I had 2 failed devices this morning.

/dev/sdj & /dev/sdaa were marked faulty.

Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LOOP DOWN detected.
Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7f7).
Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LOOP UP detected (1
Gbps).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f7c3).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7c3).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f775).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f775).
Jul 15 01:59:50 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f7c3).
Jul 15 01:59:50 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7c3).
Jul 15 02:00:42 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f776).
Jul 15 02:00:42 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f776).
Jul 15 02:01:17 abyss kernel:  rport-2:0-9: blocked FC remote port time out:
removing target
Jul 15 02:01:17 abyss kernel:  rport-2:0-26: blocked FC remote port time
out: removing target
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238272
Jul 15 02:01:17 abyss kernel: raid5: Disk failure on sdj, disabling device.
Operation continuing on 27 devices
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239552
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239296
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239040
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238784
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238528
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 26 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdaa, sector
53238016
Jul 15 02:01:17 abyss kernel: raid5: Disk failure on sdaa, disabling device.
Operation continuing on 26 devices

I have switched out the QLA2200 controller for a different one; and used:

mdadm -A /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn
/dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv
/dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab -f

to start the array; it is now running.

Jul 15 08:55:51 abyss kernel: md: bind<sdb>
Jul 15 08:55:51 abyss kernel: md: bind<sdc>
Jul 15 08:55:51 abyss kernel: md: bind<sdd>
Jul 15 08:55:51 abyss kernel: md: bind<sde>
Jul 15 08:55:51 abyss kernel: md: bind<sdf>
Jul 15 08:55:51 abyss kernel: md: bind<sdg>
Jul 15 08:55:51 abyss kernel: md: bind<sdh>
Jul 15 08:55:51 abyss kernel: md: bind<sdi>
Jul 15 08:55:51 abyss kernel: md: bind<sdj>
Jul 15 08:55:51 abyss kernel: md: bind<sdk>
Jul 15 08:55:51 abyss kernel: md: bind<sdl>
Jul 15 08:55:51 abyss kernel: md: bind<sdm>
Jul 15 08:55:51 abyss kernel: md: bind<sdn>
Jul 15 08:55:51 abyss kernel: md: bind<sdo>
Jul 15 08:55:51 abyss kernel: md: bind<sdp>
Jul 15 08:55:51 abyss kernel: md: bind<sdq>
Jul 15 08:55:51 abyss kernel: md: bind<sdr>
Jul 15 08:55:51 abyss kernel: md: bind<sds>
Jul 15 08:55:51 abyss kernel: md: bind<sdt>
Jul 15 08:55:51 abyss kernel: md: bind<sdu>
Jul 15 08:55:51 abyss kernel: md: bind<sdv>
Jul 15 08:55:51 abyss kernel: md: bind<sdw>
Jul 15 08:55:51 abyss kernel: md: bind<sdx>
Jul 15 08:55:51 abyss kernel: md: bind<sdy>
Jul 15 08:55:51 abyss kernel: md: bind<sdz>
Jul 15 08:55:51 abyss kernel: md: bind<sdaa>
Jul 15 08:55:51 abyss kernel: md: bind<sdab>
Jul 15 08:55:51 abyss kernel: md: bind<sda>
Jul 15 08:55:51 abyss kernel: md: kicking non-fresh sdaa from array!
Jul 15 08:55:51 abyss kernel: md: unbind<sdaa>
Jul 15 08:55:51 abyss kernel: md: export_rdev(sdaa)
Jul 15 08:55:51 abyss kernel: raid5: device sda operational as raid disk 0
Jul 15 08:55:51 abyss kernel: raid5: device sdab operational as raid disk 27
Jul 15 08:55:51 abyss kernel: raid5: device sdz operational as raid disk 25
Jul 15 08:55:51 abyss kernel: raid5: device sdy operational as raid disk 24
Jul 15 08:55:51 abyss kernel: raid5: device sdx operational as raid disk 23
Jul 15 08:55:51 abyss kernel: raid5: device sdw operational as raid disk 22
Jul 15 08:55:51 abyss kernel: raid5: device sdv operational as raid disk 21
Jul 15 08:55:51 abyss kernel: raid5: device sdu operational as raid disk 20
Jul 15 08:55:51 abyss kernel: raid5: device sdt operational as raid disk 19
Jul 15 08:55:51 abyss kernel: raid5: device sds operational as raid disk 18
Jul 15 08:55:51 abyss kernel: raid5: device sdr operational as raid disk 17
Jul 15 08:55:51 abyss kernel: raid5: device sdq operational as raid disk 16
Jul 15 08:55:51 abyss kernel: raid5: device sdp operational as raid disk 15
Jul 15 08:55:51 abyss kernel: raid5: device sdo operational as raid disk 14
Jul 15 08:55:51 abyss kernel: raid5: device sdn operational as raid disk 13
Jul 15 08:55:51 abyss kernel: raid5: device sdm operational as raid disk 12
Jul 15 08:55:51 abyss kernel: raid5: device sdl operational as raid disk 11
Jul 15 08:55:51 abyss kernel: raid5: device sdk operational as raid disk 10
Jul 15 08:55:51 abyss kernel: raid5: device sdj operational as raid disk 9
Jul 15 08:55:51 abyss kernel: raid5: device sdi operational as raid disk 8
Jul 15 08:55:51 abyss kernel: raid5: device sdh operational as raid disk 7
Jul 15 08:55:51 abyss kernel: raid5: device sdg operational as raid disk 6
Jul 15 08:55:51 abyss kernel: raid5: device sdf operational as raid disk 5
Jul 15 08:55:51 abyss kernel: raid5: device sde operational as raid disk 4
Jul 15 08:55:51 abyss kernel: raid5: device sdd operational as raid disk 3
Jul 15 08:55:51 abyss kernel: raid5: device sdc operational as raid disk 2
Jul 15 08:55:51 abyss kernel: raid5: device sdb operational as raid disk 1
Jul 15 08:55:51 abyss kernel: raid5: allocated 29215kB for md0
Jul 15 08:55:51 abyss kernel: raid5: raid level 5 set md0 active with 27 out
of 28 devices, algorithm 0
Jul 15 08:55:51 abyss kernel: RAID5 conf printout:
Jul 15 08:55:51 abyss kernel:  --- rd:28 wd:27 fd:1
Jul 15 08:55:51 abyss kernel:  disk 0, o:1, dev:sda
Jul 15 08:55:51 abyss kernel:  disk 1, o:1, dev:sdb
Jul 15 08:55:51 abyss kernel:  disk 2, o:1, dev:sdc
Jul 15 08:55:51 abyss kernel:  disk 3, o:1, dev:sdd
Jul 15 08:55:51 abyss kernel:  disk 4, o:1, dev:sde
Jul 15 08:55:51 abyss kernel:  disk 5, o:1, dev:sdf
Jul 15 08:55:51 abyss kernel:  disk 6, o:1, dev:sdg
Jul 15 08:55:51 abyss kernel:  disk 7, o:1, dev:sdh
Jul 15 08:55:51 abyss kernel:  disk 8, o:1, dev:sdi
Jul 15 08:55:51 abyss kernel:  disk 9, o:1, dev:sdj
Jul 15 08:55:51 abyss kernel:  disk 10, o:1, dev:sdk
Jul 15 08:55:51 abyss kernel:  disk 11, o:1, dev:sdl
Jul 15 08:55:51 abyss kernel:  disk 12, o:1, dev:sdm
Jul 15 08:55:51 abyss kernel:  disk 13, o:1, dev:sdn
Jul 15 08:55:51 abyss kernel:  disk 14, o:1, dev:sdo
Jul 15 08:55:51 abyss kernel:  disk 15, o:1, dev:sdp
Jul 15 08:55:51 abyss kernel:  disk 16, o:1, dev:sdq
Jul 15 08:55:51 abyss kernel:  disk 17, o:1, dev:sdr
Jul 15 08:55:51 abyss kernel:  disk 18, o:1, dev:sds
Jul 15 08:55:51 abyss kernel:  disk 19, o:1, dev:sdt
Jul 15 08:55:51 abyss kernel:  disk 20, o:1, dev:sdu
Jul 15 08:55:51 abyss kernel:  disk 21, o:1, dev:sdv
Jul 15 08:55:51 abyss kernel:  disk 22, o:1, dev:sdw
Jul 15 08:55:51 abyss kernel:  disk 23, o:1, dev:sdx
Jul 15 08:55:51 abyss kernel:  disk 24, o:1, dev:sdy
Jul 15 08:55:51 abyss kernel:  disk 25, o:1, dev:sdz
Jul 15 08:55:51 abyss kernel:  disk 27, o:1, dev:sdab
Jul 15 08:56:22 abyss kernel: ReiserFS: md0: found reiserfs format "3.6"
with standard journal
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: using ordered data mode
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: journal params: device md0,
size 8192, journal first block 18, max trans len 1024, max batch 900, max
commit age 30, max trans age 30
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: checking transaction log (md0)
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: replayed 1 transactions in 0
seconds
Jul 15 08:56:27 abyss kernel: ReiserFS: md0: Using r5 hash to sort names

/dev/md0:
        Version : 01.00.01
  Creation Time : Wed Dec 31 19:00:00 1969
     Raid Level : raid5
     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
    Device Size : 71687296 (68.37 GiB 73.41 GB)
   Raid Devices : 28
  Total Devices : 27
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jul 15 16:11:05 2005
          State : clean, degraded
 Active Devices : 27
Working Devices : 27
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 128K

           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
         Events : 173177

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/evms/.nodes/sda
       1       8       16        1      active sync   /dev/evms/.nodes/sdb
       2       8       32        2      active sync   /dev/evms/.nodes/sdc
       3       8       48        3      active sync   /dev/evms/.nodes/sdd
       4       8       64        4      active sync   /dev/evms/.nodes/sde
       5       8       80        5      active sync   /dev/evms/.nodes/sdf
       6       8       96        6      active sync   /dev/evms/.nodes/sdg
       7       8      112        7      active sync   /dev/evms/.nodes/sdh
       8       8      128        8      active sync   /dev/evms/.nodes/sdi
       9       8      144        9      active sync   /dev/evms/.nodes/sdj
      10       8      160       10      active sync   /dev/evms/.nodes/sdk
      11       8      176       11      active sync   /dev/evms/.nodes/sdl
      12       8      192       12      active sync   /dev/evms/.nodes/sdm
      13       8      208       13      active sync   /dev/evms/.nodes/sdn
      14       8      224       14      active sync   /dev/evms/.nodes/sdo
      15       8      240       15      active sync   /dev/evms/.nodes/sdp
      16      65        0       16      active sync   /dev/evms/.nodes/sdq
      17      65       16       17      active sync   /dev/evms/.nodes/sdr
      18      65       32       18      active sync   /dev/evms/.nodes/sds
      19      65       48       19      active sync   /dev/evms/.nodes/sdt
      20      65       64       20      active sync   /dev/evms/.nodes/sdu
      21      65       80       21      active sync   /dev/evms/.nodes/sdv
      22      65       96       22      active sync   /dev/evms/.nodes/sdw
      23      65      112       23      active sync   /dev/evms/.nodes/sdx
      24      65      128       24      active sync   /dev/evms/.nodes/sdy
      25      65      144       25      active sync   /dev/evms/.nodes/sdz
      26       0        0        -      removed
      27      65      176       27      active sync   /dev/evms/.nodes/sdab

It's been running most of the day - with no problems, just in a degraded
state. How do I get /dev/sdaa back into the array?

-- David M. Strang

----- Original Message ----- 
From: David M. Strang
To: Neil Brown
Cc: linux-raid@vger.kernel.org
Sent: Thursday, July 14, 2005 10:16 PM
Subject: Re: Raid5 Failure


Neil -

You are the man; the array went w/o force - and is rebuilding now!

-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.01
  Creation Time : Wed Dec 31 19:00:00 1969
     Raid Level : raid5
     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
    Device Size : 71687296 (68.37 GiB 73.41 GB)
   Raid Devices : 28
  Total Devices : 28
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Jul 14 22:07:18 2005
          State : active, resyncing
 Active Devices : 28
Working Devices : 28
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-asymmetric
     Chunk Size : 128K

 Rebuild Status : 0% complete

           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
         Events : 172760

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/evms/.nodes/sda
       1       8       16        1      active sync   /dev/evms/.nodes/sdb
       2       8       32        2      active sync   /dev/evms/.nodes/sdc
       3       8       48        3      active sync   /dev/evms/.nodes/sdd
       4       8       64        4      active sync   /dev/evms/.nodes/sde
       5       8       80        5      active sync   /dev/evms/.nodes/sdf
       6       8       96        6      active sync   /dev/evms/.nodes/sdg
       7       8      112        7      active sync   /dev/evms/.nodes/sdh
       8       8      128        8      active sync   /dev/evms/.nodes/sdi
       9       8      144        9      active sync   /dev/evms/.nodes/sdj
      10       8      160       10      active sync   /dev/evms/.nodes/sdk
      11       8      176       11      active sync   /dev/evms/.nodes/sdl
      12       8      192       12      active sync   /dev/evms/.nodes/sdm
      13       8      208       13      active sync   /dev/evms/.nodes/sdn
      14       8      224       14      active sync   /dev/evms/.nodes/sdo
      15       8      240       15      active sync   /dev/evms/.nodes/sdp
      16      65        0       16      active sync   /dev/evms/.nodes/sdq
      17      65       16       17      active sync   /dev/evms/.nodes/sdr
      18      65       32       18      active sync   /dev/evms/.nodes/sds
      19      65       48       19      active sync   /dev/evms/.nodes/sdt
      20      65       64       20      active sync   /dev/evms/.nodes/sdu
      21      65       80       21      active sync   /dev/evms/.nodes/sdv
      22      65       96       22      active sync   /dev/evms/.nodes/sdw
      23      65      112       23      active sync   /dev/evms/.nodes/sdx
      24      65      128       24      active sync   /dev/evms/.nodes/sdy
      25      65      144       25      active sync   /dev/evms/.nodes/sdz
      26      65      160       26      active sync   /dev/evms/.nodes/sdaa
      27      65      176       27      active sync   /dev/evms/.nodes/sdab


-- David M. Strang

----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Thursday, July 14, 2005 9:43 PM
Subject: Re: Raid5 Failure


On Thursday July 14, dstrang@shellpower.net wrote:
>
> It looks like the first 'segment of discs' sda->sdm are all marked clean;
> while sdn->sdab are marked active.
>
> What can I do to resolve this issue? Any assistance would be greatly
> appreciated.


Apply the following patch to mdadm-2.0-devel2 (it fixes a few bugs and
particularly make --assemble work) then try:

mdadm -A /dev/md0 /dev/sd[a-z] /dev/sd....

Just list all 28 SCSI devices, I'm not sure what their names are.

This will quite probably fail.
If it does, try again with
   --force

NeilBrown



Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./Assemble.c |   13 ++++++++++++-
 ./Query.c    |   33 +++++++++++++++++++--------------
 ./mdadm.h    |    2 +-
 ./super0.c   |    1 +
 ./super1.c   |    4 ++--
 5 files changed, 35 insertions(+), 18 deletions(-)

diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~ 2005-07-15 10:13:04.000000000 +1000
+++ ./Assemble.c 2005-07-15 10:37:59.000000000 +1000
@@ -473,6 +473,7 @@ int Assemble(struct supertype *st, char
  if (!devices[j].uptodate)
  continue;
  info.disk.number = i;
+ info.disk.raid_disk = i;
  info.disk.state = desired_state;

  if (devices[j].uptodate &&
@@ -526,7 +527,17 @@ int Assemble(struct supertype *st, char

  /* Almost ready to actually *do* something */
  if (!old_linux) {
- if (ioctl(mdfd, SET_ARRAY_INFO, NULL) != 0) {
+ int rv;
+ if ((vers % 100) >= 1) { /* can use different versions */
+ mdu_array_info_t inf;
+ memset(&inf, 0, sizeof(inf));
+ inf.major_version = st->ss->major;
+ inf.minor_version = st->minor_version;
+ rv = ioctl(mdfd, SET_ARRAY_INFO, &inf);
+ } else
+ rv = ioctl(mdfd, SET_ARRAY_INFO, NULL);
+
+ if (rv) {
  fprintf(stderr, Name ": SET_ARRAY_INFO failed for %s: %s\n",
  mddev, strerror(errno));
  return 1;

diff ./Query.c~current~ ./Query.c
--- ./Query.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./Query.c 2005-07-15 11:38:18.000000000 +1000
@@ -105,26 +105,31 @@ int Query(char *dev)
  if (superror == 0) {
  /* array might be active... */
  st->ss->getinfo_super(&info, super);
- mddev = get_md_name(info.array.md_minor);
- disc.number = info.disk.number;
- activity = "undetected";
- if (mddev && (fd = open(mddev, O_RDONLY))>=0) {
- if (md_get_version(fd) >= 9000 &&
-     ioctl(fd, GET_ARRAY_INFO, &array)>= 0) {
- if (ioctl(fd, GET_DISK_INFO, &disc) >= 0 &&
-     makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
- activity = "active";
- else
- activity = "mismatch";
+ if (st->ss->major == 0) {
+ mddev = get_md_name(info.array.md_minor);
+ disc.number = info.disk.number;
+ activity = "undetected";
+ if (mddev && (fd = open(mddev, O_RDONLY))>=0) {
+ if (md_get_version(fd) >= 9000 &&
+     ioctl(fd, GET_ARRAY_INFO, &array)>= 0) {
+ if (ioctl(fd, GET_DISK_INFO, &disc) >= 0 &&
+     makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
+ activity = "active";
+ else
+ activity = "mismatch";
+ }
+ close(fd);
  }
- close(fd);
+ } else {
+ activity = "unknown";
+ mddev = "array";
  }
- printf("%s: device %d in %d device %s %s md%d.  Use mdadm --examine for
more detail.\n",
+ printf("%s: device %d in %d device %s %s %s.  Use mdadm --examine for more
detail.\n",
         dev,
         info.disk.number, info.array.raid_disks,
         activity,
         map_num(pers, info.array.level),
-        info.array.md_minor);
+        mddev);
  }
  return 0;
 }

diff ./mdadm.h~current~ ./mdadm.h
--- ./mdadm.h~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./mdadm.h 2005-07-15 10:15:51.000000000 +1000
@@ -73,7 +73,7 @@ struct mdinfo {
  mdu_array_info_t array;
  mdu_disk_info_t disk;
  __u64 events;
- unsigned int uuid[4];
+ int uuid[4];
 };

 #define Name "mdadm"

diff ./super0.c~current~ ./super0.c
--- ./super0.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./super0.c 2005-07-15 11:27:12.000000000 +1000
@@ -205,6 +205,7 @@ static void getinfo_super0(struct mdinfo
  info->disk.major = sb->this_disk.major;
  info->disk.minor = sb->this_disk.minor;
  info->disk.raid_disk = sb->this_disk.raid_disk;
+ info->disk.number = sb->this_disk.number;

  info->events = md_event(sb);


diff ./super1.c~current~ ./super1.c
--- ./super1.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./super1.c 2005-07-15 11:25:04.000000000 +1000
@@ -278,7 +278,7 @@ static void getinfo_super1(struct mdinfo

  info->disk.major = 0;
  info->disk.minor = 0;
-
+ info->disk.number = __le32_to_cpu(sb->dev_number);
  if (__le32_to_cpu(sb->dev_number) >= __le32_to_cpu(sb->max_dev) ||
      __le32_to_cpu(sb->max_dev) > 512)
  role = 0xfffe;
@@ -303,7 +303,7 @@ static void getinfo_super1(struct mdinfo

  for (i=0; i< __le32_to_cpu(sb->max_dev); i++) {
  role = __le16_to_cpu(sb->dev_roles[i]);
- if (role == 0xFFFF || role < info->array.raid_disks)
+ if (/*role == 0xFFFF || */role < info->array.raid_disks)
  working++;
  }

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-17 15:44 Raid5 Failure David M. Strang
@ 2005-07-17 22:05 ` Neil Brown
  2005-07-17 23:15   ` David M. Strang
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-17 22:05 UTC (permalink / raw)
  To: David M. Strang; +Cc: linux-raid

On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> 
> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0 
> superblocks.
> 
> What, if anything, can I do since I'm using a version 1.0
> superblock?

Use a newer mdadm.  This works with v2.0-devel-2 (I just checked).

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-17 22:05 ` Neil Brown
@ 2005-07-17 23:15   ` David M. Strang
  2005-07-18  0:05     ` Tyler
  2005-07-18  0:06     ` Neil Brown
  0 siblings, 2 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-17 23:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil -

-(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
-(root@abyss)-(/)- # mdadm --version
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
-(root@abyss)-(/)- #

Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0 
superblocks.

I'm using the devel-2 version, with the patch you posted previously.

-- David M. Strang

----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 6:05 PM
Subject: Re: Raid5 Failure


On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
>
> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> What, if anything, can I do since I'm using a version 1.0
> superblock?

Use a newer mdadm.  This works with v2.0-devel-2 (I just checked).

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-17 23:15   ` David M. Strang
@ 2005-07-18  0:05     ` Tyler
  2005-07-18  0:23       ` David M. Strang
  2005-07-18  0:06     ` Neil Brown
  1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18  0:05 UTC (permalink / raw)
  To: David M. Strang; +Cc: Neil Brown, linux-raid

Try it with -a or --add, not -add, also, you don't need the --manage bit.

Regards,
Tyler.

David M. Strang wrote:

> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 
> July 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with 
> version-0 superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
>
> -- David M. Strang
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 6:05 PM
> Subject: Re: Raid5 Failure
>
>
> On Sunday July 17, dstrang@shellpower.net wrote:
>
>> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
>> mdadm: hot add failed for /dev/sdaa: Invalid argument
>>
>> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with 
>> version-0
>> superblocks.
>>
>> What, if anything, can I do since I'm using a version 1.0
>> superblock?
>
>
> Use a newer mdadm.  This works with v2.0-devel-2 (I just checked).
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-17 23:15   ` David M. Strang
  2005-07-18  0:05     ` Tyler
@ 2005-07-18  0:06     ` Neil Brown
  2005-07-18  0:52       ` David M. Strang
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-18  0:06 UTC (permalink / raw)
  To: David M. Strang; +Cc: linux-raid

On Sunday July 17, dstrang@shellpower.net wrote:
> Neil -
> 
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
> -(root@abyss)-(/)- #
> 
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0 
> superblocks.
> 
> I'm using the devel-2 version, with the patch you posted previously.

That's really odd, because the only time mdadm-2 uses HOT_ADD_DISK is
inside an
  if (array.major_version == 0)
statement.

Are you any good with 'gdb'?
Could you try running mdadm under gdb, put a break point at
'Manage_subdevs', then step through from there and see what happens?

Print the value of 'array' after the GET_ARRAY_INFO ioctl, and then
keep stepping through until the error occurs..

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  0:05     ` Tyler
@ 2005-07-18  0:23       ` David M. Strang
  0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18  0:23 UTC (permalink / raw)
  To: Tyler; +Cc: Neil Brown, linux-raid

-(root@abyss)-(/)- # mdadm -a /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
-(root@abyss)-(/)- # mdadm --add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument


Jul 17 20:21:42 abyss kernel: md0: HOT_ADD may only be used with version-0 
superblocks.
Jul 17 20:22:05 abyss kernel: md0: HOT_ADD may only be used with version-0 
superblocks.



Still no go with -a or -add.

-- David M. Strang
----- Original Message ----- 
From: Tyler
To: David M. Strang
Cc: Neil Brown ; linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 8:05 PM
Subject: Re: Raid5 Failure


Try it with -a or --add, not -add, also, you don't need the --manage bit.

Regards,
Tyler.

David M. Strang wrote:

> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 
> 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0 
> superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
>
> -- David M. Strang
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 6:05 PM
> Subject: Re: Raid5 Failure
>
>
> On Sunday July 17, dstrang@shellpower.net wrote:
>
>> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
>> mdadm: hot add failed for /dev/sdaa: Invalid argument
>>
>> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with 
>> version-0
>> superblocks.
>>
>> What, if anything, can I do since I'm using a version 1.0
>> superblock?
>
>
> Use a newer mdadm.  This works with v2.0-devel-2 (I just checked).
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  0:06     ` Neil Brown
@ 2005-07-18  0:52       ` David M. Strang
  2005-07-18  1:06         ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-18  0:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

I'm not real good with GDB... but I'm giving it a shot.

(gdb) run -a /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7, 
devlist=0x8067018) at Manage.c:174
174             void *dsuper = NULL;
(gdb) c
Continuing.
mdadm: hot add failed for /dev/sdaa: Invalid argument

Program exited with code 01.
(gdb)


-- David M. Strang

----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 8:06 PM
Subject: Re: Raid5 Failure


On Sunday July 17, dstrang@shellpower.net wrote:
> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 
> 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.

That's really odd, because the only time mdadm-2 uses HOT_ADD_DISK is
inside an
  if (array.major_version == 0)
statement.

Are you any good with 'gdb'?
Could you try running mdadm under gdb, put a break point at
'Manage_subdevs', then step through from there and see what happens?

Print the value of 'array' after the GET_ARRAY_INFO ioctl, and then
keep stepping through until the error occurs..

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  0:52       ` David M. Strang
@ 2005-07-18  1:06         ` Neil Brown
  2005-07-18  1:26           ` David M. Strang
       [not found]           ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
  0 siblings, 2 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-18  1:06 UTC (permalink / raw)
  To: David M. Strang; +Cc: linux-raid

On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
> 
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
> 
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7, 
> devlist=0x8067018) at Manage.c:174
> 174             void *dsuper = NULL;
> (gdb) c

At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
 176          	  if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
  print array
to print the 'array' structure.
Then continue with 'n' repeatedly.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  1:06         ` Neil Brown
@ 2005-07-18  1:26           ` David M. Strang
  2005-07-18  1:31             ` David M. Strang
       [not found]           ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
  1 sibling, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-18  1:26 UTC (permalink / raw)
  Cc: linux-raid

-(root@abyss)-(~)- # gdb mdadm
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

(gdb) b 'Manage_subdevs'
Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
(gdb) run --manage --add /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7, 
devlist=0x8067018) at Manage.c:174
174             void *dsuper = NULL;
(gdb) n
176             if (ioctl(fd, GET_ARRAY_INFO, &array)) {
(gdb) n
181             for (dv = devlist ; dv; dv=dv->next) {
(gdb) n
182                     if (stat(dv->devname, &stb)) {
(gdb) n
187                     if ((stb.st_mode & S_IFMT) != S_IFBLK) {
(gdb) n
192                     switch(dv->disposition){
(gdb) n
200                             tfd = open(dv->devname, O_RDONLY|O_EXCL);
(gdb) n
201                             if (tfd < 0) {
(gdb) n
206                             close(tfd);
(gdb) n
210                                     if (md_get_version(fd)%100 < 2) {
(gdb) n
212                                     if (ioctl(fd, HOT_ADD_DISK,
(gdb) n
219                                     fprintf(stderr, Name ": hot add 
failed for %s: %s\n",
(gdb) n
mdadm: hot add failed for /dev/sdaa: Invalid argument
221                                     return 1;
(gdb) n
307     }
(gdb) n
main (argc=5, argv=0xbfe0cb14) at mdadm.c:810
810                     if (!rv && readonly < 0)
(gdb) n
812                     if (!rv && runstop)
(gdb) n
1072            exit(rv);
(gdb) n

Program exited with code 01.

-- David M. Strang
----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:06 PM
Subject: Re: Raid5 Failure


On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
>
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
> devlist=0x8067018) at Manage.c:174
> 174             void *dsuper = NULL;
> (gdb) c

At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
 176            if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
  print array
to print the 'array' structure.
Then continue with 'n' repeatedly.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  1:26           ` David M. Strang
@ 2005-07-18  1:31             ` David M. Strang
  0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18  1:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Oops -

forgot this part:

(gdb) print array
$1 = {major_version = -1208614704, minor_version = 134596176, patch_version 
= -1079477788, ctime = -1209047397, level = 5, size = -1079477052, nr_disks 
= 134623392, raid_disks = 134623456,
  md_minor = -1079477212, not_persistent = 0, utime = -1208577344, state 
= -1208582100, active_disks = -1208448832, working_disks = -1079477752, 
failed_disks = -1209047199, spare_disks = 5,
  layout = -1079477052, chunk_size = 134623392}

-- David M. Strang
----- Original Message ----- 
From: David M. Strang
To: unlisted-recipients: ; no To-header on input
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:26 PM
Subject: Re: Raid5 Failure


-(root@abyss)-(~)- # gdb mdadm
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

(gdb) b 'Manage_subdevs'
Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
(gdb) run --manage --add /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7,
devlist=0x8067018) at Manage.c:174
174             void *dsuper = NULL;
(gdb) n
176             if (ioctl(fd, GET_ARRAY_INFO, &array)) {
(gdb) n
181             for (dv = devlist ; dv; dv=dv->next) {
(gdb) n
182                     if (stat(dv->devname, &stb)) {
(gdb) n
187                     if ((stb.st_mode & S_IFMT) != S_IFBLK) {
(gdb) n
192                     switch(dv->disposition){
(gdb) n
200                             tfd = open(dv->devname, O_RDONLY|O_EXCL);
(gdb) n
201                             if (tfd < 0) {
(gdb) n
206                             close(tfd);
(gdb) n
210                                     if (md_get_version(fd)%100 < 2) {
(gdb) n
212                                     if (ioctl(fd, HOT_ADD_DISK,
(gdb) n
219                                     fprintf(stderr, Name ": hot add
failed for %s: %s\n",
(gdb) n
mdadm: hot add failed for /dev/sdaa: Invalid argument
221                                     return 1;
(gdb) n
307     }
(gdb) n
main (argc=5, argv=0xbfe0cb14) at mdadm.c:810
810                     if (!rv && readonly < 0)
(gdb) n
812                     if (!rv && runstop)
(gdb) n
1072            exit(rv);
(gdb) n

Program exited with code 01.

-- David M. Strang
----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:06 PM
Subject: Re: Raid5 Failure


On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
>
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
> devlist=0x8067018) at Manage.c:174
> 174             void *dsuper = NULL;
> (gdb) c

At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
 176            if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
  print array
to print the 'array' structure.
Then continue with 'n' repeatedly.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
       [not found]           ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
@ 2005-07-18  1:33             ` Neil Brown
  2005-07-18  1:46               ` David M. Strang
  2005-07-18  2:09               ` bug report: mdadm-devel-2 , superblock version 1 Tyler
  0 siblings, 2 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-18  1:33 UTC (permalink / raw)
  To: David M. Strang; +Cc: linux-raid

On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(~)- # gdb mdadm
> GNU gdb 6.2
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
> 
> (gdb) b 'Manage_subdevs'
> Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
> (gdb) run --manage --add /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
> 
> Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7, devlist=0x8067018) at Manage.c:174
> 174             void *dsuper = NULL;
> (gdb) n
> 176             if (ioctl(fd, GET_ARRAY_INFO, &array)) {
> (gdb) n
> 181             for (dv = devlist ; dv; dv=dv->next) {
> (gdb) n
> 182                     if (stat(dv->devname, &stb)) {
> (gdb) n
> 187                     if ((stb.st_mode & S_IFMT) != S_IFBLK) {
> (gdb) n
> 192                     switch(dv->disposition){
> (gdb) n
> 200                             tfd = open(dv->devname, O_RDONLY|O_EXCL);
> (gdb) n
> 201                             if (tfd < 0) {
> (gdb) n
> 206                             close(tfd);
> (gdb) n
> 210                                     if (md_get_version(fd)%100 <
> 2) {

Ahhhh...  I cannot read my own code, that is the problem!!

This patch should fix it.

Thanks for persisting.

NeilBrown

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./Manage.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~	2005-07-07 09:19:53.000000000 +1000
+++ ./Manage.c	2005-07-18 11:31:57.000000000 +1000
@@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
 				return 1;
 			}
 			close(tfd);
-#if 0
-			if (array.major_version == 0) {
-#else
-				if (md_get_version(fd)%100 < 2) {
-#endif
+			if (array.major_version == 0 &&
+			    md_get_version(fd)%100 < 2) {
 				if (ioctl(fd, HOT_ADD_DISK,
 					  (unsigned long)stb.st_rdev)==0) {
 					fprintf(stderr, Name ": hot added %s\n",

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  1:33             ` Neil Brown
@ 2005-07-18  1:46               ` David M. Strang
  2005-07-18  2:10                 ` Tyler
  2005-07-18  2:15                 ` Neil Brown
  2005-07-18  2:09               ` bug report: mdadm-devel-2 , superblock version 1 Tyler
  1 sibling, 2 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18  1:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil --

That worked, the device has been added to the array. Now, I think the next 
problem is my own ignorance.

-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.01
  Creation Time : Wed Dec 31 19:00:00 1969
     Raid Level : raid5
     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
    Device Size : 71687296 (68.37 GiB 73.41 GB)
   Raid Devices : 28
  Total Devices : 28
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Jul 17 17:32:12 2005
          State : clean, degraded
 Active Devices : 27
Working Devices : 28
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-asymmetric
     Chunk Size : 128K

           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
         Events : 176939

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/evms/.nodes/sda
       1       8       16        1      active sync   /dev/evms/.nodes/sdb
       2       8       32        2      active sync   /dev/evms/.nodes/sdc
       3       8       48        3      active sync   /dev/evms/.nodes/sdd
       4       8       64        4      active sync   /dev/evms/.nodes/sde
       5       8       80        5      active sync   /dev/evms/.nodes/sdf
       6       8       96        6      active sync   /dev/evms/.nodes/sdg
       7       8      112        7      active sync   /dev/evms/.nodes/sdh
       8       8      128        8      active sync   /dev/evms/.nodes/sdi
       9       8      144        9      active sync   /dev/evms/.nodes/sdj
      10       8      160       10      active sync   /dev/evms/.nodes/sdk
      11       8      176       11      active sync   /dev/evms/.nodes/sdl
      12       8      192       12      active sync   /dev/evms/.nodes/sdm
      13       8      208       13      active sync   /dev/evms/.nodes/sdn
      14       8      224       14      active sync   /dev/evms/.nodes/sdo
      15       8      240       15      active sync   /dev/evms/.nodes/sdp
      16      65        0       16      active sync   /dev/evms/.nodes/sdq
      17      65       16       17      active sync   /dev/evms/.nodes/sdr
      18      65       32       18      active sync   /dev/evms/.nodes/sds
      19      65       48       19      active sync   /dev/evms/.nodes/sdt
      20      65       64       20      active sync   /dev/evms/.nodes/sdu
      21      65       80       21      active sync   /dev/evms/.nodes/sdv
      22      65       96       22      active sync   /dev/evms/.nodes/sdw
      23      65      112       23      active sync   /dev/evms/.nodes/sdx
      24      65      128       24      active sync   /dev/evms/.nodes/sdy
      25      65      144       25      active sync   /dev/evms/.nodes/sdz
      26       0        0        -      removed
      27      65      176       27      active sync   /dev/evms/.nodes/sdab

      28      65      160        -      spare   /dev/evms/.nodes/sdaa


I've got 28 devices, 1 spare, 27 active. I'm still running as clean, 
degraded.

What do I do next? What I wanted to do was to put /dev/sdaa back in as 
device 26, but now it's device 28 - and flagged as spare. How do I make it 
active in the array again?

-- David M. Strang


----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:33 PM
Subject: Re: Raid5 Failure

Ahhhh...  I cannot read my own code, that is the problem!!

This patch should fix it.

Thanks for persisting.

NeilBrown

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./Manage.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
@@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
  return 1;
  }
  close(tfd);
-#if 0
- if (array.major_version == 0) {
-#else
- if (md_get_version(fd)%100 < 2) {
-#endif
+ if (array.major_version == 0 &&
+     md_get_version(fd)%100 < 2) {
  if (ioctl(fd, HOT_ADD_DISK,
    (unsigned long)stb.st_rdev)==0) {
  fprintf(stderr, Name ": hot added %s\n",
- 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* bug report: mdadm-devel-2 , superblock version 1
  2005-07-18  1:33             ` Neil Brown
  2005-07-18  1:46               ` David M. Strang
@ 2005-07-18  2:09               ` Tyler
  2005-07-18  2:19                 ` Tyler
  2005-07-25  0:36                 ` Neil Brown
  1 sibling, 2 replies; 22+ messages in thread
From: Tyler @ 2005-07-18  2:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

# uname -a
Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
# ./mdadm -V
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1

** stop the current array

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -C /dev/md1 -l 5 
--raid-devices 3 -e 1 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm: /dev/sda2 appears to be part of a raid array:
    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
mdadm: /dev/sdb2 appears to be part of a raid array:
    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
mdadm: /dev/sdc2 appears to be part of a raid array:
    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
Continue creating array? y
mdadm: array /dev/md1 started.

** create and start new array using 3 drives, superblock version 1

root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
      128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

** mdstat mostly okay, except sdc2 is listed as device3 instead of 
device2 (from 0,1,2)

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
/dev/md1:
        Version : 01.00.01
  Creation Time : Mon Jul 18 03:56:40 2005
     Raid Level : raid5
     Array Size : 128384 (125.40 MiB 131.47 MB)
    Device Size : 64192 (62.70 MiB 65.73 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jul 18 03:56:42 2005
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
         Events : 1

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/.static/dev/sda2
       1       8       18        1      active sync   /dev/.static/dev/sdb2
       2       0        0        -      removed

       3       8       34        2      active sync   /dev/.static/dev/sdc2

** reports version 01.00.01 superblock, but reports as if there were 4 
devices used

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1

** stop the array

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
Segmentation fault

** try to assemble the array

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
mdadm: md device /dev/md1 does not appear to be active.

** check if its active at all

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2 
/dev/sdb2 /dev/sdc2
mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2 
seems ok
mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2 
seems ok
mdadm: /dev/md1 has been started with 3 drives.

** try restarting it with drive details, and it starts

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Mon Jul 18 02:53:55 2005
     Raid Level : raid5
     Array Size : 128384 (125.40 MiB 131.47 MB)
    Device Size : 64192 (62.70 MiB 65.73 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Jul 18 02:53:57 2005
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
         Events : 0.2

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/.static/dev/sda2
       1       8       18        1      active sync   /dev/.static/dev/sdb2
       2       8       34        2      active sync   /dev/.static/dev/sdc2

** magically, we now have a v00.90.01 superblock, it reports the proper 
list of drives

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
Segmentation fault

** try to stop and restart again, doesn't work

root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
/dev/sda2:
          Magic : a92b4efc
        Version : 01.00
     Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
           Name :
  Creation Time : Mon Jul 18 03:56:40 2005
     Raid Level : raid5
   Raid Devices : 3

    Device Size : 128504 (62.76 MiB 65.79 MB)
   Super Offset : 128504 sectors
          State : clean
    Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
    Update Time : Mon Jul 18 03:56:42 2005
       Checksum : 903062ed - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

   Array State : Uuu 1 failed

** the drives themselves still report a version 1 superblock... wierd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  1:46               ` David M. Strang
@ 2005-07-18  2:10                 ` Tyler
  2005-07-18  2:12                   ` David M. Strang
  2005-07-18  2:15                 ` Neil Brown
  1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18  2:10 UTC (permalink / raw)
  To: David M. Strang; +Cc: Neil Brown, linux-raid

If you cat /proc/mdstat, it should show the array resyncing... when its 
done, it will put the drive back as device 26.

Tyler.

David M. Strang wrote:

> Neil --
>
> That worked, the device has been added to the array. Now, I think the 
> next problem is my own ignorance.
>
> -(root@abyss)-(/)- # mdadm --detail /dev/md0
> /dev/md0:
>        Version : 01.00.01
>  Creation Time : Wed Dec 31 19:00:00 1969
>     Raid Level : raid5
>     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
>    Device Size : 71687296 (68.37 GiB 73.41 GB)
>   Raid Devices : 28
>  Total Devices : 28
> Preferred Minor : 0
>    Persistence : Superblock is persistent
>
>    Update Time : Sun Jul 17 17:32:12 2005
>          State : clean, degraded
> Active Devices : 27
> Working Devices : 28
> Failed Devices : 0
>  Spare Devices : 1
>
>         Layout : left-asymmetric
>     Chunk Size : 128K
>
>           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
>         Events : 176939
>
>    Number   Major   Minor   RaidDevice State
>       0       8        0        0      active sync   /dev/evms/.nodes/sda
>       1       8       16        1      active sync   /dev/evms/.nodes/sdb
>       2       8       32        2      active sync   /dev/evms/.nodes/sdc
>       3       8       48        3      active sync   /dev/evms/.nodes/sdd
>       4       8       64        4      active sync   /dev/evms/.nodes/sde
>       5       8       80        5      active sync   /dev/evms/.nodes/sdf
>       6       8       96        6      active sync   /dev/evms/.nodes/sdg
>       7       8      112        7      active sync   /dev/evms/.nodes/sdh
>       8       8      128        8      active sync   /dev/evms/.nodes/sdi
>       9       8      144        9      active sync   /dev/evms/.nodes/sdj
>      10       8      160       10      active sync   /dev/evms/.nodes/sdk
>      11       8      176       11      active sync   /dev/evms/.nodes/sdl
>      12       8      192       12      active sync   /dev/evms/.nodes/sdm
>      13       8      208       13      active sync   /dev/evms/.nodes/sdn
>      14       8      224       14      active sync   /dev/evms/.nodes/sdo
>      15       8      240       15      active sync   /dev/evms/.nodes/sdp
>      16      65        0       16      active sync   /dev/evms/.nodes/sdq
>      17      65       16       17      active sync   /dev/evms/.nodes/sdr
>      18      65       32       18      active sync   /dev/evms/.nodes/sds
>      19      65       48       19      active sync   /dev/evms/.nodes/sdt
>      20      65       64       20      active sync   /dev/evms/.nodes/sdu
>      21      65       80       21      active sync   /dev/evms/.nodes/sdv
>      22      65       96       22      active sync   /dev/evms/.nodes/sdw
>      23      65      112       23      active sync   /dev/evms/.nodes/sdx
>      24      65      128       24      active sync   /dev/evms/.nodes/sdy
>      25      65      144       25      active sync   /dev/evms/.nodes/sdz
>      26       0        0        -      removed
>      27      65      176       27      active sync   
> /dev/evms/.nodes/sdab
>
>      28      65      160        -      spare   /dev/evms/.nodes/sdaa
>
>
> I've got 28 devices, 1 spare, 27 active. I'm still running as clean, 
> degraded.
>
> What do I do next? What I wanted to do was to put /dev/sdaa back in as 
> device 26, but now it's device 28 - and flagged as spare. How do I 
> make it active in the array again?
>
> -- David M. Strang
>
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 9:33 PM
> Subject: Re: Raid5 Failure
>
> Ahhhh...  I cannot read my own code, that is the problem!!
>
> This patch should fix it.
>
> Thanks for persisting.
>
> NeilBrown
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
> ./Manage.c |    7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff ./Manage.c~current~ ./Manage.c
> --- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
> +++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
> @@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
>  return 1;
>  }
>  close(tfd);
> -#if 0
> - if (array.major_version == 0) {
> -#else
> - if (md_get_version(fd)%100 < 2) {
> -#endif
> + if (array.major_version == 0 &&
> +     md_get_version(fd)%100 < 2) {
>  if (ioctl(fd, HOT_ADD_DISK,
>    (unsigned long)stb.st_rdev)==0) {
>  fprintf(stderr, Name ": hot added %s\n",
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  2:10                 ` Tyler
@ 2005-07-18  2:12                   ` David M. Strang
  0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18  2:12 UTC (permalink / raw)
  To: Tyler; +Cc: Neil Brown, linux-raid

-(root@abyss)-(/)- # cat /proc/mdstat
Personalities : [raid5] [multipath]
md0 : active raid5 sdaa[28] sda[0] sdab[27] sdz[25] sdy[24] sdx[23] sdw[22] 
sdv[21] sdu[20] sdt[19] sds[18] sdr[17] sdq[16] sdp[15] sdo[14] sdn[13] 
sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] 
sdc[2] sdb[1]
      1935556992 blocks level 5, 128k chunk, algorithm 0 [28/27] 
[UUUUUUUUUUUUUUUUUUUUUUUUUU_U]

unused devices: <none>

Do I need to do anything else to 'kick it off' ?

-- David M. Strang

----- Original Message ----- 
From: Tyler
To: David M. Strang
Cc: Neil Brown ; linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 10:10 PM
Subject: Re: Raid5 Failure


If you cat /proc/mdstat, it should show the array resyncing... when its
done, it will put the drive back as device 26.

Tyler.

David M. Strang wrote:

> Neil --
>
> That worked, the device has been added to the array. Now, I think the next 
> problem is my own ignorance.
>
> -(root@abyss)-(/)- # mdadm --detail /dev/md0
> /dev/md0:
>        Version : 01.00.01
>  Creation Time : Wed Dec 31 19:00:00 1969
>     Raid Level : raid5
>     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
>    Device Size : 71687296 (68.37 GiB 73.41 GB)
>   Raid Devices : 28
>  Total Devices : 28
> Preferred Minor : 0
>    Persistence : Superblock is persistent
>
>    Update Time : Sun Jul 17 17:32:12 2005
>          State : clean, degraded
> Active Devices : 27
> Working Devices : 28
> Failed Devices : 0
>  Spare Devices : 1
>
>         Layout : left-asymmetric
>     Chunk Size : 128K
>
>           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
>         Events : 176939
>
>    Number   Major   Minor   RaidDevice State
>       0       8        0        0      active sync   /dev/evms/.nodes/sda
>       1       8       16        1      active sync   /dev/evms/.nodes/sdb
>       2       8       32        2      active sync   /dev/evms/.nodes/sdc
>       3       8       48        3      active sync   /dev/evms/.nodes/sdd
>       4       8       64        4      active sync   /dev/evms/.nodes/sde
>       5       8       80        5      active sync   /dev/evms/.nodes/sdf
>       6       8       96        6      active sync   /dev/evms/.nodes/sdg
>       7       8      112        7      active sync   /dev/evms/.nodes/sdh
>       8       8      128        8      active sync   /dev/evms/.nodes/sdi
>       9       8      144        9      active sync   /dev/evms/.nodes/sdj
>      10       8      160       10      active sync   /dev/evms/.nodes/sdk
>      11       8      176       11      active sync   /dev/evms/.nodes/sdl
>      12       8      192       12      active sync   /dev/evms/.nodes/sdm
>      13       8      208       13      active sync   /dev/evms/.nodes/sdn
>      14       8      224       14      active sync   /dev/evms/.nodes/sdo
>      15       8      240       15      active sync   /dev/evms/.nodes/sdp
>      16      65        0       16      active sync   /dev/evms/.nodes/sdq
>      17      65       16       17      active sync   /dev/evms/.nodes/sdr
>      18      65       32       18      active sync   /dev/evms/.nodes/sds
>      19      65       48       19      active sync   /dev/evms/.nodes/sdt
>      20      65       64       20      active sync   /dev/evms/.nodes/sdu
>      21      65       80       21      active sync   /dev/evms/.nodes/sdv
>      22      65       96       22      active sync   /dev/evms/.nodes/sdw
>      23      65      112       23      active sync   /dev/evms/.nodes/sdx
>      24      65      128       24      active sync   /dev/evms/.nodes/sdy
>      25      65      144       25      active sync   /dev/evms/.nodes/sdz
>      26       0        0        -      removed
>      27      65      176       27      active sync   /dev/evms/.nodes/sdab
>
>      28      65      160        -      spare   /dev/evms/.nodes/sdaa
>
>
> I've got 28 devices, 1 spare, 27 active. I'm still running as clean, 
> degraded.
>
> What do I do next? What I wanted to do was to put /dev/sdaa back in as 
> device 26, but now it's device 28 - and flagged as spare. How do I make it 
> active in the array again?
>
> -- David M. Strang
>
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 9:33 PM
> Subject: Re: Raid5 Failure
>
> Ahhhh...  I cannot read my own code, that is the problem!!
>
> This patch should fix it.
>
> Thanks for persisting.
>
> NeilBrown
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
> ./Manage.c |    7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff ./Manage.c~current~ ./Manage.c
> --- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
> +++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
> @@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
>  return 1;
>  }
>  close(tfd);
> -#if 0
> - if (array.major_version == 0) {
> -#else
> - if (md_get_version(fd)%100 < 2) {
> -#endif
> + if (array.major_version == 0 &&
> +     md_get_version(fd)%100 < 2) {
>  if (ioctl(fd, HOT_ADD_DISK,
>    (unsigned long)stb.st_rdev)==0) {
>  fprintf(stderr, Name ": hot added %s\n",
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  1:46               ` David M. Strang
  2005-07-18  2:10                 ` Tyler
@ 2005-07-18  2:15                 ` Neil Brown
  2005-07-18  2:24                   ` David M. Strang
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-18  2:15 UTC (permalink / raw)
  To: David M. Strang; +Cc: linux-raid

On Sunday July 17, dstrang@shellpower.net wrote:
> Neil --
> 
> That worked, the device has been added to the array. Now, I think the next 
> problem is my own ignorance.

It looks like your kernel is missing the following patch (dated 31st
may 2005).  You're near the bleeding edge working with version-1
superblocks (and I do thank you for being a guinea pig:-) and should
use an ultra-recent kernel if at all possible.

If you don't have the array mounted (or can unmount it safely) then
you might be able to convince it to start the rebuild with by setting
it read-only, then writable.
i.e
  mdadm --readonly /dev/md0
  mdadm --readwrite /dev/md0

alternately stop and re-assemble the array.

NeilBrown



-----------------------

Make sure recovery happens when add_new_disk is used for hot_add

Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    2 ++
 1 files changed, 2 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c	2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
 		err = bind_rdev_to_array(rdev, mddev);
 		if (err)
 			export_rdev(rdev);
+
+		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 		if (mddev->thread)
 			md_wakeup_thread(mddev->thread);
 		return err;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug report: mdadm-devel-2 , superblock version 1
  2005-07-18  2:09               ` bug report: mdadm-devel-2 , superblock version 1 Tyler
@ 2005-07-18  2:19                 ` Tyler
  2005-07-25  0:37                   ` Neil Brown
  2005-07-25  0:36                 ` Neil Brown
  1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18  2:19 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

As a side note, do I need any kernel patches to use the version 1 
superblocks Neil?  That may be what i'm missing.. as the kernel in use 
currently is a vanilla 2.6.12.3.  I'm not trying to use bitmaps or 
anything else at the moment.

Tyler.

Tyler wrote:

> # uname -a
> Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> # ./mdadm -V
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 
> July 2005
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>
> ** stop the current array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -C /dev/md1 -l 5 
> --raid-devices 3 -e 1 /dev/sda2 /dev/sdb2 /dev/sdc2
> mdadm: /dev/sda2 appears to be part of a raid array:
>    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> mdadm: /dev/sdb2 appears to be part of a raid array:
>    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> mdadm: /dev/sdc2 appears to be part of a raid array:
>    level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> Continue creating array? y
> mdadm: array /dev/md1 started.
>
> ** create and start new array using 3 drives, superblock version 1
>
> root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> Personalities : [raid5]
> md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
>      128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
> ** mdstat mostly okay, except sdc2 is listed as device3 instead of 
> device2 (from 0,1,2)
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
>        Version : 01.00.01
>  Creation Time : Mon Jul 18 03:56:40 2005
>     Raid Level : raid5
>     Array Size : 128384 (125.40 MiB 131.47 MB)
>    Device Size : 64192 (62.70 MiB 65.73 MB)
>   Raid Devices : 3
>  Total Devices : 3
> Preferred Minor : 1
>    Persistence : Superblock is persistent
>
>    Update Time : Mon Jul 18 03:56:42 2005
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
>  Spare Devices : 0
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>           UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>         Events : 1
>
>    Number   Major   Minor   RaidDevice State
>       0       8        2        0      active sync   
> /dev/.static/dev/sda2
>       1       8       18        1      active sync   
> /dev/.static/dev/sdb2
>       2       0        0        -      removed
>
>       3       8       34        2      active sync   
> /dev/.static/dev/sdc2
>
> ** reports version 01.00.01 superblock, but reports as if there were 4 
> devices used
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>
> ** stop the array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to assemble the array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> mdadm: md device /dev/md1 does not appear to be active.
>
> ** check if its active at all
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2 
> /dev/sdb2 /dev/sdc2
> mdadm: device 1 in /dev/md1 has wrong state in superblock, but 
> /dev/sdb2 seems ok
> mdadm: device 2 in /dev/md1 has wrong state in superblock, but 
> /dev/sdc2 seems ok
> mdadm: /dev/md1 has been started with 3 drives.
>
> ** try restarting it with drive details, and it starts
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
>        Version : 00.90.01
>  Creation Time : Mon Jul 18 02:53:55 2005
>     Raid Level : raid5
>     Array Size : 128384 (125.40 MiB 131.47 MB)
>    Device Size : 64192 (62.70 MiB 65.73 MB)
>   Raid Devices : 3
>  Total Devices : 3
> Preferred Minor : 1
>    Persistence : Superblock is persistent
>
>    Update Time : Mon Jul 18 02:53:57 2005
>          State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
>  Spare Devices : 0
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>           UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
>         Events : 0.2
>
>    Number   Major   Minor   RaidDevice State
>       0       8        2        0      active sync   
> /dev/.static/dev/sda2
>       1       8       18        1      active sync   
> /dev/.static/dev/sdb2
>       2       8       34        2      active sync   
> /dev/.static/dev/sdc2
>
> ** magically, we now have a v00.90.01 superblock, it reports the 
> proper list of drives
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to stop and restart again, doesn't work
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
> /dev/sda2:
>          Magic : a92b4efc
>        Version : 01.00
>     Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>           Name :
>  Creation Time : Mon Jul 18 03:56:40 2005
>     Raid Level : raid5
>   Raid Devices : 3
>
>    Device Size : 128504 (62.76 MiB 65.79 MB)
>   Super Offset : 128504 sectors
>          State : clean
>    Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>    Update Time : Mon Jul 18 03:56:42 2005
>       Checksum : 903062ed - correct
>         Events : 1
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>   Array State : Uuu 1 failed
>
> ** the drives themselves still report a version 1 superblock... wierd
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Raid5 Failure
  2005-07-18  2:15                 ` Neil Brown
@ 2005-07-18  2:24                   ` David M. Strang
  0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18  2:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil -

I'm using 2.6.12.

-(root@abyss)-(/)- # uname -ar
Linux abyss 2.6.12 #2 SMP Mon Jun 20 22:15:25 EDT 2005 i686 unknown unknown 
GNU/Linux

I unmounted the raid, used the readonly, then readwrite - then remounted the 
raid.

Jul 17 22:19:14 abyss kernel: md: md0 switched to read-only mode.
Jul 17 22:19:23 abyss kernel: md: md0 switched to read-write mode.
Jul 17 22:19:23 abyss kernel: RAID5 conf printout:
Jul 17 22:19:23 abyss kernel:  --- rd:28 wd:27 fd:1
Jul 17 22:19:23 abyss kernel:  disk 0, o:1, dev:sda
Jul 17 22:19:23 abyss kernel:  disk 1, o:1, dev:sdb
Jul 17 22:19:23 abyss kernel:  disk 2, o:1, dev:sdc
Jul 17 22:19:23 abyss kernel:  disk 3, o:1, dev:sdd
Jul 17 22:19:23 abyss kernel:  disk 4, o:1, dev:sde
Jul 17 22:19:23 abyss kernel:  disk 5, o:1, dev:sdf
Jul 17 22:19:23 abyss kernel:  disk 6, o:1, dev:sdg
Jul 17 22:19:23 abyss kernel:  disk 7, o:1, dev:sdh
Jul 17 22:19:23 abyss kernel:  disk 8, o:1, dev:sdi
Jul 17 22:19:23 abyss kernel:  disk 9, o:1, dev:sdj
Jul 17 22:19:23 abyss kernel:  disk 10, o:1, dev:sdk
Jul 17 22:19:23 abyss kernel:  disk 11, o:1, dev:sdl
Jul 17 22:19:23 abyss kernel:  disk 12, o:1, dev:sdm
Jul 17 22:19:23 abyss kernel:  disk 13, o:1, dev:sdn
Jul 17 22:19:23 abyss kernel:  disk 14, o:1, dev:sdo
Jul 17 22:19:23 abyss kernel:  disk 15, o:1, dev:sdp
Jul 17 22:19:23 abyss kernel:  disk 16, o:1, dev:sdq
Jul 17 22:19:23 abyss kernel:  disk 17, o:1, dev:sdr
Jul 17 22:19:23 abyss kernel:  disk 18, o:1, dev:sds
Jul 17 22:19:23 abyss kernel:  disk 19, o:1, dev:sdt
Jul 17 22:19:23 abyss kernel:  disk 20, o:1, dev:sdu
Jul 17 22:19:23 abyss kernel:  disk 21, o:1, dev:sdv
Jul 17 22:19:23 abyss kernel:  disk 22, o:1, dev:sdw
Jul 17 22:19:23 abyss kernel:  disk 23, o:1, dev:sdx
Jul 17 22:19:23 abyss kernel:  disk 24, o:1, dev:sdy
Jul 17 22:19:23 abyss kernel:  disk 25, o:1, dev:sdz
Jul 17 22:19:23 abyss kernel:  disk 26, o:1, dev:sdaa
Jul 17 22:19:23 abyss kernel:  disk 27, o:1, dev:sdab
Jul 17 22:19:23 abyss kernel: .<6>md: syncing RAID array md0
Jul 17 22:19:23 abyss kernel: md: minimum _guaranteed_ reconstruction speed: 
1000 KB/sec/disc.
Jul 17 22:19:23 abyss kernel: md: using maximum available idle IO bandwith 
(but not more than 200000 KB/sec) for reconstruction.
Jul 17 22:19:23 abyss kernel: md: using 128k window, over a total of 
71687296 blocks.


-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
        Version : 01.00.01
  Creation Time : Wed Dec 31 19:00:00 1969
     Raid Level : raid5
     Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
    Device Size : 71687296 (68.37 GiB 73.41 GB)
   Raid Devices : 28
  Total Devices : 28
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Jul 17 22:20:09 2005
          State : clean, degraded, recovering
 Active Devices : 27
Working Devices : 28
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-asymmetric
     Chunk Size : 128K

 Rebuild Status : 0% complete

           UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
         Events : 176947

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/evms/.nodes/sda
       1       8       16        1      active sync   /dev/evms/.nodes/sdb
       2       8       32        2      active sync   /dev/evms/.nodes/sdc
       3       8       48        3      active sync   /dev/evms/.nodes/sdd
       4       8       64        4      active sync   /dev/evms/.nodes/sde
       5       8       80        5      active sync   /dev/evms/.nodes/sdf
       6       8       96        6      active sync   /dev/evms/.nodes/sdg
       7       8      112        7      active sync   /dev/evms/.nodes/sdh
       8       8      128        8      active sync   /dev/evms/.nodes/sdi
       9       8      144        9      active sync   /dev/evms/.nodes/sdj
      10       8      160       10      active sync   /dev/evms/.nodes/sdk
      11       8      176       11      active sync   /dev/evms/.nodes/sdl
      12       8      192       12      active sync   /dev/evms/.nodes/sdm
      13       8      208       13      active sync   /dev/evms/.nodes/sdn
      14       8      224       14      active sync   /dev/evms/.nodes/sdo
      15       8      240       15      active sync   /dev/evms/.nodes/sdp
      16      65        0       16      active sync   /dev/evms/.nodes/sdq
      17      65       16       17      active sync   /dev/evms/.nodes/sdr
      18      65       32       18      active sync   /dev/evms/.nodes/sds
      19      65       48       19      active sync   /dev/evms/.nodes/sdt
      20      65       64       20      active sync   /dev/evms/.nodes/sdu
      21      65       80       21      active sync   /dev/evms/.nodes/sdv
      22      65       96       22      active sync   /dev/evms/.nodes/sdw
      23      65      112       23      active sync   /dev/evms/.nodes/sdx
      24      65      128       24      active sync   /dev/evms/.nodes/sdy
      25      65      144       25      active sync   /dev/evms/.nodes/sdz
      26       0        0        -      removed
      27      65      176       27      active sync   /dev/evms/.nodes/sdab

      28      65      160       26      spare rebuilding 
/dev/evms/.nodes/sdaa


It is re-syncing now. Thanks!

This is from my drivers/md.c - lines 2215->2249.

        if (rdev->faulty) {
                printk(KERN_WARNING
                        "md: can not hot-add faulty %s disk to %s!\n",
                        bdevname(rdev->bdev,b), mdname(mddev));
                err = -EINVAL;
                goto abort_export;
        }
        rdev->in_sync = 0;
        rdev->desc_nr = -1;
        bind_rdev_to_array(rdev, mddev);

        /*
         * The rest should better be atomic, we can have disk failures
         * noticed in interrupt contexts ...
         */

        if (rdev->desc_nr == mddev->max_disks) {
                printk(KERN_WARNING "%s: can not hot-add to full array!\n",
                        mdname(mddev));
                err = -EBUSY;
                goto abort_unbind_export;
        }

        rdev->raid_disk = -1;

        md_update_sb(mddev);

        /*
         * Kick recovery, maybe this spare has to be added to the
         * array immediately.
         */
        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        md_wakeup_thread(mddev->thread);

        return 0;

-- David M. Strang

----- Original Message ----- 
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 10:15 PM
Subject: Re: Raid5 Failure


On Sunday July 17, dstrang@shellpower.net wrote:
> Neil --
>
> That worked, the device has been added to the array. Now, I think the next
> problem is my own ignorance.

It looks like your kernel is missing the following patch (dated 31st
may 2005).  You're near the bleeding edge working with version-1
superblocks (and I do thank you for being a guinea pig:-) and should
use an ultra-recent kernel if at all possible.

If you don't have the array mounted (or can unmount it safely) then
you might be able to convince it to start the rebuild with by setting
it read-only, then writable.
i.e
  mdadm --readonly /dev/md0
  mdadm --readwrite /dev/md0

alternately stop and re-assemble the array.

NeilBrown



-----------------------

Make sure recovery happens when add_new_disk is used for hot_add

Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>

### Diffstat output
 ./drivers/md/md.c |    2 ++
 1 files changed, 2 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c 2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
  err = bind_rdev_to_array(rdev, mddev);
  if (err)
  export_rdev(rdev);
+
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
  if (mddev->thread)
  md_wakeup_thread(mddev->thread);
  return err; 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug report: mdadm-devel-2 , superblock version 1
  2005-07-18  2:09               ` bug report: mdadm-devel-2 , superblock version 1 Tyler
  2005-07-18  2:19                 ` Tyler
@ 2005-07-25  0:36                 ` Neil Brown
  2005-07-25  3:47                   ` Tyler
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-25  0:36 UTC (permalink / raw)
  To: Tyler; +Cc: linux-raid

On Sunday July 17, pml@dtbb.net wrote:
> # uname -a
> Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> # ./mdadm -V
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
> 

...
> root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> Personalities : [raid5]
> md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
>       128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
> unused devices: <none>
> 
> ** mdstat mostly okay, except sdc2 is listed as device3 instead of 

Hmmm, yes....  It is device number 3 in the array, but it is playing
role-2 in the raid5.  When using Version-1 superblocks, we don't moved
devices around, in the "list of all devices".  We just assign them
different roles. (device-N or 'spare').


> device2 (from 0,1,2)
> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
>         Version : 01.00.01
>   Creation Time : Mon Jul 18 03:56:40 2005
>      Raid Level : raid5
>      Array Size : 128384 (125.40 MiB 131.47 MB)
>     Device Size : 64192 (62.70 MiB 65.73 MB)
>    Raid Devices : 3
>   Total Devices : 3
> Preferred Minor : 1
>     Persistence : Superblock is persistent
> 
>     Update Time : Mon Jul 18 03:56:42 2005
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>          Events : 1
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        2        0      active sync   /dev/.static/dev/sda2
>        1       8       18        1      active sync   /dev/.static/dev/sdb2
>        2       0        0        -      removed
> 
>        3       8       34        2      active sync   /dev/.static/dev/sdc2
> 
> ** reports version 01.00.01 superblock, but reports as if there were 4 
> devices used

Ok, this output definitely needs fixing.  But as you can see, there
are 3 devices playing roles (RaidDevice) 0, 1, and 2.  They reside in
slots 0, 1, and 3 of the array.

> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
> 
> ** try to assemble the array

This is not how you assemble an array.  You need to tell mdadm which
component devices to use, either on command line or in /etc/mdadm.conf
(and give --scan).

> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> mdadm: md device /dev/md1 does not appear to be active.
> 
> ** check if its active at all
> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2 
> /dev/sdb2 /dev/sdc2
> mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2 
> seems ok
> mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2 
> seems ok
> mdadm: /dev/md1 has been started with 3 drives.
> 
> ** try restarting it with drive details, and it starts

Those message are a bother though.  I think I know roughly what is
going on.  I'll look into it shortly.

> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
>         Version : 00.90.01
>   Creation Time : Mon Jul 18 02:53:55 2005
>      Raid Level : raid5
>      Array Size : 128384 (125.40 MiB 131.47 MB)
>     Device Size : 64192 (62.70 MiB 65.73 MB)
>    Raid Devices : 3
>   Total Devices : 3
> Preferred Minor : 1
>     Persistence : Superblock is persistent
> 
>     Update Time : Mon Jul 18 02:53:57 2005
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>            UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
>          Events : 0.2
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        2        0      active sync   /dev/.static/dev/sda2
>        1       8       18        1      active sync   /dev/.static/dev/sdb2
>        2       8       34        2      active sync   /dev/.static/dev/sdc2
> 
> ** magically, we now have a v00.90.01 superblock, it reports the proper 
> list of drives

Ahhh...  You have assembled a different array (look at create time too).
version-1 superblocks live at a different location to version-0.90
superblocks.  So it is possible to have both on the one drive.  It is
supposed to pick the newest, but appears not to have done.  You should
really remove old superblocks.... maybe mdadm should do that for you
???

> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
> 
> ** try to stop and restart again, doesn't work

Again, don't do that!

> 
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
> /dev/sda2:
>           Magic : a92b4efc
>         Version : 01.00
>      Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>            Name :
>   Creation Time : Mon Jul 18 03:56:40 2005
>      Raid Level : raid5
>    Raid Devices : 3
> 
>     Device Size : 128504 (62.76 MiB 65.79 MB)
>    Super Offset : 128504 sectors
>           State : clean
>     Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>     Update Time : Mon Jul 18 03:56:42 2005
>        Checksum : 903062ed - correct
>          Events : 1
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>    Array State : Uuu 1 failed
> 
> ** the drives themselves still report a version 1 superblock... wierd

Yeh.  Assemble and Examine should pick the say one by default.  It
appears they don't.  I'll look into it.

Thanks for the very helpful  feedback.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug report: mdadm-devel-2 , superblock version 1
  2005-07-18  2:19                 ` Tyler
@ 2005-07-25  0:37                   ` Neil Brown
  0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-25  0:37 UTC (permalink / raw)
  To: Tyler; +Cc: linux-raid

On Sunday July 17, pml@dtbb.net wrote:
> As a side note, do I need any kernel patches to use the version 1 
> superblocks Neil?  That may be what i'm missing.. as the kernel in use 
> currently is a vanilla 2.6.12.3.  I'm not trying to use bitmaps or 
> anything else at the moment.

version 1 superblock should work mostly in 2.6.12.3.  I'm not 100%
sure no related patches have gone in since 2.6.12 though...

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug report: mdadm-devel-2 , superblock version 1
  2005-07-25  0:36                 ` Neil Brown
@ 2005-07-25  3:47                   ` Tyler
  2005-07-27  2:08                     ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-25  3:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Sunday July 17, pml@dtbb.net wrote:
> 
>># uname -a
>>Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
>># ./mdadm -V
>>mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
>>
> ...
> 
>>root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
>>Personalities : [raid5]
>>md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
>>      128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>>unused devices: <none>
>>
>>** mdstat mostly okay, except sdc2 is listed as device3 instead of 
> 
> Hmmm, yes....  It is device number 3 in the array, but it is playing
> role-2 in the raid5.  When using Version-1 superblocks, we don't moved
> devices around, in the "list of all devices".  We just assign them
> different roles. (device-N or 'spare').
> 

So if I were to add (as an example) 7 spares to a 3 disk raid-5 array, 
and later removed them for use elsewhere, a raid using a v1.x superblock 
would keep a permanent listing of those drives even after being removed? 
   Is there a possibility (for either asthetics, or just keeping things 
easier to read and possibly diagnose at a later date during manual 
recoveries) of adding a command line option to "re-order and remove" old 
devices that are marked as removed, that could only function if the 
array was clean, and non-degraded?  (this would be a manual feature we 
would run, especially if automatically doing this might actually confuse 
us during times of trouble-shooting?)

>>device2 (from 0,1,2)
>>
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>/dev/md1:
>>        Version : 01.00.01
>>  Creation Time : Mon Jul 18 03:56:40 2005
>>     Raid Level : raid5
>>     Array Size : 128384 (125.40 MiB 131.47 MB)
>>    Device Size : 64192 (62.70 MiB 65.73 MB)
>>   Raid Devices : 3
>>  Total Devices : 3
>>Preferred Minor : 1
>>    Persistence : Superblock is persistent
>>
>>    Update Time : Mon Jul 18 03:56:42 2005
>>          State : clean
>> Active Devices : 3
>>Working Devices : 3
>> Failed Devices : 0
>>  Spare Devices : 0
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>           UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>>         Events : 1
>>
>>    Number   Major   Minor   RaidDevice State
>>       0       8        2        0      active sync   /dev/.static/dev/sda2
>>       1       8       18        1      active sync   /dev/.static/dev/sdb2
>>       2       0        0        -      removed
>>
>>       3       8       34        2      active sync   /dev/.static/dev/sdc2
>>
>>** reports version 01.00.01 superblock, but reports as if there were 4 
>>devices used
> 
> Ok, this output definitely needs fixing.  But as you can see, there
> are 3 devices playing roles (RaidDevice) 0, 1, and 2.  They reside in
> slots 0, 1, and 3 of the array.

Depending on your answer to the first question up above, a new question 
based on your comment here comes to mind... if we assume, as you say 
above that it is normal for v1 superblocks to keep old removed drives 
listed, but down here you say the output needs fixing, which output is 
wrong in the example showing 0,1,2,3 devices, with device #2 removed, 
and device 3 acting as raiddevice 2 ?  If the v1 superblocks are 
designed to keep removed drives listed, then the above output makes 
sense.. now that you've pointed out the "feature".

>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
>>Segmentation fault
>>
>>** try to assemble the array
> 
> This is not how you assemble an array.  You need to tell mdadm which
> component devices to use, either on command line or in /etc/mdadm.conf
> (and give --scan).

I failed to mention that I had an up to date mdadm.conf file, with the 
raid UUID in it, and (I will have to verify this) I believe the command 
as I typed it above, works with the 1.12 mdadm.  The mdadm.conf file has 
a DEVICE=/dev/hd[b-z] /dev/sd* line at the beginning of the config file, 
and then the standard options (but no devices= line).  Does -A still 
need *some* options even if the config file is up to date??  (as I said, 
I'll have to verify if 1.12 works with just the -A).

Also, if -A requires some other options on the command line, should it 
not complain, instead of segfaulting? :D

>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>mdadm: md device /dev/md1 does not appear to be active.
>>
>>** check if its active at all
>>
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2 
>>/dev/sdb2 /dev/sdc2
>>mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2 
>>seems ok
>>mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2 
>>seems ok
>>mdadm: /dev/md1 has been started with 3 drives.
>>
>>** try restarting it with drive details, and it starts
> 
> Those message are a bother though.  I think I know roughly what is
> going on.  I'll look into it shortly.

Is this possibly where the v1 superblocks are being mangled, and so it 
reverts back to the v0.90 superblocks that it finds on the disk?

>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>/dev/md1:
>>        Version : 00.90.01
>>  Creation Time : Mon Jul 18 02:53:55 2005
>>     Raid Level : raid5
>>     Array Size : 128384 (125.40 MiB 131.47 MB)
>>    Device Size : 64192 (62.70 MiB 65.73 MB)
>>   Raid Devices : 3
>>  Total Devices : 3
>>Preferred Minor : 1
>>    Persistence : Superblock is persistent
>>
>>    Update Time : Mon Jul 18 02:53:57 2005
>>          State : clean
>> Active Devices : 3
>>Working Devices : 3
>> Failed Devices : 0
>>  Spare Devices : 0
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>           UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
>>         Events : 0.2
>>
>>    Number   Major   Minor   RaidDevice State
>>       0       8        2        0      active sync   /dev/.static/dev/sda2
>>       1       8       18        1      active sync   /dev/.static/dev/sdb2
>>       2       8       34        2      active sync   /dev/.static/dev/sdc2
>>
>>** magically, we now have a v00.90.01 superblock, it reports the proper 
>>list of drives
> 
> Ahhh...  You have assembled a different array (look at create time too).
> version-1 superblocks live at a different location to version-0.90
> superblocks.  So it is possible to have both on the one drive.  It is
> supposed to pick the newest, but appears not to have done.  You should
> really remove old superblocks.... maybe mdadm should do that for you
> ???

*I* didn't assemble a different array... mdadm did ;)  Yes, I agree, if 
you create a *new* raid device, it should erase any form of old 
superblocks, considering that it warns during creating if it detects a 
drive as being part of another array, and prompts for a Y/N continue.

>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
>>Segmentation fault
>>
>>** try to stop and restart again, doesn't work
> 
> Again, don't do that!

Okay.. I will begin using --scan (or short form -s) from now on.. but I 
*swear* <grin> that it worked without scan with the older MDADM, as long 
as you had a valid DEVICE= line in the config file and possibly an ARRAY 
definition also.  Once again though, it shouldn't segfault, but complain 
that it needs other options (and possibly list the options available 
with that command).

A good example of a program that offers such insights when you mistype 
or fail to provide enough options, is smartmontools.. if you type 
"smartctl -t" or "smartctl -t /dev/hda" for example, leaving out the 
*type* of test you wanted it to do, it will then list off the possible 
test options.  If you run "smartctl -t long" but forget a device name to 
run the test on, it will tell you that you need to specify a device, and 
gives an example.

>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
>>/dev/sda2:
>>          Magic : a92b4efc
>>        Version : 01.00
>>     Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>>           Name :
>>  Creation Time : Mon Jul 18 03:56:40 2005
>>     Raid Level : raid5
>>   Raid Devices : 3
>>
>>    Device Size : 128504 (62.76 MiB 65.79 MB)
>>   Super Offset : 128504 sectors
>>          State : clean
>>    Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>>    Update Time : Mon Jul 18 03:56:42 2005
>>       Checksum : 903062ed - correct
>>         Events : 1
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>   Array State : Uuu 1 failed
>>
>>** the drives themselves still report a version 1 superblock... wierd

> Yeh.  Assemble and Examine should pick the say one by default.  It
> appears they don't.  I'll look into it.
> 
> Thanks for the very helpful  feedback.
 >
> NeilBrown

My pleasure Neil.. it was actually quite simple and quick testing, just 
using the last little bit of space left over on 3 drives that were 
slightly larger than the other 5 drives in the main array.

You can email me a patch directly, or to the list, and I can do some 
more testing.  I'd really like to get v1 superblocks going, but haven't 
had much (reliable) luck in testing yet.

Tyler.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug report: mdadm-devel-2 , superblock version 1
  2005-07-25  3:47                   ` Tyler
@ 2005-07-27  2:08                     ` Neil Brown
  0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-27  2:08 UTC (permalink / raw)
  To: Tyler; +Cc: linux-raid

On Sunday July 24, pml@dtbb.net wrote:
> Neil Brown wrote:
> > On Sunday July 17, pml@dtbb.net wrote:
> > 
> >># uname -a
> >>Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> >># ./mdadm -V
> >>mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
> >>
> > ...
> > 
> >>root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> >>Personalities : [raid5]
> >>md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
> >>      128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> >>
> >>unused devices: <none>
> >>
> >>** mdstat mostly okay, except sdc2 is listed as device3 instead of 
> > 
> > Hmmm, yes....  It is device number 3 in the array, but it is playing
> > role-2 in the raid5.  When using Version-1 superblocks, we don't moved
> > devices around, in the "list of all devices".  We just assign them
> > different roles. (device-N or 'spare').
> > 
> 
> So if I were to add (as an example) 7 spares to a 3 disk raid-5 array, 
> and later removed them for use elsewhere, a raid using a v1.x superblock 
> would keep a permanent listing of those drives even after being
> removed? 

No.  A device retains it's slot number only while it is a member of
the array.  Once it is removed from the array it is forgotten.  If it
is re-added, it appears simply as a new drive and is allocated the
first free slot.


>    Is there a possibility (for either asthetics, or just keeping things 
> easier to read and possibly diagnose at a later date during manual 
> recoveries) of adding a command line option to "re-order and remove" old 
> devices that are marked as removed, that could only function if the 
> array was clean, and non-degraded?  (this would be a manual feature we 
> would run, especially if automatically doing this might actually confuse 
> us during times of trouble-shooting?)

I'd rather change the output of mdadm to display the important
information (role in array) more prominently that the less important
info (position in array).
> >>
> >>    Number   Major   Minor   RaidDevice State
> >>       0       8        2        0      active sync   /dev/.static/dev/sda2
> >>       1       8       18        1      active sync   /dev/.static/dev/sdb2
> >>       2       0        0        -      removed
> >>
> >>       3       8       34        2      active sync   /dev/.static/dev/sdc2
> >>
> >>** reports version 01.00.01 superblock, but reports as if there were 4 
> >>devices used
> > 
> > Ok, this output definitely needs fixing.  But as you can see, there
> > are 3 devices playing roles (RaidDevice) 0, 1, and 2.  They reside in
> > slots 0, 1, and 3 of the array.
> 
> Depending on your answer to the first question up above, a new question 
> based on your comment here comes to mind... if we assume, as you say 
> above that it is normal for v1 superblocks to keep old removed drives 
> listed, but down here you say the output needs fixing, which output is 
> wrong in the example showing 0,1,2,3 devices, with device #2 removed, 
> and device 3 acting as raiddevice 2 ?  If the v1 superblocks are 
> designed to keep removed drives listed, then the above output makes 
> sense.. now that you've pointed out the "feature".

It should look more like:

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/.static/dev/sda2
       1       8       18        1      active sync   /dev/.static/dev/sdb2
       3       8       34        2      active sync   /dev/.static/dev/sdc2

i.e. printing something that is 'removed' is pointless.  And the list
should be sorted by 'RaidDevice', not 'Number'.


> 
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> >>Segmentation fault
> >>
> >>** try to assemble the array
> > 
> > This is not how you assemble an array.  You need to tell mdadm which
> > component devices to use, either on command line or in /etc/mdadm.conf
> > (and give --scan).
> 
> I failed to mention that I had an up to date mdadm.conf file, with the 
> raid UUID in it, and (I will have to verify this) I believe the command 
> as I typed it above, works with the 1.12 mdadm.  The mdadm.conf file has 
> a DEVICE=/dev/hd[b-z] /dev/sd* line at the beginning of the config file, 
> and then the standard options (but no devices= line).  Does -A still 
> need *some* options even if the config file is up to date??  (as I said, 
> I'll have to verify if 1.12 works with just the -A).

mdadm won't look at the config file unless you tell it too (with
--scan or --configfile).  At least that is what I intended.



> 
> Also, if -A requires some other options on the command line, should it 
> not complain, instead of segfaulting? :D

Certainly!


> 
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> >>mdadm: md device /dev/md1 does not appear to be active.
> >>
> >>** check if its active at all
> >>
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2 
> >>/dev/sdb2 /dev/sdc2
> >>mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2 
> >>seems ok
> >>mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2 
> >>seems ok
> >>mdadm: /dev/md1 has been started with 3 drives.
> >>
> >>** try restarting it with drive details, and it starts
> > 
> > Those message are a bother though.  I think I know roughly what is
> > going on.  I'll look into it shortly.
> 
> Is this possibly where the v1 superblocks are being mangled, and so it 
> reverts back to the v0.90 superblocks that it finds on the disk?

I'm not sure until I look carefully through the code.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2005-07-27  2:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-17 15:44 Raid5 Failure David M. Strang
2005-07-17 22:05 ` Neil Brown
2005-07-17 23:15   ` David M. Strang
2005-07-18  0:05     ` Tyler
2005-07-18  0:23       ` David M. Strang
2005-07-18  0:06     ` Neil Brown
2005-07-18  0:52       ` David M. Strang
2005-07-18  1:06         ` Neil Brown
2005-07-18  1:26           ` David M. Strang
2005-07-18  1:31             ` David M. Strang
     [not found]           ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
2005-07-18  1:33             ` Neil Brown
2005-07-18  1:46               ` David M. Strang
2005-07-18  2:10                 ` Tyler
2005-07-18  2:12                   ` David M. Strang
2005-07-18  2:15                 ` Neil Brown
2005-07-18  2:24                   ` David M. Strang
2005-07-18  2:09               ` bug report: mdadm-devel-2 , superblock version 1 Tyler
2005-07-18  2:19                 ` Tyler
2005-07-25  0:37                   ` Neil Brown
2005-07-25  0:36                 ` Neil Brown
2005-07-25  3:47                   ` Tyler
2005-07-27  2:08                     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).