* Re: Raid5 Failure
@ 2005-07-17 15:44 David M. Strang
2005-07-17 22:05 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-17 15:44 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
-(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0
superblocks.
What, if anything, can I do since I'm using a version 1.0 superblock?
-- David M. Strang
----- Original Message -----
From: David M. Strang
To: Neil Brown
Cc: linux-raid@vger.kernel.org
Sent: Friday, July 15, 2005 4:25 PM
Subject: Re: Raid5 Failure
Okay, the array rebuilt - but I had 2 failed devices this morning.
/dev/sdj & /dev/sdaa were marked faulty.
Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LOOP DOWN detected.
Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7f7).
Jul 15 01:47:53 abyss kernel: qla2200 0000:00:0d.0: LOOP UP detected (1
Gbps).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f7c3).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7c3).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f775).
Jul 15 01:52:12 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f775).
Jul 15 01:59:50 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f7c3).
Jul 15 01:59:50 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f7c3).
Jul 15 02:00:42 abyss kernel: qla2200 0000:00:0d.0: LIP reset occured
(f776).
Jul 15 02:00:42 abyss kernel: qla2200 0000:00:0d.0: LIP occured (f776).
Jul 15 02:01:17 abyss kernel: rport-2:0-9: blocked FC remote port time out:
removing target
Jul 15 02:01:17 abyss kernel: rport-2:0-26: blocked FC remote port time
out: removing target
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238272
Jul 15 02:01:17 abyss kernel: raid5: Disk failure on sdj, disabling device.
Operation continuing on 27 devices
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239552
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239296
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53239040
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238784
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 9 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdj, sector
53238528
Jul 15 02:01:17 abyss kernel: scsi2 (9:0): rejecting I/O to dead device
Jul 15 02:01:17 abyss kernel: SCSI error : <2 0 26 0> return code = 0x10000
Jul 15 02:01:17 abyss kernel: end_request: I/O error, dev sdaa, sector
53238016
Jul 15 02:01:17 abyss kernel: raid5: Disk failure on sdaa, disabling device.
Operation continuing on 26 devices
I have switched out the QLA2200 controller for a different one; and used:
mdadm -A /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn
/dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv
/dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab -f
to start the array; it is now running.
Jul 15 08:55:51 abyss kernel: md: bind<sdb>
Jul 15 08:55:51 abyss kernel: md: bind<sdc>
Jul 15 08:55:51 abyss kernel: md: bind<sdd>
Jul 15 08:55:51 abyss kernel: md: bind<sde>
Jul 15 08:55:51 abyss kernel: md: bind<sdf>
Jul 15 08:55:51 abyss kernel: md: bind<sdg>
Jul 15 08:55:51 abyss kernel: md: bind<sdh>
Jul 15 08:55:51 abyss kernel: md: bind<sdi>
Jul 15 08:55:51 abyss kernel: md: bind<sdj>
Jul 15 08:55:51 abyss kernel: md: bind<sdk>
Jul 15 08:55:51 abyss kernel: md: bind<sdl>
Jul 15 08:55:51 abyss kernel: md: bind<sdm>
Jul 15 08:55:51 abyss kernel: md: bind<sdn>
Jul 15 08:55:51 abyss kernel: md: bind<sdo>
Jul 15 08:55:51 abyss kernel: md: bind<sdp>
Jul 15 08:55:51 abyss kernel: md: bind<sdq>
Jul 15 08:55:51 abyss kernel: md: bind<sdr>
Jul 15 08:55:51 abyss kernel: md: bind<sds>
Jul 15 08:55:51 abyss kernel: md: bind<sdt>
Jul 15 08:55:51 abyss kernel: md: bind<sdu>
Jul 15 08:55:51 abyss kernel: md: bind<sdv>
Jul 15 08:55:51 abyss kernel: md: bind<sdw>
Jul 15 08:55:51 abyss kernel: md: bind<sdx>
Jul 15 08:55:51 abyss kernel: md: bind<sdy>
Jul 15 08:55:51 abyss kernel: md: bind<sdz>
Jul 15 08:55:51 abyss kernel: md: bind<sdaa>
Jul 15 08:55:51 abyss kernel: md: bind<sdab>
Jul 15 08:55:51 abyss kernel: md: bind<sda>
Jul 15 08:55:51 abyss kernel: md: kicking non-fresh sdaa from array!
Jul 15 08:55:51 abyss kernel: md: unbind<sdaa>
Jul 15 08:55:51 abyss kernel: md: export_rdev(sdaa)
Jul 15 08:55:51 abyss kernel: raid5: device sda operational as raid disk 0
Jul 15 08:55:51 abyss kernel: raid5: device sdab operational as raid disk 27
Jul 15 08:55:51 abyss kernel: raid5: device sdz operational as raid disk 25
Jul 15 08:55:51 abyss kernel: raid5: device sdy operational as raid disk 24
Jul 15 08:55:51 abyss kernel: raid5: device sdx operational as raid disk 23
Jul 15 08:55:51 abyss kernel: raid5: device sdw operational as raid disk 22
Jul 15 08:55:51 abyss kernel: raid5: device sdv operational as raid disk 21
Jul 15 08:55:51 abyss kernel: raid5: device sdu operational as raid disk 20
Jul 15 08:55:51 abyss kernel: raid5: device sdt operational as raid disk 19
Jul 15 08:55:51 abyss kernel: raid5: device sds operational as raid disk 18
Jul 15 08:55:51 abyss kernel: raid5: device sdr operational as raid disk 17
Jul 15 08:55:51 abyss kernel: raid5: device sdq operational as raid disk 16
Jul 15 08:55:51 abyss kernel: raid5: device sdp operational as raid disk 15
Jul 15 08:55:51 abyss kernel: raid5: device sdo operational as raid disk 14
Jul 15 08:55:51 abyss kernel: raid5: device sdn operational as raid disk 13
Jul 15 08:55:51 abyss kernel: raid5: device sdm operational as raid disk 12
Jul 15 08:55:51 abyss kernel: raid5: device sdl operational as raid disk 11
Jul 15 08:55:51 abyss kernel: raid5: device sdk operational as raid disk 10
Jul 15 08:55:51 abyss kernel: raid5: device sdj operational as raid disk 9
Jul 15 08:55:51 abyss kernel: raid5: device sdi operational as raid disk 8
Jul 15 08:55:51 abyss kernel: raid5: device sdh operational as raid disk 7
Jul 15 08:55:51 abyss kernel: raid5: device sdg operational as raid disk 6
Jul 15 08:55:51 abyss kernel: raid5: device sdf operational as raid disk 5
Jul 15 08:55:51 abyss kernel: raid5: device sde operational as raid disk 4
Jul 15 08:55:51 abyss kernel: raid5: device sdd operational as raid disk 3
Jul 15 08:55:51 abyss kernel: raid5: device sdc operational as raid disk 2
Jul 15 08:55:51 abyss kernel: raid5: device sdb operational as raid disk 1
Jul 15 08:55:51 abyss kernel: raid5: allocated 29215kB for md0
Jul 15 08:55:51 abyss kernel: raid5: raid level 5 set md0 active with 27 out
of 28 devices, algorithm 0
Jul 15 08:55:51 abyss kernel: RAID5 conf printout:
Jul 15 08:55:51 abyss kernel: --- rd:28 wd:27 fd:1
Jul 15 08:55:51 abyss kernel: disk 0, o:1, dev:sda
Jul 15 08:55:51 abyss kernel: disk 1, o:1, dev:sdb
Jul 15 08:55:51 abyss kernel: disk 2, o:1, dev:sdc
Jul 15 08:55:51 abyss kernel: disk 3, o:1, dev:sdd
Jul 15 08:55:51 abyss kernel: disk 4, o:1, dev:sde
Jul 15 08:55:51 abyss kernel: disk 5, o:1, dev:sdf
Jul 15 08:55:51 abyss kernel: disk 6, o:1, dev:sdg
Jul 15 08:55:51 abyss kernel: disk 7, o:1, dev:sdh
Jul 15 08:55:51 abyss kernel: disk 8, o:1, dev:sdi
Jul 15 08:55:51 abyss kernel: disk 9, o:1, dev:sdj
Jul 15 08:55:51 abyss kernel: disk 10, o:1, dev:sdk
Jul 15 08:55:51 abyss kernel: disk 11, o:1, dev:sdl
Jul 15 08:55:51 abyss kernel: disk 12, o:1, dev:sdm
Jul 15 08:55:51 abyss kernel: disk 13, o:1, dev:sdn
Jul 15 08:55:51 abyss kernel: disk 14, o:1, dev:sdo
Jul 15 08:55:51 abyss kernel: disk 15, o:1, dev:sdp
Jul 15 08:55:51 abyss kernel: disk 16, o:1, dev:sdq
Jul 15 08:55:51 abyss kernel: disk 17, o:1, dev:sdr
Jul 15 08:55:51 abyss kernel: disk 18, o:1, dev:sds
Jul 15 08:55:51 abyss kernel: disk 19, o:1, dev:sdt
Jul 15 08:55:51 abyss kernel: disk 20, o:1, dev:sdu
Jul 15 08:55:51 abyss kernel: disk 21, o:1, dev:sdv
Jul 15 08:55:51 abyss kernel: disk 22, o:1, dev:sdw
Jul 15 08:55:51 abyss kernel: disk 23, o:1, dev:sdx
Jul 15 08:55:51 abyss kernel: disk 24, o:1, dev:sdy
Jul 15 08:55:51 abyss kernel: disk 25, o:1, dev:sdz
Jul 15 08:55:51 abyss kernel: disk 27, o:1, dev:sdab
Jul 15 08:56:22 abyss kernel: ReiserFS: md0: found reiserfs format "3.6"
with standard journal
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: using ordered data mode
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: journal params: device md0,
size 8192, journal first block 18, max trans len 1024, max batch 900, max
commit age 30, max trans age 30
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: checking transaction log (md0)
Jul 15 08:56:26 abyss kernel: ReiserFS: md0: replayed 1 transactions in 0
seconds
Jul 15 08:56:27 abyss kernel: ReiserFS: md0: Using r5 hash to sort names
/dev/md0:
Version : 01.00.01
Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
Device Size : 71687296 (68.37 GiB 73.41 GB)
Raid Devices : 28
Total Devices : 27
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Jul 15 16:11:05 2005
State : clean, degraded
Active Devices : 27
Working Devices : 27
Failed Devices : 0
Spare Devices : 0
Layout : left-asymmetric
Chunk Size : 128K
UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
Events : 173177
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/evms/.nodes/sda
1 8 16 1 active sync /dev/evms/.nodes/sdb
2 8 32 2 active sync /dev/evms/.nodes/sdc
3 8 48 3 active sync /dev/evms/.nodes/sdd
4 8 64 4 active sync /dev/evms/.nodes/sde
5 8 80 5 active sync /dev/evms/.nodes/sdf
6 8 96 6 active sync /dev/evms/.nodes/sdg
7 8 112 7 active sync /dev/evms/.nodes/sdh
8 8 128 8 active sync /dev/evms/.nodes/sdi
9 8 144 9 active sync /dev/evms/.nodes/sdj
10 8 160 10 active sync /dev/evms/.nodes/sdk
11 8 176 11 active sync /dev/evms/.nodes/sdl
12 8 192 12 active sync /dev/evms/.nodes/sdm
13 8 208 13 active sync /dev/evms/.nodes/sdn
14 8 224 14 active sync /dev/evms/.nodes/sdo
15 8 240 15 active sync /dev/evms/.nodes/sdp
16 65 0 16 active sync /dev/evms/.nodes/sdq
17 65 16 17 active sync /dev/evms/.nodes/sdr
18 65 32 18 active sync /dev/evms/.nodes/sds
19 65 48 19 active sync /dev/evms/.nodes/sdt
20 65 64 20 active sync /dev/evms/.nodes/sdu
21 65 80 21 active sync /dev/evms/.nodes/sdv
22 65 96 22 active sync /dev/evms/.nodes/sdw
23 65 112 23 active sync /dev/evms/.nodes/sdx
24 65 128 24 active sync /dev/evms/.nodes/sdy
25 65 144 25 active sync /dev/evms/.nodes/sdz
26 0 0 - removed
27 65 176 27 active sync /dev/evms/.nodes/sdab
It's been running most of the day - with no problems, just in a degraded
state. How do I get /dev/sdaa back into the array?
-- David M. Strang
----- Original Message -----
From: David M. Strang
To: Neil Brown
Cc: linux-raid@vger.kernel.org
Sent: Thursday, July 14, 2005 10:16 PM
Subject: Re: Raid5 Failure
Neil -
You are the man; the array went w/o force - and is rebuilding now!
-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
Version : 01.00.01
Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
Device Size : 71687296 (68.37 GiB 73.41 GB)
Raid Devices : 28
Total Devices : 28
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Jul 14 22:07:18 2005
State : active, resyncing
Active Devices : 28
Working Devices : 28
Failed Devices : 0
Spare Devices : 0
Layout : left-asymmetric
Chunk Size : 128K
Rebuild Status : 0% complete
UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
Events : 172760
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/evms/.nodes/sda
1 8 16 1 active sync /dev/evms/.nodes/sdb
2 8 32 2 active sync /dev/evms/.nodes/sdc
3 8 48 3 active sync /dev/evms/.nodes/sdd
4 8 64 4 active sync /dev/evms/.nodes/sde
5 8 80 5 active sync /dev/evms/.nodes/sdf
6 8 96 6 active sync /dev/evms/.nodes/sdg
7 8 112 7 active sync /dev/evms/.nodes/sdh
8 8 128 8 active sync /dev/evms/.nodes/sdi
9 8 144 9 active sync /dev/evms/.nodes/sdj
10 8 160 10 active sync /dev/evms/.nodes/sdk
11 8 176 11 active sync /dev/evms/.nodes/sdl
12 8 192 12 active sync /dev/evms/.nodes/sdm
13 8 208 13 active sync /dev/evms/.nodes/sdn
14 8 224 14 active sync /dev/evms/.nodes/sdo
15 8 240 15 active sync /dev/evms/.nodes/sdp
16 65 0 16 active sync /dev/evms/.nodes/sdq
17 65 16 17 active sync /dev/evms/.nodes/sdr
18 65 32 18 active sync /dev/evms/.nodes/sds
19 65 48 19 active sync /dev/evms/.nodes/sdt
20 65 64 20 active sync /dev/evms/.nodes/sdu
21 65 80 21 active sync /dev/evms/.nodes/sdv
22 65 96 22 active sync /dev/evms/.nodes/sdw
23 65 112 23 active sync /dev/evms/.nodes/sdx
24 65 128 24 active sync /dev/evms/.nodes/sdy
25 65 144 25 active sync /dev/evms/.nodes/sdz
26 65 160 26 active sync /dev/evms/.nodes/sdaa
27 65 176 27 active sync /dev/evms/.nodes/sdab
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Thursday, July 14, 2005 9:43 PM
Subject: Re: Raid5 Failure
On Thursday July 14, dstrang@shellpower.net wrote:
>
> It looks like the first 'segment of discs' sda->sdm are all marked clean;
> while sdn->sdab are marked active.
>
> What can I do to resolve this issue? Any assistance would be greatly
> appreciated.
Apply the following patch to mdadm-2.0-devel2 (it fixes a few bugs and
particularly make --assemble work) then try:
mdadm -A /dev/md0 /dev/sd[a-z] /dev/sd....
Just list all 28 SCSI devices, I'm not sure what their names are.
This will quite probably fail.
If it does, try again with
--force
NeilBrown
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./Assemble.c | 13 ++++++++++++-
./Query.c | 33 +++++++++++++++++++--------------
./mdadm.h | 2 +-
./super0.c | 1 +
./super1.c | 4 ++--
5 files changed, 35 insertions(+), 18 deletions(-)
diff ./Assemble.c~current~ ./Assemble.c
--- ./Assemble.c~current~ 2005-07-15 10:13:04.000000000 +1000
+++ ./Assemble.c 2005-07-15 10:37:59.000000000 +1000
@@ -473,6 +473,7 @@ int Assemble(struct supertype *st, char
if (!devices[j].uptodate)
continue;
info.disk.number = i;
+ info.disk.raid_disk = i;
info.disk.state = desired_state;
if (devices[j].uptodate &&
@@ -526,7 +527,17 @@ int Assemble(struct supertype *st, char
/* Almost ready to actually *do* something */
if (!old_linux) {
- if (ioctl(mdfd, SET_ARRAY_INFO, NULL) != 0) {
+ int rv;
+ if ((vers % 100) >= 1) { /* can use different versions */
+ mdu_array_info_t inf;
+ memset(&inf, 0, sizeof(inf));
+ inf.major_version = st->ss->major;
+ inf.minor_version = st->minor_version;
+ rv = ioctl(mdfd, SET_ARRAY_INFO, &inf);
+ } else
+ rv = ioctl(mdfd, SET_ARRAY_INFO, NULL);
+
+ if (rv) {
fprintf(stderr, Name ": SET_ARRAY_INFO failed for %s: %s\n",
mddev, strerror(errno));
return 1;
diff ./Query.c~current~ ./Query.c
--- ./Query.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./Query.c 2005-07-15 11:38:18.000000000 +1000
@@ -105,26 +105,31 @@ int Query(char *dev)
if (superror == 0) {
/* array might be active... */
st->ss->getinfo_super(&info, super);
- mddev = get_md_name(info.array.md_minor);
- disc.number = info.disk.number;
- activity = "undetected";
- if (mddev && (fd = open(mddev, O_RDONLY))>=0) {
- if (md_get_version(fd) >= 9000 &&
- ioctl(fd, GET_ARRAY_INFO, &array)>= 0) {
- if (ioctl(fd, GET_DISK_INFO, &disc) >= 0 &&
- makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
- activity = "active";
- else
- activity = "mismatch";
+ if (st->ss->major == 0) {
+ mddev = get_md_name(info.array.md_minor);
+ disc.number = info.disk.number;
+ activity = "undetected";
+ if (mddev && (fd = open(mddev, O_RDONLY))>=0) {
+ if (md_get_version(fd) >= 9000 &&
+ ioctl(fd, GET_ARRAY_INFO, &array)>= 0) {
+ if (ioctl(fd, GET_DISK_INFO, &disc) >= 0 &&
+ makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
+ activity = "active";
+ else
+ activity = "mismatch";
+ }
+ close(fd);
}
- close(fd);
+ } else {
+ activity = "unknown";
+ mddev = "array";
}
- printf("%s: device %d in %d device %s %s md%d. Use mdadm --examine for
more detail.\n",
+ printf("%s: device %d in %d device %s %s %s. Use mdadm --examine for more
detail.\n",
dev,
info.disk.number, info.array.raid_disks,
activity,
map_num(pers, info.array.level),
- info.array.md_minor);
+ mddev);
}
return 0;
}
diff ./mdadm.h~current~ ./mdadm.h
--- ./mdadm.h~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./mdadm.h 2005-07-15 10:15:51.000000000 +1000
@@ -73,7 +73,7 @@ struct mdinfo {
mdu_array_info_t array;
mdu_disk_info_t disk;
__u64 events;
- unsigned int uuid[4];
+ int uuid[4];
};
#define Name "mdadm"
diff ./super0.c~current~ ./super0.c
--- ./super0.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./super0.c 2005-07-15 11:27:12.000000000 +1000
@@ -205,6 +205,7 @@ static void getinfo_super0(struct mdinfo
info->disk.major = sb->this_disk.major;
info->disk.minor = sb->this_disk.minor;
info->disk.raid_disk = sb->this_disk.raid_disk;
+ info->disk.number = sb->this_disk.number;
info->events = md_event(sb);
diff ./super1.c~current~ ./super1.c
--- ./super1.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./super1.c 2005-07-15 11:25:04.000000000 +1000
@@ -278,7 +278,7 @@ static void getinfo_super1(struct mdinfo
info->disk.major = 0;
info->disk.minor = 0;
-
+ info->disk.number = __le32_to_cpu(sb->dev_number);
if (__le32_to_cpu(sb->dev_number) >= __le32_to_cpu(sb->max_dev) ||
__le32_to_cpu(sb->max_dev) > 512)
role = 0xfffe;
@@ -303,7 +303,7 @@ static void getinfo_super1(struct mdinfo
for (i=0; i< __le32_to_cpu(sb->max_dev); i++) {
role = __le16_to_cpu(sb->dev_roles[i]);
- if (role == 0xFFFF || role < info->array.raid_disks)
+ if (/*role == 0xFFFF || */role < info->array.raid_disks)
working++;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-17 15:44 Raid5 Failure David M. Strang
@ 2005-07-17 22:05 ` Neil Brown
2005-07-17 23:15 ` David M. Strang
0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-17 22:05 UTC (permalink / raw)
To: David M. Strang; +Cc: linux-raid
On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
>
> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> What, if anything, can I do since I'm using a version 1.0
> superblock?
Use a newer mdadm. This works with v2.0-devel-2 (I just checked).
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-17 22:05 ` Neil Brown
@ 2005-07-17 23:15 ` David M. Strang
2005-07-18 0:05 ` Tyler
2005-07-18 0:06 ` Neil Brown
0 siblings, 2 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-17 23:15 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil -
-(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
-(root@abyss)-(/)- # mdadm --version
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
-(root@abyss)-(/)- #
Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0
superblocks.
I'm using the devel-2 version, with the patch you posted previously.
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 6:05 PM
Subject: Re: Raid5 Failure
On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
>
> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> What, if anything, can I do since I'm using a version 1.0
> superblock?
Use a newer mdadm. This works with v2.0-devel-2 (I just checked).
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-17 23:15 ` David M. Strang
@ 2005-07-18 0:05 ` Tyler
2005-07-18 0:23 ` David M. Strang
2005-07-18 0:06 ` Neil Brown
1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18 0:05 UTC (permalink / raw)
To: David M. Strang; +Cc: Neil Brown, linux-raid
Try it with -a or --add, not -add, also, you don't need the --manage bit.
Regards,
Tyler.
David M. Strang wrote:
> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7
> July 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with
> version-0 superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
>
> -- David M. Strang
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 6:05 PM
> Subject: Re: Raid5 Failure
>
>
> On Sunday July 17, dstrang@shellpower.net wrote:
>
>> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
>> mdadm: hot add failed for /dev/sdaa: Invalid argument
>>
>> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with
>> version-0
>> superblocks.
>>
>> What, if anything, can I do since I'm using a version 1.0
>> superblock?
>
>
> Use a newer mdadm. This works with v2.0-devel-2 (I just checked).
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-18 0:05 ` Tyler
@ 2005-07-18 0:23 ` David M. Strang
0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18 0:23 UTC (permalink / raw)
To: Tyler; +Cc: Neil Brown, linux-raid
-(root@abyss)-(/)- # mdadm -a /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
-(root@abyss)-(/)- # mdadm --add /dev/md0 /dev/sdaa
mdadm: hot add failed for /dev/sdaa: Invalid argument
Jul 17 20:21:42 abyss kernel: md0: HOT_ADD may only be used with version-0
superblocks.
Jul 17 20:22:05 abyss kernel: md0: HOT_ADD may only be used with version-0
superblocks.
Still no go with -a or -add.
-- David M. Strang
----- Original Message -----
From: Tyler
To: David M. Strang
Cc: Neil Brown ; linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 8:05 PM
Subject: Re: Raid5 Failure
Try it with -a or --add, not -add, also, you don't need the --manage bit.
Regards,
Tyler.
David M. Strang wrote:
> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July
> 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
>
> -- David M. Strang
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 6:05 PM
> Subject: Re: Raid5 Failure
>
>
> On Sunday July 17, dstrang@shellpower.net wrote:
>
>> -(root@abyss)-(/)- # mdadm --manage --add /dev/md0 /dev/sdaa
>> mdadm: hot add failed for /dev/sdaa: Invalid argument
>>
>> Jul 17 11:42:38 abyss kernel: md0: HOT_ADD may only be used with
>> version-0
>> superblocks.
>>
>> What, if anything, can I do since I'm using a version 1.0
>> superblock?
>
>
> Use a newer mdadm. This works with v2.0-devel-2 (I just checked).
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-17 23:15 ` David M. Strang
2005-07-18 0:05 ` Tyler
@ 2005-07-18 0:06 ` Neil Brown
2005-07-18 0:52 ` David M. Strang
1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-18 0:06 UTC (permalink / raw)
To: David M. Strang; +Cc: linux-raid
On Sunday July 17, dstrang@shellpower.net wrote:
> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
That's really odd, because the only time mdadm-2 uses HOT_ADD_DISK is
inside an
if (array.major_version == 0)
statement.
Are you any good with 'gdb'?
Could you try running mdadm under gdb, put a break point at
'Manage_subdevs', then step through from there and see what happens?
Print the value of 'array' after the GET_ARRAY_INFO ioctl, and then
keep stepping through until the error occurs..
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-18 0:06 ` Neil Brown
@ 2005-07-18 0:52 ` David M. Strang
2005-07-18 1:06 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-18 0:52 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
I'm not real good with GDB... but I'm giving it a shot.
(gdb) run -a /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
devlist=0x8067018) at Manage.c:174
174 void *dsuper = NULL;
(gdb) c
Continuing.
mdadm: hot add failed for /dev/sdaa: Invalid argument
Program exited with code 01.
(gdb)
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 8:06 PM
Subject: Re: Raid5 Failure
On Sunday July 17, dstrang@shellpower.net wrote:
> Neil -
>
> -(root@abyss)-(/)- # mdadm --manage -add /dev/md0 /dev/sdaa
> mdadm: hot add failed for /dev/sdaa: Invalid argument
> -(root@abyss)-(/)- # mdadm --version
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July
> 2005
> -(root@abyss)-(/)- #
>
> Jul 17 19:13:57 abyss kernel: md0: HOT_ADD may only be used with version-0
> superblocks.
>
> I'm using the devel-2 version, with the patch you posted previously.
That's really odd, because the only time mdadm-2 uses HOT_ADD_DISK is
inside an
if (array.major_version == 0)
statement.
Are you any good with 'gdb'?
Could you try running mdadm under gdb, put a break point at
'Manage_subdevs', then step through from there and see what happens?
Print the value of 'array' after the GET_ARRAY_INFO ioctl, and then
keep stepping through until the error occurs..
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-18 0:52 ` David M. Strang
@ 2005-07-18 1:06 ` Neil Brown
2005-07-18 1:26 ` David M. Strang
[not found] ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
0 siblings, 2 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-18 1:06 UTC (permalink / raw)
To: David M. Strang; +Cc: linux-raid
On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
>
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
> devlist=0x8067018) at Manage.c:174
> 174 void *dsuper = NULL;
> (gdb) c
At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
print array
to print the 'array' structure.
Then continue with 'n' repeatedly.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-18 1:06 ` Neil Brown
@ 2005-07-18 1:26 ` David M. Strang
2005-07-18 1:31 ` David M. Strang
[not found] ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
1 sibling, 1 reply; 22+ messages in thread
From: David M. Strang @ 2005-07-18 1:26 UTC (permalink / raw)
Cc: linux-raid
-(root@abyss)-(~)- # gdb mdadm
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".
(gdb) b 'Manage_subdevs'
Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
(gdb) run --manage --add /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7,
devlist=0x8067018) at Manage.c:174
174 void *dsuper = NULL;
(gdb) n
176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
(gdb) n
181 for (dv = devlist ; dv; dv=dv->next) {
(gdb) n
182 if (stat(dv->devname, &stb)) {
(gdb) n
187 if ((stb.st_mode & S_IFMT) != S_IFBLK) {
(gdb) n
192 switch(dv->disposition){
(gdb) n
200 tfd = open(dv->devname, O_RDONLY|O_EXCL);
(gdb) n
201 if (tfd < 0) {
(gdb) n
206 close(tfd);
(gdb) n
210 if (md_get_version(fd)%100 < 2) {
(gdb) n
212 if (ioctl(fd, HOT_ADD_DISK,
(gdb) n
219 fprintf(stderr, Name ": hot add
failed for %s: %s\n",
(gdb) n
mdadm: hot add failed for /dev/sdaa: Invalid argument
221 return 1;
(gdb) n
307 }
(gdb) n
main (argc=5, argv=0xbfe0cb14) at mdadm.c:810
810 if (!rv && readonly < 0)
(gdb) n
812 if (!rv && runstop)
(gdb) n
1072 exit(rv);
(gdb) n
Program exited with code 01.
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:06 PM
Subject: Re: Raid5 Failure
On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
>
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
> devlist=0x8067018) at Manage.c:174
> 174 void *dsuper = NULL;
> (gdb) c
At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
print array
to print the 'array' structure.
Then continue with 'n' repeatedly.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-18 1:26 ` David M. Strang
@ 2005-07-18 1:31 ` David M. Strang
0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18 1:31 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Oops -
forgot this part:
(gdb) print array
$1 = {major_version = -1208614704, minor_version = 134596176, patch_version
= -1079477788, ctime = -1209047397, level = 5, size = -1079477052, nr_disks
= 134623392, raid_disks = 134623456,
md_minor = -1079477212, not_persistent = 0, utime = -1208577344, state
= -1208582100, active_disks = -1208448832, working_disks = -1079477752,
failed_disks = -1209047199, spare_disks = 5,
layout = -1079477052, chunk_size = 134623392}
-- David M. Strang
----- Original Message -----
From: David M. Strang
To: unlisted-recipients: ; no To-header on input
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:26 PM
Subject: Re: Raid5 Failure
-(root@abyss)-(~)- # gdb mdadm
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".
(gdb) b 'Manage_subdevs'
Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
(gdb) run --manage --add /dev/md0 /dev/sdaa
Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7,
devlist=0x8067018) at Manage.c:174
174 void *dsuper = NULL;
(gdb) n
176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
(gdb) n
181 for (dv = devlist ; dv; dv=dv->next) {
(gdb) n
182 if (stat(dv->devname, &stb)) {
(gdb) n
187 if ((stb.st_mode & S_IFMT) != S_IFBLK) {
(gdb) n
192 switch(dv->disposition){
(gdb) n
200 tfd = open(dv->devname, O_RDONLY|O_EXCL);
(gdb) n
201 if (tfd < 0) {
(gdb) n
206 close(tfd);
(gdb) n
210 if (md_get_version(fd)%100 < 2) {
(gdb) n
212 if (ioctl(fd, HOT_ADD_DISK,
(gdb) n
219 fprintf(stderr, Name ": hot add
failed for %s: %s\n",
(gdb) n
mdadm: hot add failed for /dev/sdaa: Invalid argument
221 return 1;
(gdb) n
307 }
(gdb) n
main (argc=5, argv=0xbfe0cb14) at mdadm.c:810
810 if (!rv && readonly < 0)
(gdb) n
812 if (!rv && runstop)
(gdb) n
1072 exit(rv);
(gdb) n
Program exited with code 01.
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:06 PM
Subject: Re: Raid5 Failure
On Sunday July 17, dstrang@shellpower.net wrote:
> I'm not real good with GDB... but I'm giving it a shot.
>
> (gdb) run -a /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm -a /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe75e5f "/dev/md0", fd=7,
> devlist=0x8067018) at Manage.c:174
> 174 void *dsuper = NULL;
> (gdb) c
At this point you need to use 'n' for 'next', to step through the code
one statement at a time.
When you see:
176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
enter 'n' again, to execute that, then
print array
to print the 'array' structure.
Then continue with 'n' repeatedly.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>]
* Re: Raid5 Failure
[not found] ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
@ 2005-07-18 1:33 ` Neil Brown
2005-07-18 1:46 ` David M. Strang
2005-07-18 2:09 ` bug report: mdadm-devel-2 , superblock version 1 Tyler
0 siblings, 2 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-18 1:33 UTC (permalink / raw)
To: David M. Strang; +Cc: linux-raid
On Sunday July 17, dstrang@shellpower.net wrote:
> -(root@abyss)-(~)- # gdb mdadm
> GNU gdb 6.2
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
>
> (gdb) b 'Manage_subdevs'
> Breakpoint 1 at 0x804fcb6: file Manage.c, line 174.
> (gdb) run --manage --add /dev/md0 /dev/sdaa
> Starting program: /sbin/mdadm --manage --add /dev/md0 /dev/sdaa
> warning: Unable to find dynamic linker breakpoint function.
> GDB will be unable to debug shared library initializers
> and track explicitly loaded dynamic code.
>
> Breakpoint 1, Manage_subdevs (devname=0xbfe0de75 "/dev/md0", fd=7, devlist=0x8067018) at Manage.c:174
> 174 void *dsuper = NULL;
> (gdb) n
> 176 if (ioctl(fd, GET_ARRAY_INFO, &array)) {
> (gdb) n
> 181 for (dv = devlist ; dv; dv=dv->next) {
> (gdb) n
> 182 if (stat(dv->devname, &stb)) {
> (gdb) n
> 187 if ((stb.st_mode & S_IFMT) != S_IFBLK) {
> (gdb) n
> 192 switch(dv->disposition){
> (gdb) n
> 200 tfd = open(dv->devname, O_RDONLY|O_EXCL);
> (gdb) n
> 201 if (tfd < 0) {
> (gdb) n
> 206 close(tfd);
> (gdb) n
> 210 if (md_get_version(fd)%100 <
> 2) {
Ahhhh... I cannot read my own code, that is the problem!!
This patch should fix it.
Thanks for persisting.
NeilBrown
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./Manage.c | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
@@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
return 1;
}
close(tfd);
-#if 0
- if (array.major_version == 0) {
-#else
- if (md_get_version(fd)%100 < 2) {
-#endif
+ if (array.major_version == 0 &&
+ md_get_version(fd)%100 < 2) {
if (ioctl(fd, HOT_ADD_DISK,
(unsigned long)stb.st_rdev)==0) {
fprintf(stderr, Name ": hot added %s\n",
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-18 1:33 ` Neil Brown
@ 2005-07-18 1:46 ` David M. Strang
2005-07-18 2:10 ` Tyler
2005-07-18 2:15 ` Neil Brown
2005-07-18 2:09 ` bug report: mdadm-devel-2 , superblock version 1 Tyler
1 sibling, 2 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18 1:46 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil --
That worked, the device has been added to the array. Now, I think the next
problem is my own ignorance.
-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
Version : 01.00.01
Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
Device Size : 71687296 (68.37 GiB 73.41 GB)
Raid Devices : 28
Total Devices : 28
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Jul 17 17:32:12 2005
State : clean, degraded
Active Devices : 27
Working Devices : 28
Failed Devices : 0
Spare Devices : 1
Layout : left-asymmetric
Chunk Size : 128K
UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
Events : 176939
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/evms/.nodes/sda
1 8 16 1 active sync /dev/evms/.nodes/sdb
2 8 32 2 active sync /dev/evms/.nodes/sdc
3 8 48 3 active sync /dev/evms/.nodes/sdd
4 8 64 4 active sync /dev/evms/.nodes/sde
5 8 80 5 active sync /dev/evms/.nodes/sdf
6 8 96 6 active sync /dev/evms/.nodes/sdg
7 8 112 7 active sync /dev/evms/.nodes/sdh
8 8 128 8 active sync /dev/evms/.nodes/sdi
9 8 144 9 active sync /dev/evms/.nodes/sdj
10 8 160 10 active sync /dev/evms/.nodes/sdk
11 8 176 11 active sync /dev/evms/.nodes/sdl
12 8 192 12 active sync /dev/evms/.nodes/sdm
13 8 208 13 active sync /dev/evms/.nodes/sdn
14 8 224 14 active sync /dev/evms/.nodes/sdo
15 8 240 15 active sync /dev/evms/.nodes/sdp
16 65 0 16 active sync /dev/evms/.nodes/sdq
17 65 16 17 active sync /dev/evms/.nodes/sdr
18 65 32 18 active sync /dev/evms/.nodes/sds
19 65 48 19 active sync /dev/evms/.nodes/sdt
20 65 64 20 active sync /dev/evms/.nodes/sdu
21 65 80 21 active sync /dev/evms/.nodes/sdv
22 65 96 22 active sync /dev/evms/.nodes/sdw
23 65 112 23 active sync /dev/evms/.nodes/sdx
24 65 128 24 active sync /dev/evms/.nodes/sdy
25 65 144 25 active sync /dev/evms/.nodes/sdz
26 0 0 - removed
27 65 176 27 active sync /dev/evms/.nodes/sdab
28 65 160 - spare /dev/evms/.nodes/sdaa
I've got 28 devices, 1 spare, 27 active. I'm still running as clean,
degraded.
What do I do next? What I wanted to do was to put /dev/sdaa back in as
device 26, but now it's device 28 - and flagged as spare. How do I make it
active in the array again?
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 9:33 PM
Subject: Re: Raid5 Failure
Ahhhh... I cannot read my own code, that is the problem!!
This patch should fix it.
Thanks for persisting.
NeilBrown
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./Manage.c | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff ./Manage.c~current~ ./Manage.c
--- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
+++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
@@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
return 1;
}
close(tfd);
-#if 0
- if (array.major_version == 0) {
-#else
- if (md_get_version(fd)%100 < 2) {
-#endif
+ if (array.major_version == 0 &&
+ md_get_version(fd)%100 < 2) {
if (ioctl(fd, HOT_ADD_DISK,
(unsigned long)stb.st_rdev)==0) {
fprintf(stderr, Name ": hot added %s\n",
-
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-18 1:46 ` David M. Strang
@ 2005-07-18 2:10 ` Tyler
2005-07-18 2:12 ` David M. Strang
2005-07-18 2:15 ` Neil Brown
1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18 2:10 UTC (permalink / raw)
To: David M. Strang; +Cc: Neil Brown, linux-raid
If you cat /proc/mdstat, it should show the array resyncing... when its
done, it will put the drive back as device 26.
Tyler.
David M. Strang wrote:
> Neil --
>
> That worked, the device has been added to the array. Now, I think the
> next problem is my own ignorance.
>
> -(root@abyss)-(/)- # mdadm --detail /dev/md0
> /dev/md0:
> Version : 01.00.01
> Creation Time : Wed Dec 31 19:00:00 1969
> Raid Level : raid5
> Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
> Device Size : 71687296 (68.37 GiB 73.41 GB)
> Raid Devices : 28
> Total Devices : 28
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Sun Jul 17 17:32:12 2005
> State : clean, degraded
> Active Devices : 27
> Working Devices : 28
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : left-asymmetric
> Chunk Size : 128K
>
> UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
> Events : 176939
>
> Number Major Minor RaidDevice State
> 0 8 0 0 active sync /dev/evms/.nodes/sda
> 1 8 16 1 active sync /dev/evms/.nodes/sdb
> 2 8 32 2 active sync /dev/evms/.nodes/sdc
> 3 8 48 3 active sync /dev/evms/.nodes/sdd
> 4 8 64 4 active sync /dev/evms/.nodes/sde
> 5 8 80 5 active sync /dev/evms/.nodes/sdf
> 6 8 96 6 active sync /dev/evms/.nodes/sdg
> 7 8 112 7 active sync /dev/evms/.nodes/sdh
> 8 8 128 8 active sync /dev/evms/.nodes/sdi
> 9 8 144 9 active sync /dev/evms/.nodes/sdj
> 10 8 160 10 active sync /dev/evms/.nodes/sdk
> 11 8 176 11 active sync /dev/evms/.nodes/sdl
> 12 8 192 12 active sync /dev/evms/.nodes/sdm
> 13 8 208 13 active sync /dev/evms/.nodes/sdn
> 14 8 224 14 active sync /dev/evms/.nodes/sdo
> 15 8 240 15 active sync /dev/evms/.nodes/sdp
> 16 65 0 16 active sync /dev/evms/.nodes/sdq
> 17 65 16 17 active sync /dev/evms/.nodes/sdr
> 18 65 32 18 active sync /dev/evms/.nodes/sds
> 19 65 48 19 active sync /dev/evms/.nodes/sdt
> 20 65 64 20 active sync /dev/evms/.nodes/sdu
> 21 65 80 21 active sync /dev/evms/.nodes/sdv
> 22 65 96 22 active sync /dev/evms/.nodes/sdw
> 23 65 112 23 active sync /dev/evms/.nodes/sdx
> 24 65 128 24 active sync /dev/evms/.nodes/sdy
> 25 65 144 25 active sync /dev/evms/.nodes/sdz
> 26 0 0 - removed
> 27 65 176 27 active sync
> /dev/evms/.nodes/sdab
>
> 28 65 160 - spare /dev/evms/.nodes/sdaa
>
>
> I've got 28 devices, 1 spare, 27 active. I'm still running as clean,
> degraded.
>
> What do I do next? What I wanted to do was to put /dev/sdaa back in as
> device 26, but now it's device 28 - and flagged as spare. How do I
> make it active in the array again?
>
> -- David M. Strang
>
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 9:33 PM
> Subject: Re: Raid5 Failure
>
> Ahhhh... I cannot read my own code, that is the problem!!
>
> This patch should fix it.
>
> Thanks for persisting.
>
> NeilBrown
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
> ./Manage.c | 7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff ./Manage.c~current~ ./Manage.c
> --- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
> +++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
> @@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
> return 1;
> }
> close(tfd);
> -#if 0
> - if (array.major_version == 0) {
> -#else
> - if (md_get_version(fd)%100 < 2) {
> -#endif
> + if (array.major_version == 0 &&
> + md_get_version(fd)%100 < 2) {
> if (ioctl(fd, HOT_ADD_DISK,
> (unsigned long)stb.st_rdev)==0) {
> fprintf(stderr, Name ": hot added %s\n",
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Raid5 Failure
2005-07-18 2:10 ` Tyler
@ 2005-07-18 2:12 ` David M. Strang
0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18 2:12 UTC (permalink / raw)
To: Tyler; +Cc: Neil Brown, linux-raid
-(root@abyss)-(/)- # cat /proc/mdstat
Personalities : [raid5] [multipath]
md0 : active raid5 sdaa[28] sda[0] sdab[27] sdz[25] sdy[24] sdx[23] sdw[22]
sdv[21] sdu[20] sdt[19] sds[18] sdr[17] sdq[16] sdp[15] sdo[14] sdn[13]
sdm[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3]
sdc[2] sdb[1]
1935556992 blocks level 5, 128k chunk, algorithm 0 [28/27]
[UUUUUUUUUUUUUUUUUUUUUUUUUU_U]
unused devices: <none>
Do I need to do anything else to 'kick it off' ?
-- David M. Strang
----- Original Message -----
From: Tyler
To: David M. Strang
Cc: Neil Brown ; linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 10:10 PM
Subject: Re: Raid5 Failure
If you cat /proc/mdstat, it should show the array resyncing... when its
done, it will put the drive back as device 26.
Tyler.
David M. Strang wrote:
> Neil --
>
> That worked, the device has been added to the array. Now, I think the next
> problem is my own ignorance.
>
> -(root@abyss)-(/)- # mdadm --detail /dev/md0
> /dev/md0:
> Version : 01.00.01
> Creation Time : Wed Dec 31 19:00:00 1969
> Raid Level : raid5
> Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
> Device Size : 71687296 (68.37 GiB 73.41 GB)
> Raid Devices : 28
> Total Devices : 28
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Sun Jul 17 17:32:12 2005
> State : clean, degraded
> Active Devices : 27
> Working Devices : 28
> Failed Devices : 0
> Spare Devices : 1
>
> Layout : left-asymmetric
> Chunk Size : 128K
>
> UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
> Events : 176939
>
> Number Major Minor RaidDevice State
> 0 8 0 0 active sync /dev/evms/.nodes/sda
> 1 8 16 1 active sync /dev/evms/.nodes/sdb
> 2 8 32 2 active sync /dev/evms/.nodes/sdc
> 3 8 48 3 active sync /dev/evms/.nodes/sdd
> 4 8 64 4 active sync /dev/evms/.nodes/sde
> 5 8 80 5 active sync /dev/evms/.nodes/sdf
> 6 8 96 6 active sync /dev/evms/.nodes/sdg
> 7 8 112 7 active sync /dev/evms/.nodes/sdh
> 8 8 128 8 active sync /dev/evms/.nodes/sdi
> 9 8 144 9 active sync /dev/evms/.nodes/sdj
> 10 8 160 10 active sync /dev/evms/.nodes/sdk
> 11 8 176 11 active sync /dev/evms/.nodes/sdl
> 12 8 192 12 active sync /dev/evms/.nodes/sdm
> 13 8 208 13 active sync /dev/evms/.nodes/sdn
> 14 8 224 14 active sync /dev/evms/.nodes/sdo
> 15 8 240 15 active sync /dev/evms/.nodes/sdp
> 16 65 0 16 active sync /dev/evms/.nodes/sdq
> 17 65 16 17 active sync /dev/evms/.nodes/sdr
> 18 65 32 18 active sync /dev/evms/.nodes/sds
> 19 65 48 19 active sync /dev/evms/.nodes/sdt
> 20 65 64 20 active sync /dev/evms/.nodes/sdu
> 21 65 80 21 active sync /dev/evms/.nodes/sdv
> 22 65 96 22 active sync /dev/evms/.nodes/sdw
> 23 65 112 23 active sync /dev/evms/.nodes/sdx
> 24 65 128 24 active sync /dev/evms/.nodes/sdy
> 25 65 144 25 active sync /dev/evms/.nodes/sdz
> 26 0 0 - removed
> 27 65 176 27 active sync /dev/evms/.nodes/sdab
>
> 28 65 160 - spare /dev/evms/.nodes/sdaa
>
>
> I've got 28 devices, 1 spare, 27 active. I'm still running as clean,
> degraded.
>
> What do I do next? What I wanted to do was to put /dev/sdaa back in as
> device 26, but now it's device 28 - and flagged as spare. How do I make it
> active in the array again?
>
> -- David M. Strang
>
>
> ----- Original Message ----- From: Neil Brown
> To: David M. Strang
> Cc: linux-raid@vger.kernel.org
> Sent: Sunday, July 17, 2005 9:33 PM
> Subject: Re: Raid5 Failure
>
> Ahhhh... I cannot read my own code, that is the problem!!
>
> This patch should fix it.
>
> Thanks for persisting.
>
> NeilBrown
>
> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
>
> ### Diffstat output
> ./Manage.c | 7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff ./Manage.c~current~ ./Manage.c
> --- ./Manage.c~current~ 2005-07-07 09:19:53.000000000 +1000
> +++ ./Manage.c 2005-07-18 11:31:57.000000000 +1000
> @@ -204,11 +204,8 @@ int Manage_subdevs(char *devname, int fd
> return 1;
> }
> close(tfd);
> -#if 0
> - if (array.major_version == 0) {
> -#else
> - if (md_get_version(fd)%100 < 2) {
> -#endif
> + if (array.major_version == 0 &&
> + md_get_version(fd)%100 < 2) {
> if (ioctl(fd, HOT_ADD_DISK,
> (unsigned long)stb.st_rdev)==0) {
> fprintf(stderr, Name ": hot added %s\n",
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-18 1:46 ` David M. Strang
2005-07-18 2:10 ` Tyler
@ 2005-07-18 2:15 ` Neil Brown
2005-07-18 2:24 ` David M. Strang
1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-18 2:15 UTC (permalink / raw)
To: David M. Strang; +Cc: linux-raid
On Sunday July 17, dstrang@shellpower.net wrote:
> Neil --
>
> That worked, the device has been added to the array. Now, I think the next
> problem is my own ignorance.
It looks like your kernel is missing the following patch (dated 31st
may 2005). You're near the bleeding edge working with version-1
superblocks (and I do thank you for being a guinea pig:-) and should
use an ultra-recent kernel if at all possible.
If you don't have the array mounted (or can unmount it safely) then
you might be able to convince it to start the rebuild with by setting
it read-only, then writable.
i.e
mdadm --readonly /dev/md0
mdadm --readwrite /dev/md0
alternately stop and re-assemble the array.
NeilBrown
-----------------------
Make sure recovery happens when add_new_disk is used for hot_add
Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 2 ++
1 files changed, 2 insertions(+)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c 2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
err = bind_rdev_to_array(rdev, mddev);
if (err)
export_rdev(rdev);
+
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
if (mddev->thread)
md_wakeup_thread(mddev->thread);
return err;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Raid5 Failure
2005-07-18 2:15 ` Neil Brown
@ 2005-07-18 2:24 ` David M. Strang
0 siblings, 0 replies; 22+ messages in thread
From: David M. Strang @ 2005-07-18 2:24 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil -
I'm using 2.6.12.
-(root@abyss)-(/)- # uname -ar
Linux abyss 2.6.12 #2 SMP Mon Jun 20 22:15:25 EDT 2005 i686 unknown unknown
GNU/Linux
I unmounted the raid, used the readonly, then readwrite - then remounted the
raid.
Jul 17 22:19:14 abyss kernel: md: md0 switched to read-only mode.
Jul 17 22:19:23 abyss kernel: md: md0 switched to read-write mode.
Jul 17 22:19:23 abyss kernel: RAID5 conf printout:
Jul 17 22:19:23 abyss kernel: --- rd:28 wd:27 fd:1
Jul 17 22:19:23 abyss kernel: disk 0, o:1, dev:sda
Jul 17 22:19:23 abyss kernel: disk 1, o:1, dev:sdb
Jul 17 22:19:23 abyss kernel: disk 2, o:1, dev:sdc
Jul 17 22:19:23 abyss kernel: disk 3, o:1, dev:sdd
Jul 17 22:19:23 abyss kernel: disk 4, o:1, dev:sde
Jul 17 22:19:23 abyss kernel: disk 5, o:1, dev:sdf
Jul 17 22:19:23 abyss kernel: disk 6, o:1, dev:sdg
Jul 17 22:19:23 abyss kernel: disk 7, o:1, dev:sdh
Jul 17 22:19:23 abyss kernel: disk 8, o:1, dev:sdi
Jul 17 22:19:23 abyss kernel: disk 9, o:1, dev:sdj
Jul 17 22:19:23 abyss kernel: disk 10, o:1, dev:sdk
Jul 17 22:19:23 abyss kernel: disk 11, o:1, dev:sdl
Jul 17 22:19:23 abyss kernel: disk 12, o:1, dev:sdm
Jul 17 22:19:23 abyss kernel: disk 13, o:1, dev:sdn
Jul 17 22:19:23 abyss kernel: disk 14, o:1, dev:sdo
Jul 17 22:19:23 abyss kernel: disk 15, o:1, dev:sdp
Jul 17 22:19:23 abyss kernel: disk 16, o:1, dev:sdq
Jul 17 22:19:23 abyss kernel: disk 17, o:1, dev:sdr
Jul 17 22:19:23 abyss kernel: disk 18, o:1, dev:sds
Jul 17 22:19:23 abyss kernel: disk 19, o:1, dev:sdt
Jul 17 22:19:23 abyss kernel: disk 20, o:1, dev:sdu
Jul 17 22:19:23 abyss kernel: disk 21, o:1, dev:sdv
Jul 17 22:19:23 abyss kernel: disk 22, o:1, dev:sdw
Jul 17 22:19:23 abyss kernel: disk 23, o:1, dev:sdx
Jul 17 22:19:23 abyss kernel: disk 24, o:1, dev:sdy
Jul 17 22:19:23 abyss kernel: disk 25, o:1, dev:sdz
Jul 17 22:19:23 abyss kernel: disk 26, o:1, dev:sdaa
Jul 17 22:19:23 abyss kernel: disk 27, o:1, dev:sdab
Jul 17 22:19:23 abyss kernel: .<6>md: syncing RAID array md0
Jul 17 22:19:23 abyss kernel: md: minimum _guaranteed_ reconstruction speed:
1000 KB/sec/disc.
Jul 17 22:19:23 abyss kernel: md: using maximum available idle IO bandwith
(but not more than 200000 KB/sec) for reconstruction.
Jul 17 22:19:23 abyss kernel: md: using 128k window, over a total of
71687296 blocks.
-(root@abyss)-(/)- # mdadm --detail /dev/md0
/dev/md0:
Version : 01.00.01
Creation Time : Wed Dec 31 19:00:00 1969
Raid Level : raid5
Array Size : 1935556992 (1845.89 GiB 1982.01 GB)
Device Size : 71687296 (68.37 GiB 73.41 GB)
Raid Devices : 28
Total Devices : 28
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Jul 17 22:20:09 2005
State : clean, degraded, recovering
Active Devices : 27
Working Devices : 28
Failed Devices : 0
Spare Devices : 1
Layout : left-asymmetric
Chunk Size : 128K
Rebuild Status : 0% complete
UUID : 4e2b6b0a8e:92e91c0c:018a4bf0:9bb74d
Events : 176947
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/evms/.nodes/sda
1 8 16 1 active sync /dev/evms/.nodes/sdb
2 8 32 2 active sync /dev/evms/.nodes/sdc
3 8 48 3 active sync /dev/evms/.nodes/sdd
4 8 64 4 active sync /dev/evms/.nodes/sde
5 8 80 5 active sync /dev/evms/.nodes/sdf
6 8 96 6 active sync /dev/evms/.nodes/sdg
7 8 112 7 active sync /dev/evms/.nodes/sdh
8 8 128 8 active sync /dev/evms/.nodes/sdi
9 8 144 9 active sync /dev/evms/.nodes/sdj
10 8 160 10 active sync /dev/evms/.nodes/sdk
11 8 176 11 active sync /dev/evms/.nodes/sdl
12 8 192 12 active sync /dev/evms/.nodes/sdm
13 8 208 13 active sync /dev/evms/.nodes/sdn
14 8 224 14 active sync /dev/evms/.nodes/sdo
15 8 240 15 active sync /dev/evms/.nodes/sdp
16 65 0 16 active sync /dev/evms/.nodes/sdq
17 65 16 17 active sync /dev/evms/.nodes/sdr
18 65 32 18 active sync /dev/evms/.nodes/sds
19 65 48 19 active sync /dev/evms/.nodes/sdt
20 65 64 20 active sync /dev/evms/.nodes/sdu
21 65 80 21 active sync /dev/evms/.nodes/sdv
22 65 96 22 active sync /dev/evms/.nodes/sdw
23 65 112 23 active sync /dev/evms/.nodes/sdx
24 65 128 24 active sync /dev/evms/.nodes/sdy
25 65 144 25 active sync /dev/evms/.nodes/sdz
26 0 0 - removed
27 65 176 27 active sync /dev/evms/.nodes/sdab
28 65 160 26 spare rebuilding
/dev/evms/.nodes/sdaa
It is re-syncing now. Thanks!
This is from my drivers/md.c - lines 2215->2249.
if (rdev->faulty) {
printk(KERN_WARNING
"md: can not hot-add faulty %s disk to %s!\n",
bdevname(rdev->bdev,b), mdname(mddev));
err = -EINVAL;
goto abort_export;
}
rdev->in_sync = 0;
rdev->desc_nr = -1;
bind_rdev_to_array(rdev, mddev);
/*
* The rest should better be atomic, we can have disk failures
* noticed in interrupt contexts ...
*/
if (rdev->desc_nr == mddev->max_disks) {
printk(KERN_WARNING "%s: can not hot-add to full array!\n",
mdname(mddev));
err = -EBUSY;
goto abort_unbind_export;
}
rdev->raid_disk = -1;
md_update_sb(mddev);
/*
* Kick recovery, maybe this spare has to be added to the
* array immediately.
*/
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
return 0;
-- David M. Strang
----- Original Message -----
From: Neil Brown
To: David M. Strang
Cc: linux-raid@vger.kernel.org
Sent: Sunday, July 17, 2005 10:15 PM
Subject: Re: Raid5 Failure
On Sunday July 17, dstrang@shellpower.net wrote:
> Neil --
>
> That worked, the device has been added to the array. Now, I think the next
> problem is my own ignorance.
It looks like your kernel is missing the following patch (dated 31st
may 2005). You're near the bleeding edge working with version-1
superblocks (and I do thank you for being a guinea pig:-) and should
use an ultra-recent kernel if at all possible.
If you don't have the array mounted (or can unmount it safely) then
you might be able to convince it to start the rebuild with by setting
it read-only, then writable.
i.e
mdadm --readonly /dev/md0
mdadm --readwrite /dev/md0
alternately stop and re-assemble the array.
NeilBrown
-----------------------
Make sure recovery happens when add_new_disk is used for hot_add
Currently if add_new_disk is used to hot-add a drive to a degraded
array, recovery doesn't start ... because we didn't tell it to.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
### Diffstat output
./drivers/md/md.c | 2 ++
1 files changed, 2 insertions(+)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-05-31 13:40:35.000000000 +1000
+++ ./drivers/md/md.c 2005-05-31 13:40:34.000000000 +1000
@@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev,
err = bind_rdev_to_array(rdev, mddev);
if (err)
export_rdev(rdev);
+
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
if (mddev->thread)
md_wakeup_thread(mddev->thread);
return err;
^ permalink raw reply [flat|nested] 22+ messages in thread
* bug report: mdadm-devel-2 , superblock version 1
2005-07-18 1:33 ` Neil Brown
2005-07-18 1:46 ` David M. Strang
@ 2005-07-18 2:09 ` Tyler
2005-07-18 2:19 ` Tyler
2005-07-25 0:36 ` Neil Brown
1 sibling, 2 replies; 22+ messages in thread
From: Tyler @ 2005-07-18 2:09 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
# uname -a
Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
# ./mdadm -V
mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
** stop the current array
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -C /dev/md1 -l 5
--raid-devices 3 -e 1 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm: /dev/sda2 appears to be part of a raid array:
level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
mdadm: /dev/sdb2 appears to be part of a raid array:
level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
mdadm: /dev/sdc2 appears to be part of a raid array:
level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
Continue creating array? y
mdadm: array /dev/md1 started.
** create and start new array using 3 drives, superblock version 1
root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
** mdstat mostly okay, except sdc2 is listed as device3 instead of
device2 (from 0,1,2)
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
/dev/md1:
Version : 01.00.01
Creation Time : Mon Jul 18 03:56:40 2005
Raid Level : raid5
Array Size : 128384 (125.40 MiB 131.47 MB)
Device Size : 64192 (62.70 MiB 65.73 MB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jul 18 03:56:42 2005
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
Events : 1
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/.static/dev/sda2
1 8 18 1 active sync /dev/.static/dev/sdb2
2 0 0 - removed
3 8 34 2 active sync /dev/.static/dev/sdc2
** reports version 01.00.01 superblock, but reports as if there were 4
devices used
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
** stop the array
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
Segmentation fault
** try to assemble the array
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
mdadm: md device /dev/md1 does not appear to be active.
** check if its active at all
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2
/dev/sdb2 /dev/sdc2
mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2
seems ok
mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2
seems ok
mdadm: /dev/md1 has been started with 3 drives.
** try restarting it with drive details, and it starts
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Mon Jul 18 02:53:55 2005
Raid Level : raid5
Array Size : 128384 (125.40 MiB 131.47 MB)
Device Size : 64192 (62.70 MiB 65.73 MB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jul 18 02:53:57 2005
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
Events : 0.2
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/.static/dev/sda2
1 8 18 1 active sync /dev/.static/dev/sdb2
2 8 34 2 active sync /dev/.static/dev/sdc2
** magically, we now have a v00.90.01 superblock, it reports the proper
list of drives
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
Segmentation fault
** try to stop and restart again, doesn't work
root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
/dev/sda2:
Magic : a92b4efc
Version : 01.00
Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
Name :
Creation Time : Mon Jul 18 03:56:40 2005
Raid Level : raid5
Raid Devices : 3
Device Size : 128504 (62.76 MiB 65.79 MB)
Super Offset : 128504 sectors
State : clean
Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
Update Time : Mon Jul 18 03:56:42 2005
Checksum : 903062ed - correct
Events : 1
Layout : left-symmetric
Chunk Size : 64K
Array State : Uuu 1 failed
** the drives themselves still report a version 1 superblock... wierd
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: bug report: mdadm-devel-2 , superblock version 1
2005-07-18 2:09 ` bug report: mdadm-devel-2 , superblock version 1 Tyler
@ 2005-07-18 2:19 ` Tyler
2005-07-25 0:37 ` Neil Brown
2005-07-25 0:36 ` Neil Brown
1 sibling, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-18 2:19 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
As a side note, do I need any kernel patches to use the version 1
superblocks Neil? That may be what i'm missing.. as the kernel in use
currently is a vanilla 2.6.12.3. I'm not trying to use bitmaps or
anything else at the moment.
Tyler.
Tyler wrote:
> # uname -a
> Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> # ./mdadm -V
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7
> July 2005
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>
> ** stop the current array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -C /dev/md1 -l 5
> --raid-devices 3 -e 1 /dev/sda2 /dev/sdb2 /dev/sdc2
> mdadm: /dev/sda2 appears to be part of a raid array:
> level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> mdadm: /dev/sdb2 appears to be part of a raid array:
> level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> mdadm: /dev/sdc2 appears to be part of a raid array:
> level=5 devices=3 ctime=Mon Jul 18 03:22:05 2005
> Continue creating array? y
> mdadm: array /dev/md1 started.
>
> ** create and start new array using 3 drives, superblock version 1
>
> root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> Personalities : [raid5]
> md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
> 128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
> ** mdstat mostly okay, except sdc2 is listed as device3 instead of
> device2 (from 0,1,2)
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
> Version : 01.00.01
> Creation Time : Mon Jul 18 03:56:40 2005
> Raid Level : raid5
> Array Size : 128384 (125.40 MiB 131.47 MB)
> Device Size : 64192 (62.70 MiB 65.73 MB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Mon Jul 18 03:56:42 2005
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Events : 1
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync
> /dev/.static/dev/sda2
> 1 8 18 1 active sync
> /dev/.static/dev/sdb2
> 2 0 0 - removed
>
> 3 8 34 2 active sync
> /dev/.static/dev/sdc2
>
> ** reports version 01.00.01 superblock, but reports as if there were 4
> devices used
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>
> ** stop the array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to assemble the array
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> mdadm: md device /dev/md1 does not appear to be active.
>
> ** check if its active at all
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2
> /dev/sdb2 /dev/sdc2
> mdadm: device 1 in /dev/md1 has wrong state in superblock, but
> /dev/sdb2 seems ok
> mdadm: device 2 in /dev/md1 has wrong state in superblock, but
> /dev/sdc2 seems ok
> mdadm: /dev/md1 has been started with 3 drives.
>
> ** try restarting it with drive details, and it starts
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
> Version : 00.90.01
> Creation Time : Mon Jul 18 02:53:55 2005
> Raid Level : raid5
> Array Size : 128384 (125.40 MiB 131.47 MB)
> Device Size : 64192 (62.70 MiB 65.73 MB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Mon Jul 18 02:53:57 2005
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
> Events : 0.2
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync
> /dev/.static/dev/sda2
> 1 8 18 1 active sync
> /dev/.static/dev/sdb2
> 2 8 34 2 active sync
> /dev/.static/dev/sdc2
>
> ** magically, we now have a v00.90.01 superblock, it reports the
> proper list of drives
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to stop and restart again, doesn't work
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
> /dev/sda2:
> Magic : a92b4efc
> Version : 01.00
> Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Name :
> Creation Time : Mon Jul 18 03:56:40 2005
> Raid Level : raid5
> Raid Devices : 3
>
> Device Size : 128504 (62.76 MiB 65.79 MB)
> Super Offset : 128504 sectors
> State : clean
> Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Update Time : Mon Jul 18 03:56:42 2005
> Checksum : 903062ed - correct
> Events : 1
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Array State : Uuu 1 failed
>
> ** the drives themselves still report a version 1 superblock... wierd
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bug report: mdadm-devel-2 , superblock version 1
2005-07-18 2:19 ` Tyler
@ 2005-07-25 0:37 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-25 0:37 UTC (permalink / raw)
To: Tyler; +Cc: linux-raid
On Sunday July 17, pml@dtbb.net wrote:
> As a side note, do I need any kernel patches to use the version 1
> superblocks Neil? That may be what i'm missing.. as the kernel in use
> currently is a vanilla 2.6.12.3. I'm not trying to use bitmaps or
> anything else at the moment.
version 1 superblock should work mostly in 2.6.12.3. I'm not 100%
sure no related patches have gone in since 2.6.12 though...
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bug report: mdadm-devel-2 , superblock version 1
2005-07-18 2:09 ` bug report: mdadm-devel-2 , superblock version 1 Tyler
2005-07-18 2:19 ` Tyler
@ 2005-07-25 0:36 ` Neil Brown
2005-07-25 3:47 ` Tyler
1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-07-25 0:36 UTC (permalink / raw)
To: Tyler; +Cc: linux-raid
On Sunday July 17, pml@dtbb.net wrote:
> # uname -a
> Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> # ./mdadm -V
> mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
>
...
> root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> Personalities : [raid5]
> md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
> 128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>
> unused devices: <none>
>
> ** mdstat mostly okay, except sdc2 is listed as device3 instead of
Hmmm, yes.... It is device number 3 in the array, but it is playing
role-2 in the raid5. When using Version-1 superblocks, we don't moved
devices around, in the "list of all devices". We just assign them
different roles. (device-N or 'spare').
> device2 (from 0,1,2)
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
> Version : 01.00.01
> Creation Time : Mon Jul 18 03:56:40 2005
> Raid Level : raid5
> Array Size : 128384 (125.40 MiB 131.47 MB)
> Device Size : 64192 (62.70 MiB 65.73 MB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Mon Jul 18 03:56:42 2005
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Events : 1
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync /dev/.static/dev/sda2
> 1 8 18 1 active sync /dev/.static/dev/sdb2
> 2 0 0 - removed
>
> 3 8 34 2 active sync /dev/.static/dev/sdc2
>
> ** reports version 01.00.01 superblock, but reports as if there were 4
> devices used
Ok, this output definitely needs fixing. But as you can see, there
are 3 devices playing roles (RaidDevice) 0, 1, and 2. They reside in
slots 0, 1, and 3 of the array.
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to assemble the array
This is not how you assemble an array. You need to tell mdadm which
component devices to use, either on command line or in /etc/mdadm.conf
(and give --scan).
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> mdadm: md device /dev/md1 does not appear to be active.
>
> ** check if its active at all
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2
> /dev/sdb2 /dev/sdc2
> mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2
> seems ok
> mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2
> seems ok
> mdadm: /dev/md1 has been started with 3 drives.
>
> ** try restarting it with drive details, and it starts
Those message are a bother though. I think I know roughly what is
going on. I'll look into it shortly.
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> /dev/md1:
> Version : 00.90.01
> Creation Time : Mon Jul 18 02:53:55 2005
> Raid Level : raid5
> Array Size : 128384 (125.40 MiB 131.47 MB)
> Device Size : 64192 (62.70 MiB 65.73 MB)
> Raid Devices : 3
> Total Devices : 3
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Mon Jul 18 02:53:57 2005
> State : clean
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
> Events : 0.2
>
> Number Major Minor RaidDevice State
> 0 8 2 0 active sync /dev/.static/dev/sda2
> 1 8 18 1 active sync /dev/.static/dev/sdb2
> 2 8 34 2 active sync /dev/.static/dev/sdc2
>
> ** magically, we now have a v00.90.01 superblock, it reports the proper
> list of drives
Ahhh... You have assembled a different array (look at create time too).
version-1 superblocks live at a different location to version-0.90
superblocks. So it is possible to have both on the one drive. It is
supposed to pick the newest, but appears not to have done. You should
really remove old superblocks.... maybe mdadm should do that for you
???
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> Segmentation fault
>
> ** try to stop and restart again, doesn't work
Again, don't do that!
>
> root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
> /dev/sda2:
> Magic : a92b4efc
> Version : 01.00
> Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Name :
> Creation Time : Mon Jul 18 03:56:40 2005
> Raid Level : raid5
> Raid Devices : 3
>
> Device Size : 128504 (62.76 MiB 65.79 MB)
> Super Offset : 128504 sectors
> State : clean
> Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
> Update Time : Mon Jul 18 03:56:42 2005
> Checksum : 903062ed - correct
> Events : 1
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Array State : Uuu 1 failed
>
> ** the drives themselves still report a version 1 superblock... wierd
Yeh. Assemble and Examine should pick the say one by default. It
appears they don't. I'll look into it.
Thanks for the very helpful feedback.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bug report: mdadm-devel-2 , superblock version 1
2005-07-25 0:36 ` Neil Brown
@ 2005-07-25 3:47 ` Tyler
2005-07-27 2:08 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: Tyler @ 2005-07-25 3:47 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
> On Sunday July 17, pml@dtbb.net wrote:
>
>># uname -a
>>Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
>># ./mdadm -V
>>mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
>>
> ...
>
>>root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
>>Personalities : [raid5]
>>md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
>> 128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>
>>unused devices: <none>
>>
>>** mdstat mostly okay, except sdc2 is listed as device3 instead of
>
> Hmmm, yes.... It is device number 3 in the array, but it is playing
> role-2 in the raid5. When using Version-1 superblocks, we don't moved
> devices around, in the "list of all devices". We just assign them
> different roles. (device-N or 'spare').
>
So if I were to add (as an example) 7 spares to a 3 disk raid-5 array,
and later removed them for use elsewhere, a raid using a v1.x superblock
would keep a permanent listing of those drives even after being removed?
Is there a possibility (for either asthetics, or just keeping things
easier to read and possibly diagnose at a later date during manual
recoveries) of adding a command line option to "re-order and remove" old
devices that are marked as removed, that could only function if the
array was clean, and non-degraded? (this would be a manual feature we
would run, especially if automatically doing this might actually confuse
us during times of trouble-shooting?)
>>device2 (from 0,1,2)
>>
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>/dev/md1:
>> Version : 01.00.01
>> Creation Time : Mon Jul 18 03:56:40 2005
>> Raid Level : raid5
>> Array Size : 128384 (125.40 MiB 131.47 MB)
>> Device Size : 64192 (62.70 MiB 65.73 MB)
>> Raid Devices : 3
>> Total Devices : 3
>>Preferred Minor : 1
>> Persistence : Superblock is persistent
>>
>> Update Time : Mon Jul 18 03:56:42 2005
>> State : clean
>> Active Devices : 3
>>Working Devices : 3
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>> Events : 1
>>
>> Number Major Minor RaidDevice State
>> 0 8 2 0 active sync /dev/.static/dev/sda2
>> 1 8 18 1 active sync /dev/.static/dev/sdb2
>> 2 0 0 - removed
>>
>> 3 8 34 2 active sync /dev/.static/dev/sdc2
>>
>>** reports version 01.00.01 superblock, but reports as if there were 4
>>devices used
>
> Ok, this output definitely needs fixing. But as you can see, there
> are 3 devices playing roles (RaidDevice) 0, 1, and 2. They reside in
> slots 0, 1, and 3 of the array.
Depending on your answer to the first question up above, a new question
based on your comment here comes to mind... if we assume, as you say
above that it is normal for v1 superblocks to keep old removed drives
listed, but down here you say the output needs fixing, which output is
wrong in the example showing 0,1,2,3 devices, with device #2 removed,
and device 3 acting as raiddevice 2 ? If the v1 superblocks are
designed to keep removed drives listed, then the above output makes
sense.. now that you've pointed out the "feature".
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
>>Segmentation fault
>>
>>** try to assemble the array
>
> This is not how you assemble an array. You need to tell mdadm which
> component devices to use, either on command line or in /etc/mdadm.conf
> (and give --scan).
I failed to mention that I had an up to date mdadm.conf file, with the
raid UUID in it, and (I will have to verify this) I believe the command
as I typed it above, works with the 1.12 mdadm. The mdadm.conf file has
a DEVICE=/dev/hd[b-z] /dev/sd* line at the beginning of the config file,
and then the standard options (but no devices= line). Does -A still
need *some* options even if the config file is up to date?? (as I said,
I'll have to verify if 1.12 works with just the -A).
Also, if -A requires some other options on the command line, should it
not complain, instead of segfaulting? :D
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>mdadm: md device /dev/md1 does not appear to be active.
>>
>>** check if its active at all
>>
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2
>>/dev/sdb2 /dev/sdc2
>>mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2
>>seems ok
>>mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2
>>seems ok
>>mdadm: /dev/md1 has been started with 3 drives.
>>
>>** try restarting it with drive details, and it starts
>
> Those message are a bother though. I think I know roughly what is
> going on. I'll look into it shortly.
Is this possibly where the v1 superblocks are being mangled, and so it
reverts back to the v0.90 superblocks that it finds on the disk?
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
>>/dev/md1:
>> Version : 00.90.01
>> Creation Time : Mon Jul 18 02:53:55 2005
>> Raid Level : raid5
>> Array Size : 128384 (125.40 MiB 131.47 MB)
>> Device Size : 64192 (62.70 MiB 65.73 MB)
>> Raid Devices : 3
>> Total Devices : 3
>>Preferred Minor : 1
>> Persistence : Superblock is persistent
>>
>> Update Time : Mon Jul 18 02:53:57 2005
>> State : clean
>> Active Devices : 3
>>Working Devices : 3
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> UUID : e798f37d:baf98c2f:e714b50c:8d1018b1
>> Events : 0.2
>>
>> Number Major Minor RaidDevice State
>> 0 8 2 0 active sync /dev/.static/dev/sda2
>> 1 8 18 1 active sync /dev/.static/dev/sdb2
>> 2 8 34 2 active sync /dev/.static/dev/sdc2
>>
>>** magically, we now have a v00.90.01 superblock, it reports the proper
>>list of drives
>
> Ahhh... You have assembled a different array (look at create time too).
> version-1 superblocks live at a different location to version-0.90
> superblocks. So it is possible to have both on the one drive. It is
> supposed to pick the newest, but appears not to have done. You should
> really remove old superblocks.... maybe mdadm should do that for you
> ???
*I* didn't assemble a different array... mdadm did ;) Yes, I agree, if
you create a *new* raid device, it should erase any form of old
superblocks, considering that it warns during creating if it detects a
drive as being part of another array, and prompts for a Y/N continue.
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -S /dev/md1
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
>>Segmentation fault
>>
>>** try to stop and restart again, doesn't work
>
> Again, don't do that!
Okay.. I will begin using --scan (or short form -s) from now on.. but I
*swear* <grin> that it worked without scan with the older MDADM, as long
as you had a valid DEVICE= line in the config file and possibly an ARRAY
definition also. Once again though, it shouldn't segfault, but complain
that it needs other options (and possibly list the options available
with that command).
A good example of a program that offers such insights when you mistype
or fail to provide enough options, is smartmontools.. if you type
"smartctl -t" or "smartctl -t /dev/hda" for example, leaving out the
*type* of test you wanted it to do, it will then list off the possible
test options. If you run "smartctl -t long" but forget a device name to
run the test on, it will tell you that you need to specify a device, and
gives an example.
>>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -E /dev/sda2
>>/dev/sda2:
>> Magic : a92b4efc
>> Version : 01.00
>> Array UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>> Name :
>> Creation Time : Mon Jul 18 03:56:40 2005
>> Raid Level : raid5
>> Raid Devices : 3
>>
>> Device Size : 128504 (62.76 MiB 65.79 MB)
>> Super Offset : 128504 sectors
>> State : clean
>> Device UUID : 1baa875e87:9ec208b2:7f5e6a27:db1f5e
>> Update Time : Mon Jul 18 03:56:42 2005
>> Checksum : 903062ed - correct
>> Events : 1
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Array State : Uuu 1 failed
>>
>>** the drives themselves still report a version 1 superblock... wierd
> Yeh. Assemble and Examine should pick the say one by default. It
> appears they don't. I'll look into it.
>
> Thanks for the very helpful feedback.
>
> NeilBrown
My pleasure Neil.. it was actually quite simple and quick testing, just
using the last little bit of space left over on 3 drives that were
slightly larger than the other 5 drives in the main array.
You can email me a patch directly, or to the list, and I can do some
more testing. I'd really like to get v1 superblocks going, but haven't
had much (reliable) luck in testing yet.
Tyler.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bug report: mdadm-devel-2 , superblock version 1
2005-07-25 3:47 ` Tyler
@ 2005-07-27 2:08 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-07-27 2:08 UTC (permalink / raw)
To: Tyler; +Cc: linux-raid
On Sunday July 24, pml@dtbb.net wrote:
> Neil Brown wrote:
> > On Sunday July 17, pml@dtbb.net wrote:
> >
> >># uname -a
> >>Linux server 2.6.12.3 #3 SMP Sun Jul 17 14:38:12 CEST 2005 i686 GNU/Linux
> >># ./mdadm -V
> >>mdadm - v2.0-devel-2 - DEVELOPMENT VERSION NOT FOR REGULAR USE - 7 July 2005
> >>
> > ...
> >
> >>root@server:~/dev/mdadm-2.0-devel-2# cat /proc/mdstat
> >>Personalities : [raid5]
> >>md1 : active raid5 sdc2[3] sdb2[1] sda2[0]
> >> 128384 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> >>
> >>unused devices: <none>
> >>
> >>** mdstat mostly okay, except sdc2 is listed as device3 instead of
> >
> > Hmmm, yes.... It is device number 3 in the array, but it is playing
> > role-2 in the raid5. When using Version-1 superblocks, we don't moved
> > devices around, in the "list of all devices". We just assign them
> > different roles. (device-N or 'spare').
> >
>
> So if I were to add (as an example) 7 spares to a 3 disk raid-5 array,
> and later removed them for use elsewhere, a raid using a v1.x superblock
> would keep a permanent listing of those drives even after being
> removed?
No. A device retains it's slot number only while it is a member of
the array. Once it is removed from the array it is forgotten. If it
is re-added, it appears simply as a new drive and is allocated the
first free slot.
> Is there a possibility (for either asthetics, or just keeping things
> easier to read and possibly diagnose at a later date during manual
> recoveries) of adding a command line option to "re-order and remove" old
> devices that are marked as removed, that could only function if the
> array was clean, and non-degraded? (this would be a manual feature we
> would run, especially if automatically doing this might actually confuse
> us during times of trouble-shooting?)
I'd rather change the output of mdadm to display the important
information (role in array) more prominently that the less important
info (position in array).
> >>
> >> Number Major Minor RaidDevice State
> >> 0 8 2 0 active sync /dev/.static/dev/sda2
> >> 1 8 18 1 active sync /dev/.static/dev/sdb2
> >> 2 0 0 - removed
> >>
> >> 3 8 34 2 active sync /dev/.static/dev/sdc2
> >>
> >>** reports version 01.00.01 superblock, but reports as if there were 4
> >>devices used
> >
> > Ok, this output definitely needs fixing. But as you can see, there
> > are 3 devices playing roles (RaidDevice) 0, 1, and 2. They reside in
> > slots 0, 1, and 3 of the array.
>
> Depending on your answer to the first question up above, a new question
> based on your comment here comes to mind... if we assume, as you say
> above that it is normal for v1 superblocks to keep old removed drives
> listed, but down here you say the output needs fixing, which output is
> wrong in the example showing 0,1,2,3 devices, with device #2 removed,
> and device 3 acting as raiddevice 2 ? If the v1 superblocks are
> designed to keep removed drives listed, then the above output makes
> sense.. now that you've pointed out the "feature".
It should look more like:
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/.static/dev/sda2
1 8 18 1 active sync /dev/.static/dev/sdb2
3 8 34 2 active sync /dev/.static/dev/sdc2
i.e. printing something that is 'removed' is pointless. And the list
should be sorted by 'RaidDevice', not 'Number'.
>
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1
> >>Segmentation fault
> >>
> >>** try to assemble the array
> >
> > This is not how you assemble an array. You need to tell mdadm which
> > component devices to use, either on command line or in /etc/mdadm.conf
> > (and give --scan).
>
> I failed to mention that I had an up to date mdadm.conf file, with the
> raid UUID in it, and (I will have to verify this) I believe the command
> as I typed it above, works with the 1.12 mdadm. The mdadm.conf file has
> a DEVICE=/dev/hd[b-z] /dev/sd* line at the beginning of the config file,
> and then the standard options (but no devices= line). Does -A still
> need *some* options even if the config file is up to date?? (as I said,
> I'll have to verify if 1.12 works with just the -A).
mdadm won't look at the config file unless you tell it too (with
--scan or --configfile). At least that is what I intended.
>
> Also, if -A requires some other options on the command line, should it
> not complain, instead of segfaulting? :D
Certainly!
>
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -D /dev/md1
> >>mdadm: md device /dev/md1 does not appear to be active.
> >>
> >>** check if its active at all
> >>
> >>root@server:~/dev/mdadm-2.0-devel-2# ./mdadm -A /dev/md1 /dev/sda2
> >>/dev/sdb2 /dev/sdc2
> >>mdadm: device 1 in /dev/md1 has wrong state in superblock, but /dev/sdb2
> >>seems ok
> >>mdadm: device 2 in /dev/md1 has wrong state in superblock, but /dev/sdc2
> >>seems ok
> >>mdadm: /dev/md1 has been started with 3 drives.
> >>
> >>** try restarting it with drive details, and it starts
> >
> > Those message are a bother though. I think I know roughly what is
> > going on. I'll look into it shortly.
>
> Is this possibly where the v1 superblocks are being mangled, and so it
> reverts back to the v0.90 superblocks that it finds on the disk?
I'm not sure until I look carefully through the code.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-07-27 2:08 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-17 15:44 Raid5 Failure David M. Strang
2005-07-17 22:05 ` Neil Brown
2005-07-17 23:15 ` David M. Strang
2005-07-18 0:05 ` Tyler
2005-07-18 0:23 ` David M. Strang
2005-07-18 0:06 ` Neil Brown
2005-07-18 0:52 ` David M. Strang
2005-07-18 1:06 ` Neil Brown
2005-07-18 1:26 ` David M. Strang
2005-07-18 1:31 ` David M. Strang
[not found] ` <001601c58b37$620c69d0$c200a8c0@NCNF5131FTH>
2005-07-18 1:33 ` Neil Brown
2005-07-18 1:46 ` David M. Strang
2005-07-18 2:10 ` Tyler
2005-07-18 2:12 ` David M. Strang
2005-07-18 2:15 ` Neil Brown
2005-07-18 2:24 ` David M. Strang
2005-07-18 2:09 ` bug report: mdadm-devel-2 , superblock version 1 Tyler
2005-07-18 2:19 ` Tyler
2005-07-25 0:37 ` Neil Brown
2005-07-25 0:36 ` Neil Brown
2005-07-25 3:47 ` Tyler
2005-07-27 2:08 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).