* raidhotadd works, mdadm --add doesn't
@ 2006-09-10 20:30 Leon Avery
2006-09-11 17:37 ` Steve Cousins
2006-09-14 22:30 ` Bill Davidsen
0 siblings, 2 replies; 4+ messages in thread
From: Leon Avery @ 2006-09-10 20:30 UTC (permalink / raw)
To: linux-raid
I've been using RAID for a long time, but have been using the old
raidtools. Having just discovered mdadm, I want to switch, but I'm
having trouble. I'm trying to figure out how to use mdadm to replace
a failed disk. Here is my /proc/mdstat:
Personalities : [linear] [raid1]
read_ahead 1024 sectors
md5 : active linear md3[1] md4[0]
1024504832 blocks 64k rounding
md4 : active raid1 hdf5[0] hdh5[1]
731808832 blocks [2/2] [UU]
md3 : active raid1 hde5[0] hdg5[1]
292696128 blocks [2/2] [UU]
md2 : active raid1 hda5[0] hdc5[1]
48339456 blocks [2/2] [UU]
md0 : active raid1 hda3[0] hdc3[1]
9765376 blocks [2/2] [UU]
unused devices: <none>
The relevant parts are md0 and md2. Physical disk hda failed, which
left md0 and md2 running in degraded mode. Having an old spare used
disk sitting on the shelf, I plugged it in, repartitioned it, and said
mdadm --add /dev/md0 /dev/hda3
This appeared to work, but when I looked at mdstat, hda3 was marked
as failed, and md0 was still running degraded. I then foolishly tried
mdadm --add /dev/md0 /dev/hda3 --run
That caused a kernel panic and crashed my system.
I rebooted and said
raidhotadd /dev/md0 /dev/hda3
That worked perfectly, and reconstruction started immediately. So,
although I don't actually have a problem at the moment, I still
haven't figured out how to make mdadm hot-add a replacement disk.
Examination of the syslog was interesting if not exactly
informative. Here's the relevant extract from the attempt to use mdadm:
Sep 10 06:50:28 eatworms kernel: md: trying to hot-add hda3 to md0 ...
Sep 10 06:50:28 eatworms kernel: md: bind<hda3,2>
Sep 10 06:50:28 eatworms kernel: RAID1 conf printout:
Sep 10 06:50:28 eatworms kernel: --- wd:1 rd:2 nd:1
Sep 10 06:50:28 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0
us:1 dev:[dev 00:00]
Sep 10 06:50:28 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdc3
...snip...
Sep 10 06:50:28 eatworms kernel: RAID1 conf printout:
Sep 10 06:50:28 eatworms kernel: --- wd:1 rd:2 nd:2
Sep 10 06:50:28 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0
us:1 dev:[dev 00:00]
Sep 10 06:50:28 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdc3
Sep 10 06:50:28 eatworms kernel: disk 2, s:1, o:0, n:2 rd:2 us:1 dev:hda3
...snip...
Sep 10 06:50:28 eatworms kernel: md: updating md0 RAID
superblock on device
Sep 10 06:50:28 eatworms kernel: md: hda3 [events:
0000038c]<6>(write) hda3's sb offset: -64
Sep 10 06:50:28 eatworms kernel: attempt to access beyond end of device
Sep 10 06:50:28 eatworms kernel: 03:03: rw=1, want=2147483588, limit=1
Sep 10 06:50:28 eatworms kernel: md: write_disk_sb failed for device hda3
...followed by several retries of this before giving up
The problem seems to be the negative superblock offset. In contrast,
the section after the raidhotadd looks like this:
Sep 10 07:12:29 eatworms kernel: md: trying to hot-add hda3 to md0 ...
Sep 10 07:12:29 eatworms kernel: md: bind<hda3,2>
Sep 10 07:12:29 eatworms kernel: RAID1 conf printout:
Sep 10 07:12:29 eatworms kernel: --- wd:1 rd:2 nd:1
Sep 10 07:12:29 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0
us:1 dev:[dev 00:00]
Sep 10 07:12:29 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdc3
...snip...
Sep 10 07:12:29 eatworms kernel: RAID1 conf printout:
Sep 10 07:12:29 eatworms kernel: --- wd:1 rd:2 nd:2
Sep 10 07:12:29 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0
us:1 dev:[dev 00:00]
Sep 10 07:12:29 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:hdc3
Sep 10 07:12:29 eatworms kernel: disk 2, s:1, o:0, n:2 rd:2 us:1 dev:hda3
...snip...
Sep 10 07:12:29 eatworms kernel: md: updating md0 RAID
superblock on device
Sep 10 07:12:29 eatworms kernel: md: hda3 [events:
00000459]<6>(write) hda3's sb offset: 9765440
Sep 10 07:12:29 eatworms kernel: md: hdc3 [events:
00000459]<6>(write) hdc3's sb offset: 9765440
Here we have a reasonable offset of 9765440 and everything works fine.
I suppose this could be an mdadm bug, but it seems more likely that
I'm doing something stupid. Could someone enlighten me?
My system config (uname -a):
Linux eatworms.swmed.edu 2.4.22e #1 Tue Feb 17 13:37:36 CST 2004
i686 unknown unknown GNU/Linux
--
Leon Avery (214) 648-4931 (voice)
Department of Molecular Biology -1488 (fax)
University of Texas Southwestern Medical Center
6000 Harry Hines Blvd leon@eatworms.swmed.edu
Dallas, TX 75390-9148 http://eatworms.swmed.edu/~leon/
^ permalink raw reply [flat|nested] 4+ messages in thread
* raidhotadd works, mdadm --add doesn't
@ 2006-09-10 20:40 Leon Avery
0 siblings, 0 replies; 4+ messages in thread
From: Leon Avery @ 2006-09-10 20:40 UTC (permalink / raw)
To: linux-raid
I'm having trouble using mdadm to hot-add a replacement disk. I
e-mailed a detailed description to the list, only to have it rejected
by Bogofilter. I have therefore placed it on my web server at
http://eatworms.swmed.edu/~leon/raid_problem/06_09_10.txt . Sorry
for the extra trouble.
--
Leon Avery (214) 648-4931 (voice)
Department of Molecular Biology -1488 (fax)
University of Texas Southwestern Medical Center
6000 Harry Hines Blvd leon@eatworms.swmed.edu
Dallas, TX 75390-9148 http://eatworms.swmed.edu/~leon/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raidhotadd works, mdadm --add doesn't
2006-09-10 20:30 raidhotadd works, mdadm --add doesn't Leon Avery
@ 2006-09-11 17:37 ` Steve Cousins
2006-09-14 22:30 ` Bill Davidsen
1 sibling, 0 replies; 4+ messages in thread
From: Steve Cousins @ 2006-09-11 17:37 UTC (permalink / raw)
To: Leon Avery; +Cc: linux-raid
Leon Avery wrote:
> I've been using RAID for a long time, but have been using the old
> raidtools. Having just discovered mdadm, I want to switch, but I'm
> having trouble. I'm trying to figure out how to use mdadm to replace a
> failed disk. Here is my /proc/mdstat:
>
> Personalities : [linear] [raid1]
> read_ahead 1024 sectors
> md5 : active linear md3[1] md4[0]
> 1024504832 blocks 64k rounding
>
> md4 : active raid1 hdf5[0] hdh5[1]
> 731808832 blocks [2/2] [UU]
>
> md3 : active raid1 hde5[0] hdg5[1]
> 292696128 blocks [2/2] [UU]
>
> md2 : active raid1 hda5[0] hdc5[1]
> 48339456 blocks [2/2] [UU]
>
> md0 : active raid1 hda3[0] hdc3[1]
> 9765376 blocks [2/2] [UU]
>
> unused devices: <none>
>
> The relevant parts are md0 and md2. Physical disk hda failed, which
> left md0 and md2 running in degraded mode. Having an old spare used
> disk sitting on the shelf, I plugged it in, repartitioned it, and said
>
> mdadm --add /dev/md0 /dev/hda3
I think the thing to do is to list the md device before the --add :
mdadm /dev/md0 --add /dev/hda3
I use the -a form and do:
mdadm /dev/md0 -a /dev/hda3
Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raidhotadd works, mdadm --add doesn't
2006-09-10 20:30 raidhotadd works, mdadm --add doesn't Leon Avery
2006-09-11 17:37 ` Steve Cousins
@ 2006-09-14 22:30 ` Bill Davidsen
1 sibling, 0 replies; 4+ messages in thread
From: Bill Davidsen @ 2006-09-14 22:30 UTC (permalink / raw)
To: Leon Avery; +Cc: linux-raid
Leon Avery wrote:
> I've been using RAID for a long time, but have been using the old
> raidtools. Having just discovered mdadm, I want to switch, but I'm
> having trouble. I'm trying to figure out how to use mdadm to replace
> a failed disk. Here is my /proc/mdstat:
>
> Personalities : [linear] [raid1]
> read_ahead 1024 sectors
> md5 : active linear md3[1] md4[0]
> 1024504832 blocks 64k rounding
>
> md4 : active raid1 hdf5[0] hdh5[1]
> 731808832 blocks [2/2] [UU]
>
> md3 : active raid1 hde5[0] hdg5[1]
> 292696128 blocks [2/2] [UU]
>
> md2 : active raid1 hda5[0] hdc5[1]
> 48339456 blocks [2/2] [UU]
>
> md0 : active raid1 hda3[0] hdc3[1]
> 9765376 blocks [2/2] [UU]
>
> unused devices: <none>
>
> The relevant parts are md0 and md2. Physical disk hda failed, which
> left md0 and md2 running in degraded mode. Having an old spare used
> disk sitting on the shelf, I plugged it in, repartitioned it, and said
>
> mdadm --add /dev/md0 /dev/hda3
Did you remove the hda from the array first?
>
> This appeared to work, but when I looked at mdstat, hda3 was marked as
> failed, and md0 was still running degraded. I then foolishly tried
>
> mdadm --add /dev/md0 /dev/hda3 --run
>
> That caused a kernel panic and crashed my system.
>
> I rebooted and said
>
> raidhotadd /dev/md0 /dev/hda3
>
> That worked perfectly, and reconstruction started immediately. So,
> although I don't actually have a problem at the moment, I still
> haven't figured out how to make mdadm hot-add a replacement disk.
>
> Examination of the syslog was interesting if not exactly informative.
> Here's the relevant extract from the attempt to use mdadm:
>
> Sep 10 06:50:28 eatworms kernel: md: trying to hot-add hda3 to md0
> ...
> Sep 10 06:50:28 eatworms kernel: md: bind<hda3,2>
> Sep 10 06:50:28 eatworms kernel: RAID1 conf printout:
> Sep 10 06:50:28 eatworms kernel: --- wd:1 rd:2 nd:1
> Sep 10 06:50:28 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0 us:1
> dev:[dev 00:00]
> Sep 10 06:50:28 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1
> dev:hdc3
> ...snip...
> Sep 10 06:50:28 eatworms kernel: RAID1 conf printout:
> Sep 10 06:50:28 eatworms kernel: --- wd:1 rd:2 nd:2
> Sep 10 06:50:28 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0 us:1
> dev:[dev 00:00]
> Sep 10 06:50:28 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1
> dev:hdc3
> Sep 10 06:50:28 eatworms kernel: disk 2, s:1, o:0, n:2 rd:2 us:1
> dev:hda3
> ...snip...
> Sep 10 06:50:28 eatworms kernel: md: updating md0 RAID superblock
> on device
> Sep 10 06:50:28 eatworms kernel: md: hda3 [events:
> 0000038c]<6>(write) hda3's sb offset: -64
> Sep 10 06:50:28 eatworms kernel: attempt to access beyond end of
> device
> Sep 10 06:50:28 eatworms kernel: 03:03: rw=1, want=2147483588,
> limit=1
> Sep 10 06:50:28 eatworms kernel: md: write_disk_sb failed for
> device hda3
> ...followed by several retries of this before giving up
>
> The problem seems to be the negative superblock offset. In contrast,
> the section after the raidhotadd looks like this:
>
> Sep 10 07:12:29 eatworms kernel: md: trying to hot-add hda3 to md0
> ...
> Sep 10 07:12:29 eatworms kernel: md: bind<hda3,2>
> Sep 10 07:12:29 eatworms kernel: RAID1 conf printout:
> Sep 10 07:12:29 eatworms kernel: --- wd:1 rd:2 nd:1
> Sep 10 07:12:29 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0 us:1
> dev:[dev 00:00]
> Sep 10 07:12:29 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1
> dev:hdc3
> ...snip...
> Sep 10 07:12:29 eatworms kernel: RAID1 conf printout:
> Sep 10 07:12:29 eatworms kernel: --- wd:1 rd:2 nd:2
> Sep 10 07:12:29 eatworms kernel: disk 0, s:0, o:0, n:0 rd:0 us:1
> dev:[dev 00:00]
> Sep 10 07:12:29 eatworms kernel: disk 1, s:0, o:1, n:1 rd:1 us:1
> dev:hdc3
> Sep 10 07:12:29 eatworms kernel: disk 2, s:1, o:0, n:2 rd:2 us:1
> dev:hda3
> ...snip...
> Sep 10 07:12:29 eatworms kernel: md: updating md0 RAID superblock
> on device
> Sep 10 07:12:29 eatworms kernel: md: hda3 [events:
> 00000459]<6>(write) hda3's sb offset: 9765440
> Sep 10 07:12:29 eatworms kernel: md: hdc3 [events:
> 00000459]<6>(write) hdc3's sb offset: 9765440
>
> Here we have a reasonable offset of 9765440 and everything works fine.
>
> I suppose this could be an mdadm bug, but it seems more likely that
> I'm doing something stupid. Could someone enlighten me?
>
> My system config (uname -a):
>
> Linux eatworms.swmed.edu 2.4.22e #1 Tue Feb 17 13:37:36 CST 2004
> i686 unknown unknown GNU/Linux
>
>
> --
> Leon Avery (214) 648-4931 (voice)
> Department of Molecular Biology -1488 (fax)
> University of Texas Southwestern Medical Center
> 6000 Harry Hines Blvd leon@eatworms.swmed.edu
> Dallas, TX 75390-9148 http://eatworms.swmed.edu/~leon/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-09-14 22:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-10 20:30 raidhotadd works, mdadm --add doesn't Leon Avery
2006-09-11 17:37 ` Steve Cousins
2006-09-14 22:30 ` Bill Davidsen
-- strict thread matches above, loose matches on Subject: below --
2006-09-10 20:40 Leon Avery
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).