* Bug in mdadm?
@ 2003-07-07 13:34 Tapani Utriainen
0 siblings, 0 replies; 4+ messages in thread
From: Tapani Utriainen @ 2003-07-07 13:34 UTC (permalink / raw)
To: linux-raid
Hi,
I have struck into something that seems to be a bug in mdadm, and/or in the kernel (2.4.20).
I wanted to create a RAID 5 with 6 disks with mdadm:
# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde
Despite the explicit statement of SIX disks and NO spares it
created an array of SEVEN disks, with ONE spare and ONE missing/failed+removed?!
Output from 'mdadm -D' is found at the end of this message. This is very
likely to be a bug in mdadm.
Now, in case this was just a quirk of some other kind I proceeded with
creating a reiserfs ; mount ; fiddle ; testing of redundancy by
marking a drive as failed.
# mdadm /dev/md0 -f /dev/sdf
After this all processes accessing the fs goes into disk sleep.
(If functional, the array was expected to go into degenerate mode, and me
still being able to access the fs).
In the logs there is an indication of a kernel bug. (See the dump at the
very end of this message)
However I am no software raid expert, and this might just be a result of
severe misusage/misunderstanding of the tools..
//Tapani
* * * * * MISCONFIGURED ARRAY ? * * * * *
# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde
# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Mon Jul 7 13:45:47 2003
Raid Level : raid5
Array Size : 603136000 (575.20 GiB 617.61 GB)
Device Size : 120627200 (115.04 GiB 123.52 GB)
Raid Devices : 6
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Jul 7 13:45:47 2003
State : dirty, no-errors
Active Devices : 5
Working Devices : 6
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd
2 8 64 2 active sync /dev/sde
3 8 80 3 active sync /dev/sdf
4 34 0 4 active sync /dev/hdg
5 0 0 5 faulty
6 33 0 6 /dev/hde
UUID : faa0f80f:a5bacb7f:1caf43d5:d0147d81
Events : 0.1
* * * * * KERNEL BUG ? * * * * *
# mdadm /dev/md0 -f /dev/sdf
mdadm: set /dev/sdf faulty in /dev/md0
From the logs:
# dmesg
...
raid5: Disk failure on sdf, disabling device. Operation continuing on 4
devices
md: updating md0 RAID superblock on device
md: hde [events: 00000002]<6>(write) hde's sb offset: 120627264
md: hdg [events: 00000002]<6>(write) hdg's sb offset: 120627264
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: recovery thread got woken up ...
md0: resyncing spare disk hde to replace failed disk
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 100000 KB/sec) for reconstruction.
md: using 124k window, over a total of 120627200 blocks.
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
--- rd:6 wd:4 fd:2
disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: (skipping faulty sdf )
md: sde [events: 00000002]<6>(write) sde's sb offset: 120627264
md: sdd [events: 00000002]<6>(write) sdd's sb offset: 120627264
md: sdc [events: 00000002]<6>(write) sdc's sb offset: 120627264
journal-601, buffer write failed
kernel BUG at prints.c:334!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01aa4b8>] Not tainted
EFLAGS: 00010282
eax: 00000024 ebx: f7b3b000 ecx: 00000012 edx: ef66ff7c
esi: 00000000 edi: f7b3b000 ebp: 00000003 esp: f7bd3ec0
ds: 0018 es: 0018 ss: 0018
Process kupdated (pid: 7, stackpage=f7bd3000)
Stack: c02bade6 c0355ce0 f7b3b000 f8d264ec c01b584a f7b3b000 c02bc900 00001000
eecbef80 00000006 00000004 00000000 ee621e40 00000000 00000008 ecb5c000
00000004 c01b9991 f7b3b000 f8d264ec 00000001 00000006 f8d2f58c 00000004
Call Trace: [<c01b584a>] [<c01b9991>] [<c01b8ba4>] [<c01a7240>] [<c0141d0a>]
[<c0140e14>] [<c014118d>] [<c0105000>] [<c0105000>] [<c01058ce>] [<c0141090>]
Code: 0f 0b 4e 01 ec ad 2b c0 85 db 74 0e 0f b7 43 08 89 04 24 e8
^ permalink raw reply [flat|nested] 4+ messages in thread
* bug in mdadm?
@ 2004-05-29 12:17 Bernd Schubert
2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2004-05-29 12:17 UTC (permalink / raw)
To: linux-raid
Hello,
for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
failed device, though I'm pretty sure that I specified
'--raid-devices=2' when I created that raid-array.
One another system, 'mdadm -D' reports the correct numbers.
The data from /proc/mdstat report the correct numbers.
Any ideas whats the reason for this? Is it a bug in mdadm or has the
superblock really wrong data?
debye:~# ./mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Sun May 16 15:22:03 2004
Raid Level : raid1
Array Size : 2441728 (2.33 GiB 2.50 GB)
Device Size : 2441728 (2.33 GiB 2.50 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon May 24 20:41:17 2004
State : dirty, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Number Major Minor RaidDevice State
0 22 1 0 active sync /dev/hdc1
1 3 1 1 active sync /dev/hda1
UUID : fb539f04:91afe349:3591ed8e:f46b8ef1
Events : 0.26
Thanks,
Bernd
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bug in mdadm?
2004-05-29 12:17 bug in mdadm? Bernd Schubert
@ 2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
2004-05-29 15:46 ` Guy
0 siblings, 1 reply; 4+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-29 13:59 UTC (permalink / raw)
To: linux-raid
Bernd Schubert <Bernd.Schubert@tc.pci.uni-heidelberg.de> wrote:
> for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
> failed device, though I'm pretty sure that I specified
> '--raid-devices=2' when I created that raid-array.
[...]
> Raid Devices : 2
You did.
> Total Devices : 3
Plus one spare disk.
> Active Devices : 2
> Working Devices : 2
Two mirrors up and running.
> Failed Devices : 1
> Spare Devices : 0
One disk failed or out-of-sync or something like that.
[moved from above]
> One another system, 'mdadm -D' reports the correct numbers.
What do you expect as 'correct'?
Did you move *all* the physical disks of the one
system to the other?
Did you also move your mdadm.conf (if you didn't
move the disk with the root-fs), if there is one?
> The data from /proc/mdstat report the correct numbers.
> Any ideas whats the reason for this? Is it a bug in mdadm or has the
> superblock really wrong data?
Well, perhaps there is any partition somewhere else
on your disks with the same UUID, which gets merged
to md0 as spare disk: Did you remove a mirror from
md0 in the past and add another one?
Another chance could be you are using mdadm's 'spare
groups'. I don't know, what mdadm does show in this
case.
regards,
Mario
--
reich sein heisst nicht, einen Ferrari zu kaufen, sondern einen zu
verbrennen
Dietmar Wischmeier
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: bug in mdadm?
2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
@ 2004-05-29 15:46 ` Guy
0 siblings, 0 replies; 4+ messages in thread
From: Guy @ 2004-05-29 15:46 UTC (permalink / raw)
To: 'Mario 'BitKoenig' Holbe', linux-raid
No! It's a bug. It's been reported here before. I have a RAID5 with 14
disks and 1 spare. It reports Raid devices 14, Total 13, Active 14, working
12, failed 1 and spare 1. I should have data loss! But nothing is wrong,
see below.
Guy
# mdadm -D /dev/md2
/dev/md2:
Version : 00.90.00
Creation Time : Fri Dec 12 17:29:50 2003
Raid Level : raid5
Array Size : 230980672 (220.28 GiB 236.57 GB)
Device Size : 17767744 (16.94 GiB 18.24 GB)
Raid Devices : 14
Total Devices : 13
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Mon Mar 1 20:32:41 2004
State : dirty, no-errors
Active Devices : 14
Working Devices : 12
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 161 1 active sync /dev/sdk1
2 8 65 2 active sync /dev/sde1
3 8 177 3 active sync /dev/sdl1
4 8 81 4 active sync /dev/sdf1
5 8 193 5 active sync /dev/sdm1
6 8 97 6 active sync /dev/sdg1
7 8 209 7 active sync /dev/sdn1
8 8 113 8 active sync /dev/sdh1
9 8 225 9 active sync /dev/sdo1
10 8 129 10 active sync /dev/sdi1
11 8 241 11 active sync /dev/sdp1
12 8 145 12 active sync /dev/sdj1
13 65 1 13 active sync /dev/sdq1
14 8 33 14 /dev/sdc1
UUID : 8357a389:8853c2d1:f160d155:6b4e1b99
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md2 : active raid5 sdc1[14] sdq1[13] sdj1[12] sdp1[11] sdi1[10] sdo1[9]
sdh1[8] sdn1[7] sdg1[6] sdm1[5] sdf1[4] sdl1[3] sde1[2] sdk1[1] sdd1[0]
230980672 blocks level 5, 64k chunk, algorithm 2 [14/14]
[UUUUUUUUUUUUUU]
md0 : active raid1 sdb1[1] sda1[0]
264960 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
17510784 blocks [2/2] [UU]
unused devices: <none>
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig'
Holbe
Sent: Saturday, May 29, 2004 10:00 AM
To: linux-raid@vger.kernel.org
Subject: Re: bug in mdadm?
Bernd Schubert <Bernd.Schubert@tc.pci.uni-heidelberg.de> wrote:
> for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
> failed device, though I'm pretty sure that I specified
> '--raid-devices=2' when I created that raid-array.
[...]
> Raid Devices : 2
You did.
> Total Devices : 3
Plus one spare disk.
> Active Devices : 2
> Working Devices : 2
Two mirrors up and running.
> Failed Devices : 1
> Spare Devices : 0
One disk failed or out-of-sync or something like that.
[moved from above]
> One another system, 'mdadm -D' reports the correct numbers.
What do you expect as 'correct'?
Did you move *all* the physical disks of the one
system to the other?
Did you also move your mdadm.conf (if you didn't
move the disk with the root-fs), if there is one?
> The data from /proc/mdstat report the correct numbers.
> Any ideas whats the reason for this? Is it a bug in mdadm or has the
> superblock really wrong data?
Well, perhaps there is any partition somewhere else
on your disks with the same UUID, which gets merged
to md0 as spare disk: Did you remove a mirror from
md0 in the past and add another one?
Another chance could be you are using mdadm's 'spare
groups'. I don't know, what mdadm does show in this
case.
regards,
Mario
--
reich sein heisst nicht, einen Ferrari zu kaufen, sondern einen zu
verbrennen
Dietmar Wischmeier
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-05-29 15:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-29 12:17 bug in mdadm? Bernd Schubert
2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
2004-05-29 15:46 ` Guy
-- strict thread matches above, loose matches on Subject: below --
2003-07-07 13:34 Bug " Tapani Utriainen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.