bug in mdadm?

All of lore.kernel.org
 help / color / mirror / Atom feed

* bug in mdadm?
@ 2004-05-29 12:17 Bernd Schubert
  2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
  0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2004-05-29 12:17 UTC (permalink / raw)
  To: linux-raid

Hello,

for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
failed device, though I'm pretty sure that I specified
'--raid-devices=2' when I created that raid-array.
One another system, 'mdadm -D' reports the correct numbers.
The data from /proc/mdstat report the correct numbers.

Any ideas whats the reason for this? Is it a bug in mdadm or has the
superblock really wrong data?

debye:~# ./mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.00
  Creation Time : Sun May 16 15:22:03 2004
     Raid Level : raid1
     Array Size : 2441728 (2.33 GiB 2.50 GB)
    Device Size : 2441728 (2.33 GiB 2.50 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon May 24 20:41:17 2004
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0      22        1        0      active sync   /dev/hdc1
       1       3        1        1      active sync   /dev/hda1
           UUID : fb539f04:91afe349:3591ed8e:f46b8ef1
         Events : 0.26


Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bug in mdadm?
  2004-05-29 12:17 bug in mdadm? Bernd Schubert
@ 2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
  2004-05-29 15:46   ` Guy
  0 siblings, 1 reply; 4+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-29 13:59 UTC (permalink / raw)
  To: linux-raid

Bernd Schubert <Bernd.Schubert@tc.pci.uni-heidelberg.de> wrote:
> for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
> failed device, though I'm pretty sure that I specified
> '--raid-devices=2' when I created that raid-array.
[...]
>    Raid Devices : 2

You did.

>   Total Devices : 3

Plus one spare disk.

>  Active Devices : 2
> Working Devices : 2

Two mirrors up and running.

>  Failed Devices : 1
>   Spare Devices : 0

One disk failed or out-of-sync or something like that.

[moved from above]
> One another system, 'mdadm -D' reports the correct numbers.

What do you expect as 'correct'?
Did you move *all* the physical disks of the one
system to the other?
Did you also move your mdadm.conf (if you didn't
move the disk with the root-fs), if there is one?

> The data from /proc/mdstat report the correct numbers.
> Any ideas whats the reason for this? Is it a bug in mdadm or has the
> superblock really wrong data?

Well, perhaps there is any partition somewhere else
on your disks with the same UUID, which gets merged
to md0 as spare disk: Did you remove a mirror from
md0 in the past and add another one?

Another chance could be you are using mdadm's 'spare
groups'. I don't know, what mdadm does show in this
case.

regards,
   Mario
-- 
reich sein heisst nicht, einen Ferrari zu kaufen, sondern einen zu
verbrennen
                                               Dietmar Wischmeier

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: bug in mdadm?
  2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
@ 2004-05-29 15:46   ` Guy
  0 siblings, 0 replies; 4+ messages in thread
From: Guy @ 2004-05-29 15:46 UTC (permalink / raw)
  To: 'Mario 'BitKoenig' Holbe', linux-raid

No!  It's a bug.  It's been reported here before.  I have a RAID5 with 14
disks and 1 spare.  It reports Raid devices 14, Total 13, Active 14, working
12, failed 1 and spare 1.  I should have data loss!  But nothing is wrong,
see below.

Guy

# mdadm -D /dev/md2
/dev/md2:
        Version : 00.90.00
  Creation Time : Fri Dec 12 17:29:50 2003
     Raid Level : raid5
     Array Size : 230980672 (220.28 GiB 236.57 GB)
    Device Size : 17767744 (16.94 GiB 18.24 GB)
   Raid Devices : 14
  Total Devices : 13
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Mar  1 20:32:41 2004
          State : dirty, no-errors
 Active Devices : 14
Working Devices : 12
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
       1       8      161        1      active sync   /dev/sdk1
       2       8       65        2      active sync   /dev/sde1
       3       8      177        3      active sync   /dev/sdl1
       4       8       81        4      active sync   /dev/sdf1
       5       8      193        5      active sync   /dev/sdm1
       6       8       97        6      active sync   /dev/sdg1
       7       8      209        7      active sync   /dev/sdn1
       8       8      113        8      active sync   /dev/sdh1
       9       8      225        9      active sync   /dev/sdo1
      10       8      129       10      active sync   /dev/sdi1
      11       8      241       11      active sync   /dev/sdp1
      12       8      145       12      active sync   /dev/sdj1
      13      65        1       13      active sync   /dev/sdq1
      14       8       33       14        /dev/sdc1
           UUID : 8357a389:8853c2d1:f160d155:6b4e1b99

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md2 : active raid5 sdc1[14] sdq1[13] sdj1[12] sdp1[11] sdi1[10] sdo1[9]
sdh1[8] sdn1[7] sdg1[6] sdm1[5] sdf1[4] sdl1[3] sde1[2] sdk1[1] sdd1[0]
      230980672 blocks level 5, 64k chunk, algorithm 2 [14/14]
[UUUUUUUUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
      264960 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      17510784 blocks [2/2] [UU]

unused devices: <none>


-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig'
Holbe
Sent: Saturday, May 29, 2004 10:00 AM
To: linux-raid@vger.kernel.org
Subject: Re: bug in mdadm?

Bernd Schubert <Bernd.Schubert@tc.pci.uni-heidelberg.de> wrote:
> for one of our raid1 devices 'mdadm -D' reports 3 devices and 1
> failed device, though I'm pretty sure that I specified
> '--raid-devices=2' when I created that raid-array.
[...]
>    Raid Devices : 2

You did.

>   Total Devices : 3

Plus one spare disk.

>  Active Devices : 2
> Working Devices : 2

Two mirrors up and running.

>  Failed Devices : 1
>   Spare Devices : 0

One disk failed or out-of-sync or something like that.

[moved from above]
> One another system, 'mdadm -D' reports the correct numbers.

What do you expect as 'correct'?
Did you move *all* the physical disks of the one
system to the other?
Did you also move your mdadm.conf (if you didn't
move the disk with the root-fs), if there is one?

> The data from /proc/mdstat report the correct numbers.
> Any ideas whats the reason for this? Is it a bug in mdadm or has the
> superblock really wrong data?

Well, perhaps there is any partition somewhere else
on your disks with the same UUID, which gets merged
to md0 as spare disk: Did you remove a mirror from
md0 in the past and add another one?

Another chance could be you are using mdadm's 'spare
groups'. I don't know, what mdadm does show in this
case.


regards,
   Mario
-- 
reich sein heisst nicht, einen Ferrari zu kaufen, sondern einen zu
verbrennen
                                               Dietmar Wischmeier

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Bug in mdadm?
@ 2003-07-07 13:34 Tapani Utriainen
  0 siblings, 0 replies; 4+ messages in thread
From: Tapani Utriainen @ 2003-07-07 13:34 UTC (permalink / raw)
  To: linux-raid


Hi,

I have struck into something that seems to be a bug in mdadm, and/or in the kernel (2.4.20).

I wanted to create a RAID 5 with 6 disks with mdadm:

# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde

Despite the explicit statement of SIX disks and NO spares it
created an array of SEVEN disks, with ONE spare and ONE missing/failed+removed?!

Output from 'mdadm -D' is found at the end of this message. This is very
likely to be a bug in mdadm.

Now, in case this was just a quirk of some other kind I proceeded with
creating a reiserfs ; mount ; fiddle ; testing of redundancy by
marking a drive as failed.

# mdadm /dev/md0 -f /dev/sdf

After this all processes accessing the fs goes into disk sleep.
(If functional, the array was expected to go into degenerate mode, and me
still being able to access the fs).

In the logs there is an indication of a kernel bug. (See the dump at the
very end of this message)

However I am no software raid expert, and this might just be a result of
severe misusage/misunderstanding of the tools..

//Tapani



* * * * *  MISCONFIGURED ARRAY ?  * * * * *




# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde

# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.00
  Creation Time : Mon Jul  7 13:45:47 2003
     Raid Level : raid5
     Array Size : 603136000 (575.20 GiB 617.61 GB)
    Device Size : 120627200 (115.04 GiB 123.52 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Jul  7 13:45:47 2003
          State : dirty, no-errors
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf
       4      34        0        4      active sync   /dev/hdg
       5       0        0        5      faulty
       6      33        0        6        /dev/hde
           UUID : faa0f80f:a5bacb7f:1caf43d5:d0147d81
         Events : 0.1





* * * * *  KERNEL BUG ?  * * * * *




# mdadm /dev/md0 -f /dev/sdf
mdadm: set /dev/sdf faulty in /dev/md0

From the logs:

# dmesg

...

raid5: Disk failure on sdf, disabling device. Operation continuing on 4
devices
md: updating md0 RAID superblock on device
md: hde [events: 00000002]<6>(write) hde's sb offset: 120627264
md: hdg [events: 00000002]<6>(write) hdg's sb offset: 120627264
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: recovery thread got woken up ...
md0: resyncing spare disk hde to replace failed disk
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 100000 KB/sec) for reconstruction.
md: using 124k window, over a total of 120627200 blocks.
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: (skipping faulty sdf )
md: sde [events: 00000002]<6>(write) sde's sb offset: 120627264
md: sdd [events: 00000002]<6>(write) sdd's sb offset: 120627264
md: sdc [events: 00000002]<6>(write) sdc's sb offset: 120627264
journal-601, buffer write failed
kernel BUG at prints.c:334!

invalid operand: 0000
CPU:    0
EIP:    0010:[<c01aa4b8>]    Not tainted
EFLAGS: 00010282
eax: 00000024   ebx: f7b3b000   ecx: 00000012   edx: ef66ff7c
esi: 00000000   edi: f7b3b000   ebp: 00000003   esp: f7bd3ec0
ds: 0018   es: 0018   ss: 0018
Process kupdated (pid: 7, stackpage=f7bd3000)
Stack: c02bade6 c0355ce0 f7b3b000 f8d264ec c01b584a f7b3b000 c02bc900 00001000
       eecbef80 00000006 00000004 00000000 ee621e40 00000000 00000008 ecb5c000
       00000004 c01b9991 f7b3b000 f8d264ec 00000001 00000006 f8d2f58c 00000004
Call Trace:    [<c01b584a>] [<c01b9991>] [<c01b8ba4>] [<c01a7240>] [<c0141d0a>]
  [<c0140e14>] [<c014118d>] [<c0105000>] [<c0105000>] [<c01058ce>] [<c0141090>]

Code: 0f 0b 4e 01 ec ad 2b c0 85 db 74 0e 0f b7 43 08 89 04 24 e8



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-05-29 15:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-29 12:17 bug in mdadm? Bernd Schubert
2004-05-29 13:59 ` Mario 'BitKoenig' Holbe
2004-05-29 15:46   ` Guy
  -- strict thread matches above, loose matches on Subject: below --
2003-07-07 13:34 Bug " Tapani Utriainen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.