All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brad Campbell <lists2009@fnarfbargle.com>
To: linux-raid@vger.kernel.org
Cc: neilb@suse.de
Subject: Re: What the heck happened to my array?
Date: Tue, 05 Apr 2011 08:47:16 +0800	[thread overview]
Message-ID: <4D9A6694.4040606@fnarfbargle.com> (raw)
In-Reply-To: <BANLkTi=prv_vzfJr2JJt3LLhdB0GFSMy4w@mail.gmail.com>

On 05/04/11 00:49, Roberto Spadim wrote:
> i don´t know but this happened with me on a hp server, with linux
> 2,6,37 i changed kernel to a older release and the problem ended,
> check with neil and others md guys what´s the real problem
> maybe realtime module and others changes inside kernel are the
> problem, maybe not...
> just a quick solution idea: try a older kernel
>

Quick precis:
- Started reshape 512k to 64k chunk size.
- sdd got bad sector and was kicked.
- Array froze all IO.
- Reboot required to get system back.
- Restarted reshape with 9 drives.
- sdl suffered IO error and was kicked
- Array froze all IO.
- Reboot required to get system back.
- Array will no longer mount with 8/10 drives.
- Mdadm 3.1.5 segfaults when trying to start reshape.
   Naively tried to run it under gdb to get a backtrace but was unable 
to stop it forking
- Got array started with mdadm 3.2.1
- Attempted to re-add sdd/sdl (now marked as spares)

root@srv:~/mdadm-3.1.5# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdl[1](S) sdd[6](S) sdc[0] sdh[9] sda[8] sde[7] 
sdg[5] sdb[4] sdf[3] sdm[2]
       7814078464 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[10/8] [U_UUUU_UUU]
       	resync=DELAYED

md2 : active raid5 sdi[0] sdk[3] sdj[1]
       1465146368 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] 
[UUU]

md6 : active raid1 sdo6[0] sdn6[1]
       821539904 blocks [2/2] [UU]

md5 : active raid1 sdo5[0] sdn5[1]
       104864192 blocks [2/2] [UU]

md4 : active raid1 sdo3[0] sdn3[1]
       20980800 blocks [2/2] [UU]

md3 : active (auto-read-only) raid1 sdo2[0] sdn2[1]
       8393856 blocks [2/2] [UU]

md1 : active raid1 sdo1[0] sdn1[1]
       20980736 blocks [2/2] [UU]

unused devices: <none>


[  303.640776] md: bind<sdl>
[  303.677461] md: bind<sdm>
[  303.837358] md: bind<sdf>
[  303.846291] md: bind<sdb>
[  303.851476] md: bind<sdg>
[  303.860725] md: bind<sdd>
[  303.861055] md: bind<sde>
[  303.861982] md: bind<sda>
[  303.862830] md: bind<sdh>
[  303.863128] md: bind<sdc>
[  303.863306] md: kicking non-fresh sdd from array!
[  303.863353] md: unbind<sdd>
[  303.900207] md: export_rdev(sdd)
[  303.900260] md: kicking non-fresh sdl from array!
[  303.900306] md: unbind<sdl>
[  303.940100] md: export_rdev(sdl)
[  303.942181] md/raid:md0: reshape will continue
[  303.942242] md/raid:md0: device sdc operational as raid disk 0
[  303.942285] md/raid:md0: device sdh operational as raid disk 9
[  303.942327] md/raid:md0: device sda operational as raid disk 8
[  303.942368] md/raid:md0: device sde operational as raid disk 7
[  303.942409] md/raid:md0: device sdg operational as raid disk 5
[  303.942449] md/raid:md0: device sdb operational as raid disk 4
[  303.942490] md/raid:md0: device sdf operational as raid disk 3
[  303.942531] md/raid:md0: device sdm operational as raid disk 2
[  303.943733] md/raid:md0: allocated 10572kB
[  303.943866] md/raid:md0: raid level 6 active with 8 out of 10 
devices, algorithm 2
[  303.943912] RAID conf printout:
[  303.943916]  --- level:6 rd:10 wd:8
[  303.943920]  disk 0, o:1, dev:sdc
[  303.943924]  disk 2, o:1, dev:sdm
[  303.943927]  disk 3, o:1, dev:sdf
[  303.943931]  disk 4, o:1, dev:sdb
[  303.943934]  disk 5, o:1, dev:sdg
[  303.943938]  disk 7, o:1, dev:sde
[  303.943941]  disk 8, o:1, dev:sda
[  303.943945]  disk 9, o:1, dev:sdh
[  303.944061] md0: detected capacity change from 0 to 8001616347136
[  303.944366] md: md0 switched to read-write mode.
[  303.944427] md: reshape of RAID array md0
[  303.944469] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  303.944511] md: using maximum available idle IO bandwidth (but not 
more than 200000 KB/sec) for reshape.
[  303.944573] md: using 128k window, over a total of 976759808 blocks.
[  304.054875]  md0: unknown partition table
[  304.393245] mdadm[5940]: segfault at 7f2000 ip 00000000004480d2 sp 
00007fffa04777b8 error 4 in mdadm[400000+64000]


root@srv:~# mdadm --detail /dev/md0
/dev/md0:
         Version : 1.2
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
      Array Size : 7814078464 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
    Raid Devices : 10
   Total Devices : 10
     Persistence : Superblock is persistent

     Update Time : Tue Apr  5 07:54:30 2011
           State : active, degraded
  Active Devices : 8
Working Devices : 10
  Failed Devices : 0
   Spare Devices : 2

          Layout : left-symmetric
      Chunk Size : 512K

   New Chunksize : 64K

            Name : srv:server  (local to host srv)
            UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
          Events : 633835

     Number   Major   Minor   RaidDevice State
        0       8       32        0      active sync   /dev/sdc
        1       0        0        1      removed
        2       8      192        2      active sync   /dev/sdm
        3       8       80        3      active sync   /dev/sdf
        4       8       16        4      active sync   /dev/sdb
        5       8       96        5      active sync   /dev/sdg
        6       0        0        6      removed
        7       8       64        7      active sync   /dev/sde
        8       8        0        8      active sync   /dev/sda
        9       8      112        9      active sync   /dev/sdh

        1       8      176        -      spare   /dev/sdl
        6       8       48        -      spare   /dev/sdd

root@srv:~# for i in /dev/sd? ; do mdadm --examine $i ; done
/dev/sda:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 9beb9a0f:2a73328c:f0c17909:89da70fd

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : c58ed095 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 8
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 75d997f8:d9372d90:c068755b:81c8206b

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : 72321703 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 4
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 5738a232:85f23a16:0c7a9454:d770199c

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : 5c61ea2e - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 83a2c731:ba2846d0:2ce97d83:de624339

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : e1a5ebbc - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : f1e3c1d3:ea9dc52e:a4e6b70e:e25a0321

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : 551997d7 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 7
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdf:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : c32dff71:0b8c165c:9f589b0f:bcbc82da

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : db0aa39b - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdg:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 194bc75c:97d3f507:4915b73a:51a50172

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : 344cadbe - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 5
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdh:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 1326457e:4fc0a6be:0073ccae:398d5c7f

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : 8debbb14 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 9
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdi:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e39d73c3:75be3b52:44d195da:b240c146
            Name : srv:2  (local to host srv)
   Creation Time : Sat Jul 10 21:14:29 2010
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
      Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
   Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : b577b308:56f2e4c9:c78175f4:cf10c77f

     Update Time : Tue Apr  5 07:46:18 2011
        Checksum : 57ee683f - correct
          Events : 455775

          Layout : left-symmetric
      Chunk Size : 64K

    Device Role : Active device 0
    Array State : AAA ('A' == active, '.' == missing)
/dev/sdj:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e39d73c3:75be3b52:44d195da:b240c146
            Name : srv:2  (local to host srv)
   Creation Time : Sat Jul 10 21:14:29 2010
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
      Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
   Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : b127f002:a4aa8800:735ef8d7:6018564e

     Update Time : Tue Apr  5 07:46:18 2011
        Checksum : 3ae0b4c6 - correct
          Events : 455775

          Layout : left-symmetric
      Chunk Size : 64K

    Device Role : Active device 1
    Array State : AAA ('A' == active, '.' == missing)
/dev/sdk:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : e39d73c3:75be3b52:44d195da:b240c146
            Name : srv:2  (local to host srv)
   Creation Time : Sat Jul 10 21:14:29 2010
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 1465147120 (698.64 GiB 750.16 GB)
      Array Size : 2930292736 (1397.27 GiB 1500.31 GB)
   Used Dev Size : 1465146368 (698.64 GiB 750.15 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : clean
     Device UUID : 90fddf63:03d5dba4:3fcdc476:9ce3c44c

     Update Time : Tue Apr  5 07:46:18 2011
        Checksum : dd5eef0e - correct
          Events : 455775

          Layout : left-symmetric
      Chunk Size : 64K

    Device Role : Active device 2
    Array State : AAA ('A' == active, '.' == missing)
/dev/sdl:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 769940af:66733069:37cea27d:7fb28a23

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : dc756202 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : spare
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)
/dev/sdm:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x4
      Array UUID : d00a11d7:fe0435af:07c8d4d6:e3b8e34e
            Name : srv:server  (local to host srv)
   Creation Time : Sat Jan  8 11:25:17 2011
      Raid Level : raid6
    Raid Devices : 10

  Avail Dev Size : 1953523120 (931.51 GiB 1000.20 GB)
      Array Size : 15628156928 (7452.09 GiB 8001.62 GB)
   Used Dev Size : 1953519616 (931.51 GiB 1000.20 GB)
     Data Offset : 2048 sectors
    Super Offset : 8 sectors
           State : active
     Device UUID : 7e564e2c:7f21125b:c3b1907a:b640178f

   Reshape pos'n : 3437035520 (3277.81 GiB 3519.52 GB)
   New Chunksize : 64K

     Update Time : Tue Apr  5 07:54:30 2011
        Checksum : b3df3ee7 - correct
          Events : 633835

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : A.AAAA.AAA ('A' == active, '.' == missing)

root@srv:~/mdadm-3.1.5# ./mdadm --version
mdadm - v3.1.5 - 23rd March 2011

root@srv:~/mdadm-3.1.5# uname -a
Linux srv 2.6.38 #19 SMP Wed Mar 23 09:57:05 WST 2011 x86_64 GNU/Linux

Now. The array restarted with mdadm 3.2.1, but of course its now 
reshaping 8 out of 10 disks, has no redundancy and is going at 600k/s 
which will take over 10 days. Is there anything I can do to give it some 
redundancy while it completes or am I better to copy the data off, blow 
it away and start again? All the important stuff is backed up anyway, I 
just wanted to avoid restoring 8TB from backup if I could.

Regards,
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-04-05  0:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-03 13:32 What the heck happened to my array? (No apparent data loss) Brad Campbell
2011-04-03 15:47 ` Roberto Spadim
2011-04-04  5:59   ` Brad Campbell
2011-04-04 16:49     ` Roberto Spadim
2011-04-05  0:47       ` Brad Campbell [this message]
2011-04-05  6:10         ` What the heck happened to my array? NeilBrown
2011-04-05  9:02           ` Brad Campbell
2011-04-05 11:31             ` NeilBrown
2011-04-05 11:47               ` Brad Campbell
2011-04-08  1:19           ` Brad Campbell
2011-04-08  9:52             ` NeilBrown
2011-04-08 15:27               ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9A6694.4040606@fnarfbargle.com \
    --to=lists2009@fnarfbargle.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.