mdadm hangs during raid5 grow

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* mdadm hangs during raid5 grow
@ 2017-01-04  6:42 Sebastian Reichel
  2017-01-08 20:50 ` Sebastian Reichel
  0 siblings, 1 reply; 2+ messages in thread
From: Sebastian Reichel @ 2017-01-04  6:42 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 8358 bytes --]

Hi,

I have a problem with hanging mdadm reshape task at 100% CPU load
(kernel thread "md2_raid5"). Any operation on the raid (i.e. mdadm
-S) is also hanging.  Rebooting worked, but after triggering the
reshape (mdadm --readwrite /dev/md2) I get the same behaviour.

dmesg has this stacktrace:

[ 1813.500745] INFO: task md2_resync:3377 blocked for more than 120 seconds.
[ 1813.500778]       Not tainted 4.8.0-2-amd64 #1
[ 1813.500795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.500822] md2_resync      D ffff93207bc98180     0  3377      2 0x00000000
[ 1813.500827]  ffff93206f46d000 ffff93207642a1c0 0000000000000246 ffff932059607bf0
[ 1813.500829]  ffff932059608000 ffff93206effc400 ffff93206effc688 ffff932059607d24
[ 1813.500830]  ffff932059607bf0 ffff93206e0a3000 ffffffffbc7eb6d1 ffff93206e0a3000
[ 1813.500832] Call Trace:
[ 1813.500841]  [<ffffffffbc7eb6d1>] ? schedule+0x31/0x80
[ 1813.500847]  [<ffffffffc0356924>] ? reshape_request+0x7b4/0x910 [raid456]
[ 1813.500851]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
[ 1813.500854]  [<ffffffffc0356da3>] ? raid5_sync_request+0x323/0x3a0 [raid456]
[ 1813.500862]  [<ffffffffc0271b50>] ? is_mddev_idle+0x98/0xf3 [md_mod]
[ 1813.500866]  [<ffffffffc02649a9>] ? md_do_sync+0x959/0xed0 [md_mod]
[ 1813.500868]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
[ 1813.500872]  [<ffffffffc0261363>] ? md_thread+0x133/0x140 [md_mod]
[ 1813.500873]  [<ffffffffbc7eb1c9>] ? __schedule+0x289/0x760
[ 1813.500877]  [<ffffffffc0261230>] ? find_pers+0x70/0x70 [md_mod]
[ 1813.500879]  [<ffffffffbc29aecd>] ? kthread+0xcd/0xf0
[ 1813.500881]  [<ffffffffbc7efcaf>] ? ret_from_fork+0x1f/0x40
[ 1813.500883]  [<ffffffffbc29ae00>] ? kthread_create_on_node+0x190/0x190

Is this a known bug / some patch available?

[0] http://serverfault.com/questions/773244/mdadm-stuck-reshape-operation
[1] http://serverfault.com/questions/697193/raid-5-reshape-freeze

-- Sebastian

FWIW here are some infos about the raid:

mars# uname -a
Linux mars 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux
sre@mars ~ % cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] 
md2 : active raid5 sdl[4] sdk[6] sdm[3] sdj[5]
      7813774720 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  reshape =  0.2% (10197888/3906887360) finish=157305.2min speed=412K/sec
...
mars# for disk in /dev/sd[jmlk]; smartctl -i $disk | grep "Device Model"
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
Device Model:     WDC WD40EFRX-68WT0N0
mars# for disk in /dev/sd[jmlk]; if smartctl -l scterc,70,70 $disk > /dev/null ; then echo "$disk is good"; fi
/dev/sdj is good
/dev/sdk is good
/dev/sdl is good
/dev/sdm is good
mars# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
     Array Size : 7813774720 (7451.80 GiB 8001.31 GB)
  Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Wed Jan  4 06:09:53 2017
          State : clean, reshaping 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Reshape Status : 0% complete
  Delta Devices : 1, (3->4)

           Name : mars:2  (local to host mars)
           UUID : a9946f57:e2c50b35:d192467d:fa495817
         Events : 39050

    Number   Major   Minor   RaidDevice State
       4       8      176        0      active sync   /dev/sdl
       5       8      144        1      active sync   /dev/sdj
       3       8      192        2      active sync   /dev/sdm
       6       8      160        3      active sync   /dev/sdk
mars# mdadm --examine /dev/sd[jmlk]    
/dev/sdj:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 94bb69dc:955c3040:5cc4ecbb:28130785

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : ed4b675b - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdk:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 213a97fc:46865f42:0e8b06be:7309eac1

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4ea4cc90 - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : a25cccf5:d23f2274:30dc0ffa:d8a79f14

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : 85b982a - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdm:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xc
     Array UUID : a9946f57:e2c50b35:d192467d:fa495817
           Name : mars:2  (local to host mars)
  Creation Time : Wed Jan 15 12:59:19 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775280 (3725.90 GiB 4000.65 GB)
     Array Size : 11720662080 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774720 (3725.90 GiB 4000.65 GB)
    Data Offset : 261888 sectors
   Super Offset : 8 sectors
   Unused Space : before=261800 sectors, after=560 sectors
          State : active
    Device UUID : 21517788:cdc63605:5bdc0bee:d23cc234

  Reshape pos'n : 29073984 (27.73 GiB 29.77 GB)
  Delta Devices : 1 (3->4)

    Update Time : Wed Jan  4 06:09:53 2017
  Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
       Checksum : 4039a8c7 - correct
         Events : 39050

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mdadm hangs during raid5 grow
  2017-01-04  6:42 mdadm hangs during raid5 grow Sebastian Reichel
@ 2017-01-08 20:50 ` Sebastian Reichel
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Reichel @ 2017-01-08 20:50 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 10928 bytes --]

Hi,

On Wed, Jan 04, 2017 at 07:42:11AM +0100, Sebastian Reichel wrote:
> I have a problem with hanging mdadm reshape task at 100% CPU load
> (kernel thread "md2_raid5"). Any operation on the raid (i.e. mdadm
> -S) is also hanging.  Rebooting worked, but after triggering the
> reshape (mdadm --readwrite /dev/md2) I get the same behaviour.
> 
> dmesg has this stacktrace:
> 
> [ 1813.500745] INFO: task md2_resync:3377 blocked for more than 120 seconds.
> [ 1813.500778]       Not tainted 4.8.0-2-amd64 #1
> [ 1813.500795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1813.500822] md2_resync      D ffff93207bc98180     0  3377      2 0x00000000
> [ 1813.500827]  ffff93206f46d000 ffff93207642a1c0 0000000000000246 ffff932059607bf0
> [ 1813.500829]  ffff932059608000 ffff93206effc400 ffff93206effc688 ffff932059607d24
> [ 1813.500830]  ffff932059607bf0 ffff93206e0a3000 ffffffffbc7eb6d1 ffff93206e0a3000
> [ 1813.500832] Call Trace:
> [ 1813.500841]  [<ffffffffbc7eb6d1>] ? schedule+0x31/0x80
> [ 1813.500847]  [<ffffffffc0356924>] ? reshape_request+0x7b4/0x910 [raid456]
> [ 1813.500851]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
> [ 1813.500854]  [<ffffffffc0356da3>] ? raid5_sync_request+0x323/0x3a0 [raid456]
> [ 1813.500862]  [<ffffffffc0271b50>] ? is_mddev_idle+0x98/0xf3 [md_mod]
> [ 1813.500866]  [<ffffffffc02649a9>] ? md_do_sync+0x959/0xed0 [md_mod]
> [ 1813.500868]  [<ffffffffbc2bce80>] ? wake_atomic_t_function+0x60/0x60
> [ 1813.500872]  [<ffffffffc0261363>] ? md_thread+0x133/0x140 [md_mod]
> [ 1813.500873]  [<ffffffffbc7eb1c9>] ? __schedule+0x289/0x760
> [ 1813.500877]  [<ffffffffc0261230>] ? find_pers+0x70/0x70 [md_mod]
> [ 1813.500879]  [<ffffffffbc29aecd>] ? kthread+0xcd/0xf0
> [ 1813.500881]  [<ffffffffbc7efcaf>] ? ret_from_fork+0x1f/0x40
> [ 1813.500883]  [<ffffffffbc29ae00>] ? kthread_create_on_node+0x190/0x190
> 
> Is this a known bug / some patch available?

I also tried 4.9.0-rc8, which was easily available without trying to
find a working kernel config for the system and the error persists.
I have some more information:

mars# gdb /usr/lib/debug/lib/modules/4.9.0-rc8-amd64/kernel/drivers/md/raid456.ko
(gdb) info line *reshape_request+0x6dd
Line 5481 of "./drivers/md/raid5.c" starts at address 0xe8e4 <reshape_request+1748> and ends at 0xe8fd <reshape_request+1773>.

=> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/md/raid5.c?h=v4.9-rc8#n5481
=> wait_event(conf->wait_for_overlap, atomic_read(&conf->reshape_stripes) == 0 || test_bit(MD_RECOVERY_INTR, &mddev->recovery));

Also if I enable debug messages in raid5.c I get this from from
[md2_raid5], which is running at 100% cpu on one of the cores (Note
regarding the failed state: I'm pretty sure the disks are ok and I
can use the raid, if I mount it read-only).

# echo "file raid5.c +p" > /sys/kernel/debug/dynamic_debug/control
# dmesg | grep-random-sample
[41658.396897] locked=1 uptodate=4 to_read=0 to_write=0 failed=2 failed_num=2,1
[41658.396904] check 3: state 0x13 read           (null) write           (null) written           (null)
[41658.397113] check 0: state 0x811 read           (null) write           (null) written           (null)
[41658.397139] check 0: state 0x811 read           (null) write           (null) written           (null)
[41658.397163] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397192] handling stripe 20206728, state=0x1401 cnt=1, pd_idx=2, qd_idx=-1
, check:0, reconstruct:6
[41658.397222] schedule_reconstruction: stripe 20206728 locked: 2 ops_request: 10
[41658.397251] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397277] ops_complete_reconstruct: stripe 20206736
[41658.397296] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397328] check 0: state 0x811 read           (null) write           (null) written           (null)
[41658.397351] check 3: state 0x13 read           (null) write           (null) written           (null)
[41658.397368] locked=1 uptodate=4 to_read=0 to_write=0 failed=2 failed_num=2,1
[41658.397387] ops_complete_reconstruct: stripe 20206112
[41658.397395] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.397417] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397476] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397560] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.397592] check 3: state 0x13 read           (null) write           (null) written           (null)
[41658.397606] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.397614] handling stripe 20206792, state=0x1401 cnt=1, pd_idx=2, qd_idx=-1
, check:0, reconstruct:6
[41658.397621] schedule_reconstruction: stripe 20206792 locked: 2 ops_request: 10
[41658.397630] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397638] ops_complete_reconstruct: stripe 20206800
[41658.397840] locked=1 uptodate=4 to_read=0 to_write=0 failed=2 failed_num=2,1
[41658.397863] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397869] ops_complete_reconstruct: stripe 20206976
[41658.397874] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.397883] ops_complete_reconstruct: stripe 20206984
[41658.397890] check 0: state 0x13 read           (null) write           (null) written           (null)
[41658.397899] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.397907] schedule_reconstruction: stripe 20207000 locked: 2 ops_request: 10
[41658.397952] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.398030] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.398079] check 1: state 0x203 read           (null) write           (null) written           (null)
[41658.398105] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.398118] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398124] locked=1 uptodate=4 to_read=0 to_write=0 failed=2 failed_num=2,1
[41658.398130] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398136] schedule_reconstruction: stripe 20206904 locked: 2 ops_request: 10
[41658.398145] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398162] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.398226] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.398294] ops_run_reconstruct5: stripe 20206936
[41658.398308] check 0: state 0x811 read           (null) write           (null) written           (null)
[41658.398318] check 1: state 0x203 read           (null) write           (null) written           (null)
[41658.398342] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.398367] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.398389] ops_complete_reconstruct: stripe 20206184
[41658.398415] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.398469] locked=1 uptodate=4 to_read=0 to_write=0 failed=2 failed_num=2,1
[41658.398494] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398519] schedule_reconstruction: stripe 20206216 locked: 2 ops_request: 10
[41658.398542] check 2: state 0x203 read           (null) write           (null) written           (null)
[41658.398554] ops_run_reconstruct5: stripe 20206224
[41658.398560] check 2: state 0x203 read           (null) write           (null) written           (null)
[41658.398582] ops_run_reconstruct5: stripe 20206240
[41658.398600] check 2: state 0x203 read           (null) write           (null) written           (null)
[41658.398627] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398659] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398733] ops_run_reconstruct5: stripe 20206288
[41658.398754] check 2: state 0x203 read           (null) write           (null) written           (null)
[41658.398785] ops_complete_reconstruct: stripe 20206296
[41658.398792] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.398799] ops_complete_reconstruct: stripe 20206304
[41658.398807] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.398813] handling stripe 20206312, state=0x1401 cnt=1, pd_idx=2, qd_idx=-1
, check:0, reconstruct:6
[41658.398845] check 0: state 0x13 read           (null) write           (null) written           (null)
[41658.398870] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.398901] check 3: state 0x811 read           (null) write           (null) written           (null)
[41658.398990] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.399008] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.399035] check 2: state 0xa01 read           (null) write           (null) written           (null)
[41658.399039] schedule_reconstruction: stripe 20206040 locked: 2 ops_request: 10
[41658.399048] check 1: state 0x203 read           (null) write           (null) written           (null)
[41658.399057] ops_complete_reconstruct: stripe 20206336
[41658.399063] check 1: state 0x203 read           (null) write           (null) written           (null)
[41658.399090] check 1: state 0x203 read           (null) write           (null) written           (null)
[41658.399117] check 1: state 0xa01 read           (null) write           (null) written           (null)
[41658.399142] __get_priority_stripe: handle: busy hold: empty full_writes: 0 bypass_count: 0
[41658.399220] check 3: state 0x13 read           (null) write           (null) written           (null)

Also I have an idea about the problem: Usually I added zero'd disks
to mdadm. This time I only zero'd out the first few MB, so that mdadm
does not complain about the header and thought it would take care of
the rest. Is it safe to remove the new disk, zero it out and re-add
it?

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-01-08 20:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-04  6:42 mdadm hangs during raid5 grow Sebastian Reichel
2017-01-08 20:50 ` Sebastian Reichel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).