Array died during grow; now resync stopped

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Array died during grow; now resync stopped
@ 2015-02-02  9:41 Jörg Habenicht
  2015-02-02 19:40 ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Jörg Habenicht @ 2015-02-02  9:41 UTC (permalink / raw)
  To: linux-raid

Hi all,

I had a server crash during an array grow.
Commandline was "mdadm --grow /dev/md0 --raid-devices=6 --chunk=1M"

Now the sync is stuck at 27% and wont continue.
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
      5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6]
[UUUUUU]
      [=====>...............]  reshape = 27.9% (410229760/1465137152)
finish=8670020128.0min speed=0K/sec
      
unused devices: <none>


$ mdadm -D /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Thu Oct  7 09:28:04 2010
     Raid Level : raid5
     Array Size : 5860548608 (5589.05 GiB 6001.20 GB)
  Used Dev Size : 1465137152 (1397.26 GiB 1500.30 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Sun Feb  1 13:30:05 2015
          State : clean, reshaping 
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

 Reshape Status : 27% complete
  Delta Devices : 1, (5->6)
  New Chunksize : 1024K

           Name : stelli:3  (local to host stelli)
           UUID : 52857d77:3806e446:477d4865:d711451e
         Events : 2254869

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       5       8       81        1      active sync   /dev/sdf1
       8       8       49        2      active sync   /dev/sdd1
       7       8       17        3      active sync   /dev/sdb1
       6       8       33        4      active sync   /dev/sdc1
       9       8       97        5      active sync   /dev/sdg1


smartcrl reports the disks are OK. No remapped sectors, no pending writes, etc.

The system load keeps at 2.0:
$ cat /proc/loadavg 
2.00 2.00 1.95 1/140 2937
which may be caused by udevd and md0_reshape
$ ps fax
  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
...
 1671 ?        D      0:00  \_ [md0_reshape]
...
 1289 ?        Ss     0:01 /sbin/udevd --daemon
 1672 ?        D      0:00  \_ /sbin/udevd --daemon


Could this be caused by a software lock?

The system got 2G RAM and 2G swap. Is this sufficient to complete?
$ free
             total       used       free     shared    buffers     cached
Mem:       1799124     351808    1447316        540      14620     286216
-/+ buffers/cache:      50972    1748152
Swap:      2104508          0    2104508


And in dmesg I found this:
$ dmesg | less
[    5.456941] md: bind<sdg1>
[   11.015014] xor: measuring software checksum speed
[   11.051384]    prefetch64-sse:  3291.000 MB/sec
[   11.091375]    generic_sse:  3129.000 MB/sec
[   11.091378] xor: using function: prefetch64-sse (3291.000 MB/sec)
[   11.159365] raid6: sse2x1    1246 MB/s
[   11.227343] raid6: sse2x2    2044 MB/s
[   11.295327] raid6: sse2x4    2487 MB/s
[   11.295331] raid6: using algorithm sse2x4 (2487 MB/s)
[   11.295334] raid6: using intx1 recovery algorithm
[   11.328771] md: raid6 personality registered for level 6
[   11.328776] md: raid5 personality registered for level 5
[   11.328779] md: raid4 personality registered for level 4
[   19.840890] bio: create slab <bio-1> at 1
[  159.701406] md: md0 stopped.
[  159.701413] md: unbind<sdg1>
[  159.709902] md: export_rdev(sdg1)
[  159.709980] md: unbind<sdd1>
[  159.721856] md: export_rdev(sdd1)
[  159.721955] md: unbind<sdb1>
[  159.733883] md: export_rdev(sdb1)
[  159.733991] md: unbind<sdc1>
[  159.749856] md: export_rdev(sdc1)
[  159.749954] md: unbind<sdf1>
[  159.769885] md: export_rdev(sdf1)
[  159.769985] md: unbind<sde1>
[  159.781873] md: export_rdev(sde1)
[  160.471460] md: md0 stopped.
[  160.490329] md: bind<sdf1>
[  160.490478] md: bind<sdd1>
[  160.490689] md: bind<sdb1>
[  160.490911] md: bind<sdc1>
[  160.491164] md: bind<sdg1>
[  160.491408] md: bind<sde1>
[  160.492616] md/raid:md0: reshape will continue
[  160.492638] md/raid:md0: device sde1 operational as raid disk 0
[  160.492640] md/raid:md0: device sdg1 operational as raid disk 5
[  160.492641] md/raid:md0: device sdc1 operational as raid disk 4
[  160.492642] md/raid:md0: device sdb1 operational as raid disk 3
[  160.492644] md/raid:md0: device sdd1 operational as raid disk 2
[  160.492645] md/raid:md0: device sdf1 operational as raid disk 1
[  160.493187] md/raid:md0: allocated 0kB
[  160.493253] md/raid:md0: raid level 5 active with 6 out of 6 devices,
algorithm 2
[  160.493256] RAID conf printout:
[  160.493257]  --- level:5 rd:6 wd:6
[  160.493259]  disk 0, o:1, dev:sde1
[  160.493261]  disk 1, o:1, dev:sdf1
[  160.493262]  disk 2, o:1, dev:sdd1
[  160.493263]  disk 3, o:1, dev:sdb1
[  160.493264]  disk 4, o:1, dev:sdc1
[  160.493266]  disk 5, o:1, dev:sdg1
[  160.493336] md0: detected capacity change from 0 to 6001201774592
[  160.493340] md: reshape of RAID array md0
[  160.493342] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  160.493343] md: using maximum available idle IO bandwidth (but not more
than 200000 KB/sec) for reshape.
[  160.493351] md: using 128k window, over a total of 1465137152k.
[  160.951404]  md0: unknown partition table
[  190.984871] udevd[1289]: worker [1672] /devices/virtual/block/md0
timeout; kill it
[  190.984901] udevd[1289]: seq 2259 '/devices/virtual/block/md0' killed



$ mdadm --version
mdadm - v3.3.1 - 5th June 2014

uname -a
Linux XXXXX 3.14.14-gentoo #3 SMP Sat Jan 31 18:45:04 CET 2015 x86_64 AMD
Athlon(tm) II X2 240e Processor AuthenticAMD GNU/Linux


Currently I can't access the array to read the remaining data, nor can I
continue the array grow.
Can you help me get it running?


best regards 
Jörg




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Array died during grow; now resync stopped
  2015-02-02  9:41 Array died during grow; now resync stopped Jörg Habenicht
@ 2015-02-02 19:40 ` NeilBrown
  2015-02-03 10:35   ` Aw: " "Jörg Habenicht"
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2015-02-02 19:40 UTC (permalink / raw)
  To: Jörg Habenicht; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6762 bytes --]

On Mon, 2 Feb 2015 09:41:02 +0000 (UTC) Jörg Habenicht <j.habenicht@gmx.de>
wrote:

> Hi all,
> 
> I had a server crash during an array grow.
> Commandline was "mdadm --grow /dev/md0 --raid-devices=6 --chunk=1M"
> 
> Now the sync is stuck at 27% and wont continue.
> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] 
> md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
>       5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6]
> [UUUUUU]
>       [=====>...............]  reshape = 27.9% (410229760/1465137152)
> finish=8670020128.0min speed=0K/sec
>       
> unused devices: <none>
> 
> 
> $ mdadm -D /dev/md0
> /dev/md0:
>         Version : 1.0
>   Creation Time : Thu Oct  7 09:28:04 2010
>      Raid Level : raid5
>      Array Size : 5860548608 (5589.05 GiB 6001.20 GB)
>   Used Dev Size : 1465137152 (1397.26 GiB 1500.30 GB)
>    Raid Devices : 6
>   Total Devices : 6
>     Persistence : Superblock is persistent
> 
>     Update Time : Sun Feb  1 13:30:05 2015
>           State : clean, reshaping 
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 256K
> 
>  Reshape Status : 27% complete
>   Delta Devices : 1, (5->6)
>   New Chunksize : 1024K
> 
>            Name : stelli:3  (local to host stelli)
>            UUID : 52857d77:3806e446:477d4865:d711451e
>          Events : 2254869
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       65        0      active sync   /dev/sde1
>        5       8       81        1      active sync   /dev/sdf1
>        8       8       49        2      active sync   /dev/sdd1
>        7       8       17        3      active sync   /dev/sdb1
>        6       8       33        4      active sync   /dev/sdc1
>        9       8       97        5      active sync   /dev/sdg1
> 
> 
> smartcrl reports the disks are OK. No remapped sectors, no pending writes, etc.
> 
> The system load keeps at 2.0:
> $ cat /proc/loadavg 
> 2.00 2.00 1.95 1/140 2937
> which may be caused by udevd and md0_reshape
> $ ps fax
>   PID TTY      STAT   TIME COMMAND
>     2 ?        S      0:00 [kthreadd]
> ...
>  1671 ?        D      0:00  \_ [md0_reshape]
> ...
>  1289 ?        Ss     0:01 /sbin/udevd --daemon
>  1672 ?        D      0:00  \_ /sbin/udevd --daemon
> 
> 
> Could this be caused by a software lock?

Some sort of software problem I suspect.
What does
  cat /proc/1671/stack
  cat /proc/1672/stack
show?

Alternatively,
  echo w > /proc/sysrq-trigger
and see what appears in 'dmesg'.


> 
> The system got 2G RAM and 2G swap. Is this sufficient to complete?

Memory shouldn't be a problem.
However it wouldn't hurt to see what value is in 
  /sys/block/md0/md/stripe_cache_size
and double it.

If all else fails a reboot should be safe and will probably start the reshape
properly.  md is very careful about surviving reboots.

NeilBrown



> $ free
>              total       used       free     shared    buffers     cached
> Mem:       1799124     351808    1447316        540      14620     286216
> -/+ buffers/cache:      50972    1748152
> Swap:      2104508          0    2104508
> 
> 
> And in dmesg I found this:
> $ dmesg | less
> [    5.456941] md: bind<sdg1>
> [   11.015014] xor: measuring software checksum speed
> [   11.051384]    prefetch64-sse:  3291.000 MB/sec
> [   11.091375]    generic_sse:  3129.000 MB/sec
> [   11.091378] xor: using function: prefetch64-sse (3291.000 MB/sec)
> [   11.159365] raid6: sse2x1    1246 MB/s
> [   11.227343] raid6: sse2x2    2044 MB/s
> [   11.295327] raid6: sse2x4    2487 MB/s
> [   11.295331] raid6: using algorithm sse2x4 (2487 MB/s)
> [   11.295334] raid6: using intx1 recovery algorithm
> [   11.328771] md: raid6 personality registered for level 6
> [   11.328776] md: raid5 personality registered for level 5
> [   11.328779] md: raid4 personality registered for level 4
> [   19.840890] bio: create slab <bio-1> at 1
> [  159.701406] md: md0 stopped.
> [  159.701413] md: unbind<sdg1>
> [  159.709902] md: export_rdev(sdg1)
> [  159.709980] md: unbind<sdd1>
> [  159.721856] md: export_rdev(sdd1)
> [  159.721955] md: unbind<sdb1>
> [  159.733883] md: export_rdev(sdb1)
> [  159.733991] md: unbind<sdc1>
> [  159.749856] md: export_rdev(sdc1)
> [  159.749954] md: unbind<sdf1>
> [  159.769885] md: export_rdev(sdf1)
> [  159.769985] md: unbind<sde1>
> [  159.781873] md: export_rdev(sde1)
> [  160.471460] md: md0 stopped.
> [  160.490329] md: bind<sdf1>
> [  160.490478] md: bind<sdd1>
> [  160.490689] md: bind<sdb1>
> [  160.490911] md: bind<sdc1>
> [  160.491164] md: bind<sdg1>
> [  160.491408] md: bind<sde1>
> [  160.492616] md/raid:md0: reshape will continue
> [  160.492638] md/raid:md0: device sde1 operational as raid disk 0
> [  160.492640] md/raid:md0: device sdg1 operational as raid disk 5
> [  160.492641] md/raid:md0: device sdc1 operational as raid disk 4
> [  160.492642] md/raid:md0: device sdb1 operational as raid disk 3
> [  160.492644] md/raid:md0: device sdd1 operational as raid disk 2
> [  160.492645] md/raid:md0: device sdf1 operational as raid disk 1
> [  160.493187] md/raid:md0: allocated 0kB
> [  160.493253] md/raid:md0: raid level 5 active with 6 out of 6 devices,
> algorithm 2
> [  160.493256] RAID conf printout:
> [  160.493257]  --- level:5 rd:6 wd:6
> [  160.493259]  disk 0, o:1, dev:sde1
> [  160.493261]  disk 1, o:1, dev:sdf1
> [  160.493262]  disk 2, o:1, dev:sdd1
> [  160.493263]  disk 3, o:1, dev:sdb1
> [  160.493264]  disk 4, o:1, dev:sdc1
> [  160.493266]  disk 5, o:1, dev:sdg1
> [  160.493336] md0: detected capacity change from 0 to 6001201774592
> [  160.493340] md: reshape of RAID array md0
> [  160.493342] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  160.493343] md: using maximum available idle IO bandwidth (but not more
> than 200000 KB/sec) for reshape.
> [  160.493351] md: using 128k window, over a total of 1465137152k.
> [  160.951404]  md0: unknown partition table
> [  190.984871] udevd[1289]: worker [1672] /devices/virtual/block/md0
> timeout; kill it
> [  190.984901] udevd[1289]: seq 2259 '/devices/virtual/block/md0' killed
> 
> 
> 
> $ mdadm --version
> mdadm - v3.3.1 - 5th June 2014
> 
> uname -a
> Linux XXXXX 3.14.14-gentoo #3 SMP Sat Jan 31 18:45:04 CET 2015 x86_64 AMD
> Athlon(tm) II X2 240e Processor AuthenticAMD GNU/Linux
> 
> 
> Currently I can't access the array to read the remaining data, nor can I
> continue the array grow.
> Can you help me get it running?
> 
> 
> best regards 
> Jörg
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Aw: Re: Array died during grow; now resync stopped
  2015-02-02 19:40 ` NeilBrown
@ 2015-02-03 10:35   ` "Jörg Habenicht"
  2015-02-04  6:45     ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: "Jörg Habenicht" @ 2015-02-03 10:35 UTC (permalink / raw)
  To: linux-raid

Hello Neil,

thank you for caring.

(And sorry for the malformed structure, I have to use webmail.. )


On Mon, 2 Feb 2015 09:41:02 +0000 (UTC) Jörg Habenicht <j.habenicht@gmx.de>
wrote:

> Hi all,
>
> I had a server crash during an array grow.
> Commandline was "mdadm --grow /dev/md0 --raid-devices=6 --chunk=1M"
>
>
>
> Could this be caused by a software lock?

>Some sort of software problem I suspect.
>What does
>cat /proc/1671/stack
>cat /proc/1672/stack
>show?

$ cat /proc/1671/stack
cat: /proc/1671/stack: No such file or directory

Huch?
$ ls /proc/1671
ls: cannot read symbolic link /proc/1671/exe: No such file or directory
attr        comm             fdinfo     mounts      oom_score      stat
autogroup   coredump_filter  io         mountstats  oom_score_adj  statm
auxv        cwd              limits     net         pagemap        status
cgroup      environ          maps       ns          personality    syscall
clear_refs  exe              mem        numa_maps   root           task
cmdline     fd               mountinfo  oom_adj     smaps          wchan
$ id 
uid=0(root) gid=0(root) groups=0(root), ...

$ cat /proc/1672/stack
cat: /proc/1672/stack: No such file or directory


>Alternatively,
>echo w > /proc/sysrq-trigger
>and see what appears in 'dmesg'.

No good:

[99166.625796] SysRq : Show Blocked State
[99166.625829]   task                        PC stack   pid father
[99166.625845] md0_reshape     D ffff88006cb81e08     0  1671      2 0x00000000
[99166.625854]  ffff88006a17fb30 0000000000000046 000000000000a000 ffff88006cc9b7e0
[99166.625861]  ffff88006a17ffd8 ffff88006cc9b7e0 ffff88006fc11830 ffff88006fc11830
[99166.625866]  0000000000000001 ffffffff81068670 ffff88006ca56848 ffff88006fc11830
[99166.625871] Call Trace:
[99166.625884]  [<ffffffff81068670>] ? __dequeue_entity+0x40/0x50
[99166.625891]  [<ffffffff8106b966>] ? pick_next_task_fair+0x56/0x1b0
[99166.625898]  [<ffffffff813f4a50>] ? __schedule+0x2a0/0x820
[99166.625905]  [<ffffffff8106273d>] ? ttwu_do_wakeup+0xd/0x80
[99166.625914]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
[99166.625922]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
[99166.625929]  [<ffffffffa027e83a>] ? reshape_request+0x21a/0x860 [raid456]
[99166.625935]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
[99166.625942]  [<ffffffffa02744f6>] ? sync_request+0x236/0x380 [raid456]
[99166.625955]  [<ffffffffa01557ad>] ? md_do_sync+0x82d/0xd00 [md_mod]
[99166.625961]  [<ffffffff810684b4>] ? update_curr+0x64/0xe0
[99166.625971]  [<ffffffffa0152197>] ? md_thread+0xf7/0x110 [md_mod]
[99166.625977]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
[99166.625985]  [<ffffffffa01520a0>] ? md_register_thread+0xf0/0xf0 [md_mod]
[99166.625991]  [<ffffffff81059de8>] ? kthread+0xb8/0xd0
[99166.625997]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180
[99166.626003]  [<ffffffff813f837c>] ? ret_from_fork+0x7c/0xb0
[99166.626008]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180
[99166.626012] udevd           D ffff88006cb81e08     0  1672   1289 0x00000004
[99166.626017]  ffff88006a1819e8 0000000000000086 000000000000a000 ffff88006c4967a0
[99166.626022]  ffff88006a181fd8 ffff88006c4967a0 0000000000000000 0000000000000000
[99166.626027]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[99166.626032] Call Trace:
[99166.626039]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
[99166.626044]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
[99166.626050]  [<ffffffff810b13e7>] ? get_page_from_freelist+0x507/0x850
[99166.626057]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
[99166.626063]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
[99166.626069]  [<ffffffffa027f627>] ? make_request+0x7a7/0xa00 [raid456]
[99166.626075]  [<ffffffff81080afd>] ? ktime_get_ts+0x3d/0xd0
[99166.626080]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
[99166.626089]  [<ffffffffa014ea12>] ? md_make_request+0xd2/0x210 [md_mod]
[99166.626096]  [<ffffffff811e649d>] ? generic_make_request_checks+0x23d/0x270
[99166.626100]  [<ffffffff810acc68>] ? mempool_alloc+0x58/0x140
[99166.626106]  [<ffffffff811e7238>] ? generic_make_request+0xa8/0xf0
[99166.626111]  [<ffffffff811e72e7>] ? submit_bio+0x67/0x130
[99166.626117]  [<ffffffff8112a638>] ? bio_alloc_bioset+0x1b8/0x2a0
[99166.626123]  [<ffffffff81126a57>] ? _submit_bh+0x127/0x200
[99166.626129]  [<ffffffff8112815d>] ? block_read_full_page+0x1fd/0x290
[99166.626133]  [<ffffffff8112b680>] ? I_BDEV+0x10/0x10
[99166.626140]  [<ffffffff810aad2b>] ? add_to_page_cache_locked+0x6b/0xc0
[99166.626146]  [<ffffffff810b5520>] ? __do_page_cache_readahead+0x1b0/0x220
[99166.626152]  [<ffffffff810b5812>] ? force_page_cache_readahead+0x62/0xa0
[99166.626159]  [<ffffffff810ac936>] ? generic_file_aio_read+0x4b6/0x6c0
[99166.626166]  [<ffffffff810f9f87>] ? do_sync_read+0x57/0x90
[99166.626172]  [<ffffffff810fa571>] ? vfs_read+0xa1/0x180
[99166.626178]  [<ffffffff810fb0ab>] ? SyS_read+0x4b/0xc0
[99166.626183]  [<ffffffff813f7f72>] ? page_fault+0x22/0x30
[99166.626190]  [<ffffffff813f8422>] ? system_call_fastpath+0x16/0x1b


>
> The system got 2G RAM and 2G swap. Is this sufficient to complete?

>Memory shouldn't be a problem.
>However it wouldn't hurt to see what value is in
>/sys/block/md0/md/stripe_cache_size
>and double it.

$ cat /sys/block/md0/md/stripe_cache_size
256

I did not change it due to the crash in md_reshape


>If all else fails a reboot should be safe and will probably start the reshape
>properly. md is very careful about surviving reboots.

I already did reboot twice before I wrote to the list. Same result.


>NeilBrown

 
cu,
Joerg 
 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Array died during grow; now resync stopped
  2015-02-03 10:35   ` Aw: " "Jörg Habenicht"
@ 2015-02-04  6:45     ` NeilBrown
  2015-02-04 11:58       ` Aw: " "Jörg Habenicht"
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2015-02-04  6:45 UTC (permalink / raw)
  To: Jörg Habenicht; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 7305 bytes --]

On Tue, 3 Feb 2015 11:35:53 +0100 "Jörg Habenicht" <j.habenicht@gmx.de> wrote:

> Hello Neil,
> 
> thank you for caring.
> 
> (And sorry for the malformed structure, I have to use webmail.. )
> 
> 
> On Mon, 2 Feb 2015 09:41:02 +0000 (UTC) Jörg Habenicht <j.habenicht@gmx.de>
> wrote:
> 
> > Hi all,
> >
> > I had a server crash during an array grow.
> > Commandline was "mdadm --grow /dev/md0 --raid-devices=6 --chunk=1M"
> >
> >
> >
> > Could this be caused by a software lock?
> 
> >Some sort of software problem I suspect.
> >What does
> >cat /proc/1671/stack
> >cat /proc/1672/stack
> >show?
> 
> $ cat /proc/1671/stack
> cat: /proc/1671/stack: No such file or directory

I guess you don't have that feature compiled into your kernel.

> 
> Huch?
> $ ls /proc/1671
> ls: cannot read symbolic link /proc/1671/exe: No such file or directory
> attr        comm             fdinfo     mounts      oom_score      stat
> autogroup   coredump_filter  io         mountstats  oom_score_adj  statm
> auxv        cwd              limits     net         pagemap        status
> cgroup      environ          maps       ns          personality    syscall
> clear_refs  exe              mem        numa_maps   root           task
> cmdline     fd               mountinfo  oom_adj     smaps          wchan
> $ id 
> uid=0(root) gid=0(root) groups=0(root), ...
> 
> $ cat /proc/1672/stack
> cat: /proc/1672/stack: No such file or directory
> 
> 
> >Alternatively,
> >echo w > /proc/sysrq-trigger
> >and see what appears in 'dmesg'.
> 
> No good:

Quite the reverse, this is exactly what I wanted.  It shows the stack trace
of pid 1671 and 1672..


> 
> [99166.625796] SysRq : Show Blocked State
> [99166.625829]   task                        PC stack   pid father
> [99166.625845] md0_reshape     D ffff88006cb81e08     0  1671      2 0x00000000
> [99166.625854]  ffff88006a17fb30 0000000000000046 000000000000a000 ffff88006cc9b7e0
> [99166.625861]  ffff88006a17ffd8 ffff88006cc9b7e0 ffff88006fc11830 ffff88006fc11830
> [99166.625866]  0000000000000001 ffffffff81068670 ffff88006ca56848 ffff88006fc11830
> [99166.625871] Call Trace:
> [99166.625884]  [<ffffffff81068670>] ? __dequeue_entity+0x40/0x50
> [99166.625891]  [<ffffffff8106b966>] ? pick_next_task_fair+0x56/0x1b0
> [99166.625898]  [<ffffffff813f4a50>] ? __schedule+0x2a0/0x820
> [99166.625905]  [<ffffffff8106273d>] ? ttwu_do_wakeup+0xd/0x80
> [99166.625914]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
> [99166.625922]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> [99166.625929]  [<ffffffffa027e83a>] ? reshape_request+0x21a/0x860 [raid456]
> [99166.625935]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> [99166.625942]  [<ffffffffa02744f6>] ? sync_request+0x236/0x380 [raid456]
> [99166.625955]  [<ffffffffa01557ad>] ? md_do_sync+0x82d/0xd00 [md_mod]
> [99166.625961]  [<ffffffff810684b4>] ? update_curr+0x64/0xe0
> [99166.625971]  [<ffffffffa0152197>] ? md_thread+0xf7/0x110 [md_mod]
> [99166.625977]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> [99166.625985]  [<ffffffffa01520a0>] ? md_register_thread+0xf0/0xf0 [md_mod]
> [99166.625991]  [<ffffffff81059de8>] ? kthread+0xb8/0xd0
> [99166.625997]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180
> [99166.626003]  [<ffffffff813f837c>] ? ret_from_fork+0x7c/0xb0
> [99166.626008]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180

That's not surprise.  Whenever anything goes wrong in raid5, something gets
stuck in get_active_stripe()...


> [99166.626012] udevd           D ffff88006cb81e08     0  1672   1289 0x00000004
> [99166.626017]  ffff88006a1819e8 0000000000000086 000000000000a000 ffff88006c4967a0
> [99166.626022]  ffff88006a181fd8 ffff88006c4967a0 0000000000000000 0000000000000000
> [99166.626027]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [99166.626032] Call Trace:
> [99166.626039]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
> [99166.626044]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
> [99166.626050]  [<ffffffff810b13e7>] ? get_page_from_freelist+0x507/0x850
> [99166.626057]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
> [99166.626063]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> [99166.626069]  [<ffffffffa027f627>] ? make_request+0x7a7/0xa00 [raid456]
> [99166.626075]  [<ffffffff81080afd>] ? ktime_get_ts+0x3d/0xd0
> [99166.626080]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> [99166.626089]  [<ffffffffa014ea12>] ? md_make_request+0xd2/0x210 [md_mod]
> [99166.626096]  [<ffffffff811e649d>] ? generic_make_request_checks+0x23d/0x270
> [99166.626100]  [<ffffffff810acc68>] ? mempool_alloc+0x58/0x140
> [99166.626106]  [<ffffffff811e7238>] ? generic_make_request+0xa8/0xf0
> [99166.626111]  [<ffffffff811e72e7>] ? submit_bio+0x67/0x130
> [99166.626117]  [<ffffffff8112a638>] ? bio_alloc_bioset+0x1b8/0x2a0
> [99166.626123]  [<ffffffff81126a57>] ? _submit_bh+0x127/0x200
> [99166.626129]  [<ffffffff8112815d>] ? block_read_full_page+0x1fd/0x290
> [99166.626133]  [<ffffffff8112b680>] ? I_BDEV+0x10/0x10
> [99166.626140]  [<ffffffff810aad2b>] ? add_to_page_cache_locked+0x6b/0xc0
> [99166.626146]  [<ffffffff810b5520>] ? __do_page_cache_readahead+0x1b0/0x220
> [99166.626152]  [<ffffffff810b5812>] ? force_page_cache_readahead+0x62/0xa0
> [99166.626159]  [<ffffffff810ac936>] ? generic_file_aio_read+0x4b6/0x6c0
> [99166.626166]  [<ffffffff810f9f87>] ? do_sync_read+0x57/0x90
> [99166.626172]  [<ffffffff810fa571>] ? vfs_read+0xa1/0x180
> [99166.626178]  [<ffffffff810fb0ab>] ? SyS_read+0x4b/0xc0
> [99166.626183]  [<ffffffff813f7f72>] ? page_fault+0x22/0x30
> [99166.626190]  [<ffffffff813f8422>] ? system_call_fastpath+0x16/0x1b

And this is stuck in the same place.... what what is consuming all the
stripes I wonder....

> 
> 
> >
> > The system got 2G RAM and 2G swap. Is this sufficient to complete?
> 
> >Memory shouldn't be a problem.
> >However it wouldn't hurt to see what value is in
> >/sys/block/md0/md/stripe_cache_size
> >and double it.
> 
> $ cat /sys/block/md0/md/stripe_cache_size
> 256


You are setting the chunk size to 1M, which is 256 4K pages.
So this stripe_cache only just has enough space to store one full stripe at
the new chunk size.  That isn't enough.

If you double it, the problem should go away.

mdadm should  do that for you .... I wonder why it didn't.


> 
> I did not change it due to the crash in md_reshape

What crash is that?  The above stack traces that you said "No good" about?
That isn't a crash.  That is the kernel showing you stack traces because you
asked for them.

 echo 1024 > /sys/block/md0/md/stripe_cache_size

should make it work.

NeilBrown


> 
> 
> >If all else fails a reboot should be safe and will probably start the reshape
> >properly. md is very careful about surviving reboots.
> 
> I already did reboot twice before I wrote to the list. Same result.
> 
> 
> >NeilBrown
> 
>  
> cu,
> Joerg 
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Aw: Re: Array died during grow; now resync stopped
  2015-02-04  6:45     ` NeilBrown
@ 2015-02-04 11:58       ` "Jörg Habenicht"
  2015-02-05  4:40         ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: "Jörg Habenicht" @ 2015-02-04 11:58 UTC (permalink / raw)
  To: linux-raid

It works! :-)) The array syncs!


> > >What does
> > >cat /proc/1671/stack
> > >cat /proc/1672/stack
> > >show?
> > 
> > $ cat /proc/1671/stack
> > cat: /proc/1671/stack: No such file or directory
> 
> I guess you don't have that feature compiled into your kernel.

Guess so. I'm going to look to the missing CONFIG_ line.

> > >Alternatively,
> > >echo w > /proc/sysrq-trigger
> > >and see what appears in 'dmesg'.
> > 
> > No good:
> 
> Quite the reverse, this is exactly what I wanted.  It shows the stack trace
> of pid 1671 and 1672..

Ahh, OK. I thought it being a crash dump.

> > 
> > [99166.625796] SysRq : Show Blocked State
> > [99166.625829]   task                        PC stack   pid father
> > [99166.625845] md0_reshape     D ffff88006cb81e08     0  1671      2 0x00000000
> > [99166.625854]  ffff88006a17fb30 0000000000000046 000000000000a000 ffff88006cc9b7e0
> > [99166.625861]  ffff88006a17ffd8 ffff88006cc9b7e0 ffff88006fc11830 ffff88006fc11830
> > [99166.625866]  0000000000000001 ffffffff81068670 ffff88006ca56848 ffff88006fc11830
> > [99166.625871] Call Trace:
> > [99166.625884]  [<ffffffff81068670>] ? __dequeue_entity+0x40/0x50
> > [99166.625891]  [<ffffffff8106b966>] ? pick_next_task_fair+0x56/0x1b0
> > [99166.625898]  [<ffffffff813f4a50>] ? __schedule+0x2a0/0x820
> > [99166.625905]  [<ffffffff8106273d>] ? ttwu_do_wakeup+0xd/0x80
> > [99166.625914]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
> > [99166.625922]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> > [99166.625929]  [<ffffffffa027e83a>] ? reshape_request+0x21a/0x860 [raid456]
> > [99166.625935]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> > [99166.625942]  [<ffffffffa02744f6>] ? sync_request+0x236/0x380 [raid456]
> > [99166.625955]  [<ffffffffa01557ad>] ? md_do_sync+0x82d/0xd00 [md_mod]
> > [99166.625961]  [<ffffffff810684b4>] ? update_curr+0x64/0xe0
> > [99166.625971]  [<ffffffffa0152197>] ? md_thread+0xf7/0x110 [md_mod]
> > [99166.625977]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> > [99166.625985]  [<ffffffffa01520a0>] ? md_register_thread+0xf0/0xf0 [md_mod]
> > [99166.625991]  [<ffffffff81059de8>] ? kthread+0xb8/0xd0
> > [99166.625997]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180
> > [99166.626003]  [<ffffffff813f837c>] ? ret_from_fork+0x7c/0xb0
> > [99166.626008]  [<ffffffff81059d30>] ? kthread_create_on_node+0x180/0x180
> 
> That's not surprise.  Whenever anything goes wrong in raid5, something gets
> stuck in get_active_stripe()...
> 
> 
> > [99166.626012] udevd           D ffff88006cb81e08     0  1672   1289 0x00000004
> > [99166.626017]  ffff88006a1819e8 0000000000000086 000000000000a000 ffff88006c4967a0
> > [99166.626022]  ffff88006a181fd8 ffff88006c4967a0 0000000000000000 0000000000000000
> > [99166.626027]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [99166.626032] Call Trace:
> > [99166.626039]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
> > [99166.626044]  [<ffffffff810c24ed>] ? zone_statistics+0x9d/0xa0
> > [99166.626050]  [<ffffffff810b13e7>] ? get_page_from_freelist+0x507/0x850
> > [99166.626057]  [<ffffffffa027b4c5>] ? get_active_stripe+0x185/0x5c0 [raid456]
> > [99166.626063]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> > [99166.626069]  [<ffffffffa027f627>] ? make_request+0x7a7/0xa00 [raid456]
> > [99166.626075]  [<ffffffff81080afd>] ? ktime_get_ts+0x3d/0xd0
> > [99166.626080]  [<ffffffff81072110>] ? __wake_up_sync+0x10/0x10
> > [99166.626089]  [<ffffffffa014ea12>] ? md_make_request+0xd2/0x210 [md_mod]
> > [99166.626096]  [<ffffffff811e649d>] ? generic_make_request_checks+0x23d/0x270
> > [99166.626100]  [<ffffffff810acc68>] ? mempool_alloc+0x58/0x140
> > [99166.626106]  [<ffffffff811e7238>] ? generic_make_request+0xa8/0xf0
> > [99166.626111]  [<ffffffff811e72e7>] ? submit_bio+0x67/0x130
> > [99166.626117]  [<ffffffff8112a638>] ? bio_alloc_bioset+0x1b8/0x2a0
> > [99166.626123]  [<ffffffff81126a57>] ? _submit_bh+0x127/0x200
> > [99166.626129]  [<ffffffff8112815d>] ? block_read_full_page+0x1fd/0x290
> > [99166.626133]  [<ffffffff8112b680>] ? I_BDEV+0x10/0x10
> > [99166.626140]  [<ffffffff810aad2b>] ? add_to_page_cache_locked+0x6b/0xc0
> > [99166.626146]  [<ffffffff810b5520>] ? __do_page_cache_readahead+0x1b0/0x220
> > [99166.626152]  [<ffffffff810b5812>] ? force_page_cache_readahead+0x62/0xa0
> > [99166.626159]  [<ffffffff810ac936>] ? generic_file_aio_read+0x4b6/0x6c0
> > [99166.626166]  [<ffffffff810f9f87>] ? do_sync_read+0x57/0x90
> > [99166.626172]  [<ffffffff810fa571>] ? vfs_read+0xa1/0x180
> > [99166.626178]  [<ffffffff810fb0ab>] ? SyS_read+0x4b/0xc0
> > [99166.626183]  [<ffffffff813f7f72>] ? page_fault+0x22/0x30
> > [99166.626190]  [<ffffffff813f8422>] ? system_call_fastpath+0x16/0x1b
> 
> And this is stuck in the same place.... what what is consuming all the
> stripes I wonder....

Do you like me to collect more information?

> > >
> > > The system got 2G RAM and 2G swap. Is this sufficient to complete?
> > 
> > >Memory shouldn't be a problem.
> > >However it wouldn't hurt to see what value is in
> > >/sys/block/md0/md/stripe_cache_size
> > >and double it.
> > 
> > $ cat /sys/block/md0/md/stripe_cache_size
> > 256
> 
> 
> You are setting the chunk size to 1M, which is 256 4K pages.
> So this stripe_cache only just has enough space to store one full stripe at
> the new chunk size.  That isn't enough.
> 
> If you double it, the problem should go away.
> 
> mdadm should  do that for you .... I wonder why it didn't.
> 

Do you like to have more test results?

> > 
> > I did not change it due to the crash in md_reshape
> 
> What crash is that?  The above stack traces that you said "No good" about?
> That isn't a crash.  That is the kernel showing you stack traces because you
> asked for them.

Ok, Thanks, learned something more :-)

> 
>  echo 1024 > /sys/block/md0/md/stripe_cache_size
> 
> should make it work.

Yes it did!

$ echo 1024 > /sys/block/md0/md/stripe_cache_size
$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
      5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [=====>...............]  reshape = 28.0% (410341376/1465137152) finish=29602912.5min speed=0K/sec
      
unused devices: <none>
$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
      5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [=====>...............]  reshape = 28.0% (410656252/1465137152) finish=7746625.6min speed=2K/sec
      
unused devices: <none>
$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
      5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [=====>...............]  reshape = 28.0% (410851328/1465137152) finish=5314609.8min speed=3K/sec
      
unused devices: <none>
$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[0] sdg1[9] sdc1[6] sdb1[7] sdd1[8] sdf1[5]
      5860548608 blocks super 1.0 level 5, 256k chunk, algorithm 2 [6/6] [UUUUUU]
      [=====>...............]  reshape = 28.4% (416577276/1465137152) finish=870.3min speed=20079K/sec
      
unused devices: <none>

Immediatly it starts to sync.
I wonder why it got stuck at 27% and not at 0%? Shouldn't it get stuck at the beginning if the cache size is not sufficient?
Or is it because of a reboot which happened at 27% sync status?


Anyways,
Thank you for your help.

> 
> NeilBrown
> 
> 

cu,
Joerg



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Array died during grow; now resync stopped
  2015-02-04 11:58       ` Aw: " "Jörg Habenicht"
@ 2015-02-05  4:40         ` NeilBrown
  0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2015-02-05  4:40 UTC (permalink / raw)
  To: Jörg Habenicht; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

On Wed, 4 Feb 2015 12:58:06 +0100 "Jörg Habenicht" <j.habenicht@gmx.de> wrote:

> It works! :-)) The array syncs!

Excellent!

> 
> 
> > > >What does
> > > >cat /proc/1671/stack
> > > >cat /proc/1672/stack
> > > >show?
> > > 
> > > $ cat /proc/1671/stack
> > > cat: /proc/1671/stack: No such file or directory
> > 
> > I guess you don't have that feature compiled into your kernel.
> 
> Guess so. I'm going to look to the missing CONFIG_ line.

CONFIG_STACKTRACE ... in case you haven't found it already.

> > 
> > And this is stuck in the same place.... what what is consuming all the
> > stripes I wonder....
> 
> Do you like me to collect more information?

No thanks, the required information was further down.  They were all consumed
by the reshape (which needs at least twice the largest chunk size).


> > If you double it, the problem should go away.
> > 
> > mdadm should  do that for you .... I wonder why it didn't.
> > 
> 
> Do you like to have more test results?

I had a look at the code and it was fairly obvious.  mdadm adjusts the stripe
cache based on the 'old' chunk size.  It never includes the 'new' chunk size
in those calculations.  I've made a note to fix that.

Thanks,
NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-05  4:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-02  9:41 Array died during grow; now resync stopped Jörg Habenicht
2015-02-02 19:40 ` NeilBrown
2015-02-03 10:35   ` Aw: " "Jörg Habenicht"
2015-02-04  6:45     ` NeilBrown
2015-02-04 11:58       ` Aw: " "Jörg Habenicht"
2015-02-05  4:40         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).