md raid5 fsync deadlock

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md raid5 fsync deadlock
@ 2012-02-29 23:31 Milan Broz
  2012-03-01  1:53 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2012-02-29 23:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 11405 bytes --]

Hi Neil,

I am repeatedly getting deadlock with MD raid5 & running fio check.

array created just this way
# mdadm -C -l 5 -n 4 -c 64 --assume-clean /dev/md0 /dev/sd[bcde]

and running this test (on quadcore CPU)

# fio --name=global --rw=randwrite --size=1G --bsrange=1k-128k --filename=/dev/md0 --name=job1 --name=job2 --name=job3 --name=job4 --end_fsync=1

deadlocks in final fsync()

I can reproduce it on Fedora 3.2.7 kernel (and also 3.3.0-rc5),
below is part of the sysrq trace (full sysrq in attached gz archive)

I was able to simulate it even when resync is running, it stopped
resync process as well.

Please let me know if you need more information.

Thanks,
Milan

log from3.2.7-1.fc16.x86_64

[ 1182.945100] md0_raid5       S ffff880130f9e040     0  1021      2 0x00000000
[ 1182.952259]  ffff88012b8f3da0 0000000000000046 ffff880100000000 7fffffffffffffff
[ 1182.959778]  ffff880130f9dc80 ffff88012b8f3fd8 ffff88012b8f3fd8 ffff88012b8f3fd8
[ 1182.967272]  ffffffff81a0d020 ffff880130f9dc80 ffff88012b8f3e60 00000001a01a25cf
[ 1182.974853] Call Trace:
[ 1182.977271]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1182.982342]  [<ffffffff815e0505>] schedule_timeout+0x2a5/0x320
[ 1182.988266]  [<ffffffff8109035e>] ? prepare_to_wait+0x5e/0x90
[ 1182.994104]  [<ffffffff814662cc>] md_thread+0xec/0x140
[ 1182.999327]  [<ffffffff81090440>] ? remove_wait_queue+0x50/0x50
[ 1183.005337]  [<ffffffff814661e0>] ? md_rdev_init+0x130/0x130
[ 1183.010998]  [<ffffffff8108fb9c>] kthread+0x8c/0xa0
[ 1183.015893]  [<ffffffff815ebef4>] kernel_thread_helper+0x4/0x10
[ 1183.021928]  [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
[ 1183.028113]  [<ffffffff815ebef0>] ? gs_change+0x13/0x13
[ 1183.033420] flush-9:0       D ffff88012f314920     0  1031      2 0x00000000
[ 1183.040598]  ffff88011e8e35e0 0000000000000046 0003520000000000 ffff88011e8e3590
[ 1183.048133]  ffff88012f314560 ffff88011e8e3fd8 ffff88011e8e3fd8 ffff88011e8e3fd8
[ 1183.055765]  ffff880130f41720 ffff88012f314560 ffff88011e8e35a0 000000011e8e35a0
[ 1183.063304] Call Trace:
[ 1183.065810]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.070861]  [<ffffffffa019e16b>] get_active_stripe+0x20b/0x710 [raid456]
[ 1183.077662]  [<ffffffff81065420>] ? try_to_wake_up+0x290/0x290
[ 1183.083613]  [<ffffffffa01a42ba>] make_request+0x19a/0x460 [raid456]
[ 1183.090064]  [<ffffffff81090440>] ? remove_wait_queue+0x50/0x50
[ 1183.096076]  [<ffffffff81465cc6>] md_make_request+0xc6/0x200
[ 1183.101824]  [<ffffffff8111ca55>] ? mempool_alloc_slab+0x15/0x20
[ 1183.107924]  [<ffffffff8129c257>] generic_make_request+0xc7/0x100
[ 1183.114110]  [<ffffffff8129c317>] submit_bio+0x87/0x110
[ 1183.119421]  [<ffffffff811acd9a>] ? bio_alloc_bioset+0x5a/0xf0
[ 1183.125344]  [<ffffffff811a73cb>] submit_bh+0xeb/0x120
[ 1183.130479]  [<ffffffff811a9590>] __block_write_full_page+0x210/0x3a0
[ 1183.137035]  [<ffffffff811a8cd0>] ? end_buffer_async_read+0x130/0x130
[ 1183.143574]  [<ffffffff811ae8d0>] ? thaw_bdev+0x90/0x90
[ 1183.148884]  [<ffffffff811ae8d0>] ? thaw_bdev+0x90/0x90
[ 1183.154193]  [<ffffffff811aa02d>] block_write_full_page_endio+0xcd/0x100
[ 1183.160993]  [<ffffffff8111a7a4>] ? find_get_pages_tag+0x44/0x140
[ 1183.167180]  [<ffffffff811aa075>] block_write_full_page+0x15/0x20
[ 1183.173365]  [<ffffffff811ae528>] blkdev_writepage+0x18/0x20
[ 1183.179111]  [<ffffffff81123667>] __writepage+0x17/0x40
[ 1183.184420]  [<ffffffff81123acd>] write_cache_pages+0x20d/0x460
[ 1183.190431]  [<ffffffff81123650>] ? set_page_dirty_lock+0x60/0x60
[ 1183.196532]  [<ffffffff81123d6a>] generic_writepages+0x4a/0x70
[ 1183.202476]  [<ffffffff81125401>] do_writepages+0x21/0x40
[ 1183.207960]  [<ffffffff811a00b1>] writeback_single_inode+0x141/0x3e0
[ 1183.214410]  [<ffffffff815df6b4>] ? __schedule+0x3d4/0x8c0
[ 1183.219985]  [<ffffffff811a04f0>] writeback_sb_inodes+0x1a0/0x240
[ 1183.226172]  [<ffffffff811a062e>] __writeback_inodes_wb+0x9e/0xd0
[ 1183.232360]  [<ffffffff811a10ab>] wb_writeback+0x25b/0x310
[ 1183.237846]  [<ffffffff81191fa2>] ? get_nr_inodes+0x52/0x70
[ 1183.243443]  [<ffffffff81192f02>] ? get_nr_dirty_inodes+0x52/0x80
[ 1183.249650]  [<ffffffff811a11ff>] wb_check_old_data_flush+0x9f/0xb0
[ 1183.255925]  [<ffffffff811a20d9>] wb_do_writeback+0x149/0x1d0
[ 1183.261783]  [<ffffffff815df6b4>] ? __schedule+0x3d4/0x8c0
[ 1183.267355]  [<ffffffff8107c520>] ? lock_timer_base+0x70/0x70
[ 1183.273192]  [<ffffffff811a21e3>] bdi_writeback_thread+0x83/0x2a0
[ 1183.279380]  [<ffffffff811a2160>] ? wb_do_writeback+0x1d0/0x1d0
[ 1183.285393]  [<ffffffff8108fb9c>] kthread+0x8c/0xa0
[ 1183.290264]  [<ffffffff815ebef4>] kernel_thread_helper+0x4/0x10
[ 1183.296211]  [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
[ 1183.302330]  [<ffffffff815ebef0>] ? gs_change+0x13/0x13
[ 1183.307661] kworker/0:1     S ffff880130f5e040     0  1042      2 0x00000000
[ 1183.314834]  ffff8800af62de60 0000000000000046 ffff8800af62de00 ffffffff812b1468
[ 1183.322417]  ffff880130f5dc80 ffff8800af62dfd8 ffff8800af62dfd8 ffff8800af62dfd8
[ 1183.329984]  ffff88012bb2ae40 ffff880130f5dc80 ffff88012c6e2478 0000000130f5dc80
[ 1183.337570] Call Trace:
[ 1183.340074]  [<ffffffff812b1468>] ? cfq_kick_queue+0x38/0x60
[ 1183.345824]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.350872]  [<ffffffff8108b2b9>] worker_thread+0x239/0x350
[ 1183.356532]  [<ffffffff8108b080>] ? manage_workers+0x230/0x230
[ 1183.362456]  [<ffffffff8108fb9c>] kthread+0x8c/0xa0
[ 1183.367416]  [<ffffffff815ebef4>] kernel_thread_helper+0x4/0x10
[ 1183.373428]  [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
[ 1183.379615]  [<ffffffff815ebef0>] ? gs_change+0x13/0x13
[ 1183.384924] kworker/0:2     S ffff88012bee9ae0     0  1070      2 0x00000000
[ 1183.392098]  ffff8801131bde60 0000000000000046 ffff8801131bde00 ffffffff812b1468
[ 1183.399665]  ffff88012bee9720 ffff8801131bdfd8 ffff8801131bdfd8 ffff8801131bdfd8
[ 1183.407230]  ffff88012bb2ae40 ffff88012bee9720 ffff88012c6e2478 000000012bee9720
[ 1183.414765] Call Trace:
[ 1183.417268]  [<ffffffff812b1468>] ? cfq_kick_queue+0x38/0x60
[ 1183.423017]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.428065]  [<ffffffff8108b2b9>] worker_thread+0x239/0x350
[ 1183.433725]  [<ffffffff8108b080>] ? manage_workers+0x230/0x230
[ 1183.439646]  [<ffffffff8108fb9c>] kthread+0x8c/0xa0
[ 1183.444607]  [<ffffffff815ebef4>] kernel_thread_helper+0x4/0x10
[ 1183.450620]  [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
[ 1183.456807]  [<ffffffff815ebef0>] ? gs_change+0x13/0x13
[ 1183.462116] fio             S ffff88012c579ae0     0  1091    994 0x00000000
[ 1183.469203]  ffff88012bce3e58 0000000000000082 ffff880100000000 ffffffff81093b11
[ 1183.476851]  ffff88012c579720 ffff88012bce3fd8 ffff88012bce3fd8 ffff88012bce3fd8
[ 1183.484339]  ffff880130f99720 ffff88012c579720 ffff88012bce3e38 0000000100000282
[ 1183.491903] Call Trace:
[ 1183.494408]  [<ffffffff81093b11>] ? lock_hrtimer_base+0x31/0x60
[ 1183.500419]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.505465]  [<ffffffff815e0dac>] do_nanosleep+0x9c/0xd0
[ 1183.510774]  [<ffffffff81094cc8>] hrtimer_nanosleep+0xb8/0x180
[ 1183.516721]  [<ffffffff81093900>] ? update_rmtp+0x70/0x70
[ 1183.522204]  [<ffffffff81094614>] ? hrtimer_start_range_ns+0x14/0x20
[ 1183.528656]  [<ffffffff81094df1>] sys_nanosleep+0x61/0x80
[ 1183.534142]  [<ffffffff815e9d82>] system_call_fastpath+0x16/0x1b
[ 1183.540239] fio             S ffff88012c57b200     0  1092    994 0x00000000
[ 1183.547413]  ffff88012bc73e58 0000000000000082 ffff880100000000 ffffffff8135f33e
[ 1183.554858]  ffff88012c57ae40 ffff88012bc73fd8 ffff88012bc73fd8 ffff88012bc73fd8
[ 1183.562505]  ffff880130f41720 ffff88012c57ae40 ffff88012bd81810 0000000100000282
[ 1183.570090] Call Trace:
[ 1183.572594]  [<ffffffff8135f33e>] ? tty_ldisc_deref+0xe/0x10
[ 1183.578342]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.583301]  [<ffffffff815e0dac>] do_nanosleep+0x9c/0xd0
[ 1183.588633]  [<ffffffff81094cc8>] hrtimer_nanosleep+0xb8/0x180
[ 1183.594578]  [<ffffffff81093900>] ? update_rmtp+0x70/0x70
[ 1183.600062]  [<ffffffff81094614>] ? hrtimer_start_range_ns+0x14/0x20
[ 1183.606510]  [<ffffffff81094df1>] sys_nanosleep+0x61/0x80
[ 1183.611908]  [<ffffffff815e9d82>] system_call_fastpath+0x16/0x1b
[ 1183.618027] fio             D ffff880130f46040     0  1096   1091 0x00000000
[ 1183.625149]  ffff88011469d928 0000000000000082 0003520000000000 ffff88011469d8d8
[ 1183.632733]  ffff880130f45c80 ffff88011469dfd8 ffff88011469dfd8 ffff88011469dfd8
[ 1183.640316]  ffff880130f99720 ffff880130f45c80 ffff88011469d8e8 000000011469d8e8
[ 1183.647896] Call Trace:
[ 1183.650402]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.655453]  [<ffffffffa019e16b>] get_active_stripe+0x20b/0x710 [raid456]
[ 1183.662342]  [<ffffffff81065420>] ? try_to_wake_up+0x290/0x290
[ 1183.668268]  [<ffffffffa01a42ba>] make_request+0x19a/0x460 [raid456]
[ 1183.675248]  [<ffffffff81090440>] ? remove_wait_queue+0x50/0x50
[ 1183.681259]  [<ffffffff81465cc6>] md_make_request+0xc6/0x200
[ 1183.687007]  [<ffffffff8111ca55>] ? mempool_alloc_slab+0x15/0x20
[ 1183.693110]  [<ffffffff8129c257>] generic_make_request+0xc7/0x100
[ 1183.699300]  [<ffffffff8129c317>] submit_bio+0x87/0x110
[ 1183.704610]  [<ffffffff811acd9a>] ? bio_alloc_bioset+0x5a/0xf0
[ 1183.710531]  [<ffffffff811a73cb>] submit_bh+0xeb/0x120
[ 1183.715754]  [<ffffffff811a9590>] __block_write_full_page+0x210/0x3a0
[ 1183.722290]  [<ffffffff811a8cd0>] ? end_buffer_async_read+0x130/0x130
[ 1183.728828]  [<ffffffff811ae8d0>] ? thaw_bdev+0x90/0x90
[ 1183.734139]  [<ffffffff811ae8d0>] ? thaw_bdev+0x90/0x90
[ 1183.739361]  [<ffffffff811aa02d>] block_write_full_page_endio+0xcd/0x100
[ 1183.746183]  [<ffffffff811aa075>] block_write_full_page+0x15/0x20
[ 1183.752369]  [<ffffffff811ae528>] blkdev_writepage+0x18/0x20
[ 1183.758116]  [<ffffffff81123667>] __writepage+0x17/0x40
[ 1183.763425]  [<ffffffff81123acd>] write_cache_pages+0x20d/0x460
[ 1183.769353]  [<ffffffff8113ee47>] ? handle_pte_fault+0xf7/0xac0
[ 1183.775385]  [<ffffffff81123650>] ? set_page_dirty_lock+0x60/0x60
[ 1183.781574]  [<ffffffff81095467>] ? __srcu_read_unlock+0x27/0x30
[ 1183.787673]  [<ffffffff81123d6a>] generic_writepages+0x4a/0x70
[ 1183.793597]  [<ffffffff81125401>] do_writepages+0x21/0x40
[ 1183.799084]  [<ffffffff8111b27b>] __filemap_fdatawrite_range+0x5b/0x60
[ 1183.805709]  [<ffffffff8111b2d2>] filemap_write_and_wait_range+0x52/0x80
[ 1183.812510]  [<ffffffff811ae814>] blkdev_fsync+0x24/0x50
[ 1183.817906]  [<ffffffff811a59b6>] do_fsync+0x56/0x80
[ 1183.822953]  [<ffffffff811a5ce0>] sys_fsync+0x10/0x20
[ 1183.828001]  [<ffffffff815e9d82>] system_call_fastpath+0x16/0x1b
[ 1183.834121] kworker/0:0     S ffff88012bb2b200     0  1097      2 0x00000000
[ 1183.841296]  ffff88009c38de60 0000000000000046 ffff880100000000 ffff880137c17200
[ 1183.848864]  ffff88012bb2ae40 ffff88009c38dfd8 ffff88009c38dfd8 ffff88009c38dfd8
[ 1183.856396]  ffffffff81a0d020 ffff88012bb2ae40 ffff880137c10d68 0000000197fbfe80
[ 1183.863966] Call Trace:
[ 1183.866470]  [<ffffffff815dfebf>] schedule+0x3f/0x60
[ 1183.871516]  [<ffffffff8108b2b9>] worker_thread+0x239/0x350
[ 1183.877176]  [<ffffffff8108b080>] ? manage_workers+0x230/0x230
[ 1183.883099]  [<ffffffff8108fb9c>] kthread+0x8c/0xa0
[ 1183.887969]  [<ffffffff815ebef4>] kernel_thread_helper+0x4/0x10
[ 1183.894002]  [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
[ 1183.900188]  [<ffffffff815ebef0>] ? gs_change+0x13/0x13
...

[-- Attachment #2: messages3.gz --]
[-- Type: application/gzip, Size: 26643 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md raid5 fsync deadlock
  2012-02-29 23:31 md raid5 fsync deadlock Milan Broz
@ 2012-03-01  1:53 ` NeilBrown
  2012-03-01  8:46   ` Milan Broz
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-03-01  1:53 UTC (permalink / raw)
  To: Milan Broz; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Thu, 01 Mar 2012 00:31:08 +0100 Milan Broz <mbroz@redhat.com> wrote:

> Hi Neil,
> 
> I am repeatedly getting deadlock with MD raid5 & running fio check.
> 
> array created just this way
> # mdadm -C -l 5 -n 4 -c 64 --assume-clean /dev/md0 /dev/sd[bcde]
> 
> and running this test (on quadcore CPU)
> 
> # fio --name=global --rw=randwrite --size=1G --bsrange=1k-128k --filename=/dev/md0 --name=job1 --name=job2 --name=job3 --name=job4 --end_fsync=1
> 
> deadlocks in final fsync()
> 
> I can reproduce it on Fedora 3.2.7 kernel (and also 3.3.0-rc5),
> below is part of the sysrq trace (full sysrq in attached gz archive)
> 
> I was able to simulate it even when resync is running, it stopped
> resync process as well.
> 
> Please let me know if you need more information.
> 

Are you certain it is a deadlock?  No forward progress at all?

What is in md/stripe_cache_size?  Does it change?
What happens if you double the number in stripe_cache_size?  What if you
double it again?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md raid5 fsync deadlock
  2012-03-01  1:53 ` NeilBrown
@ 2012-03-01  8:46   ` Milan Broz
  2012-03-04  9:20     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2012-03-01  8:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 03/01/2012 02:53 AM, NeilBrown wrote:
> On Thu, 01 Mar 2012 00:31:08 +0100 Milan Broz<mbroz@redhat.com>  wrote:

> Are you certain it is a deadlock?  No forward progress at all?

Seems so, it was for several hours in this state without progress.

> What is in md/stripe_cache_size?  Does it change?

> What happens if you double the number in stripe_cache_size?  What if you
> double it again?

stripe_cache_size was 256, I doubled it to 512, now
   stripe_cache_active is 390
   stripe_cache size is 512
and no progress.

With stripe_cache size 1024 it survived few iterations of fio run, now it is
locked up again:
   stripe_cache_active is 921
   stripe_cache size is 1024

Thanks,
Milan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md raid5 fsync deadlock
  2012-03-01  8:46   ` Milan Broz
@ 2012-03-04  9:20     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-03-04  9:20 UTC (permalink / raw)
  To: Milan Broz; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

On Thu, 01 Mar 2012 09:46:11 +0100 Milan Broz <mbroz@redhat.com> wrote:

> On 03/01/2012 02:53 AM, NeilBrown wrote:
> > On Thu, 01 Mar 2012 00:31:08 +0100 Milan Broz<mbroz@redhat.com>  wrote:
> 
> > Are you certain it is a deadlock?  No forward progress at all?
> 
> Seems so, it was for several hours in this state without progress.
> 
> > What is in md/stripe_cache_size?  Does it change?
> 
> > What happens if you double the number in stripe_cache_size?  What if you
> > double it again?
> 
> stripe_cache_size was 256, I doubled it to 512, now
>    stripe_cache_active is 390
>    stripe_cache size is 512
> and no progress.
> 
> With stripe_cache size 1024 it survived few iterations of fio run, now it is
> locked up again:
>    stripe_cache_active is 921
>    stripe_cache size is 1024
> 

That definitely looks like something getting stuck inside RAID5.  There are
390 (or 921) stripes that should be being processed but they are blocked
waiting for something.

I would suggest modifying the 'status' function in raid5.c to print out some
details about the stripes in the stripe cache.
You would need to spinlock device_lock, then walk through each chain from
stripe_hashtbl and print out the 'state' and 'count' for each stripe-head and
flags and various bio pointers from each dev.

That might be helpful.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-04  9:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-29 23:31 md raid5 fsync deadlock Milan Broz
2012-03-01  1:53 ` NeilBrown
2012-03-01  8:46   ` Milan Broz
2012-03-04  9:20     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).