* deadlock with fallocate
@ 2009-10-07 3:00 Thomas Neumann
2009-10-11 7:57 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Neumann @ 2009-10-07 3:00 UTC (permalink / raw)
To: linux-kernel
Under 2.6.32rc1 I observe a very annoying deadlock in fallocate calls with
XFS. When running a very I/O intensive workload (using posix_fallocate and a
lot of mmaps) the kernel deadlocks semi-reproducible (ca. 60% of the time).
I included an excerpt from dmesg below, but I am not sure how useful it is
for debugging.
Is there any way to isolate the problem? Unfortunately it is not 100%
reproducible and it takes a while to reach the deadlock.
[36600.700131] INFO: task xfsconvertd/0:411 blocked for more than 120
seconds.
[36600.700138] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[36600.700143] xfsconvertd/0 D 0000000000000000 0 411 2
0x00000000
[36600.700153] ffff88007b21d3e0 0000000000000046 ffff88007d4e8c40
ffff88007b21dfd8
[36600.700163] ffff88007adfdb40 0000000000015980 0000000000015980
ffff88007b21dfd8
[36600.700171] 0000000000015980 ffff88007b21dfd8 0000000000015980
ffff88007adfdf00
[36600.700180] Call Trace:
[36600.700195] [<ffffffff81526162>] io_schedule+0x42/0x60
[36600.700204] [<ffffffff810df6d5>] sync_page+0x35/0x50
[36600.700211] [<ffffffff815268e5>] __wait_on_bit+0x55/0x80
[36600.700218] [<ffffffff810df6a0>] ? sync_page+0x0/0x50
[36600.700225] [<ffffffff810df7f0>] wait_on_page_bit+0x70/0x80
[36600.700233] [<ffffffff8107afb0>] ? wake_bit_function+0x0/0x30
[36600.700242] [<ffffffff810ecce8>] shrink_page_list+0x3d8/0x550
[36600.700249] [<ffffffff8107b0de>] ? finish_wait+0x5e/0x80
[36600.700258] [<ffffffff810f85da>] ? congestion_wait+0x7a/0x90
[36600.700265] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
[36600.700273] [<ffffffff810ed7e6>] shrink_inactive_list+0x6b6/0x700
[36600.700281] [<ffffffff81044250>] ? wake_affine+0x220/0x240
[36600.700289] [<ffffffff81048210>] ? __enqueue_entity+0x80/0x90
[36600.700296] [<ffffffff810ed881>] shrink_list+0x51/0xb0
[36600.700303] [<ffffffff810eddea>] shrink_zone+0x1ea/0x200
[36600.700311] [<ffffffff810ee823>] shrink_zones+0x63/0xf0
[36600.700318] [<ffffffff810ee920>] do_try_to_free_pages+0x70/0x280
[36600.700325] [<ffffffff810eec9c>] try_to_free_pages+0x9c/0xc0
[36600.700332] [<ffffffff810ed0e0>] ? isolate_pages_global+0x0/0x50
[36600.700340] [<ffffffff810e6342>] __alloc_pages_slowpath+0x232/0x520
[36600.700347] [<ffffffff810e6776>] __alloc_pages_nodemask+0x146/0x180
[36600.700355] [<ffffffff811143f7>] alloc_pages_current+0x87/0xd0
[36600.700363] [<ffffffff8111939c>] allocate_slab+0x11c/0x1b0
[36600.700369] [<ffffffff8111945b>] new_slab+0x2b/0x190
[36600.700376] [<ffffffff8111b641>] __slab_alloc+0x121/0x230
[36600.700422] [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
[36600.700452] [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
[36600.700459] [<ffffffff8111b980>] kmem_cache_alloc+0xf0/0x130
[36600.700489] [<ffffffffa009b57d>] kmem_zone_alloc+0x5d/0xd0 [xfs]
[36600.700519] [<ffffffffa009b609>] kmem_zone_zalloc+0x19/0x50 [xfs]
[36600.700550] [<ffffffffa009368f>] _xfs_trans_alloc+0x2f/0x70 [xfs]
[36600.700580] [<ffffffffa0093832>] xfs_trans_alloc+0x92/0xa0 [xfs]
[36600.700588] [<ffffffff810f7710>] ? __dec_zone_page_state+0x30/0x40
[36600.700619] [<ffffffffa0083691>] xfs_iomap_write_unwritten+0x71/0x200
[xfs]
[36600.700627] [<ffffffff810e1ad2>] ? mempool_free_slab+0x12/0x20
[36600.700634] [<ffffffff810e1b6a>] ? mempool_free+0x8a/0xa0
[36600.700663] [<ffffffffa009c3d0>] ? xfs_end_bio_unwritten+0x0/0x80 [xfs]
[36600.700692] [<ffffffffa009c435>] xfs_end_bio_unwritten+0x65/0x80 [xfs]
[36600.700701] [<ffffffff81075c47>] run_workqueue+0xb7/0x190
[36600.700707] [<ffffffff81076fa6>] worker_thread+0x96/0xf0
[36600.700714] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
[36600.700721] [<ffffffff81076f10>] ? worker_thread+0x0/0xf0
[36600.700727] [<ffffffff8107ab8e>] kthread+0x8e/0xa0
[36600.700735] [<ffffffff81012f8a>] child_rip+0xa/0x20
[36600.700742] [<ffffffff8107ab00>] ? kthread+0x0/0xa0
[36600.700748] [<ffffffff81012f80>] ? child_rip+0x0/0x20
[36600.700784] INFO: task dataload:3947 blocked for more than 120 seconds.
[36600.700788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[36600.700792] dataload D 0000000000000001 0 3947 2121 0x00000000
[36600.700801] ffff880074f37ce8 0000000000000082 0000000000000086
ffff880074f37fd8
[36600.700810] ffff88007b0316d0 0000000000015980 0000000000015980
ffff880074f37fd8
[36600.700818] 0000000000015980 ffff880074f37fd8 0000000000015980
ffff88007b031a90
[36600.700826] Call Trace:
[36600.700859] [<ffffffffa009c0e5>] xfs_ioend_wait+0x85/0xc0 [xfs]
[36600.700868] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
[36600.700899] [<ffffffffa0097d1d>] xfs_setattr+0x85d/0xb20 [xfs]
[36600.700930] [<ffffffffa0094aa8>] ? xfs_trans_unlocked_item+0x38/0x50
[xfs]
[36600.700960] [<ffffffffa00a2ebd>] xfs_vn_fallocate+0xed/0x100 [xfs]
[36600.700968] [<ffffffff810334a9>] ? default_spin_lock_flags+0x9/0x10
[36600.700976] [<ffffffff81527ecf>] ? _spin_lock_irqsave+0x2f/0x50
[36600.700984] [<ffffffff8112556d>] do_fallocate+0xfd/0x110
[36600.700990] [<ffffffff811255c9>] sys_fallocate+0x49/0x70
[36600.700998] [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b
I have also opened a bug here
http://oss.sgi.com/bugzilla/show_bug.cgi?id=852
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: deadlock with fallocate
2009-10-07 3:00 deadlock with fallocate Thomas Neumann
@ 2009-10-11 7:57 ` Andrew Morton
2009-10-11 8:53 ` Thomas Neumann
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2009-10-11 7:57 UTC (permalink / raw)
To: Thomas Neumann; +Cc: linux-kernel, xfs-masters, Christoph Hellwig
On Wed, 07 Oct 2009 05:00:36 +0200 Thomas Neumann <tneumann@users.sourceforge.net> wrote:
> Under 2.6.32rc1 I observe a very annoying deadlock in fallocate calls with
> XFS. When running a very I/O intensive workload (using posix_fallocate and a
> lot of mmaps) the kernel deadlocks semi-reproducible (ca. 60% of the time).
> I included an excerpt from dmesg below, but I am not sure how useful it is
> for debugging.
> Is there any way to isolate the problem? Unfortunately it is not 100%
> reproducible and it takes a while to reach the deadlock.
I'm wondering how serious this bug is.
Will legacy applications fail on newer kernels? Or is it the case that
only recently-written applications which utilise new kernel
functionality will hit this bug?
If the former then it's time to run around in panic looking for changes
to revert.
[trace retained for the newly cc'ed]
> [36600.700131] INFO: task xfsconvertd/0:411 blocked for more than 120
> seconds.
> [36600.700138] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [36600.700143] xfsconvertd/0 D 0000000000000000 0 411 2
> 0x00000000
> [36600.700153] ffff88007b21d3e0 0000000000000046 ffff88007d4e8c40
> ffff88007b21dfd8
> [36600.700163] ffff88007adfdb40 0000000000015980 0000000000015980
> ffff88007b21dfd8
> [36600.700171] 0000000000015980 ffff88007b21dfd8 0000000000015980
> ffff88007adfdf00
> [36600.700180] Call Trace:
> [36600.700195] [<ffffffff81526162>] io_schedule+0x42/0x60
> [36600.700204] [<ffffffff810df6d5>] sync_page+0x35/0x50
> [36600.700211] [<ffffffff815268e5>] __wait_on_bit+0x55/0x80
> [36600.700218] [<ffffffff810df6a0>] ? sync_page+0x0/0x50
> [36600.700225] [<ffffffff810df7f0>] wait_on_page_bit+0x70/0x80
> [36600.700233] [<ffffffff8107afb0>] ? wake_bit_function+0x0/0x30
> [36600.700242] [<ffffffff810ecce8>] shrink_page_list+0x3d8/0x550
> [36600.700249] [<ffffffff8107b0de>] ? finish_wait+0x5e/0x80
> [36600.700258] [<ffffffff810f85da>] ? congestion_wait+0x7a/0x90
> [36600.700265] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
> [36600.700273] [<ffffffff810ed7e6>] shrink_inactive_list+0x6b6/0x700
> [36600.700281] [<ffffffff81044250>] ? wake_affine+0x220/0x240
> [36600.700289] [<ffffffff81048210>] ? __enqueue_entity+0x80/0x90
> [36600.700296] [<ffffffff810ed881>] shrink_list+0x51/0xb0
> [36600.700303] [<ffffffff810eddea>] shrink_zone+0x1ea/0x200
> [36600.700311] [<ffffffff810ee823>] shrink_zones+0x63/0xf0
> [36600.700318] [<ffffffff810ee920>] do_try_to_free_pages+0x70/0x280
> [36600.700325] [<ffffffff810eec9c>] try_to_free_pages+0x9c/0xc0
> [36600.700332] [<ffffffff810ed0e0>] ? isolate_pages_global+0x0/0x50
> [36600.700340] [<ffffffff810e6342>] __alloc_pages_slowpath+0x232/0x520
> [36600.700347] [<ffffffff810e6776>] __alloc_pages_nodemask+0x146/0x180
> [36600.700355] [<ffffffff811143f7>] alloc_pages_current+0x87/0xd0
> [36600.700363] [<ffffffff8111939c>] allocate_slab+0x11c/0x1b0
> [36600.700369] [<ffffffff8111945b>] new_slab+0x2b/0x190
> [36600.700376] [<ffffffff8111b641>] __slab_alloc+0x121/0x230
> [36600.700422] [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
> [36600.700452] [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
> [36600.700459] [<ffffffff8111b980>] kmem_cache_alloc+0xf0/0x130
> [36600.700489] [<ffffffffa009b57d>] kmem_zone_alloc+0x5d/0xd0 [xfs]
> [36600.700519] [<ffffffffa009b609>] kmem_zone_zalloc+0x19/0x50 [xfs]
> [36600.700550] [<ffffffffa009368f>] _xfs_trans_alloc+0x2f/0x70 [xfs]
> [36600.700580] [<ffffffffa0093832>] xfs_trans_alloc+0x92/0xa0 [xfs]
> [36600.700588] [<ffffffff810f7710>] ? __dec_zone_page_state+0x30/0x40
> [36600.700619] [<ffffffffa0083691>] xfs_iomap_write_unwritten+0x71/0x200
> [xfs]
> [36600.700627] [<ffffffff810e1ad2>] ? mempool_free_slab+0x12/0x20
> [36600.700634] [<ffffffff810e1b6a>] ? mempool_free+0x8a/0xa0
> [36600.700663] [<ffffffffa009c3d0>] ? xfs_end_bio_unwritten+0x0/0x80 [xfs]
> [36600.700692] [<ffffffffa009c435>] xfs_end_bio_unwritten+0x65/0x80 [xfs]
> [36600.700701] [<ffffffff81075c47>] run_workqueue+0xb7/0x190
> [36600.700707] [<ffffffff81076fa6>] worker_thread+0x96/0xf0
> [36600.700714] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
> [36600.700721] [<ffffffff81076f10>] ? worker_thread+0x0/0xf0
> [36600.700727] [<ffffffff8107ab8e>] kthread+0x8e/0xa0
> [36600.700735] [<ffffffff81012f8a>] child_rip+0xa/0x20
> [36600.700742] [<ffffffff8107ab00>] ? kthread+0x0/0xa0
> [36600.700748] [<ffffffff81012f80>] ? child_rip+0x0/0x20
> [36600.700784] INFO: task dataload:3947 blocked for more than 120 seconds.
> [36600.700788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [36600.700792] dataload D 0000000000000001 0 3947 2121 0x00000000
> [36600.700801] ffff880074f37ce8 0000000000000082 0000000000000086
> ffff880074f37fd8
> [36600.700810] ffff88007b0316d0 0000000000015980 0000000000015980
> ffff880074f37fd8
> [36600.700818] 0000000000015980 ffff880074f37fd8 0000000000015980
> ffff88007b031a90
> [36600.700826] Call Trace:
> [36600.700859] [<ffffffffa009c0e5>] xfs_ioend_wait+0x85/0xc0 [xfs]
> [36600.700868] [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
> [36600.700899] [<ffffffffa0097d1d>] xfs_setattr+0x85d/0xb20 [xfs]
> [36600.700930] [<ffffffffa0094aa8>] ? xfs_trans_unlocked_item+0x38/0x50
> [xfs]
> [36600.700960] [<ffffffffa00a2ebd>] xfs_vn_fallocate+0xed/0x100 [xfs]
> [36600.700968] [<ffffffff810334a9>] ? default_spin_lock_flags+0x9/0x10
> [36600.700976] [<ffffffff81527ecf>] ? _spin_lock_irqsave+0x2f/0x50
> [36600.700984] [<ffffffff8112556d>] do_fallocate+0xfd/0x110
> [36600.700990] [<ffffffff811255c9>] sys_fallocate+0x49/0x70
> [36600.700998] [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b
>
> I have also opened a bug here
>
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=852
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: deadlock with fallocate
2009-10-11 7:57 ` Andrew Morton
@ 2009-10-11 8:53 ` Thomas Neumann
2009-10-11 12:17 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Neumann @ 2009-10-11 8:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, xfs-masters, Christoph Hellwig
> Will legacy applications fail on newer kernels? Or is it the case that
> only recently-written applications which utilise new kernel
> functionality will hit this bug?
In theory posix_fallocate has been around for a while and glibc will use
kernel functionality if available, so applications might break. In practice it
is perhaps not that common that applications use fallocate.
The problem is definitively fallocate related. When I replace possix_fallocate
with the equivalent ftruncate64 call the problem goes away. (But then again
the two calls are not really equivalent, and fallocate is the semantic that I
need).
Furthermore the deadlocks seem to start occurring after writing more data than
available main memory, but I did not investigate this thoroughly.
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: deadlock with fallocate
2009-10-11 8:53 ` Thomas Neumann
@ 2009-10-11 12:17 ` Christoph Hellwig
2009-10-11 14:19 ` Thomas Neumann
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2009-10-11 12:17 UTC (permalink / raw)
To: Thomas Neumann
Cc: Andrew Morton, linux-kernel, xfs-masters, Christoph Hellwig
Thomas, I'll look at this ASAP. It's been a bit of a busy time, just
returning from a conference. Any sort of self-contained testcase would
be perfect, though.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: deadlock with fallocate
2009-10-11 12:17 ` Christoph Hellwig
@ 2009-10-11 14:19 ` Thomas Neumann
0 siblings, 0 replies; 5+ messages in thread
From: Thomas Neumann @ 2009-10-11 14:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, xfs-masters
Am Sonntag, 11. Oktober 2009 14:17:52 schrieb Christoph Hellwig:
> Thomas, I'll look at this ASAP. It's been a bit of a busy time, just
> returning from a conference. Any sort of self-contained testcase would
> be perfect, though.
I will send you a mail with a pointer to test data and code tomorrow. It is
"self-contained" in the sense that it has no external dependencies, but it is
somewhat large...
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-10-11 14:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-07 3:00 deadlock with fallocate Thomas Neumann
2009-10-11 7:57 ` Andrew Morton
2009-10-11 8:53 ` Thomas Neumann
2009-10-11 12:17 ` Christoph Hellwig
2009-10-11 14:19 ` Thomas Neumann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox