public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* deadlocks with fallocate
@ 2009-10-08  6:59 Thomas Neumann
  2009-10-14 15:09 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Neumann @ 2009-10-08  6:59 UTC (permalink / raw)
  To: linux-xfs

Hi,

I currently observe frequent deadlocks in XFS on an I/O intensive workload 
that uses posix_fallocate. This is very annoying, as I have to power down 
the machine (in an unclean state) afterwards.
I observed this with multiple kernel versions including 2.6.32rc1, the 
symptoms are shown in the dmesg extract below.

I am willing to help to debug the problem, although it is probably a race 
condition, as it does not occur all of the time. Is there anything I should 
do to pinpoint the problem?
It always seems to occur when the user space calls fallocate (100% of my log 
entries contained this function call), but otherwise I am not sure what 
triggers it.

Thomas



INFO: task xfsconvertd/0:411 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfsconvertd/0 D 0000000000000000     0   411      2 0x00000000
 ffff88007b21d3e0 0000000000000046 ffff88007d4e8c40 ffff88007b21dfd8
 ffff88007adfdb40 0000000000015980 0000000000015980 ffff88007b21dfd8
 0000000000015980 ffff88007b21dfd8 0000000000015980 ffff88007adfdf00
Call Trace:
 [<ffffffff81526162>] io_schedule+0x42/0x60
 [<ffffffff810df6d5>] sync_page+0x35/0x50
 [<ffffffff815268e5>] __wait_on_bit+0x55/0x80
 [<ffffffff810df6a0>] ? sync_page+0x0/0x50
 [<ffffffff810df7f0>] wait_on_page_bit+0x70/0x80
 [<ffffffff8107afb0>] ? wake_bit_function+0x0/0x30
 [<ffffffff810ecce8>] shrink_page_list+0x3d8/0x550
 [<ffffffff8107b0de>] ? finish_wait+0x5e/0x80
 [<ffffffff810f85da>] ? congestion_wait+0x7a/0x90
 [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810ed7e6>] shrink_inactive_list+0x6b6/0x700
 [<ffffffff81044250>] ? wake_affine+0x220/0x240
 [<ffffffff81048210>] ? __enqueue_entity+0x80/0x90
 [<ffffffff810ed881>] shrink_list+0x51/0xb0
 [<ffffffff810eddea>] shrink_zone+0x1ea/0x200
 [<ffffffff810ee823>] shrink_zones+0x63/0xf0
 [<ffffffff810ee920>] do_try_to_free_pages+0x70/0x280
 [<ffffffff810eec9c>] try_to_free_pages+0x9c/0xc0
 [<ffffffff810ed0e0>] ? isolate_pages_global+0x0/0x50
 [<ffffffff810e6342>] __alloc_pages_slowpath+0x232/0x520
 [<ffffffff810e6776>] __alloc_pages_nodemask+0x146/0x180
 [<ffffffff811143f7>] alloc_pages_current+0x87/0xd0
 [<ffffffff8111939c>] allocate_slab+0x11c/0x1b0
 [<ffffffff8111945b>] new_slab+0x2b/0x190
 [<ffffffff8111b641>] __slab_alloc+0x121/0x230
 [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
 [<ffffffffa009b57d>] ? kmem_zone_alloc+0x5d/0xd0 [xfs]
 [<ffffffff8111b980>] kmem_cache_alloc+0xf0/0x130
 [<ffffffffa009b57d>] kmem_zone_alloc+0x5d/0xd0 [xfs]
 [<ffffffffa009b609>] kmem_zone_zalloc+0x19/0x50 [xfs]
 [<ffffffffa009368f>] _xfs_trans_alloc+0x2f/0x70 [xfs]
 [<ffffffffa0093832>] xfs_trans_alloc+0x92/0xa0 [xfs]
 [<ffffffff810f7710>] ? __dec_zone_page_state+0x30/0x40
 [<ffffffffa0083691>] xfs_iomap_write_unwritten+0x71/0x200 [xfs]
 [<ffffffff810e1ad2>] ? mempool_free_slab+0x12/0x20
 [<ffffffff810e1b6a>] ? mempool_free+0x8a/0xa0
 [<ffffffffa009c3d0>] ? xfs_end_bio_unwritten+0x0/0x80 [xfs]
 [<ffffffffa009c435>] xfs_end_bio_unwritten+0x65/0x80 [xfs]
 [<ffffffff81075c47>] run_workqueue+0xb7/0x190
 [<ffffffff81076fa6>] worker_thread+0x96/0xf0
 [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81076f10>] ? worker_thread+0x0/0xf0
 [<ffffffff8107ab8e>] kthread+0x8e/0xa0
 [<ffffffff81012f8a>] child_rip+0xa/0x20
 [<ffffffff8107ab00>] ? kthread+0x0/0xa0
 [<ffffffff81012f80>] ? child_rip+0x0/0x20
INFO: task dataload:3947 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dataload     D 0000000000000001     0  3947   2121 0x00000000
 ffff880074f37ce8 0000000000000082 0000000000000086 ffff880074f37fd8
 ffff88007b0316d0 0000000000015980 0000000000015980 ffff880074f37fd8
 0000000000015980 ffff880074f37fd8 0000000000015980 ffff88007b031a90
Call Trace:
 [<ffffffffa009c0e5>] xfs_ioend_wait+0x85/0xc0 [xfs]
 [<ffffffff8107af70>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0097d1d>] xfs_setattr+0x85d/0xb20 [xfs]
 [<ffffffffa0094aa8>] ? xfs_trans_unlocked_item+0x38/0x50 [xfs]
 [<ffffffffa00a2ebd>] xfs_vn_fallocate+0xed/0x100 [xfs]
 [<ffffffff810334a9>] ? default_spin_lock_flags+0x9/0x10
 [<ffffffff81527ecf>] ? _spin_lock_irqsave+0x2f/0x50
 [<ffffffff8112556d>] do_fallocate+0xfd/0x110
 [<ffffffff811255c9>] sys_fallocate+0x49/0x70
 [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: deadlocks with fallocate
  2009-10-08  6:59 deadlocks with fallocate Thomas Neumann
@ 2009-10-14 15:09 ` Christoph Hellwig
  2009-10-14 23:16   ` Thomas Neumann
  2009-10-15  5:58   ` Thomas Neumann
  0 siblings, 2 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-10-14 15:09 UTC (permalink / raw)
  To: Thomas Neumann; +Cc: linux-xfs

On Thu, Oct 08, 2009 at 08:59:45AM +0200, Thomas Neumann wrote:
> I am willing to help to debug the problem, although it is probably a race 
> condition, as it does not occur all of the time. Is there anything I should 
> do to pinpoint the problem?
> It always seems to occur when the user space calls fallocate (100% of my log 
> entries contained this function call), but otherwise I am not sure what 
> triggers it.

I think we're deadlocking here because we have one process waiting from
I/O completions from another task, but the waiting task holds a lock
that the I/O completion needs.

> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> xfsconvertd/0 D 0000000000000000     0   411      2 0x00000000
>  ffff88007b21d3e0 0000000000000046 ffff88007d4e8c40 ffff88007b21dfd8
>  ffff88007adfdb40 0000000000015980 0000000000015980 ffff88007b21dfd8
>  0000000000015980 ffff88007b21dfd8 0000000000015980 ffff88007adfdf00
> Call Trace:
>  [<ffffffff81526162>] io_schedule+0x42/0x60
>  [<ffffffff810df6d5>] sync_page+0x35/0x50
>  [<ffffffff815268e5>] __wait_on_bit+0x55/0x80
>  [<ffffffff810df7f0>] wait_on_page_bit+0x70/0x80
>  [<ffffffff810ecce8>] shrink_page_list+0x3d8/0x550
>  [<ffffffff810ed7e6>] shrink_inactive_list+0x6b6/0x700
>  [<ffffffff810ed881>] shrink_list+0x51/0xb0
>  [<ffffffff810eddea>] shrink_zone+0x1ea/0x200
>  [<ffffffff810ee823>] shrink_zones+0x63/0xf0
>  [<ffffffff810ee920>] do_try_to_free_pages+0x70/0x280
>  [<ffffffff810eec9c>] try_to_free_pages+0x9c/0xc0
>  [<ffffffff810e6342>] __alloc_pages_slowpath+0x232/0x520
>  [<ffffffff810e6776>] __alloc_pages_nodemask+0x146/0x180
>  [<ffffffff811143f7>] alloc_pages_current+0x87/0xd0
>  [<ffffffff8111939c>] allocate_slab+0x11c/0x1b0
>  [<ffffffff8111945b>] new_slab+0x2b/0x190
>  [<ffffffff8111b641>] __slab_alloc+0x121/0x230
>  [<ffffffff8111b980>] kmem_cache_alloc+0xf0/0x130
>  [<ffffffffa009b57d>] kmem_zone_alloc+0x5d/0xd0 [xfs]
>  [<ffffffffa009b609>] kmem_zone_zalloc+0x19/0x50 [xfs]
>  [<ffffffffa009368f>] _xfs_trans_alloc+0x2f/0x70 [xfs]
>  [<ffffffffa0093832>] xfs_trans_alloc+0x92/0xa0 [xfs]
>  [<ffffffffa0083691>] xfs_iomap_write_unwritten+0x71/0x200 [xfs]
>  [<ffffffffa009c435>] xfs_end_bio_unwritten+0x65/0x80 [xfs]
>  [<ffffffff81075c47>] run_workqueue+0xb7/0x190
>  [<ffffffff81076fa6>] worker_thread+0x96/0xf0
>  [<ffffffff81012f8a>] child_rip+0xa/0x20

This thread is completing an unwritten I/O, which is the XFS name
for preallocated space as the one that is created by posix_fallocate
since the system call was wired up.  It needs to allocate memory
for the transaction pointer, and we go down all the way into the
page allocator where we get stuck, probably waiting for I/O on the
same inode.  

>  [<ffffffffa009c0e5>] xfs_ioend_wait+0x85/0xc0 [xfs]
>  [<ffffffffa0097d1d>] xfs_setattr+0x85d/0xb20 [xfs]
>  [<ffffffffa00a2ebd>] xfs_vn_fallocate+0xed/0x100 [xfs]
>  [<ffffffff8112556d>] do_fallocate+0xfd/0x110
>  [<ffffffff811255c9>] sys_fallocate+0x49/0x70
>  [<ffffffff81011f42>] system_call_fastpath+0x16/0x1b

And this thread is inside the fallocate handler.

Can you try if the hack below makes the problem go away?


Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2009-10-14 17:06:50.489254023 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2009-10-14 17:07:54.055006445 +0200
@@ -278,6 +278,8 @@ xfs_end_bio_unwritten(
 	xfs_off_t		offset = ioend->io_offset;
 	size_t			size = ioend->io_size;
 
+	current->flags |= PF_FSTRANS;
+
 	if (likely(!ioend->io_error)) {
 		if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 			int error;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: deadlocks with fallocate
  2009-10-14 15:09 ` Christoph Hellwig
@ 2009-10-14 23:16   ` Thomas Neumann
  2009-10-15  5:58   ` Thomas Neumann
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Neumann @ 2009-10-14 23:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

> And this thread is inside the fallocate handler.
> 
> Can you try if the hack below makes the problem go away?
> 
> +	current->flags |= PF_FSTRANS;

that seems to do the trick. I will test it some more, but so far it has 
already survived a workload that usually triggered the deadlock quite 
reliably.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: deadlocks with fallocate
  2009-10-14 15:09 ` Christoph Hellwig
  2009-10-14 23:16   ` Thomas Neumann
@ 2009-10-15  5:58   ` Thomas Neumann
  2009-10-15 12:13     ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Neumann @ 2009-10-15  5:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

> Can you try if the hack below makes the problem go away?
> 
> 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2009-10-14
>  17:06:50.489254023 +0200 +++
>  linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2009-10-14 17:07:54.055006445 +0200
>  @@ -278,6 +278,8 @@ xfs_end_bio_unwritten(
>  	xfs_off_t		offset = ioend->io_offset;
>  	size_t			size = ioend->io_size;
> 
> +	current->flags |= PF_FSTRANS;
> +
>  	if (likely(!ioend->io_error)) {
>  		if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
>  			int error;
> 

I did some more tests, and indeed the patch makes the deadlock go away. Your 
description of it as a hack makes me somewhat concerned about using it 
permanently, but hopefully you will find a proper solution ;)

Thanks for looking at this.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: deadlocks with fallocate
  2009-10-15  5:58   ` Thomas Neumann
@ 2009-10-15 12:13     ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2009-10-15 12:13 UTC (permalink / raw)
  To: Thomas Neumann; +Cc: Christoph Hellwig, linux-xfs

On Thu, Oct 15, 2009 at 07:58:05AM +0200, Thomas Neumann wrote:
> I did some more tests, and indeed the patch makes the deadlock go away. Your 
> description of it as a hack makes me somewhat concerned about using it 
> permanently, but hopefully you will find a proper solution ;)

It's basically a flag to say all allocations from this thread should
not recurse back into filesystem code.  I'll either need to change
the allocations manually or keep this flag and write a comment
documenting why.  I'll decide what to on my long airplane ride tomorrow
and will make sure it goes off to Linus' tree ASAP after that.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-15 12:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-08  6:59 deadlocks with fallocate Thomas Neumann
2009-10-14 15:09 ` Christoph Hellwig
2009-10-14 23:16   ` Thomas Neumann
2009-10-15  5:58   ` Thomas Neumann
2009-10-15 12:13     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox