From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounce@oss.sgi.com>
Received: with ECARTIS (v1.0.0; list xfs); Mon, 24 Mar 2008 16:39:52 -0700 (PDT)
Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130])
	by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m2ONdab6032213
	for <xfs@oss.sgi.com>; Mon, 24 Mar 2008 16:39:42 -0700
Date: Tue, 25 Mar 2008 10:39:52 +1100
From: David Chinner <dgc@sgi.com>
Subject: Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o
Message-ID: <20080324233952.GF103491721@sgi.com>
References: <200803211520.16398.stf_xl@wp.pl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200803211520.16398.stf_xl@wp.pl>
Sender: xfs-bounce@oss.sgi.com
Errors-to: xfs-bounce@oss.sgi.com
List-Id: xfs
To: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: xfs@oss.sgi.com

On Fri, Mar 21, 2008 at 03:20:16PM +0100, Stanislaw Gruszka wrote:
> Hello
> 
> I have problems using xfs and lvm snapshots on linux-2.6.24 , When I do
> lvconvert  to create snapshots and when system is under heavy load, lvconvert
> and I/O processes randomly hung .  I use below script to reproduce, but it is very hard
> to catch this bug.

This looks like an I/O completion problem.

You're writing 2 files and doing snap shots while they are running.

> xfsdatad/1    D 00000000     0   288      2
> Call Trace:
>  [<c05558e6>] rwsem_down_failed_common+0x76/0x170
>  [<c0555a2d>] rwsem_down_write_failed+0x1d/0x24
>  [<c0555aa2>] call_rwsem_down_write_failed+0x6/0x8
>  [<c05555d2>] down_write+0x12/0x20
>  [<c029906a>] xfs_ilock+0x5a/0xa0
>  [<c02c0093>] xfs_setfilesize+0x43/0x130
>  [<c02c0180>] xfs_end_bio_delalloc+0x0/0x20
>  [<c02c018d>] xfs_end_bio_delalloc+0xd/0x20
>  [<c01334d2>] run_workqueue+0x52/0x100
>  [<c01373a2>] prepare_to_wait+0x52/0x70
>  [<c01335ff>] worker_thread+0x7f/0xc0

This is an xfs I/O completion workqueue, waiting to get the inode
ilock in excusive mode to update the file size.

> pdflush       D 00fc61cb     0  7337      2
> Call Trace:
>  [<c0554fd7>] schedule_timeout+0x47/0x90
>  [<c012ca40>] process_timeout+0x0/0x10
>  [<c0137370>] prepare_to_wait+0x20/0x70
>  [<c0554f5b>] io_schedule_timeout+0x1b/0x30
>  [<c0162ede>] congestion_wait+0x7e/0xa0

Is waiting for I/O completion to remove the congestion status.

> lvconvert     D c4010a80     0 12930  12501
> Call Trace:
>  [<c0133719>] flush_cpu_workqueue+0x69/0xa0
>  [<c0133640>] wq_barrier_func+0x0/0x10
>  [<c013377c>] flush_workqueue+0x2c/0x40
>  [<c02c3fc7>] xfs_flush_buftarg+0x17/0x120
>  [<c02b6f26>] xfs_quiesce_fs+0x16/0x70
>  [<c02b6fa0>] xfs_attr_quiesce+0x20/0x60
>  [<c02b8db8>] xfs_freeze+0x8/0x10

That's waiting for the I/O completion workqueue to be flushed.

> dd            D 00fc61cb     0 12953  29684
> Call Trace:
>  [<c0554fd7>] schedule_timeout+0x47/0x90
>  [<c012ca40>] process_timeout+0x0/0x10
>  [<c0137370>] prepare_to_wait+0x20/0x70
>  [<c0554f5b>] io_schedule_timeout+0x1b/0x30
>  [<c0162ede>] congestion_wait+0x7e/0xa0

Stuck in congestion.

This dd (I've trimmed the stack trace to make it readable):

> dd            D c4018ab4     0 12113  29734
> Call Trace:
>  [<c0555b35>] __down+0x75/0xe0
>  [<c04793c7>] dm_unplug_all+0x17/0x30
>  [<c0555a3b>] __down_failed+0x7/0xc
>  [<c02e1ac0>] blk_backing_dev_unplug+0x0/0x10
>  [<c02c2e4c>] xfs_buf_lock+0x3c/0x50
>  [<c02c27d1>] _xfs_buf_find+0x151/0x1d0
>  [<c02c28a5>] xfs_buf_get_flags+0x55/0x130
>  [<c02c299c>] xfs_buf_read_flags+0x1c/0x90
>  [<c02b48cf>] xfs_trans_read_buf+0x16f/0x350
>  [<c02995fd>] xfs_itobp+0x7d/0x250
>  [<c029cfc9>] xfs_iflush+0x99/0x470
>  [<c02bd907>] xfs_inode_flush+0x127/0x1f0
>  [<c02c9732>] xfs_fs_write_inode+0x22/0x80
>  [<c01963fb>] write_inode+0x4b/0x50
>  [<c01966d0>] __sync_single_inode+0xf0/0x190
>  [<c01967b9>] __writeback_single_inode+0x49/0x1c0
>  [<c0196a0e>] sync_sb_inodes+0xde/0x1d0
>  [<c0196ba0>] writeback_inodes+0xa0/0xb0
>  [<c015d043>] balance_dirty_pages+0x193/0x2c0
>  [<c0158ad2>] generic_perform_write+0x142/0x190
>  [<c0158ba7>] generic_file_buffered_write+0x87/0x150
>  [<c02c8d0b>] xfs_write+0x61b/0x8c0
>  [<c02c45f6>] xfs_file_aio_write+0x76/0x90
>  [<c0178e9d>] do_sync_write+0xbd/0x110
>  [<c0179050>] vfs_write+0x160/0x170
>  [<c0179111>] sys_write+0x41/0x70
>  [<c010418e>] syscall_call+0x7/0xb

Is writing to one file, hitting foreground write throttling and
flushing either itself or the other file. It's stuct waiting on 
I/O completion of the inode buffer.

I suspect that the I/O completion has been blocked by the fact it's
trying to get The xfsdatad process is blocked on this inode - the
inode flush takes the ilock shared, which is holding off the I/O
completion. As soon as the inode buffer I/O is issued, then inode
will be unlocked and completion processing can continue.

i.e. it seems that either we can't safely take the ilock in I/O
completion without a trylock or we can't hold the ilock across
I/O submission without a trylock on the buffer lock. Ouch! That's
going to take some fixing....

I'd suggest that these two patches (already queued for 2.6.26):

http://oss.sgi.com/archives/xfs/2008-01/msg00153.html
http://oss.sgi.com/archives/xfs/2008-01/msg00154.html

Which make xfs_iflush do trylocks on the inode buffer in these
writeback cases and that should avoid the problem you are seeing
here. It won't avoid all possible problems, but it will not hang
waiting on buffer I/O completion in async inode flushes like
above....

> I also would like to ask if you have some propositions how to reproduce bug,
> because my scripts need to work few hours or even days to hung processes.

It's pure chance. Hence I don't think there's much you can do to
improve the reproducability of this problem....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group