From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 24 Mar 2008 16:39:52 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m2ONdab6032213 for ; Mon, 24 Mar 2008 16:39:42 -0700 Date: Tue, 25 Mar 2008 10:39:52 +1100 From: David Chinner Subject: Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o Message-ID: <20080324233952.GF103491721@sgi.com> References: <200803211520.16398.stf_xl@wp.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200803211520.16398.stf_xl@wp.pl> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Stanislaw Gruszka Cc: xfs@oss.sgi.com On Fri, Mar 21, 2008 at 03:20:16PM +0100, Stanislaw Gruszka wrote: > Hello > > I have problems using xfs and lvm snapshots on linux-2.6.24 , When I do > lvconvert to create snapshots and when system is under heavy load, lvconvert > and I/O processes randomly hung . I use below script to reproduce, but it is very hard > to catch this bug. This looks like an I/O completion problem. You're writing 2 files and doing snap shots while they are running. > xfsdatad/1 D 00000000 0 288 2 > Call Trace: > [] rwsem_down_failed_common+0x76/0x170 > [] rwsem_down_write_failed+0x1d/0x24 > [] call_rwsem_down_write_failed+0x6/0x8 > [] down_write+0x12/0x20 > [] xfs_ilock+0x5a/0xa0 > [] xfs_setfilesize+0x43/0x130 > [] xfs_end_bio_delalloc+0x0/0x20 > [] xfs_end_bio_delalloc+0xd/0x20 > [] run_workqueue+0x52/0x100 > [] prepare_to_wait+0x52/0x70 > [] worker_thread+0x7f/0xc0 This is an xfs I/O completion workqueue, waiting to get the inode ilock in excusive mode to update the file size. > pdflush D 00fc61cb 0 7337 2 > Call Trace: > [] schedule_timeout+0x47/0x90 > [] process_timeout+0x0/0x10 > [] prepare_to_wait+0x20/0x70 > [] io_schedule_timeout+0x1b/0x30 > [] congestion_wait+0x7e/0xa0 Is waiting for I/O completion to remove the congestion status. > lvconvert D c4010a80 0 12930 12501 > Call Trace: > [] flush_cpu_workqueue+0x69/0xa0 > [] wq_barrier_func+0x0/0x10 > [] flush_workqueue+0x2c/0x40 > [] xfs_flush_buftarg+0x17/0x120 > [] xfs_quiesce_fs+0x16/0x70 > [] xfs_attr_quiesce+0x20/0x60 > [] xfs_freeze+0x8/0x10 That's waiting for the I/O completion workqueue to be flushed. > dd D 00fc61cb 0 12953 29684 > Call Trace: > [] schedule_timeout+0x47/0x90 > [] process_timeout+0x0/0x10 > [] prepare_to_wait+0x20/0x70 > [] io_schedule_timeout+0x1b/0x30 > [] congestion_wait+0x7e/0xa0 Stuck in congestion. This dd (I've trimmed the stack trace to make it readable): > dd D c4018ab4 0 12113 29734 > Call Trace: > [] __down+0x75/0xe0 > [] dm_unplug_all+0x17/0x30 > [] __down_failed+0x7/0xc > [] blk_backing_dev_unplug+0x0/0x10 > [] xfs_buf_lock+0x3c/0x50 > [] _xfs_buf_find+0x151/0x1d0 > [] xfs_buf_get_flags+0x55/0x130 > [] xfs_buf_read_flags+0x1c/0x90 > [] xfs_trans_read_buf+0x16f/0x350 > [] xfs_itobp+0x7d/0x250 > [] xfs_iflush+0x99/0x470 > [] xfs_inode_flush+0x127/0x1f0 > [] xfs_fs_write_inode+0x22/0x80 > [] write_inode+0x4b/0x50 > [] __sync_single_inode+0xf0/0x190 > [] __writeback_single_inode+0x49/0x1c0 > [] sync_sb_inodes+0xde/0x1d0 > [] writeback_inodes+0xa0/0xb0 > [] balance_dirty_pages+0x193/0x2c0 > [] generic_perform_write+0x142/0x190 > [] generic_file_buffered_write+0x87/0x150 > [] xfs_write+0x61b/0x8c0 > [] xfs_file_aio_write+0x76/0x90 > [] do_sync_write+0xbd/0x110 > [] vfs_write+0x160/0x170 > [] sys_write+0x41/0x70 > [] syscall_call+0x7/0xb Is writing to one file, hitting foreground write throttling and flushing either itself or the other file. It's stuct waiting on I/O completion of the inode buffer. I suspect that the I/O completion has been blocked by the fact it's trying to get The xfsdatad process is blocked on this inode - the inode flush takes the ilock shared, which is holding off the I/O completion. As soon as the inode buffer I/O is issued, then inode will be unlocked and completion processing can continue. i.e. it seems that either we can't safely take the ilock in I/O completion without a trylock or we can't hold the ilock across I/O submission without a trylock on the buffer lock. Ouch! That's going to take some fixing.... I'd suggest that these two patches (already queued for 2.6.26): http://oss.sgi.com/archives/xfs/2008-01/msg00153.html http://oss.sgi.com/archives/xfs/2008-01/msg00154.html Which make xfs_iflush do trylocks on the inode buffer in these writeback cases and that should avoid the problem you are seeing here. It won't avoid all possible problems, but it will not hang waiting on buffer I/O completion in async inode flushes like above.... > I also would like to ask if you have some propositions how to reproduce bug, > because my scripts need to work few hours or even days to hung processes. It's pure chance. Hence I don't think there's much you can do to improve the reproducability of this problem.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group