From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 24 Mar 2008 19:02:20 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m2P228Qw009915 for ; Mon, 24 Mar 2008 19:02:11 -0700 Date: Tue, 25 Mar 2008 13:02:23 +1100 From: David Chinner Subject: Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o Message-ID: <20080325020223.GB108924158@sgi.com> References: <200803211520.16398.stf_xl@wp.pl> <20080324233952.GF103491721@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080324233952.GF103491721@sgi.com> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: David Chinner Cc: Stanislaw Gruszka , xfs@oss.sgi.com On Tue, Mar 25, 2008 at 10:39:52AM +1100, David Chinner wrote: > On Fri, Mar 21, 2008 at 03:20:16PM +0100, Stanislaw Gruszka wrote: > > Hello > > > > I have problems using xfs and lvm snapshots on linux-2.6.24 , When I do > > lvconvert to create snapshots and when system is under heavy load, lvconvert > > and I/O processes randomly hung . I use below script to reproduce, but it is very hard > > to catch this bug. > > This looks like an I/O completion problem. > > You're writing 2 files and doing snap shots while they are running. > > > xfsdatad/1 D 00000000 0 288 2 > > Call Trace: > > [] rwsem_down_failed_common+0x76/0x170 > > [] rwsem_down_write_failed+0x1d/0x24 > > [] call_rwsem_down_write_failed+0x6/0x8 > > [] down_write+0x12/0x20 > > [] xfs_ilock+0x5a/0xa0 > > [] xfs_setfilesize+0x43/0x130 > > [] xfs_end_bio_delalloc+0x0/0x20 > > [] xfs_end_bio_delalloc+0xd/0x20 > > [] run_workqueue+0x52/0x100 > > [] prepare_to_wait+0x52/0x70 > > [] worker_thread+0x7f/0xc0 > > This is an xfs I/O completion workqueue, waiting to get the inode > ilock in excusive mode to update the file size. ...... > This dd (I've trimmed the stack trace to make it readable): > > > dd D c4018ab4 0 12113 29734 > > Call Trace: > > [] __down+0x75/0xe0 > > [] dm_unplug_all+0x17/0x30 > > [] __down_failed+0x7/0xc > > [] blk_backing_dev_unplug+0x0/0x10 > > [] xfs_buf_lock+0x3c/0x50 > > [] _xfs_buf_find+0x151/0x1d0 > > [] xfs_buf_get_flags+0x55/0x130 > > [] xfs_buf_read_flags+0x1c/0x90 > > [] xfs_trans_read_buf+0x16f/0x350 > > [] xfs_itobp+0x7d/0x250 > > [] xfs_iflush+0x99/0x470 > > [] xfs_inode_flush+0x127/0x1f0 > > [] xfs_fs_write_inode+0x22/0x80 > > [] write_inode+0x4b/0x50 > > [] __sync_single_inode+0xf0/0x190 > > [] __writeback_single_inode+0x49/0x1c0 > > [] sync_sb_inodes+0xde/0x1d0 > > [] writeback_inodes+0xa0/0xb0 > > [] balance_dirty_pages+0x193/0x2c0 > > [] generic_perform_write+0x142/0x190 > > [] generic_file_buffered_write+0x87/0x150 > > [] xfs_write+0x61b/0x8c0 > > [] xfs_file_aio_write+0x76/0x90 > > [] do_sync_write+0xbd/0x110 > > [] vfs_write+0x160/0x170 > > [] sys_write+0x41/0x70 > > [] syscall_call+0x7/0xb > > Is writing to one file, hitting foreground write throttling and > flushing either itself or the other file. It's stuct waiting on > I/O completion of the inode buffer. > > I suspect that the I/O completion has been blocked by the fact it's > trying to get The xfsdatad process is blocked on this inode - the > inode flush takes the ilock shared, which is holding off the I/O > completion. As soon as the inode buffer I/O is issued, then inode > will be unlocked and completion processing can continue. > > i.e. it seems that either we can't safely take the ilock in I/O > completion without a trylock or we can't hold the ilock across > I/O submission without a trylock on the buffer lock. Ouch! That's > going to take some fixing.... No, that's not true - the data I/O is queued to the xfsdatad, whilst metadata gets queued to the xfslogd completion queue. Hence data I/O completion can't hold up metadata I/O completion and we can't deadlock here.... That points to I/O not completing (not an XFS problem at all), or the filesystem freeze is just taking a long time to run (as it has to sync everything to disk). Given that this is a snapshot target, writing new blocks will take quite some time. Is the system still making writeback progress when in this state, or is it really hung? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group