From: David Chinner <dgc@sgi.com>
To: David Chinner <dgc@sgi.com>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>, xfs@oss.sgi.com
Subject: Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o
Date: Tue, 25 Mar 2008 13:02:23 +1100 [thread overview]
Message-ID: <20080325020223.GB108924158@sgi.com> (raw)
In-Reply-To: <20080324233952.GF103491721@sgi.com>
On Tue, Mar 25, 2008 at 10:39:52AM +1100, David Chinner wrote:
> On Fri, Mar 21, 2008 at 03:20:16PM +0100, Stanislaw Gruszka wrote:
> > Hello
> >
> > I have problems using xfs and lvm snapshots on linux-2.6.24 , When I do
> > lvconvert to create snapshots and when system is under heavy load, lvconvert
> > and I/O processes randomly hung . I use below script to reproduce, but it is very hard
> > to catch this bug.
>
> This looks like an I/O completion problem.
>
> You're writing 2 files and doing snap shots while they are running.
>
> > xfsdatad/1 D 00000000 0 288 2
> > Call Trace:
> > [<c05558e6>] rwsem_down_failed_common+0x76/0x170
> > [<c0555a2d>] rwsem_down_write_failed+0x1d/0x24
> > [<c0555aa2>] call_rwsem_down_write_failed+0x6/0x8
> > [<c05555d2>] down_write+0x12/0x20
> > [<c029906a>] xfs_ilock+0x5a/0xa0
> > [<c02c0093>] xfs_setfilesize+0x43/0x130
> > [<c02c0180>] xfs_end_bio_delalloc+0x0/0x20
> > [<c02c018d>] xfs_end_bio_delalloc+0xd/0x20
> > [<c01334d2>] run_workqueue+0x52/0x100
> > [<c01373a2>] prepare_to_wait+0x52/0x70
> > [<c01335ff>] worker_thread+0x7f/0xc0
>
> This is an xfs I/O completion workqueue, waiting to get the inode
> ilock in excusive mode to update the file size.
......
> This dd (I've trimmed the stack trace to make it readable):
>
> > dd D c4018ab4 0 12113 29734
> > Call Trace:
> > [<c0555b35>] __down+0x75/0xe0
> > [<c04793c7>] dm_unplug_all+0x17/0x30
> > [<c0555a3b>] __down_failed+0x7/0xc
> > [<c02e1ac0>] blk_backing_dev_unplug+0x0/0x10
> > [<c02c2e4c>] xfs_buf_lock+0x3c/0x50
> > [<c02c27d1>] _xfs_buf_find+0x151/0x1d0
> > [<c02c28a5>] xfs_buf_get_flags+0x55/0x130
> > [<c02c299c>] xfs_buf_read_flags+0x1c/0x90
> > [<c02b48cf>] xfs_trans_read_buf+0x16f/0x350
> > [<c02995fd>] xfs_itobp+0x7d/0x250
> > [<c029cfc9>] xfs_iflush+0x99/0x470
> > [<c02bd907>] xfs_inode_flush+0x127/0x1f0
> > [<c02c9732>] xfs_fs_write_inode+0x22/0x80
> > [<c01963fb>] write_inode+0x4b/0x50
> > [<c01966d0>] __sync_single_inode+0xf0/0x190
> > [<c01967b9>] __writeback_single_inode+0x49/0x1c0
> > [<c0196a0e>] sync_sb_inodes+0xde/0x1d0
> > [<c0196ba0>] writeback_inodes+0xa0/0xb0
> > [<c015d043>] balance_dirty_pages+0x193/0x2c0
> > [<c0158ad2>] generic_perform_write+0x142/0x190
> > [<c0158ba7>] generic_file_buffered_write+0x87/0x150
> > [<c02c8d0b>] xfs_write+0x61b/0x8c0
> > [<c02c45f6>] xfs_file_aio_write+0x76/0x90
> > [<c0178e9d>] do_sync_write+0xbd/0x110
> > [<c0179050>] vfs_write+0x160/0x170
> > [<c0179111>] sys_write+0x41/0x70
> > [<c010418e>] syscall_call+0x7/0xb
>
> Is writing to one file, hitting foreground write throttling and
> flushing either itself or the other file. It's stuct waiting on
> I/O completion of the inode buffer.
>
> I suspect that the I/O completion has been blocked by the fact it's
> trying to get The xfsdatad process is blocked on this inode - the
> inode flush takes the ilock shared, which is holding off the I/O
> completion. As soon as the inode buffer I/O is issued, then inode
> will be unlocked and completion processing can continue.
>
> i.e. it seems that either we can't safely take the ilock in I/O
> completion without a trylock or we can't hold the ilock across
> I/O submission without a trylock on the buffer lock. Ouch! That's
> going to take some fixing....
No, that's not true - the data I/O is queued to the xfsdatad, whilst
metadata gets queued to the xfslogd completion queue. Hence data I/O
completion can't hold up metadata I/O completion and we can't deadlock
here....
That points to I/O not completing (not an XFS problem at all), or
the filesystem freeze is just taking a long time to run (as it has
to sync everything to disk). Given that this is a snapshot target,
writing new blocks will take quite some time. Is the system still
making writeback progress when in this state, or is it really hung?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
next prev parent reply other threads:[~2008-03-25 2:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-21 14:20 BUG: xfs on linux lvm - lvconvert random hungs when doing i/o Stanislaw Gruszka
2008-03-21 17:45 ` Josef 'Jeff' Sipek
2008-03-22 10:20 ` Stanislaw Gruszka
2008-03-24 23:39 ` David Chinner
2008-03-25 2:02 ` David Chinner [this message]
2008-03-26 14:02 ` Stanislaw Gruszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080325020223.GB108924158@sgi.com \
--to=dgc@sgi.com \
--cc=stf_xl@wp.pl \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox