Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: David Chinner <dgc@sgi.com>
To: David Chinner <dgc@sgi.com>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>, xfs@oss.sgi.com
Subject: Re: BUG: xfs on linux lvm - lvconvert random hungs when doing i/o
Date: Tue, 25 Mar 2008 13:02:23 +1100	[thread overview]
Message-ID: <20080325020223.GB108924158@sgi.com> (raw)
In-Reply-To: <20080324233952.GF103491721@sgi.com>

On Tue, Mar 25, 2008 at 10:39:52AM +1100, David Chinner wrote:
> On Fri, Mar 21, 2008 at 03:20:16PM +0100, Stanislaw Gruszka wrote:
> > Hello
> > 
> > I have problems using xfs and lvm snapshots on linux-2.6.24 , When I do
> > lvconvert  to create snapshots and when system is under heavy load, lvconvert
> > and I/O processes randomly hung .  I use below script to reproduce, but it is very hard
> > to catch this bug.
> 
> This looks like an I/O completion problem.
> 
> You're writing 2 files and doing snap shots while they are running.
> 
> > xfsdatad/1    D 00000000     0   288      2
> > Call Trace:
> >  [<c05558e6>] rwsem_down_failed_common+0x76/0x170
> >  [<c0555a2d>] rwsem_down_write_failed+0x1d/0x24
> >  [<c0555aa2>] call_rwsem_down_write_failed+0x6/0x8
> >  [<c05555d2>] down_write+0x12/0x20
> >  [<c029906a>] xfs_ilock+0x5a/0xa0
> >  [<c02c0093>] xfs_setfilesize+0x43/0x130
> >  [<c02c0180>] xfs_end_bio_delalloc+0x0/0x20
> >  [<c02c018d>] xfs_end_bio_delalloc+0xd/0x20
> >  [<c01334d2>] run_workqueue+0x52/0x100
> >  [<c01373a2>] prepare_to_wait+0x52/0x70
> >  [<c01335ff>] worker_thread+0x7f/0xc0
> 
> This is an xfs I/O completion workqueue, waiting to get the inode
> ilock in excusive mode to update the file size.

......

> This dd (I've trimmed the stack trace to make it readable):
> 
> > dd            D c4018ab4     0 12113  29734
> > Call Trace:
> >  [<c0555b35>] __down+0x75/0xe0
> >  [<c04793c7>] dm_unplug_all+0x17/0x30
> >  [<c0555a3b>] __down_failed+0x7/0xc
> >  [<c02e1ac0>] blk_backing_dev_unplug+0x0/0x10
> >  [<c02c2e4c>] xfs_buf_lock+0x3c/0x50
> >  [<c02c27d1>] _xfs_buf_find+0x151/0x1d0
> >  [<c02c28a5>] xfs_buf_get_flags+0x55/0x130
> >  [<c02c299c>] xfs_buf_read_flags+0x1c/0x90
> >  [<c02b48cf>] xfs_trans_read_buf+0x16f/0x350
> >  [<c02995fd>] xfs_itobp+0x7d/0x250
> >  [<c029cfc9>] xfs_iflush+0x99/0x470
> >  [<c02bd907>] xfs_inode_flush+0x127/0x1f0
> >  [<c02c9732>] xfs_fs_write_inode+0x22/0x80
> >  [<c01963fb>] write_inode+0x4b/0x50
> >  [<c01966d0>] __sync_single_inode+0xf0/0x190
> >  [<c01967b9>] __writeback_single_inode+0x49/0x1c0
> >  [<c0196a0e>] sync_sb_inodes+0xde/0x1d0
> >  [<c0196ba0>] writeback_inodes+0xa0/0xb0
> >  [<c015d043>] balance_dirty_pages+0x193/0x2c0
> >  [<c0158ad2>] generic_perform_write+0x142/0x190
> >  [<c0158ba7>] generic_file_buffered_write+0x87/0x150
> >  [<c02c8d0b>] xfs_write+0x61b/0x8c0
> >  [<c02c45f6>] xfs_file_aio_write+0x76/0x90
> >  [<c0178e9d>] do_sync_write+0xbd/0x110
> >  [<c0179050>] vfs_write+0x160/0x170
> >  [<c0179111>] sys_write+0x41/0x70
> >  [<c010418e>] syscall_call+0x7/0xb
> 
> Is writing to one file, hitting foreground write throttling and
> flushing either itself or the other file. It's stuct waiting on 
> I/O completion of the inode buffer.
> 
> I suspect that the I/O completion has been blocked by the fact it's
> trying to get The xfsdatad process is blocked on this inode - the
> inode flush takes the ilock shared, which is holding off the I/O
> completion. As soon as the inode buffer I/O is issued, then inode
> will be unlocked and completion processing can continue.
> 
> i.e. it seems that either we can't safely take the ilock in I/O
> completion without a trylock or we can't hold the ilock across
> I/O submission without a trylock on the buffer lock. Ouch! That's
> going to take some fixing....

No, that's not true - the data I/O is queued to the xfsdatad, whilst
metadata gets queued to the xfslogd completion queue. Hence data I/O
completion can't hold up metadata I/O completion and we can't deadlock
here....

That points to I/O not completing (not an XFS problem at all), or
the filesystem freeze is just taking a long time to run (as it has
to sync everything to disk). Given that this is a snapshot target,
writing new blocks will take quite some time. Is the system still
making writeback  progress when in this state, or is it really hung?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

next prev parent reply	other threads:[~2008-03-25  2:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-21 14:20 BUG: xfs on linux lvm - lvconvert random hungs when doing i/o Stanislaw Gruszka
2008-03-21 17:45 ` Josef 'Jeff' Sipek
2008-03-22 10:20   ` Stanislaw Gruszka
2008-03-24 23:39 ` David Chinner
2008-03-25  2:02   ` David Chinner [this message]
2008-03-26 14:02     ` Stanislaw Gruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080325020223.GB108924158@sgi.com \
    --to=dgc@sgi.com \
    --cc=stf_xl@wp.pl \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox