Re: deadlock with latest xfs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>,
	Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>,
	linux-mm@kvack.org
Subject: Re: deadlock with latest xfs
Date: Sun, 26 Oct 2008 15:20:26 +1100	[thread overview]
Message-ID: <20081026042026.GM18495@disturbed> (raw)
In-Reply-To: <20081026025013.GL18495@disturbed>

On Sun, Oct 26, 2008 at 01:50:13PM +1100, Dave Chinner wrote:
> On Sun, Oct 26, 2008 at 11:53:51AM +1100, Dave Chinner wrote:
> > On Fri, Oct 24, 2008 at 05:48:04PM +1100, Dave Chinner wrote:
> > > OK, I just hung a single-threaded rm -rf after this completed:
> > > 
> > > # fsstress -p 1024 -n 100 -d /mnt/xfs2/fsstress
> > > 
> > > It has hung with this trace:
> > > 
> > > # echo w > /proc/sysrq-trigger
> > ....
> > > [42954211.590000] 794877f8:  [<6002e40a>] update_curr+0x3a/0x50
> > > [42954211.590000] 79487818:  [<60014f0d>] _switch_to+0x6d/0xe0
> > > [42954211.590000] 79487858:  [<60324b21>] schedule+0x171/0x2c0
> > > [42954211.590000] 794878a8:  [<60324e6d>] schedule_timeout+0xad/0xf0
> > > [42954211.590000] 794878c8:  [<60326e98>] _spin_unlock_irqrestore+0x18/0x20
> > > [42954211.590000] 79487908:  [<60195455>] xlog_grant_log_space+0x245/0x470
> > > [42954211.590000] 79487920:  [<60030ba0>] default_wake_function+0x0/0x10
> > > [42954211.590000] 79487978:  [<601957a2>] xfs_log_reserve+0x122/0x140
> > > [42954211.590000] 794879c8:  [<601a36e7>] xfs_trans_reserve+0x147/0x2e0
> > > [42954211.590000] 794879f8:  [<60087374>] kmem_cache_alloc+0x84/0x100
> > > [42954211.590000] 79487a38:  [<601ab01f>] xfs_inactive_symlink_rmt+0x9f/0x450
> > > [42954211.590000] 79487a88:  [<601ada94>] kmem_zone_zalloc+0x34/0x50
> > > [42954211.590000] 79487aa8:  [<601a3a6d>] _xfs_trans_alloc+0x2d/0x70
> > ....
> > 
> > I came back to the system, and found that the hang had gone away - the
> > rm -rf had finished sometime in the ~36 hours between triggering the
> > problem and coming back to look at the corpse....
> > 
> > So nothing to report yet.
> 
> Got it now. I can reproduce this in a couple of minutes now that both
> the test fs and the fs hosting the UML fs images are using lazy-count=1
> (and the frequent 10s long host system freezes have gone away, too).
> 
> Looks like *another* new memory allocation problem [1]:

[snip]

And having fixed that, I'm now seeing the log reservation hang:

[42950307.350000] xfsdatad/0    D 00000000407219f0     0    51      2
[42950307.350000] 7bd1acd8 7bd1a838 60498c40 81074000 81077b40 60014f0d 81044780 81074000
[42950307.350000]        81074000 7e15f808 7bd1a800 81044780 81077b90 60324bc1 81074000 00000250
[42950307.350000]        81074000 81074000 7fffffffffffffff 6646a168 80b6dd28 80b6ddf8 81077bf0 60324f0d <6>Call Trace:
[42950307.350000] 81077b08:  [<60014f0d>] _switch_to+0x6d/0xe0
[42950307.350000] 81077b48:  [<60324bc1>] schedule+0x171/0x2c0
[42950307.350000] 81077b98:  [<60324f0d>] schedule_timeout+0xad/0xf0
[42950307.350000] 81077bb8:  [<60326f38>] _spin_unlock_irqrestore+0x18/0x20
[42950307.350000] 81077bf8:  [<601953e9>] xlog_grant_log_space+0x169/0x470
[42950307.350000] 81077c10:  [<60030ba0>] default_wake_function+0x0/0x10
[42950307.350000] 81077c68:  [<60195812>] xfs_log_reserve+0x122/0x140
[42950307.350000] 81077cb8:  [<601a3757>] xfs_trans_reserve+0x147/0x2e0
[42950307.350000] 81077ce8:  [<601adb14>] kmem_zone_zalloc+0x34/0x50
[42950307.350000] 81077d28:  [<6018f985>] xfs_iomap_write_unwritten+0xa5/0x2d0
[42950307.350000] 81077d38:  [<60326f38>] _spin_unlock_irqrestore+0x18/0x20
[42950307.350000] 81077d48:  [<60085750>] cache_free_debugcheck+0x150/0x2e0
[42950307.350000] 81077d50:  [<60063d12>] mempool_free_slab+0x12/0x20
[42950307.350000] 81077d88:  [<60085e02>] kmem_cache_free+0x72/0xb0
[42950307.350000] 81077dc8:  [<60063dbf>] mempool_free+0x4f/0x90
[42950307.350000] 81077e08:  [<601af66d>] xfs_end_bio_unwritten+0x6d/0xa0
[42950307.350000] 81077e38:  [<60048574>] run_workqueue+0xa4/0x180
[42950307.350000] 81077e50:  [<601af600>] xfs_end_bio_unwritten+0x0/0xa0
[42950307.350000] 81077e58:  [<6004c791>] prepare_to_wait+0x51/0x80
[42950307.350000] 81077e98:  [<600488e0>] worker_thread+0x70/0xd0
[42950307.350000] 81077eb0:  [<6004c5b0>] autoremove_wake_function+0x0/0x40
[42950307.350000] 81077ee8:  [<60048870>] worker_thread+0x0/0xd0
[42950307.350000] 81077f08:  [<6004c204>] kthread+0x64/0xb0
[42950307.350000] 81077f48:  [<60026285>] run_kernel_thread+0x35/0x60
[42950307.350000] 81077f58:  [<6004c1a0>] kthread+0x0/0xb0
[42950307.350000] 81077f98:  [<60026278>] run_kernel_thread+0x28/0x60
[42950307.350000] 81077fc8:  [<60014e71>] new_thread_handler+0x71/0xa0

Basically, the log is too small to fit the number of transaction reservations
that are currently being attempted (roughly 1000 parallel transactions), and so
xlog_grant_log_space() is sleeping.  Because it is sleeping in I/O completion,
the log tail can't move forward because I/O completion is not occurring.

I think that at this point, we need a separate workqueue for unwritten extent
conversion to prevent it from blocking normal data and metadata I/O completion.
that way we can allow it to recurse on allocation and transaction reservation
without introducing I/O completion deadlocks....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)

From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>,
	Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>,
	linux-mm@kvack.org
Subject: Re: deadlock with latest xfs
Date: Sun, 26 Oct 2008 15:20:26 +1100	[thread overview]
Message-ID: <20081026042026.GM18495@disturbed> (raw)
In-Reply-To: <20081026025013.GL18495@disturbed>

On Sun, Oct 26, 2008 at 01:50:13PM +1100, Dave Chinner wrote:
> On Sun, Oct 26, 2008 at 11:53:51AM +1100, Dave Chinner wrote:
> > On Fri, Oct 24, 2008 at 05:48:04PM +1100, Dave Chinner wrote:
> > > OK, I just hung a single-threaded rm -rf after this completed:
> > > 
> > > # fsstress -p 1024 -n 100 -d /mnt/xfs2/fsstress
> > > 
> > > It has hung with this trace:
> > > 
> > > # echo w > /proc/sysrq-trigger
> > ....
> > > [42954211.590000] 794877f8:  [<6002e40a>] update_curr+0x3a/0x50
> > > [42954211.590000] 79487818:  [<60014f0d>] _switch_to+0x6d/0xe0
> > > [42954211.590000] 79487858:  [<60324b21>] schedule+0x171/0x2c0
> > > [42954211.590000] 794878a8:  [<60324e6d>] schedule_timeout+0xad/0xf0
> > > [42954211.590000] 794878c8:  [<60326e98>] _spin_unlock_irqrestore+0x18/0x20
> > > [42954211.590000] 79487908:  [<60195455>] xlog_grant_log_space+0x245/0x470
> > > [42954211.590000] 79487920:  [<60030ba0>] default_wake_function+0x0/0x10
> > > [42954211.590000] 79487978:  [<601957a2>] xfs_log_reserve+0x122/0x140
> > > [42954211.590000] 794879c8:  [<601a36e7>] xfs_trans_reserve+0x147/0x2e0
> > > [42954211.590000] 794879f8:  [<60087374>] kmem_cache_alloc+0x84/0x100
> > > [42954211.590000] 79487a38:  [<601ab01f>] xfs_inactive_symlink_rmt+0x9f/0x450
> > > [42954211.590000] 79487a88:  [<601ada94>] kmem_zone_zalloc+0x34/0x50
> > > [42954211.590000] 79487aa8:  [<601a3a6d>] _xfs_trans_alloc+0x2d/0x70
> > ....
> > 
> > I came back to the system, and found that the hang had gone away - the
> > rm -rf had finished sometime in the ~36 hours between triggering the
> > problem and coming back to look at the corpse....
> > 
> > So nothing to report yet.
> 
> Got it now. I can reproduce this in a couple of minutes now that both
> the test fs and the fs hosting the UML fs images are using lazy-count=1
> (and the frequent 10s long host system freezes have gone away, too).
> 
> Looks like *another* new memory allocation problem [1]:

[snip]

And having fixed that, I'm now seeing the log reservation hang:

[42950307.350000] xfsdatad/0    D 00000000407219f0     0    51      2
[42950307.350000] 7bd1acd8 7bd1a838 60498c40 81074000 81077b40 60014f0d 81044780 81074000
[42950307.350000]        81074000 7e15f808 7bd1a800 81044780 81077b90 60324bc1 81074000 00000250
[42950307.350000]        81074000 81074000 7fffffffffffffff 6646a168 80b6dd28 80b6ddf8 81077bf0 60324f0d <6>Call Trace:
[42950307.350000] 81077b08:  [<60014f0d>] _switch_to+0x6d/0xe0
[42950307.350000] 81077b48:  [<60324bc1>] schedule+0x171/0x2c0
[42950307.350000] 81077b98:  [<60324f0d>] schedule_timeout+0xad/0xf0
[42950307.350000] 81077bb8:  [<60326f38>] _spin_unlock_irqrestore+0x18/0x20
[42950307.350000] 81077bf8:  [<601953e9>] xlog_grant_log_space+0x169/0x470
[42950307.350000] 81077c10:  [<60030ba0>] default_wake_function+0x0/0x10
[42950307.350000] 81077c68:  [<60195812>] xfs_log_reserve+0x122/0x140
[42950307.350000] 81077cb8:  [<601a3757>] xfs_trans_reserve+0x147/0x2e0
[42950307.350000] 81077ce8:  [<601adb14>] kmem_zone_zalloc+0x34/0x50
[42950307.350000] 81077d28:  [<6018f985>] xfs_iomap_write_unwritten+0xa5/0x2d0
[42950307.350000] 81077d38:  [<60326f38>] _spin_unlock_irqrestore+0x18/0x20
[42950307.350000] 81077d48:  [<60085750>] cache_free_debugcheck+0x150/0x2e0
[42950307.350000] 81077d50:  [<60063d12>] mempool_free_slab+0x12/0x20
[42950307.350000] 81077d88:  [<60085e02>] kmem_cache_free+0x72/0xb0
[42950307.350000] 81077dc8:  [<60063dbf>] mempool_free+0x4f/0x90
[42950307.350000] 81077e08:  [<601af66d>] xfs_end_bio_unwritten+0x6d/0xa0
[42950307.350000] 81077e38:  [<60048574>] run_workqueue+0xa4/0x180
[42950307.350000] 81077e50:  [<601af600>] xfs_end_bio_unwritten+0x0/0xa0
[42950307.350000] 81077e58:  [<6004c791>] prepare_to_wait+0x51/0x80
[42950307.350000] 81077e98:  [<600488e0>] worker_thread+0x70/0xd0
[42950307.350000] 81077eb0:  [<6004c5b0>] autoremove_wake_function+0x0/0x40
[42950307.350000] 81077ee8:  [<60048870>] worker_thread+0x0/0xd0
[42950307.350000] 81077f08:  [<6004c204>] kthread+0x64/0xb0
[42950307.350000] 81077f48:  [<60026285>] run_kernel_thread+0x35/0x60
[42950307.350000] 81077f58:  [<6004c1a0>] kthread+0x0/0xb0
[42950307.350000] 81077f98:  [<60026278>] run_kernel_thread+0x28/0x60
[42950307.350000] 81077fc8:  [<60014e71>] new_thread_handler+0x71/0xa0

Basically, the log is too small to fit the number of transaction reservations
that are currently being attempted (roughly 1000 parallel transactions), and so
xlog_grant_log_space() is sleeping.  Because it is sleeping in I/O completion,
the log tail can't move forward because I/O completion is not occurring.

I think that at this point, we need a separate workqueue for unwritten extent
conversion to prevent it from blocking normal data and metadata I/O completion.
that way we can allow it to recurse on allocation and transaction reservation
without introducing I/O completion deadlocks....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-10-26  4:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  9:17 deadlock with latest xfs Lachlan McIlroy
2008-10-23 20:57 ` Christoph Hellwig
2008-10-23 22:28   ` Dave Chinner
2008-10-24  3:08   ` Lachlan McIlroy
2008-10-24  5:24     ` Dave Chinner
2008-10-24  6:48       ` Dave Chinner
2008-10-26  0:53         ` Dave Chinner
2008-10-26  2:50           ` Dave Chinner
2008-10-26  2:50             ` Dave Chinner
2008-10-26  4:20             ` Dave Chinner [this message]
2008-10-26  4:20               ` Dave Chinner
2008-10-27  1:42             ` Lachlan McIlroy
2008-10-27  1:42               ` Lachlan McIlroy
2008-10-27  5:30               ` Dave Chinner
2008-10-27  5:30                 ` Dave Chinner
2008-10-27  6:29                 ` Lachlan McIlroy
2008-10-27  6:29                   ` Lachlan McIlroy
2008-10-27  6:54                   ` Dave Chinner
2008-10-27  6:54                     ` Dave Chinner
2008-10-27  7:31                     ` Lachlan McIlroy
2008-10-27  7:31                       ` Lachlan McIlroy
2008-10-28  6:02             ` Nick Piggin
2008-10-28  6:25               ` Dave Chinner
2008-10-28  6:25                 ` Dave Chinner
2008-10-28  8:56                 ` Nick Piggin
2008-10-24  8:46       ` Lachlan McIlroy
2008-10-26 22:39     ` Dave Chinner
2008-10-27  2:30       ` Timothy Shimmin
2008-10-27  5:47         ` Dave Chinner
2008-10-27  7:33       ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081026042026.GM18495@disturbed \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=lachlan@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.