All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>, xfs@oss.sgi.com
Subject: Re: Hang in XFS reclaim on 3.7.0-rc3
Date: Wed, 21 Nov 2012 07:27:45 +1100	[thread overview]
Message-ID: <20121120202745.GG2591@dastard> (raw)
In-Reply-To: <CAPVoSvStEdD2uhGQmtb6+qOrme_Cs_AWAuE+dP_XYD5BZyp-kA@mail.gmail.com>

On Tue, Nov 20, 2012 at 08:45:03PM +0100, Torsten Kaiser wrote:
> On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner <david@fromorbit.com> wrote:
> >        [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
> >        [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
> >        [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
> >        [<ffffffff810dba31>] vm_map_ram+0x271/0x770
> >        [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
> >        [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
> >        [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
> >        [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0
> >
> > We shouldn't be mapping buffers there, there's a patch below to fix
> > this. It's probably the source of this report, even though I cannot
> > lockdep seems to be off with the fairies...
> 
> That patch seems to break my system.

You've got an IO problem, not an XFS problem. Everything is hung up
on MD.

 INFO: task kswapd0:725 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kswapd0         D 0000000000000001     0   725      2 0x00000000
  ffff8803280d13f8 0000000000000046 ffff880329a0ab80 ffff8803280d1fd8
  ffff8803280d1fd8 ffff8803280d1fd8 ffff880046b7c880 ffff880329a0ab80
  ffff8803280d1408 ffff8803278dbbd0 ffff8803278db800 00000000ffffffff
 Call Trace:
  [<ffffffff816b1224>] schedule+0x24/0x60
  [<ffffffff814f9dad>] md_super_wait+0x4d/0x80
  [<ffffffff81500753>] bitmap_unplug+0x173/0x180
  [<ffffffff814e8eb8>] raid1_unplug+0x98/0x110
  [<ffffffff81278a6d>] blk_flush_plug_list+0xad/0x240
  [<ffffffff816b15c3>] io_schedule_timeout+0x83/0xf0
  [<ffffffff810b0e1d>] mempool_alloc+0x12d/0x160
  [<ffffffff811263da>] bvec_alloc_bs+0xda/0x100
  [<ffffffff811264ea>] bio_alloc_bioset+0xea/0x110
  [<ffffffff81126656>] bio_clone_bioset+0x16/0x40
  [<ffffffff814f471a>] bio_clone_mddev+0x1a/0x30
  [<ffffffff814edbb1>] make_request+0x551/0xde0
  [<ffffffff814f80bb>] md_make_request+0x21b/0x4d0
  [<ffffffff81276e52>] generic_make_request+0xc2/0x100
  [<ffffffff81276ef5>] submit_bio+0x65/0x110
  [<ffffffff811e07bf>] xfs_submit_ioend_bio.isra.21+0x2f/0x40
  [<ffffffff811e088e>] xfs_submit_ioend+0xbe/0x110
  [<ffffffff811e0c91>] xfs_vm_writepage+0x3b1/0x540
  [<ffffffff810bcd84>] shrink_page_list+0x564/0x890
  [<ffffffff810bd637>] shrink_inactive_list+0x1d7/0x310
  [<ffffffff810bdb9d>] shrink_lruvec+0x42d/0x530
  [<ffffffff810be323>] kswapd+0x683/0xa20
  [<ffffffff8105c246>] kthread+0xd6/0xe0
  [<ffffffff816b31ac>] ret_from_fork+0x7c/0xb0
 no locks held by kswapd0/725.

So kswapd is trying to clean pages, but it's blocked in an unplug
during IO submission. Probably one to report to the linux-raid list.

 INFO: task xfsaild/md4:1742 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 xfsaild/md4     D 0000000000000003     0  1742      2 0x00000000
  ffff88032438bb68 0000000000000046 ffff880329965700 ffff88032438bfd8
  ffff88032438bfd8 ffff88032438bfd8 ffff88032827e580 ffff880329965700
  ffff88032438bb78 ffff8803278dbbd0 ffff8803278db800 00000000ffffffff
 Call Trace:
  [<ffffffff816b1224>] schedule+0x24/0x60
  [<ffffffff814f9dad>] md_super_wait+0x4d/0x80
  [<ffffffff8105ca30>] ? __init_waitqueue_head+0x60/0x60
  [<ffffffff81500753>] bitmap_unplug+0x173/0x180
  [<ffffffff81278c13>] ? blk_finish_plug+0x13/0x50
  [<ffffffff814e8eb8>] raid1_unplug+0x98/0x110
  [<ffffffff81278a6d>] blk_flush_plug_list+0xad/0x240
  [<ffffffff81278c13>] blk_finish_plug+0x13/0x50
  [<ffffffff811e296a>] __xfs_buf_delwri_submit+0x1ca/0x1e0
  [<ffffffff811e2ffb>] xfs_buf_delwri_submit_nowait+0x1b/0x20
  [<ffffffff81233066>] xfsaild+0x226/0x4c0
  [<ffffffff81065dfa>] ? finish_task_switch+0x3a/0x100
  [<ffffffff81232e40>] ? xfs_trans_ail_cursor_first+0xa0/0xa0
  [<ffffffff8105c246>] kthread+0xd6/0xe0
  [<ffffffff816b246b>] ? _raw_spin_unlock_irq+0x2b/0x50
  [<ffffffff8105c170>] ? flush_kthread_worker+0xe0/0xe0
  [<ffffffff816b31ac>] ret_from_fork+0x7c/0xb0
  [<ffffffff8105c170>] ? flush_kthread_worker+0xe0/0xe0
 no locks held by xfsaild/md4/1742.

Same here - metadata writes are backed up waiting for MD to submit
IO. Everything else is stuck on thesei or MD, too...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Torsten Kaiser <just.for.lkml@googlemail.com>
Cc: xfs@oss.sgi.com, Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Hang in XFS reclaim on 3.7.0-rc3
Date: Wed, 21 Nov 2012 07:27:45 +1100	[thread overview]
Message-ID: <20121120202745.GG2591@dastard> (raw)
In-Reply-To: <CAPVoSvStEdD2uhGQmtb6+qOrme_Cs_AWAuE+dP_XYD5BZyp-kA@mail.gmail.com>

On Tue, Nov 20, 2012 at 08:45:03PM +0100, Torsten Kaiser wrote:
> On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner <david@fromorbit.com> wrote:
> >        [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
> >        [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
> >        [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
> >        [<ffffffff810dba31>] vm_map_ram+0x271/0x770
> >        [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
> >        [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
> >        [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
> >        [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0
> >
> > We shouldn't be mapping buffers there, there's a patch below to fix
> > this. It's probably the source of this report, even though I cannot
> > lockdep seems to be off with the fairies...
> 
> That patch seems to break my system.

You've got an IO problem, not an XFS problem. Everything is hung up
on MD.

 INFO: task kswapd0:725 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 kswapd0         D 0000000000000001     0   725      2 0x00000000
  ffff8803280d13f8 0000000000000046 ffff880329a0ab80 ffff8803280d1fd8
  ffff8803280d1fd8 ffff8803280d1fd8 ffff880046b7c880 ffff880329a0ab80
  ffff8803280d1408 ffff8803278dbbd0 ffff8803278db800 00000000ffffffff
 Call Trace:
  [<ffffffff816b1224>] schedule+0x24/0x60
  [<ffffffff814f9dad>] md_super_wait+0x4d/0x80
  [<ffffffff81500753>] bitmap_unplug+0x173/0x180
  [<ffffffff814e8eb8>] raid1_unplug+0x98/0x110
  [<ffffffff81278a6d>] blk_flush_plug_list+0xad/0x240
  [<ffffffff816b15c3>] io_schedule_timeout+0x83/0xf0
  [<ffffffff810b0e1d>] mempool_alloc+0x12d/0x160
  [<ffffffff811263da>] bvec_alloc_bs+0xda/0x100
  [<ffffffff811264ea>] bio_alloc_bioset+0xea/0x110
  [<ffffffff81126656>] bio_clone_bioset+0x16/0x40
  [<ffffffff814f471a>] bio_clone_mddev+0x1a/0x30
  [<ffffffff814edbb1>] make_request+0x551/0xde0
  [<ffffffff814f80bb>] md_make_request+0x21b/0x4d0
  [<ffffffff81276e52>] generic_make_request+0xc2/0x100
  [<ffffffff81276ef5>] submit_bio+0x65/0x110
  [<ffffffff811e07bf>] xfs_submit_ioend_bio.isra.21+0x2f/0x40
  [<ffffffff811e088e>] xfs_submit_ioend+0xbe/0x110
  [<ffffffff811e0c91>] xfs_vm_writepage+0x3b1/0x540
  [<ffffffff810bcd84>] shrink_page_list+0x564/0x890
  [<ffffffff810bd637>] shrink_inactive_list+0x1d7/0x310
  [<ffffffff810bdb9d>] shrink_lruvec+0x42d/0x530
  [<ffffffff810be323>] kswapd+0x683/0xa20
  [<ffffffff8105c246>] kthread+0xd6/0xe0
  [<ffffffff816b31ac>] ret_from_fork+0x7c/0xb0
 no locks held by kswapd0/725.

So kswapd is trying to clean pages, but it's blocked in an unplug
during IO submission. Probably one to report to the linux-raid list.

 INFO: task xfsaild/md4:1742 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 xfsaild/md4     D 0000000000000003     0  1742      2 0x00000000
  ffff88032438bb68 0000000000000046 ffff880329965700 ffff88032438bfd8
  ffff88032438bfd8 ffff88032438bfd8 ffff88032827e580 ffff880329965700
  ffff88032438bb78 ffff8803278dbbd0 ffff8803278db800 00000000ffffffff
 Call Trace:
  [<ffffffff816b1224>] schedule+0x24/0x60
  [<ffffffff814f9dad>] md_super_wait+0x4d/0x80
  [<ffffffff8105ca30>] ? __init_waitqueue_head+0x60/0x60
  [<ffffffff81500753>] bitmap_unplug+0x173/0x180
  [<ffffffff81278c13>] ? blk_finish_plug+0x13/0x50
  [<ffffffff814e8eb8>] raid1_unplug+0x98/0x110
  [<ffffffff81278a6d>] blk_flush_plug_list+0xad/0x240
  [<ffffffff81278c13>] blk_finish_plug+0x13/0x50
  [<ffffffff811e296a>] __xfs_buf_delwri_submit+0x1ca/0x1e0
  [<ffffffff811e2ffb>] xfs_buf_delwri_submit_nowait+0x1b/0x20
  [<ffffffff81233066>] xfsaild+0x226/0x4c0
  [<ffffffff81065dfa>] ? finish_task_switch+0x3a/0x100
  [<ffffffff81232e40>] ? xfs_trans_ail_cursor_first+0xa0/0xa0
  [<ffffffff8105c246>] kthread+0xd6/0xe0
  [<ffffffff816b246b>] ? _raw_spin_unlock_irq+0x2b/0x50
  [<ffffffff8105c170>] ? flush_kthread_worker+0xe0/0xe0
  [<ffffffff816b31ac>] ret_from_fork+0x7c/0xb0
  [<ffffffff8105c170>] ? flush_kthread_worker+0xe0/0xe0
 no locks held by xfsaild/md4/1742.

Same here - metadata writes are backed up waiting for MD to submit
IO. Everything else is stuck on thesei or MD, too...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2012-11-20 20:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 20:03 Hang in XFS reclaim on 3.7.0-rc3 Torsten Kaiser
2012-10-29 20:03 ` Torsten Kaiser
2012-10-29 22:26 ` Dave Chinner
2012-10-29 22:26   ` Dave Chinner
2012-10-29 22:41   ` Dave Chinner
2012-10-29 22:41     ` Dave Chinner
2012-10-29 22:41     ` Dave Chinner
2012-10-30 20:37   ` Torsten Kaiser
2012-10-30 20:37     ` Torsten Kaiser
2012-10-30 20:46     ` Christoph Hellwig
2012-10-30 20:46       ` Christoph Hellwig
2012-11-18 10:24     ` Torsten Kaiser
2012-11-18 10:24       ` Torsten Kaiser
2012-11-18 15:29       ` Torsten Kaiser
2012-11-18 15:29         ` Torsten Kaiser
2012-11-18 23:51         ` Dave Chinner
2012-11-18 23:51           ` Dave Chinner
2012-11-19  6:50           ` Torsten Kaiser
2012-11-19  6:50             ` Torsten Kaiser
2012-11-19 23:53             ` Dave Chinner
2012-11-19 23:53               ` Dave Chinner
2012-11-20  7:09               ` Torsten Kaiser
2012-11-20  7:09                 ` Torsten Kaiser
2012-11-20 19:45               ` Torsten Kaiser
2012-11-20 19:45                 ` Torsten Kaiser
2012-11-20 20:27                 ` Dave Chinner [this message]
2012-11-20 20:27                   ` Dave Chinner
2012-11-01 21:30   ` Ben Myers
2012-11-01 21:30     ` Ben Myers
2012-11-01 22:32     ` Dave Chinner
2012-11-01 22:32       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121120202745.GG2591@dastard \
    --to=david@fromorbit.com \
    --cc=just.for.lkml@googlemail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.