linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: hang on 3.9, 3.10-rc5
Date: Fri, 21 Jun 2013 16:54:54 +0200	[thread overview]
Message-ID: <20130621165454.5a87e2cb@fem.tu-ilmenau.de> (raw)
In-Reply-To: <loom.20130618T190955-784@post.gmane.org>

On Tue, 18 Jun 2013 17:19:04 +0000 (UTC)
Jon Nelson <jnelson+btrfs@jamponi.net> wrote:

> Josef Bacik <jbacik <at> fusionio.com> writes:
> 
> > 
> > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.
> > > Is this is a known problem?  In this case there is no
> > > powercycling; just a regular ceph-osd workload.
> 
> ..
> 
> 
> I'm able to cause a complete kernel hang by defrag'ing even one 
> file on 3.9.X (3.9.0 through 3.9.4, so far).

I see similar behavior with autodefrag enabled. When fetching mails
with claws (piped through bogofilter) the whole system got stuck more
or less. I can switch between the tasks but everything involving I/O is
hanging. Most time I was able to solve this with the sync command in a
shell. I got only one time a backtrace from hung task checker:

Jun 20 12:37:47 localhost kernel: INFO: task btrfs-cleaner:771 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: btrfs-cleaner   D ffff88011fc91740     0   771      2 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aaff530 0000000000000002 ffff880119929fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff880119929fd8 ffff88011aabddc0 ffff880119928000 ffff880119929cb8
Jun 20 12:37:47 localhost kernel: ffff8800aaac22b4 ffff8800aaac22b0 0000000000000000 ffff8800aaac22b8
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff8109856d>] ? ondemand_readahead+0x15d/0x200
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff81171579>] ? btrfs_defrag_file+0x41a/0xa4e
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff81159eb9>] ? btrfs_run_defrag_inodes+0x1f7/0x2d3
Jun 20 12:37:47 localhost kernel: [<ffffffff8115100e>] ? btrfs_run_delayed_iputs+0x44/0xbe
Jun 20 12:37:47 localhost kernel: [<ffffffff811459d5>] ? cleaner_kthread+0x89/0xf3
Jun 20 12:37:47 localhost kernel: [<ffffffff8114594c>] ? transaction_kthread+0x17a/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff81042c00>] ? kthread+0x7d/0x85
Jun 20 12:37:47 localhost kernel: [<ffffffff81040000>] ? thaw_workqueues+0xd3/0xff
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: [<ffffffff8148942c>] ? ret_from_fork+0x7c/0xb0
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2384 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0  2384   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dce20 0000000000000002 ffff8800c0c55fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0c55fd8 ffffffff81a10400 ffff8800c0c54000 ffff8800c0c55e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2385 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2385   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dd5f0 0000000000000002 ffff8800c0d1dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0d1dfd8 ffff88011aabddc0 ffff8800c0d1c000 ffff8800c0d1de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2393 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2393   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82df530 0000000000000002 ffff8800c0daffd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0daffd8 ffff88011aabddc0 ffff8800ddd9ade0 ffff88011816e1e8
Jun 20 12:37:47 localhost kernel: 0000000000000000 0000000000000000 00000000003c0000 0000000000000000
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff8114aa11>] ? wait_current_trans.isra.19+0xaa/0xdc
Jun 20 12:37:47 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:47 localhost kernel: [<ffffffff8114c663>] ? start_transaction+0x3b9/0x3e9
Jun 20 12:37:47 localhost kernel: [<ffffffff8114d71b>] ? __unlink_start_trans+0x46/0x37f
Jun 20 12:37:47 localhost kernel: [<ffffffff810d9b18>] ? __d_lookup+0xdb/0xed
Jun 20 12:37:47 localhost kernel: [<ffffffff810d242d>] ? path_lookupat+0x6d/0x2a7
Jun 20 12:37:47 localhost kernel: [<ffffffff811537be>] ? btrfs_unlink+0x1f/0x91
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3ee2>] ? vfs_unlink+0x61/0xa8
Jun 20 12:37:47 localhost kernel: [<ffffffff810d4000>] ? do_unlinkat+0xd7/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2398 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff880106fc0000     0  2398   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff880106fc0000 0000000000000002 ffff8800c0fa3fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0fa3fd8 ffff8800c82dd5f0 ffff8800c0fa2000 ffff8800c0fa3e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2400 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2400   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800d8ebc650 0000000000000002 ffff8800bece9fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800bece9fd8 ffff88011aabddc0 ffff8800bece8000 ffff8800bece9e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:12930 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0 12930   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800acf22710 0000000000000002 ffff8801018a1fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8801018a1fd8 ffff88011aabddc0 ffff8801018a0000 ffff8801018a1e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:18283 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 18283   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8801030f8000 0000000000000002 ffff8800a8a9dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800a8a9dfd8 ffffffff81a10400 ffff8800a8a9c000 ffff8800a8a9de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:21478 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 21478   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aafc650 0000000000000002 ffff88010852bfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff88010852bfd8 ffffffff81a10400 ffff88010852a000 ffff88010852be90
Jun 20 12:37:49 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:49 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:49 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:49 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:49 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:49 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:49 localhost kernel: INFO: task bogofilter:21625 blocked for more than 120 seconds.
Jun 20 12:37:49 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:49 localhost kernel: bogofilter      D ffff88011fc91740     0 21625      1 0x00000000
Jun 20 12:37:49 localhost kernel: ffff88009f116d60 0000000000000002 ffff8800296e7fd8 0000000000011740
Jun 20 12:37:49 localhost kernel: ffff8800296e7fd8 ffff88011aabddc0 ffff88004b568da8 ffff88002970a800
Jun 20 12:37:49 localhost kernel: 0000000000000000 ffff8800aaa2a2e0 ffff8800aaa2a350 ffff88002970ac60
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff811763fd>] ? log_one_extent.isra.24+0x441/0x58b
Jun 20 12:37:49 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:49 localhost kernel: [<ffffffff81177ee9>] ? btrfs_log_inode+0x5d0/0x71e
Jun 20 12:37:49 localhost kernel: [<ffffffff811791f5>] ? btrfs_log_inode_parent+0x1f1/0x2dd
Jun 20 12:37:49 localhost kernel: [<ffffffff81179c34>] ? btrfs_log_dentry_safe+0x31/0x45
Jun 20 12:37:49 localhost kernel: [<ffffffff81158e69>] ? btrfs_sync_file+0x11a/0x1e2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8211>] ? do_fsync+0x2b/0x50
Jun 20 12:37:49 localhost kernel: [<ffffffff8148966c>] ? tracesys+0x7e/0xe2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8402>] ? SyS_fdatasync+0xa/0xd
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2

After disabling autodefrag it didn't happen again. The kernel is the latest git.

      parent reply	other threads:[~2013-06-21 15:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 15:43 hang on 3.9, 3.10-rc5 Sage Weil
2013-06-11 16:21 ` Chris Mason
     [not found]   ` <alpine.DEB.2.00.1306121317300.27596@cobra.newdream.net>
2013-06-18 15:56     ` Sage Weil
2013-06-18 16:36       ` Chris Mason
2013-06-18 16:37 ` Josef Bacik
2013-06-18 16:59   ` Chris Mason
2013-06-20  3:44     ` Sage Weil
2013-06-20 21:56       ` Sage Weil
2013-06-21  0:42         ` Chris Mason
2013-06-21  1:00           ` Sage Weil
2013-06-21  1:04             ` Chris Mason
2013-06-21  1:09               ` Sage Weil
2013-08-09 18:04           ` Sage Weil
2013-08-09 18:08             ` Chris Mason
2013-06-18 17:19   ` Jon Nelson
2013-06-21  1:11     ` Chris Mason
2013-06-21  1:46       ` Jon Nelson
2013-06-21  5:40         ` Chris Murphy
2013-06-21  6:42         ` Clemens Eisserer
2013-06-25 15:52           ` Liu Bo
2013-06-21 11:30         ` Chris Mason
2013-06-21 13:14           ` Jon Nelson
2013-06-21 14:54     ` Johannes Hirte [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130621165454.5a87e2cb@fem.tu-ilmenau.de \
    --to=johannes.hirte@fem.tu-ilmenau.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).