From: Jan Kara <jack@suse.cz>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: jack@suse.cz, linux-fsdevel@vger.kernel.org
Subject: Re: [2.6.32 ubuntu] I/O hang at start_this_handle
Date: Fri, 8 Apr 2011 20:29:16 +0200 [thread overview]
Message-ID: <20110408182916.GA16729@quack.suse.cz> (raw)
In-Reply-To: <201104082338.GCG69297.FQVLtSJOFHOMOF@I-love.SAKURA.ne.jp>
On Fri 08-04-11 23:38:55, Tetsuo Handa wrote:
> Jan Kara wrote:
> > > I waited for a few hours but the process did not resume. Thus, I gave up.
> > OK, so stuck forever ;). Interesting. So we probably missed a wakeup
> > somehow or j_barrier_count got corrupted. I suppose you are not able to
> > reproduce the hang, are you? Looking at the code, it looks safe and I have
> > no clue how it could happen. So unless you are able to see the issue again
> > (so that we can gather some more debug information), I'm not able to help...
>
> I don't know how to reproduce the hang, but I got below message (I think it was
> 2.6.18-238.5.1.el5) and CPU usage became 100% soon after I resumed a VM (which
> took a few *minutes* to resume) running on the VMware Workstation.
Hmm, but 2.6.18 is really ancient. I'm afraid I cannot help you much with
it. There have been numerous bugs fixed since then.
> INFO: task kjournald:543 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald D 00000122 3072 543 9 573 530 (L-TLB)
> c193cf40 00000046 c37e5098 00000122 c1a41254 00000001 c193cf18 00000001
> f7c25aa0 c3807a0a 00000122 00022972 00000000 f7c25bac c1806f00 f7096e40
> f7c25aa0 00000000 c195ec50 c1803e00 f7096e40 c195eaa0 c193cf70 ffffffff
> Call Trace:
> [<f886609b>] journal_commit_transaction+0x137/0xefc [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<c042df7e>] try_to_del_timer_sync+0x65/0x6c
> [<f8869c21>] kjournald+0xa1/0x1c2 [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8869b80>] kjournald+0x0/0x1c2 [jbd]
> [<c0436af7>] kthread+0xc0/0xed
> [<c0436a37>] kthread+0x0/0xed
> [<c0405c87>] kernel_thread_helper+0x7/0x10
> =======================
> INFO: task login:2107 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> login D 00000151 2752 2107 1 2108 604 (NOTLB)
> c1940f38 00000086 ff6037b8 00000151 00000000 c1940f3c c0484602 0000000a
> f7c25000 ffd43070 00000151 0073f8b8 00000000 f7c2510c c1806f00 f49683c0
> f1eff380 00000000 c0475922 f7d6d8c0 f7679cc8 f1eff380 c1940f3c ffffffff
> Call Trace:
> [<c0484602>] may_open+0x125/0x203
> [<c0475922>] __dentry_open+0xea/0x1ab
> [<c06201a5>] __mutex_lock_slowpath+0x4d/0x7c
> [<c06201e3>] .text.lock.mutex+0xf/0x14
> [<c049b1f3>] inotify_inode_queue_event+0x45/0xc6
> [<c0477828>] generic_file_llseek+0x2a/0xd2
> [<c04777fe>] generic_file_llseek+0x0/0xd2
> [<c0476b41>] vfs_llseek+0x30/0x34
> [<c0477a31>] sys_llseek+0x43/0x84
> [<c04873a0>] sys_fcntl64+0x6f/0x80
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task syslogd:2633 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syslogd D 00000122 2412 2633 1 2636 2621 (NOTLB)
> f7138ed4 00000082 c35f4a61 00000122 0000000e 00000000 f7138f18 00000007
> c195eaa0 c37e5098 00000122 001f0637 00000000 c195ebac c1806f00 f7096e40
> c04975ff 00000000 00000282 c042deff c081ef80 f7138edc c18073bc c042e073
> Call Trace:
> [<c04975ff>] mpage_writepages+0x2b1/0x310
> [<c042deff>] lock_timer_base+0x15/0x2f
> [<c042e073>] __mod_timer+0xda/0xe4
> [<c061fe71>] schedule_timeout+0x71/0x8c
> [<c042d6ab>] process_timeout+0x0/0x5
> [<f886459f>] journal_stop+0xd3/0x1ba [jbd]
> [<c0495c42>] __writeback_single_inode+0x197/0x2a3
> [<c045edfc>] do_writepages+0x2b/0x32
> [<c045a950>] __filemap_fdatawrite_range+0x66/0x72
> [<c04962de>] sync_inode+0x19/0x24
> [<f88e8009>] ext3_sync_file+0xb1/0xdc [ext3]
> [<c0479529>] do_fsync+0x41/0x83
> [<c0479588>] __do_fsync+0x1d/0x2b
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task auditd:2688 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> auditd D 00000122 2396 2688 1 2940 2687 (NOTLB)
> f4cc1ed0 00000086 c34cb5c9 00000122 c0459b22 0000000e 00000000 00000009
> f77bcaa0 c35f4a61 00000122 00129498 00000000 f77bcbac c1806f00 f7096200
> c1a41268 00000000 00000000 f4cc1ecc c041ec40 00000000 f4cc1ed8 c1a41250
> Call Trace:
> [<c0459b22>] find_get_pages_tag+0x30/0x75
> [<c041ec40>] __wake_up+0x2a/0x3d
> [<c0436d07>] prepare_to_wait+0x24/0x46
> [<f88691ea>] log_wait_commit+0x80/0xc7 [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8864661>] journal_stop+0x195/0x1ba [jbd]
> [<c0495c42>] __writeback_single_inode+0x197/0x2a3
> [<c045edfc>] do_writepages+0x2b/0x32
> [<c045a950>] __filemap_fdatawrite_range+0x66/0x72
> [<c04962de>] sync_inode+0x19/0x24
> [<f88e8009>] ext3_sync_file+0xb1/0xdc [ext3]
> [<c0479529>] do_fsync+0x41/0x83
> [<c0479588>] __do_fsync+0x1d/0x2b
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task nmbd:3186 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> nmbd D 00000177 2572 3186 1 3198 3183 (NOTLB)
> f48c8e28 00200086 eb451402 00000177 00000010 c0637d00 00000000 00000007
> f7140000 eb4f644b 00000177 000a5049 00000000 f714010c c1806f00 f77233c0
> ffffffff 00000000 00000000 00000000 00000000 00000000 f48c8e40 ffffffff
> Call Trace:
> [<f8864d35>] start_this_handle+0x21b/0x30d [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8864ed3>] journal_start+0xac/0xdb [jbd]
> [<f88ec262>] ext3_dirty_inode+0x24/0x66 [ext3]
> [<c0496310>] __mark_inode_dirty+0x27/0x14f
> [<c048e6b1>] file_update_time+0x30/0xa1
> [<c0464d5b>] do_wp_page+0x5b7/0x5f1
> [<c04663e6>] __handle_mm_fault+0xa33/0xaac
> [<c048b509>] fcntl_setlk64+0x1f9/0x21f
> [<c0622048>] do_page_fault+0x23a/0x52d
> [<c0621e0e>] do_page_fault+0x0/0x52d
> [<c0405abd>] error_code+0x39/0x40
> =======================
> INFO: task login:3370 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> login D 00000122 2672 3370 1 3288 (NOTLB)
> eff1cdc4 00000086 c9528bc8 00000122 00a41000 c062c080 00000007 00000009
> f776f000 c9702c11 00000122 001da049 00000000 f776f10c c1806f00 f4a85900
> 00000000 00000000 00160057 ffffffff 00000000 ffffffff eff1cddc ffffffff
> Call Trace:
> [<f8864d35>] start_this_handle+0x21b/0x30d [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8864ed3>] journal_start+0xac/0xdb [jbd]
> [<f88ec262>] ext3_dirty_inode+0x24/0x66 [ext3]
> [<c0496310>] __mark_inode_dirty+0x27/0x14f
> [<c048e6b1>] file_update_time+0x30/0xa1
> [<c045b8dd>] __generic_file_aio_write_nolock+0x38d/0x52a
> [<c04c955b>] avc_has_perm+0x3c/0x46
> [<c045bad1>] generic_file_aio_write+0x57/0xaa
> [<f88e7e91>] ext3_file_write+0x19/0x83 [ext3]
> [<c0476c5e>] do_sync_write+0xb6/0xf1
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<c0476ba8>] do_sync_write+0x0/0xf1
> [<c04774e7>] vfs_write+0xa1/0x143
> [<c0477b11>] sys_write+0x3c/0x63
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task kjournald:543 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald D 00000122 3072 543 9 573 530 (L-TLB)
> c193cf40 00000046 c37e5098 00000122 c1a41254 00000001 c193cf18 00000001
> f7c25aa0 c3807a0a 00000122 00022972 00000000 f7c25bac c1806f00 f7096e40
> f7c25aa0 00000000 c195ec50 c1803e00 f7096e40 c195eaa0 c193cf70 ffffffff
> Call Trace:
> [<f886609b>] journal_commit_transaction+0x137/0xefc [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<c042df7e>] try_to_del_timer_sync+0x65/0x6c
> [<f8869c21>] kjournald+0xa1/0x1c2 [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8869b80>] kjournald+0x0/0x1c2 [jbd]
> [<c0436af7>] kthread+0xc0/0xed
> [<c0436a37>] kthread+0x0/0xed
> [<c0405c87>] kernel_thread_helper+0x7/0x10
> =======================
> INFO: task login:2107 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> login D 00000151 2752 2107 1 2108 604 (NOTLB)
> c1940f38 00000086 ff6037b8 00000151 00000000 c1940f3c c0484602 0000000a
> f7c25000 ffd43070 00000151 0073f8b8 00000000 f7c2510c c1806f00 f49683c0
> f1eff380 00000000 c0475922 f7d6d8c0 f7679cc8 f1eff380 c1940f3c ffffffff
> Call Trace:
> [<c0484602>] may_open+0x125/0x203
> [<c0475922>] __dentry_open+0xea/0x1ab
> [<c06201a5>] __mutex_lock_slowpath+0x4d/0x7c
> [<c06201e3>] .text.lock.mutex+0xf/0x14
> [<c049b1f3>] inotify_inode_queue_event+0x45/0xc6
> [<c0477828>] generic_file_llseek+0x2a/0xd2
> [<c04777fe>] generic_file_llseek+0x0/0xd2
> [<c0476b41>] vfs_llseek+0x30/0x34
> [<c0477a31>] sys_llseek+0x43/0x84
> [<c04873a0>] sys_fcntl64+0x6f/0x80
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task syslogd:2633 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syslogd D 00000122 2412 2633 1 2636 2621 (NOTLB)
> f7138ed4 00000082 c35f4a61 00000122 0000000e 00000000 f7138f18 00000007
> c195eaa0 c37e5098 00000122 001f0637 00000000 c195ebac c1806f00 f7096e40
> c04975ff 00000000 00000282 c042deff c081ef80 f7138edc c18073bc c042e073
> Call Trace:
> [<c04975ff>] mpage_writepages+0x2b1/0x310
> [<c042deff>] lock_timer_base+0x15/0x2f
> [<c042e073>] __mod_timer+0xda/0xe4
> [<c061fe71>] schedule_timeout+0x71/0x8c
> [<c042d6ab>] process_timeout+0x0/0x5
> [<f886459f>] journal_stop+0xd3/0x1ba [jbd]
> [<c0495c42>] __writeback_single_inode+0x197/0x2a3
> [<c045edfc>] do_writepages+0x2b/0x32
> [<c045a950>] __filemap_fdatawrite_range+0x66/0x72
> [<c04962de>] sync_inode+0x19/0x24
> [<f88e8009>] ext3_sync_file+0xb1/0xdc [ext3]
> [<c0479529>] do_fsync+0x41/0x83
> [<c0479588>] __do_fsync+0x1d/0x2b
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
> INFO: task auditd:2688 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> auditd D 00000122 2396 2688 1 2940 2687 (NOTLB)
> f4cc1ed0 00000086 c34cb5c9 00000122 c0459b22 0000000e 00000000 00000009
> f77bcaa0 c35f4a61 00000122 00129498 00000000 f77bcbac c1806f00 f7096200
> c1a41268 00000000 00000000 f4cc1ecc c041ec40 00000000 f4cc1ed8 c1a41250
> Call Trace:
> [<c0459b22>] find_get_pages_tag+0x30/0x75
> [<c041ec40>] __wake_up+0x2a/0x3d
> [<c0436d07>] prepare_to_wait+0x24/0x46
> [<f88691ea>] log_wait_commit+0x80/0xc7 [jbd]
> [<c0436bbb>] autoremove_wake_function+0x0/0x2d
> [<f8864661>] journal_stop+0x195/0x1ba [jbd]
> [<c0495c42>] __writeback_single_inode+0x197/0x2a3
> [<c045edfc>] do_writepages+0x2b/0x32
> [<c045a950>] __filemap_fdatawrite_range+0x66/0x72
> [<c04962de>] sync_inode+0x19/0x24
> [<f88e8009>] ext3_sync_file+0xb1/0xdc [ext3]
> [<c0479529>] do_fsync+0x41/0x83
> [<c0479588>] __do_fsync+0x1d/0x2b
> [<c0404f4b>] syscall_call+0x7/0xb
> =======================
>
> If this comes from the same cause, it might be related with clock or timer event.
> I couldn't get more information because sysrq didn't work.
I see. From the traces it looks as if some processes are waiting in
journal_stop() in transaction batching code (which then blocks a
transaction commit). So indeed if the timer / wakeup code didn't work quite
well, it could cause situations like this.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2011-04-08 18:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-08 5:26 [2.6.32 ubuntu] I/O hang at start_this_handle Tetsuo Handa
2011-02-15 15:16 ` Jan Kara
2011-02-15 21:52 ` Tetsuo Handa
2011-02-16 15:53 ` Jan Kara
2011-02-17 8:13 ` Tetsuo Handa
2011-02-17 15:38 ` Jan Kara
2011-04-08 14:38 ` Tetsuo Handa
2011-04-08 18:29 ` Jan Kara [this message]
2011-04-09 2:21 ` Tetsuo Handa
2011-04-12 19:50 ` Jan Kara
2011-04-12 22:00 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110408182916.GA16729@quack.suse.cz \
--to=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).