From: Rick Boone <rick@buzz-media.com>
To: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: I/O related (?) domU crash on xen 4.0 + pv_ops
Date: Tue, 8 Jun 2010 13:16:05 -0700 [thread overview]
Message-ID: <4C0EA505.8050500@buzz-media.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 6182 bytes --]
Hey,
I'm running into an issue with pv_ops kernel (2.6.31.13) and xen 4.0 -
my domU's are continually locking up, under heavy IO load. My issue
seems similar to what these guys are reportiing:
https://bugzilla.redhat.com/show_bug.cgi?id=551552
https://bugzilla.redhat.com/show_bug.cgi?id=526627
https://bugzilla.redhat.com/show_bug.cgi?id=550724
Any solutions/ideas that haven't been covered in those reports? I've
turned off tickless kernel, and have also set the guest I/O scheduler to
"noop", but machines are still crashing. I'm using LVM-backed block
devices on the dom0. For awhile, I didn't have the kernel set to panic
on a hung task, and from digging around while the kernel was still up, I
was able to determine that the device that seems to be causing issues is
one that sees a lot of IO (it's receiving all of the logs on a heavily
used web server).
Here's a couple of my tracebacks:
1)
INFO: task pdflush:36 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ffff8801e963f9c0 0000000000000282 00000000e4f10f56 ffff8801e963f9d0
ffff8801eb7a31b0 ffff8801eb67c240 ffff8801eb7a3590 0000000103902b94
00000000e4f10f56 ffff8801e963fa70 ffff8801e963f9b0 ffffffff811f14ec
Call Trace:
[<ffffffff811f14ec>] ? blk_unplug+0x56/0x72
[<ffffffff813f1ee0>] io_schedule+0x37/0x59
[<ffffffff8112b1a8>] ? block_sync_page+0x5b/0x71
[<ffffffff810c2e77>] sync_page+0x5a/0x72
[<ffffffff813f2237>] __wait_on_bit_lock+0x55/0xb3
[<ffffffff810c2e1d>] ? sync_page+0x0/0x72
[<ffffffff810c2b0d>] ? find_get_pages_tag+0xf7/0x144
[<ffffffff810c2dce>] __lock_page+0x71/0x8c
[<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
[<ffffffff810cafe4>] write_cache_pages+0x201/0x3bf
[<ffffffff810cac23>] ? __writepage+0x0/0x5a
[<ffffffff8100ef6c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff810cb1d7>] generic_writepages+0x35/0x4f
[<ffffffff810cb230>] do_writepages+0x3f/0x5e
[<ffffffff811261e5>] writeback_single_inode+0x161/0x2d7
[<ffffffff811267ab>] generic_sync_sb_inodes+0x1ef/0x355
[<ffffffff810cc726>] ? pdflush+0x0/0x286
[<ffffffff8112692d>] sync_sb_inodes+0x1c/0x32
[<ffffffff811269bc>] writeback_inodes+0x79/0xdf
[<ffffffff81107819>] ? sync_supers+0xb3/0xce
[<ffffffff810cc1f6>] wb_kupdate+0xb9/0x13a
[<ffffffff810cc84c>] ? pdflush+0x126/0x286
[<ffffffff810cc889>] pdflush+0x163/0x286
[<ffffffff810cc13d>] ? wb_kupdate+0x0/0x13a
[<ffffffff810cc726>] ? pdflush+0x0/0x286
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
1 lock held by pdflush/36:
#0: (&type->s_umount_key#23){......}, at: [<ffffffff811269a6>]
writeback_inodes+0x63/0xdf
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #4
Call Trace:
[<ffffffff8105c8f3>] panic+0xb2/0x168
[<ffffffff81085881>] ? print_lock+0x96/0xb1
[<ffffffff810861d5>] ? lockdep_print_held_locks+0xa5/0xc9
[<ffffffff8101907a>] ? show_stack+0x2a/0x40
[<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
[<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
[<ffffffff810b146c>] watchdog+0x209/0x258
[<ffffffff810b12d8>] ? watchdog+0x75/0x258
[<ffffffff8104a45f>] ? complete+0x52/0x71
[<ffffffff810b1263>] ? watchdog+0x0/0x258
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
---------------------------------------------
2)
INFO: task kjournald:951 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ffff8801c8993bf0 0000000000000282 00000000cf63a654 ffff8801c8993c00
ffff8801ca899090 ffff8801ccae31b0 ffff8801ca899470 0000000000000001
0000000000000001 0000000000000200 0000000000000001 000000000160015f
Call Trace:
[<ffffffff813f1ee0>] io_schedule+0x37/0x59
[<ffffffff811f1528>] ? blk_backing_dev_unplug+0x20/0x36
[<ffffffff8112cb73>] sync_buffer+0x51/0x69
[<ffffffff813f2387>] __wait_on_bit+0x54/0x9c
[<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
[<ffffffff8112cb22>] ? sync_buffer+0x0/0x69
[<ffffffff813f244c>] out_of_line_wait_on_bit+0x7d/0x9e
[<ffffffff8107569f>] ? wake_bit_function+0x0/0x51
[<ffffffff8112ca8f>] __wait_on_buffer+0x32/0x48
[<ffffffffa005cf62>] journal_commit_transaction+0x684/0x12f2 [jbd]
[<ffffffff8100fa82>] ? check_events+0x12/0x20
[<ffffffff8100fa6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff813f4ad7>] ? _spin_unlock_irqrestore+0x44/0x5f
[<ffffffff81067de4>] ? try_to_del_timer_sync+0x65/0x84
[<ffffffff81067e03>] ? del_timer_sync+0x0/0xa0
[<ffffffffa0061dd4>] kjournald+0x161/0x3ae [jbd]
[<ffffffff81075641>] ? autoremove_wake_function+0x0/0x5e
[<ffffffffa0061c73>] ? kjournald+0x0/0x3ae [jbd]
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
no locks held by kjournald/951.
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 34, comm: khungtaskd Not tainted 2.6.31.13-xen-4.0.0 #18
Call Trace:
[<ffffffff8105c8f3>] panic+0xb2/0x168
[<ffffffff81086176>] ? lockdep_print_held_locks+0x46/0xc9
[<ffffffff8101907a>] ? show_stack+0x2a/0x40
[<ffffffff8102f025>] ? touch_nmi_watchdog+0x6c/0x87
[<ffffffff810862c7>] ? __debug_show_held_locks+0x33/0x49
[<ffffffff810b146c>] watchdog+0x209/0x258
[<ffffffff810b12d8>] ? watchdog+0x75/0x258
[<ffffffff8104a45f>] ? complete+0x52/0x71
[<ffffffff810b1263>] ? watchdog+0x0/0x258
[<ffffffff810754ce>] kthread+0x9e/0xa8
[<ffffffff8101606a>] child_rip+0xa/0x20
[<ffffffff810151ac>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff810159e6>] ? retint_restore_args+0x5/0x6
[<ffffffff81016060>] ? child_rip+0x0/0x20
-- Rick Boone
[-- Attachment #1.2: Type: text/html, Size: 8292 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next reply other threads:[~2010-06-08 20:16 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-08 20:16 Rick Boone [this message]
2010-06-08 23:17 ` I/O related (?) domU crash on xen 4.0 + pv_ops Jeremy Fitzhardinge
2010-06-10 18:42 ` Rick Boone
2010-06-10 22:47 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C0EA505.8050500@buzz-media.com \
--to=rick@buzz-media.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.