From mboxrd@z Thu Jan 1 00:00:00 1970 From: RW Subject: Re: repeatable hang with loop mount and heavy IO in guest Date: Thu, 21 Jan 2010 21:08:38 +0100 Message-ID: <4B58B446.4080801@tauceti.net> References: <4B588E63.4060700@nagafix.co.uk> Reply-To: kvm@tauceti.net Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Antoine Martin Return-path: Received: from tauceti.net ([62.245.250.166]:43315 "EHLO www.tauceti.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754703Ab0AUUE7 (ORCPT ); Thu, 21 Jan 2010 15:04:59 -0500 In-Reply-To: <4B588E63.4060700@nagafix.co.uk> Sender: kvm-owner@vger.kernel.org List-ID: Some months ago I also thought elevator=noop should be a good idea. But it isn't. It works good as long as you only do short IO requests. Try using deadline in host and guest. Robert On 01/21/10 18:26, Antoine Martin wrote: > I've tried various guests, including most recent Fedora12 kernels, > custom 2.6.32.x > All of them hang around the same point (~1GB written) when I do heavy IO > write inside the guest. > I have waited 30 minutes to see if the guest would recover, but it just > sits there, not writing back any data, not doing anything - but > certainly not allowing any new IO writes. The host has some load on it, > but nothing heavy enough to completely hand a guest for that long. > > mount -o loop some_image.fs ./somewhere bs=512 > dd if=/dev/zero of=/somewhere/zero > then after ~1GB: sync > > Host is running: 2.6.31.4 > QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88) > > Guests are booted with "elevator=noop" as the filesystems are stored as > files, accessed as virtio disks. > > > The "hung" backtraces always look similar to these: > [ 361.460136] INFO: task loop0:2097 blocked for more than 120 seconds. > [ 361.460139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 361.460142] loop0 D ffff88000b92c848 0 2097 2 > 0x00000080 > [ 361.460148] ffff88000b92c5d0 0000000000000046 ffff880008c1f810 > ffff880009829fd8 > [ 361.460153] ffff880009829fd8 ffff880009829fd8 ffff88000a21ee80 > ffff88000b92c5d0 > [ 361.460157] ffff880009829610 ffffffff8181b768 ffff880001af33b0 > 0000000000000002 > [ 361.460161] Call Trace: > [ 361.460216] [] ? sync_page+0x0/0x43 > [ 361.460253] [] ? io_schedule+0x2c/0x43 > [ 361.460257] [] ? sync_page+0x3e/0x43 > [ 361.460261] [] ? __wait_on_bit+0x41/0x71 > [ 361.460264] [] ? wait_on_page_bit+0x6a/0x70 > [ 361.460283] [] ? wake_bit_function+0x0/0x23 > [ 361.460287] [] ? shrink_page_list+0x3e5/0x61e > [ 361.460291] [] ? schedule_timeout+0xa3/0xbe > [ 361.460305] [] ? autoremove_wake_function+0x0/0x2e > [ 361.460308] [] ? shrink_zone+0x7e1/0xaf6 > [ 361.460310] [] ? determine_dirtyable_memory+0xd/0x17 > [ 361.460314] [] ? isolate_pages_global+0xa3/0x216 > [ 361.460316] [] ? mark_page_accessed+0x2a/0x39 > [ 361.460335] [] ? __find_get_block+0x13b/0x15c > [ 361.460337] [] ? try_to_free_pages+0x1ab/0x2c9 > [ 361.460340] [] ? isolate_pages_global+0x0/0x216 > [ 361.460343] [] ? __alloc_pages_nodemask+0x394/0x564 > [ 361.460350] [] ? __slab_alloc+0x137/0x44f > [ 361.460371] [] ? radix_tree_preload+0x1f/0x6a > [ 361.460374] [] ? kmem_cache_alloc+0x5d/0x88 > [ 361.460376] [] ? radix_tree_preload+0x1f/0x6a > [ 361.460379] [] ? add_to_page_cache_locked+0x1d/0xf1 > [ 361.460381] [] ? add_to_page_cache_lru+0x27/0x57 > [ 361.460384] [] ? > grab_cache_page_write_begin+0x7a/0xa0 > [ 361.460399] [] ? ext3_write_begin+0x7e/0x201 > [ 361.460417] [] ? do_lo_send_aops+0xa1/0x174 > [ 361.460420] [] ? virt_to_head_page+0x9/0x2a > [ 361.460422] [] ? loop_thread+0x309/0x48a > [ 361.460425] [] ? do_lo_send_aops+0x0/0x174 > [ 361.460427] [] ? autoremove_wake_function+0x0/0x2e > [ 361.460430] [] ? loop_thread+0x0/0x48a > [ 361.460432] [] ? kthread+0x78/0x80 > [ 361.460441] [] ? finish_task_switch+0x2b/0x78 > [ 361.460454] [] ? child_rip+0xa/0x20 > [ 361.460460] [] ? native_pax_close_kernel+0x0/0x32 > [ 361.460463] [] ? kthread+0x0/0x80 > [ 361.460469] [] ? child_rip+0x0/0x20 > [ 361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds. > [ 361.460473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 361.460474] kjournald D ffff88000b92e558 0 2098 2 > 0x00000080 > [ 361.460477] ffff88000b92e2e0 0000000000000046 ffff88000aad9840 > ffff88000983ffd8 > [ 361.460480] ffff88000983ffd8 ffff88000983ffd8 ffffffff81808e00 > ffff88000b92e2e0 > [ 361.460483] ffff88000983fcf0 ffffffff8181b768 ffff880001af3c40 > 0000000000000002 > [ 361.460486] Call Trace: > [ 361.460488] [] ? sync_buffer+0x0/0x3c > [ 361.460491] [] ? io_schedule+0x2c/0x43 > [ 361.460494] [] ? sync_buffer+0x38/0x3c > [ 361.460496] [] ? __wait_on_bit+0x41/0x71 > [ 361.460499] [] ? sync_buffer+0x0/0x3c > [ 361.460501] [] ? out_of_line_wait_on_bit+0x6a/0x76 > [ 361.460504] [] ? wake_bit_function+0x0/0x23 > [ 361.460514] [] ? > journal_commit_transaction+0x769/0xbb8 > [ 361.460517] [] ? finish_task_switch+0x2b/0x78 > [ 361.460519] [] ? thread_return+0x40/0x79 > [ 361.460522] [] ? kjournald+0xc7/0x1cb > [ 361.460525] [] ? autoremove_wake_function+0x0/0x2e > [ 361.460527] [] ? kjournald+0x0/0x1cb > [ 361.460530] [] ? kthread+0x78/0x80 > [ 361.460532] [] ? finish_task_switch+0x2b/0x78 > [ 361.460534] [] ? child_rip+0xa/0x20 > [ 361.460537] [] ? native_pax_close_kernel+0x0/0x32 > [ 361.460540] [] ? kthread+0x0/0x80 > [ 361.460542] [] ? child_rip+0x0/0x20 > [ 361.460544] INFO: task dd:2132 blocked for more than 120 seconds. > [ 361.460546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 361.460547] dd D ffff88000a21f0f8 0 2132 2090 > 0x00000080 > [ 361.460550] ffff88000a21ee80 0000000000000082 ffff88000a21ee80 > ffff88000b3affd8 > [ 361.460553] ffff88000b3affd8 ffff88000b3affd8 ffffffff81808e00 > ffff880001af3510 > [ 361.460556] ffff88000b78eaf0 ffff88000b3daa00 ffff880008de6c40 > ffff88000ab44a80 > [ 361.460558] Call Trace: > [ 361.460561] [] ? do_get_write_access+0x1f5/0x3b6 > [ 361.460564] [] ? get_dirty_limits+0x1dc/0x210 > [ 361.460566] [] ? wake_bit_function+0x0/0x23 > [ 361.460569] [] ? __getblk+0x1c/0x26c > [ 361.460576] [] ? __func__.28446+0x0/0x20 > [ 361.460578] [] ? journal_get_write_access+0x22/0x34 > [ 361.460582] [] ? > __ext3_journal_get_write_access+0x1e/0x47 > [ 361.460584] [] ? ext3_reserve_inode_write+0x3e/0x75 > [ 361.460587] [] ? ext3_mark_inode_dirty+0x16/0x31 > [ 361.460589] [] ? ext3_dirty_inode+0x62/0x7a > [ 361.460592] [] ? __mark_inode_dirty+0x25/0x134 > [ 361.460595] [] ? file_update_time+0xd4/0xfb > [ 361.460598] [] ? __generic_file_aio_write+0x16c/0x290 > [ 361.460600] [] ? generic_file_aio_write+0x59/0x9f > [ 361.460603] [] ? do_sync_write+0xcd/0x112 > [ 361.460606] [] ? pvclock_clocksource_read+0x3a/0x70 > [ 361.460609] [] ? autoremove_wake_function+0x0/0x2e > [ 361.460612] [] ? __switch_to+0x177/0x255 > [ 361.460621] [] ? selinux_file_permission+0x4d/0xa3 > [ 361.460624] [] ? vfs_write+0xfc/0x138 > [ 361.460627] [] ? sys_write+0x45/0x6e > [ 361.460629] [] ? system_call_fastpath+0x16/0x1b > [ 361.460632] INFO: task sync:2164 blocked for more than 120 seconds. > [ 361.460633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 361.460639] sync D ffff88000ba11f88 0 2164 2136 > 0x00000080 > [ 361.460642] ffff88000ba11d10 0000000000000086 0000000100000246 > ffff88000b1e9fd8 > [ 361.460645] ffff88000b1e9fd8 ffff88000b1e9fd8 ffffffff81808e00 > ffff88000b3daa00 > [ 361.460648] 00000000000001cc ffff88000b1e9e68 ffff88000b1e9e80 > ffff88000b3daa78 > [ 361.460651] Call Trace: > [ 361.460653] [] ? log_wait_commit+0x9e/0xe0 > [ 361.460656] [] ? autoremove_wake_function+0x0/0x2e > [ 361.460659] [] ? ext3_sync_fs+0x42/0x4b > [ 361.460669] [] ? sync_quota_sb+0x45/0xf6 > [ 361.460672] [] ? __sync_filesystem+0x43/0x70 > [ 361.460675] [] ? sync_filesystems+0x87/0xbd > [ 361.460677] [] ? sys_sync+0x1c/0x2e > [ 361.460679] [] ? system_call_fastpath+0x16/0x1b > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html