From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756802Ab0ICPSA (ORCPT ); Fri, 3 Sep 2010 11:18:00 -0400 Received: from claw.goop.org ([74.207.240.146]:59945 "EHLO claw.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753694Ab0ICPR7 (ORCPT ); Fri, 3 Sep 2010 11:17:59 -0400 Message-ID: <4C8111A5.4050204@goop.org> Date: Fri, 03 Sep 2010 08:17:57 -0700 From: Jeremy Fitzhardinge User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Lightning/1.0b2pre Thunderbird/3.1.2 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Pasi_K=E4rkk=E4inen?= CC: Doug Neal , linux-kernel@vger.kernel.org Subject: Re: I/O scheduler deadlocks on Xen virtual block devices References: <20100903122253.GL2804@reaktio.net> In-Reply-To: <20100903122253.GL2804@reaktio.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/03/2010 05:22 AM, Pasi Kärkkäinen wrote: > On Wed, Sep 01, 2010 at 03:17:25PM +0100, Doug Neal wrote: >> Hello all, >> >> I believe I have found a bug in the I/O scheduler which results in >> access to a block device being blocked indefinitely. >> >> The setup: >> * Xen dom0 version 3.4.2. >> * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5. >> * Paravirtual disks and network interfaces. >> * Root filesystem on /dev/xvdb1 >> * A scratch filesystem for the purposes of my tests on /dev/xvda >> mounted on /mnt >> * Both filesystems are ext3, formatted and mounted with defaults >> * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0. >> > Hmm.. recently there was some patches to the Xen block device frontend driver > (sent to lkml). Maybe you should try if those make a difference? > > Jeremy: What do you think? I think this is more likely to be related to the lost interrupt bug which I sent fixes up for last week ("xen: use percpu interrupts for IPIs and VIRQs" and "xen: handle events as edge-triggered"). Doug, as a workaround, try disabling irqbalanced. Thanks, J > -- Pasi > >> Activities leading up to the incident: >> >> To reproduce the bug, I run the VM on a Xen host which has a moderate >> workload of other VMs. (It seems to manifest itself more readily than >> if the host is otherwise idle). >> >> I repeatedly rsync the contents of a Linux install CD to an arbitrary >> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the >> lot, and rsync again. It can sometimes take a few iterations before >> the bug is triggered. Sometimes it's triggered on the rsync, sometimes >> on the rm. >> >> At some point during either the rsync or the rm, all the processes >> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D >> state. After 120 seconds the warnings start to appear in dmesg: >> >> [ 840.070508] INFO: task rm:1455 blocked for more than 120 seconds. >> [ 840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> >> Full dmesg output is below, including call traces which show that each >> of these processes is stuck in io_schedule or sync_buffer called from >> io_schedule. >> >> >> I believe I have eliminated: >> * Problems with the underlying physical device >> - The same bug has manifested itself on two completely separate >> sets of hardware, with different servers, switches, and SANs. >> >> * Problems with the host's iSCSI initiator >> - Other VMs depending on the same iSCSI session are unaffected >> - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted >> on /) are unaffected >> >> Things I've tried: >> * Noop, deadline, cfq and anticipatory elevators. >> * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the >> bug under only one vCPU. >> >> Unfortunately the steps to reproduce are rather long winded once you >> factor in setting up the Xen dom0 which is in this case Citrix >> XenServer, so if there is any way I can help with that please let me >> know, e.g. by arranging remote access to my existing test environment >> or by supplying a VM template image. However I don't believe this >> problem is limited to this environment as I have seen quite a few >> reports of similar or identical issues elsewhere (but not on LKML...): >> >> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html >> - same issue reported with a non-commercial and more recent Xen >> version than mine. Also links to three Redhat bug reports showing >> similar behaviour. >> >> >> Thanks for your time, >> >> Doug >> >> >> >> [ 720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds. >> [ 720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 720.070131] kjournald D ffff8800035bcbc0 0 1011 2 0x00000000 >> [ 720.070140] ffff880002dc1a30 0000000000000246 0000000000015bc0 >> 0000000000015bc0 >> [ 720.070149] ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0 >> ffff8800574dade0 >> [ 720.070156] 0000000000015bc0 ffff880002dc1fd8 0000000000015bc0 >> ffff8800574db1a0 >> [ 720.070164] Call Trace: >> [ 720.070177] [] io_schedule+0x47/0x70 >> [ 720.070184] [] get_request_wait+0xce/0x1a0 >> [ 720.070191] [] ? autoremove_wake_function+0x0/0x40 >> [ 720.070195] [] __make_request+0x95/0x4a0 >> [ 720.070201] [] ? mempool_alloc_slab+0x15/0x20 >> [ 720.070208] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 720.070213] [] generic_make_request+0x1b1/0x4f0 >> [ 720.070218] [] ? mempool_alloc_slab+0x15/0x20 >> [ 720.070223] [] submit_bio+0x80/0x110 >> [ 720.070229] [] submit_bh+0xf9/0x140 >> [ 720.070235] [] journal_submit_data_buffers+0x24c/0x320 >> [ 720.070240] [] journal_commit_transaction+0x2d2/0xe40 >> [ 720.070245] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 720.070250] [] ? _spin_unlock_irqrestore+0x1e/0x30 >> [ 720.070256] [] ? try_to_del_timer_sync+0x79/0xd0 >> [ 720.070261] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 720.070265] [] ? _spin_unlock_irqrestore+0x1e/0x30 >> [ 720.070270] [] kjournald+0xed/0x250 >> [ 720.070275] [] ? autoremove_wake_function+0x0/0x40 >> [ 720.070279] [] ? kjournald+0x0/0x250 >> [ 720.070284] [] kthread+0x96/0xa0 >> [ 720.070289] [] child_rip+0xa/0x20 >> [ 720.070295] [] ? int_ret_from_sys_call+0x7/0x1b >> [ 720.070299] [] ? retint_restore_args+0x5/0x6 >> [ 720.070313] [] ? child_rip+0x0/0x20 >> [ 720.070317] INFO: task rm:1455 blocked for more than 120 seconds. >> [ 720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 720.070329] rm D ffff8800035dabc0 0 1455 980 0x00000000 >> [ 720.070333] ffff880058bedb68 0000000000000286 0000000000015bc0 >> 0000000000015bc0 >> [ 720.070338] ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0 >> ffff880002d396f0 >> [ 720.070343] 0000000000015bc0 ffff880058bedfd8 0000000000015bc0 >> ffff880002d39ab0 >> [ 720.070348] Call Trace: >> [ 720.070351] [] ? sync_buffer+0x0/0x50 >> [ 720.070354] [] io_schedule+0x47/0x70 >> [ 720.070357] [] sync_buffer+0x45/0x50 >> [ 720.070360] [] __wait_on_bit_lock+0x5a/0xc0 >> [ 720.070363] [] ? sync_buffer+0x0/0x50 >> [ 720.070366] [] out_of_line_wait_on_bit_lock+0x78/0x90 >> [ 720.070369] [] ? wake_bit_function+0x0/0x40 >> [ 720.070373] [] ? try_to_free_buffers+0x79/0xc0 >> [ 720.070376] [] __lock_buffer+0x36/0x40 >> [ 720.070379] [] journal_invalidatepage+0xc6/0xd0 >> [ 720.070383] [] ext3_invalidatepage+0x3d/0x40 >> [ 720.070388] [] do_invalidatepage+0x25/0x30 >> [ 720.070390] [] truncate_inode_page+0x9a/0xb0 >> [ 720.070393] [] truncate_inode_pages_range+0x160/0x440 >> [ 720.070396] [] ? ext3_delete_inode+0x0/0xf0 >> [ 720.070399] [] truncate_inode_pages+0x15/0x20 >> [ 720.070402] [] ext3_delete_inode+0x1d/0xf0 >> [ 720.070405] [] ? ext3_delete_inode+0x0/0xf0 >> [ 720.070409] [] generic_delete_inode+0xe3/0x1c0 >> [ 720.070412] [] generic_drop_inode+0x65/0x80 >> [ 720.070415] [] iput+0x62/0x70 >> [ 720.070419] [] do_unlinkat+0x112/0x1d0 >> [ 720.070422] [] ? sys_getdents+0xb5/0xf0 >> [ 720.070425] [] sys_unlinkat+0x22/0x40 >> [ 720.070428] [] system_call_fastpath+0x16/0x1b >> [ 840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds. >> [ 840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 840.070125] flush-202:0 D ffff88000359ebc0 0 208 2 0x00000000 >> [ 840.070132] ffff880002d698c0 0000000000000246 0000000000015bc0 >> 0000000000015bc0 >> [ 840.070142] ffff88007d444890 ffff880002d69fd8 0000000000015bc0 >> ffff88007d4444d0 >> [ 840.070149] 0000000000015bc0 ffff880002d69fd8 0000000000015bc0 >> ffff88007d444890 >> [ 840.070156] Call Trace: >> [ 840.070170] [] ? sync_buffer+0x0/0x50 >> [ 840.070177] [] io_schedule+0x47/0x70 >> [ 840.070182] [] sync_buffer+0x45/0x50 >> [ 840.070187] [] __wait_on_bit_lock+0x5a/0xc0 >> [ 840.070191] [] ? sync_buffer+0x0/0x50 >> [ 840.070196] [] ? end_buffer_async_write+0x0/0x180 >> [ 840.070201] [] out_of_line_wait_on_bit_lock+0x78/0x90 >> [ 840.070208] [] ? wake_bit_function+0x0/0x40 >> [ 840.070212] [] __lock_buffer+0x36/0x40 >> [ 840.070217] [] __block_write_full_page+0x383/0x3c0 >> [ 840.070222] [] ? end_buffer_async_write+0x0/0x180 >> [ 840.070227] [] block_write_full_page_endio+0xe0/0x120 >> [ 840.070232] [] ? buffer_unmapped+0x0/0x20 >> [ 840.070237] [] block_write_full_page+0x15/0x20 >> [ 840.070241] [] ext3_ordered_writepage+0x1dd/0x200 >> [ 840.070248] [] __writepage+0x17/0x40 >> [ 840.070254] [] write_cache_pages+0x21f/0x4d0 >> [ 840.070258] [] ? __writepage+0x0/0x40 >> [ 840.070263] [] generic_writepages+0x24/0x30 >> [ 840.070268] [] do_writepages+0x35/0x40 >> [ 840.070273] [] writeback_single_inode+0xf6/0x3d0 >> [ 840.070279] [] writeback_inodes_wb+0x40e/0x5e0 >> [ 840.070283] [] wb_writeback+0x10a/0x1d0 >> [ 840.070289] [] ? try_to_del_timer_sync+0x79/0xd0 >> [ 840.070294] [] ? schedule_timeout+0x19d/0x300 >> [ 840.070298] [] wb_do_writeback+0x18c/0x1a0 >> [ 840.070303] [] bdi_writeback_task+0x53/0xe0 >> [ 840.070308] [] bdi_start_fn+0x86/0x100 >> [ 840.070312] [] ? bdi_start_fn+0x0/0x100 >> [ 840.070317] [] kthread+0x96/0xa0 >> [ 840.070322] [] child_rip+0xa/0x20 >> [ 840.070329] [] ? int_ret_from_sys_call+0x7/0x1b >> [ 840.070333] [] ? retint_restore_args+0x5/0x6 >> [ 840.070337] [] ? child_rip+0x0/0x20 >> [ 840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds. >> [ 840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 840.070363] kjournald D ffff8800035bcbc0 0 1011 2 0x00000000 >> [ 840.070378] ffff880002dc1a30 0000000000000246 0000000000015bc0 >> 0000000000015bc0 >> [ 840.070385] ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0 >> ffff8800574dade0 >> [ 840.070392] 0000000000015bc0 ffff880002dc1fd8 0000000000015bc0 >> ffff8800574db1a0 >> [ 840.070399] Call Trace: >> [ 840.070404] [] io_schedule+0x47/0x70 >> [ 840.070409] [] get_request_wait+0xce/0x1a0 >> [ 840.070414] [] ? autoremove_wake_function+0x0/0x40 >> [ 840.070418] [] __make_request+0x95/0x4a0 >> [ 840.070423] [] ? mempool_alloc_slab+0x15/0x20 >> [ 840.070428] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 840.070434] [] generic_make_request+0x1b1/0x4f0 >> [ 840.070438] [] ? mempool_alloc_slab+0x15/0x20 >> [ 840.070443] [] submit_bio+0x80/0x110 >> [ 840.070448] [] submit_bh+0xf9/0x140 >> [ 840.070463] [] journal_submit_data_buffers+0x24c/0x320 >> [ 840.070466] [] journal_commit_transaction+0x2d2/0xe40 >> [ 840.070470] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 840.070473] [] ? _spin_unlock_irqrestore+0x1e/0x30 >> [ 840.070476] [] ? try_to_del_timer_sync+0x79/0xd0 >> [ 840.070479] [] ? xen_restore_fl_direct_end+0x0/0x1 >> [ 840.070482] [] ? _spin_unlock_irqrestore+0x1e/0x30 >> [ 840.070485] [] kjournald+0xed/0x250 >> [ 840.070488] [] ? autoremove_wake_function+0x0/0x40 >> [ 840.070491] [] ? kjournald+0x0/0x250 >> [ 840.070494] [] kthread+0x96/0xa0 >> [ 840.070497] [] child_rip+0xa/0x20 >> [ 840.070500] [] ? int_ret_from_sys_call+0x7/0x1b >> [ 840.070503] [] ? retint_restore_args+0x5/0x6 >> [ 840.070506] [] ? child_rip+0x0/0x20 >> [ 840.070508] INFO: task rm:1455 blocked for more than 120 seconds. >> [ 840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [ 840.070519] rm D ffff8800035dabc0 0 1455 980 0x00000000 >> [ 840.070524] ffff880058bedb68 0000000000000286 0000000000015bc0 >> 0000000000015bc0 >> [ 840.070528] ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0 >> ffff880002d396f0 >> [ 840.070533] 0000000000015bc0 ffff880058bedfd8 0000000000015bc0 >> ffff880002d39ab0 >> [ 840.070538] Call Trace: >> [ 840.070541] [] ? sync_buffer+0x0/0x50 >> [ 840.070544] [] io_schedule+0x47/0x70 >> [ 840.070547] [] sync_buffer+0x45/0x50 >> [ 840.070550] [] __wait_on_bit_lock+0x5a/0xc0 >> [ 840.070553] [] ? sync_buffer+0x0/0x50 >> [ 840.070556] [] out_of_line_wait_on_bit_lock+0x78/0x90 >> [ 840.070560] [] ? wake_bit_function+0x0/0x40 >> [ 840.070563] [] ? try_to_free_buffers+0x79/0xc0 >> [ 840.070566] [] __lock_buffer+0x36/0x40 >> [ 840.070569] [] journal_invalidatepage+0xc6/0xd0 >> [ 840.070572] [] ext3_invalidatepage+0x3d/0x40 >> [ 840.070575] [] do_invalidatepage+0x25/0x30 >> [ 840.070578] [] truncate_inode_page+0x9a/0xb0 >> [ 840.070581] [] truncate_inode_pages_range+0x160/0x440 >> [ 840.070584] [] ? ext3_delete_inode+0x0/0xf0 >> [ 840.070587] [] truncate_inode_pages+0x15/0x20 >> [ 840.070590] [] ext3_delete_inode+0x1d/0xf0 >> [ 840.070593] [] ? ext3_delete_inode+0x0/0xf0 >> [ 840.070597] [] generic_delete_inode+0xe3/0x1c0 >> [ 840.070600] [] generic_drop_inode+0x65/0x80 >> [ 840.070603] [] iput+0x62/0x70 >> [ 840.070606] [] do_unlinkat+0x112/0x1d0 >> [ 840.070610] [] ? sys_getdents+0xb5/0xf0 >> [ 840.070613] [] sys_unlinkat+0x22/0x40 >> [ 840.070616] [] system_call_fastpath+0x16/0x1b >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/