From: "Pasi Kärkkäinen" <pasik@iki.fi>
To: Doug Neal <dneallkml@gmail.com>
Cc: linux-kernel@vger.kernel.org, Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: I/O scheduler deadlocks on Xen virtual block devices
Date: Fri, 3 Sep 2010 15:22:53 +0300 [thread overview]
Message-ID: <20100903122253.GL2804@reaktio.net> (raw)
In-Reply-To: <AANLkTi=O6Tpiz8TRFFknEpW7MMTGBAtqCpr7zH3abO67@mail.gmail.com>
On Wed, Sep 01, 2010 at 03:17:25PM +0100, Doug Neal wrote:
> Hello all,
>
> I believe I have found a bug in the I/O scheduler which results in
> access to a block device being blocked indefinitely.
>
> The setup:
> * Xen dom0 version 3.4.2.
> * domU: Ubuntu 10.04, x86_64, with kernel 2.6.32.15+drm33.5.
> * Paravirtual disks and network interfaces.
> * Root filesystem on /dev/xvdb1
> * A scratch filesystem for the purposes of my tests on /dev/xvda
> mounted on /mnt
> * Both filesystems are ext3, formatted and mounted with defaults
> * XVBDs are backed by LVM on top of an iSCSI SAN in the dom0.
>
Hmm.. recently there was some patches to the Xen block device frontend driver
(sent to lkml). Maybe you should try if those make a difference?
Jeremy: What do you think?
-- Pasi
>
> Activities leading up to the incident:
>
> To reproduce the bug, I run the VM on a Xen host which has a moderate
> workload of other VMs. (It seems to manifest itself more readily than
> if the host is otherwise idle).
>
> I repeatedly rsync the contents of a Linux install CD to an arbitrary
> location on the scratch filesystem, e.g. /mnt/test, then rm -rf the
> lot, and rsync again. It can sometimes take a few iterations before
> the bug is triggered. Sometimes it's triggered on the rsync, sometimes
> on the rm.
>
> At some point during either the rsync or the rm, all the processes
> accessing /dev/xvda (rsync, kjournald, flush) become stuck in the D
> state. After 120 seconds the warnings start to appear in dmesg:
>
> [ 840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
> [ 840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
>
> Full dmesg output is below, including call traces which show that each
> of these processes is stuck in io_schedule or sync_buffer called from
> io_schedule.
>
>
> I believe I have eliminated:
> * Problems with the underlying physical device
> - The same bug has manifested itself on two completely separate
> sets of hardware, with different servers, switches, and SANs.
>
> * Problems with the host's iSCSI initiator
> - Other VMs depending on the same iSCSI session are unaffected
> - Other XVBDs within the same VM (in this case: /dev/xvdb1 mounted
> on /) are unaffected
>
> Things I've tried:
> * Noop, deadline, cfq and anticipatory elevators.
> * Reducing the VM's vCPU count to 1. I _wasn't_ able to trigger the
> bug under only one vCPU.
>
> Unfortunately the steps to reproduce are rather long winded once you
> factor in setting up the Xen dom0 which is in this case Citrix
> XenServer, so if there is any way I can help with that please let me
> know, e.g. by arranging remote access to my existing test environment
> or by supplying a VM template image. However I don't believe this
> problem is limited to this environment as I have seen quite a few
> reports of similar or identical issues elsewhere (but not on LKML...):
>
> http://lists.xensource.com/archives/html/xen-devel/2010-06/msg00590.html
> - same issue reported with a non-commercial and more recent Xen
> version than mine. Also links to three Redhat bug reports showing
> similar behaviour.
>
>
> Thanks for your time,
>
> Doug
>
>
>
> [ 720.070103] INFO: task kjournald:1011 blocked for more than 120 seconds.
> [ 720.070121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 720.070131] kjournald D ffff8800035bcbc0 0 1011 2 0x00000000
> [ 720.070140] ffff880002dc1a30 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [ 720.070149] ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574dade0
> [ 720.070156] 0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574db1a0
> [ 720.070164] Call Trace:
> [ 720.070177] [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [ 720.070184] [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
> [ 720.070191] [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [ 720.070195] [<ffffffff812a1045>] __make_request+0x95/0x4a0
> [ 720.070201] [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [ 720.070208] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 720.070213] [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
> [ 720.070218] [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [ 720.070223] [<ffffffff8129fb60>] submit_bio+0x80/0x110
> [ 720.070229] [<ffffffff8116d399>] submit_bh+0xf9/0x140
> [ 720.070235] [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
> [ 720.070240] [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
> [ 720.070245] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 720.070250] [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [ 720.070256] [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [ 720.070261] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 720.070265] [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [ 720.070270] [<ffffffff81218c1d>] kjournald+0xed/0x250
> [ 720.070275] [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [ 720.070279] [<ffffffff81218b30>] ? kjournald+0x0/0x250
> [ 720.070284] [<ffffffff81084d16>] kthread+0x96/0xa0
> [ 720.070289] [<ffffffff810141ea>] child_rip+0xa/0x20
> [ 720.070295] [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [ 720.070299] [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [ 720.070313] [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [ 720.070317] INFO: task rm:1455 blocked for more than 120 seconds.
> [ 720.070323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 720.070329] rm D ffff8800035dabc0 0 1455 980 0x00000000
> [ 720.070333] ffff880058bedb68 0000000000000286 0000000000015bc0
> 0000000000015bc0
> [ 720.070338] ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
> ffff880002d396f0
> [ 720.070343] 0000000000015bc0 ffff880058bedfd8 0000000000015bc0
> ffff880002d39ab0
> [ 720.070348] Call Trace:
> [ 720.070351] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 720.070354] [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [ 720.070357] [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [ 720.070360] [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [ 720.070363] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 720.070366] [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [ 720.070369] [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [ 720.070373] [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
> [ 720.070376] [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [ 720.070379] [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
> [ 720.070383] [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
> [ 720.070388] [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
> [ 720.070390] [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
> [ 720.070393] [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
> [ 720.070396] [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [ 720.070399] [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
> [ 720.070402] [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
> [ 720.070405] [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [ 720.070409] [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
> [ 720.070412] [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
> [ 720.070415] [<ffffffff8115b592>] iput+0x62/0x70
> [ 720.070419] [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
> [ 720.070422] [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
> [ 720.070425] [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
> [ 720.070428] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
> [ 840.070097] INFO: task flush-202:0:208 blocked for more than 120 seconds.
> [ 840.070115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 840.070125] flush-202:0 D ffff88000359ebc0 0 208 2 0x00000000
> [ 840.070132] ffff880002d698c0 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [ 840.070142] ffff88007d444890 ffff880002d69fd8 0000000000015bc0
> ffff88007d4444d0
> [ 840.070149] 0000000000015bc0 ffff880002d69fd8 0000000000015bc0
> ffff88007d444890
> [ 840.070156] Call Trace:
> [ 840.070170] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 840.070177] [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [ 840.070182] [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [ 840.070187] [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [ 840.070191] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 840.070196] [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
> [ 840.070201] [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [ 840.070208] [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [ 840.070212] [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [ 840.070217] [<ffffffff8116f473>] __block_write_full_page+0x383/0x3c0
> [ 840.070222] [<ffffffff8116eb60>] ? end_buffer_async_write+0x0/0x180
> [ 840.070227] [<ffffffff8116fde0>] block_write_full_page_endio+0xe0/0x120
> [ 840.070232] [<ffffffff811b70f0>] ? buffer_unmapped+0x0/0x20
> [ 840.070237] [<ffffffff8116fe35>] block_write_full_page+0x15/0x20
> [ 840.070241] [<ffffffff811b7bfd>] ext3_ordered_writepage+0x1dd/0x200
> [ 840.070248] [<ffffffff810fc687>] __writepage+0x17/0x40
> [ 840.070254] [<ffffffff810fd83f>] write_cache_pages+0x21f/0x4d0
> [ 840.070258] [<ffffffff810fc670>] ? __writepage+0x0/0x40
> [ 840.070263] [<ffffffff810fdb14>] generic_writepages+0x24/0x30
> [ 840.070268] [<ffffffff810fdb55>] do_writepages+0x35/0x40
> [ 840.070273] [<ffffffff81166976>] writeback_single_inode+0xf6/0x3d0
> [ 840.070279] [<ffffffff811675de>] writeback_inodes_wb+0x40e/0x5e0
> [ 840.070283] [<ffffffff811678ba>] wb_writeback+0x10a/0x1d0
> [ 840.070289] [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [ 840.070294] [<ffffffff81558e6d>] ? schedule_timeout+0x19d/0x300
> [ 840.070298] [<ffffffff81167bec>] wb_do_writeback+0x18c/0x1a0
> [ 840.070303] [<ffffffff81167c53>] bdi_writeback_task+0x53/0xe0
> [ 840.070308] [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
> [ 840.070312] [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
> [ 840.070317] [<ffffffff81084d16>] kthread+0x96/0xa0
> [ 840.070322] [<ffffffff810141ea>] child_rip+0xa/0x20
> [ 840.070329] [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [ 840.070333] [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [ 840.070337] [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [ 840.070346] INFO: task kjournald:1011 blocked for more than 120 seconds.
> [ 840.070354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 840.070363] kjournald D ffff8800035bcbc0 0 1011 2 0x00000000
> [ 840.070378] ffff880002dc1a30 0000000000000246 0000000000015bc0
> 0000000000015bc0
> [ 840.070385] ffff8800574db1a0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574dade0
> [ 840.070392] 0000000000015bc0 ffff880002dc1fd8 0000000000015bc0
> ffff8800574db1a0
> [ 840.070399] Call Trace:
> [ 840.070404] [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [ 840.070409] [<ffffffff812a095e>] get_request_wait+0xce/0x1a0
> [ 840.070414] [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [ 840.070418] [<ffffffff812a1045>] __make_request+0x95/0x4a0
> [ 840.070423] [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [ 840.070428] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 840.070434] [<ffffffff8129f7a1>] generic_make_request+0x1b1/0x4f0
> [ 840.070438] [<ffffffff810f6595>] ? mempool_alloc_slab+0x15/0x20
> [ 840.070443] [<ffffffff8129fb60>] submit_bio+0x80/0x110
> [ 840.070448] [<ffffffff8116d399>] submit_bh+0xf9/0x140
> [ 840.070463] [<ffffffff8121495c>] journal_submit_data_buffers+0x24c/0x320
> [ 840.070466] [<ffffffff81214d02>] journal_commit_transaction+0x2d2/0xe40
> [ 840.070470] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 840.070473] [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [ 840.070476] [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [ 840.070479] [<ffffffff8100f37f>] ? xen_restore_fl_direct_end+0x0/0x1
> [ 840.070482] [<ffffffff8155ab5e>] ? _spin_unlock_irqrestore+0x1e/0x30
> [ 840.070485] [<ffffffff81218c1d>] kjournald+0xed/0x250
> [ 840.070488] [<ffffffff81085090>] ? autoremove_wake_function+0x0/0x40
> [ 840.070491] [<ffffffff81218b30>] ? kjournald+0x0/0x250
> [ 840.070494] [<ffffffff81084d16>] kthread+0x96/0xa0
> [ 840.070497] [<ffffffff810141ea>] child_rip+0xa/0x20
> [ 840.070500] [<ffffffff810133d1>] ? int_ret_from_sys_call+0x7/0x1b
> [ 840.070503] [<ffffffff81013b5d>] ? retint_restore_args+0x5/0x6
> [ 840.070506] [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [ 840.070508] INFO: task rm:1455 blocked for more than 120 seconds.
> [ 840.070514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 840.070519] rm D ffff8800035dabc0 0 1455 980 0x00000000
> [ 840.070524] ffff880058bedb68 0000000000000286 0000000000015bc0
> 0000000000015bc0
> [ 840.070528] ffff880002d39ab0 ffff880058bedfd8 0000000000015bc0
> ffff880002d396f0
> [ 840.070533] 0000000000015bc0 ffff880058bedfd8 0000000000015bc0
> ffff880002d39ab0
> [ 840.070538] Call Trace:
> [ 840.070541] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 840.070544] [<ffffffff81558bb7>] io_schedule+0x47/0x70
> [ 840.070547] [<ffffffff8116e615>] sync_buffer+0x45/0x50
> [ 840.070550] [<ffffffff8155908a>] __wait_on_bit_lock+0x5a/0xc0
> [ 840.070553] [<ffffffff8116e5d0>] ? sync_buffer+0x0/0x50
> [ 840.070556] [<ffffffff81559168>] out_of_line_wait_on_bit_lock+0x78/0x90
> [ 840.070560] [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [ 840.070563] [<ffffffff8116d259>] ? try_to_free_buffers+0x79/0xc0
> [ 840.070566] [<ffffffff8116e796>] __lock_buffer+0x36/0x40
> [ 840.070569] [<ffffffff812136d6>] journal_invalidatepage+0xc6/0xd0
> [ 840.070572] [<ffffffff811b7e5d>] ext3_invalidatepage+0x3d/0x40
> [ 840.070575] [<ffffffff810ff7a5>] do_invalidatepage+0x25/0x30
> [ 840.070578] [<ffffffff810ffa8a>] truncate_inode_page+0x9a/0xb0
> [ 840.070581] [<ffffffff810ffd80>] truncate_inode_pages_range+0x160/0x440
> [ 840.070584] [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [ 840.070587] [<ffffffff81100075>] truncate_inode_pages+0x15/0x20
> [ 840.070590] [<ffffffff811bb96d>] ext3_delete_inode+0x1d/0xf0
> [ 840.070593] [<ffffffff811bb950>] ? ext3_delete_inode+0x0/0xf0
> [ 840.070597] [<ffffffff8115cac3>] generic_delete_inode+0xe3/0x1c0
> [ 840.070600] [<ffffffff8115cc05>] generic_drop_inode+0x65/0x80
> [ 840.070603] [<ffffffff8115b592>] iput+0x62/0x70
> [ 840.070606] [<ffffffff81151f82>] do_unlinkat+0x112/0x1d0
> [ 840.070610] [<ffffffff811555e5>] ? sys_getdents+0xb5/0xf0
> [ 840.070613] [<ffffffff811521b2>] sys_unlinkat+0x22/0x40
> [ 840.070616] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2010-09-03 12:22 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-01 14:17 I/O scheduler deadlocks on Xen virtual block devices Doug Neal
2010-09-01 17:33 ` Jeff Moyer
2010-09-02 8:34 ` Doug Neal
2010-09-02 13:47 ` Jeff Moyer
2010-09-02 14:06 ` Doug Neal
2010-09-02 15:25 ` Jeff Moyer
2010-09-02 15:43 ` Doug Neal
2010-09-03 12:22 ` Pasi Kärkkäinen [this message]
2010-09-03 15:17 ` Jeremy Fitzhardinge
2010-09-06 12:44 ` Doug Neal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100903122253.GL2804@reaktio.net \
--to=pasik@iki.fi \
--cc=dneallkml@gmail.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.