All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: kvm@vger.kernel.org
Subject: Re: writes to a virtio block device hungs
Date: Tue, 23 Sep 2008 11:06:11 +0400	[thread overview]
Message-ID: <48D89563.5050502@msgid.tls.msk.ru> (raw)
In-Reply-To: <48D76286.5090203@msgid.tls.msk.ru>

[Replying to my own email...]

Michael Tokarev wrote:
> Hello!  It's my first email to this list.. ;)
> 
> After experimenting for some time with KVM on linux
> (both host and guests are linux machines), I placed
> one virtual machine into production use, and almost
> immediately come... issues.  Here's how it looks like
> from the guest:
> 
> Sep 21 10:35:52 hobbit kernel: INFO: task cleanup:20535 blocked for more than 120 seconds.
> Sep 21 10:35:52 hobbit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 21 10:35:52 hobbit kernel: cleanup       D 00000000     0 20535   1570
> Sep 21 10:35:52 hobbit kernel:        f73b39c0 00200086 00000000 00000000 c3a2ba48 00000000 f7022e00 00000000
> Sep 21 10:35:52 hobbit kernel:        dbc48ed4 f789c000 c0399080 c0157e48 0000000e 00000000 d05e1b80 d05e1ce4
> Sep 21 10:35:52 hobbit kernel:        00000002 00200286 c01322f7 d05e1ce4 c0131ef0 dbc48ec8 00200286 c0132486
> Sep 21 10:35:52 hobbit kernel: Call Trace:
> Sep 21 10:35:52 hobbit kernel:  [<c0157e48>] find_get_pages_tag+0x38/0x80
> Sep 21 10:35:52 hobbit kernel:  [<c01322f7>] lock_timer_base+0x27/0x60
> Sep 21 10:35:52 hobbit kernel:  [<c0131ef0>] process_timeout+0x0/0x10
> Sep 21 10:35:52 hobbit kernel:  [<c0132486>] __mod_timer+0x86/0xa0
> Sep 21 10:35:52 hobbit kernel:  [<c02c6408>] schedule_timeout+0x58/0xb0
> Sep 21 10:35:52 hobbit kernel:  [<c0131ef0>] process_timeout+0x0/0x10
> Sep 21 10:35:52 hobbit kernel:  [<f882db04>] journal_stop+0xa4/0x1b0 [jbd]
> Sep 21 10:35:52 hobbit kernel:  [<f882ece8>] journal_start+0x88/0xc0 [jbd]
> Sep 21 10:35:52 hobbit kernel:  [<f8860f20>] ext3_write_inode+0x0/0x40 [ext3]
> Sep 21 10:35:52 hobbit kernel:  [<f8860f20>] ext3_write_inode+0x0/0x40 [ext3]
> Sep 21 10:35:52 hobbit kernel:  [<c019d002>] __writeback_single_inode+0x282/0x390
> Sep 21 10:35:52 hobbit kernel:  [<c015f3c0>] generic_writepages+0x20/0x30
> Sep 21 10:35:52 hobbit kernel:  [<c015f419>] do_writepages+0x49/0x50
> Sep 21 10:35:52 hobbit kernel:  [<c0159151>] __filemap_fdatawrite_range+0x71/0x90
> Sep 21 10:35:52 hobbit kernel:  [<c019d131>] sync_inode+0x21/0x40
> Sep 21 10:35:52 hobbit kernel:  [<f885f88e>] ext3_sync_file+0x9e/0xc0 [ext3]
> Sep 21 10:35:52 hobbit kernel:  [<c01a065e>] do_fsync+0x6e/0xb0
> Sep 21 10:35:52 hobbit kernel:  [<c01a06c7>] __do_fsync+0x27/0x50
> Sep 21 10:35:52 hobbit kernel:  [<c01032f3>] sysenter_past_esp+0x78/0xb1
> Sep 21 10:35:52 hobbit kernel:  =======================
> 
> It's almost always after fsync, but I guess it's due to the fact that
> cleanup (from Postfix) process is the one who does that most often.
> 
> After first such message (after which corresponding process will sleep
> forever), no write to the corresponding device will succeed - all will
> stall the same way.  It looks like kvm just "forgets" about each and
> every write, effectively turning the device into a black hole -- but
> only writes, reads are all ok.
> 
> Obviously the system will not reboot in that state, only force-reboot
> (echo b > /proc/sysrq-trigger), or a "power-off" from the guest will
> help.
> 
> The device in question is a virtio block device (vda), which is on top
> op a raid1 device on the host (/dev/md_d5, partitioned).  The problem
> happens after some up-time, from several hours to 2 days, usually under
> heavy load.
> 
> The system is Asus M3A-H/HDMI motherboard (AMD 780G/SB700 chipset),
> with AMD Phenom 9750 CPU and 8Gb ECC memory.  Stock 2.6.26.5 kernel,
> with KVM optimizations (KVM_TIME etc) turned on in guest.  kvm-72.
> 
> I'm running it with IDE emulation right now, to see if it will change
> something or not.

With IDE (as opposed to virtio) the situation is exactly the same,
switching if=virtio to if=ide didn't change anything at all.

> The question is -- should I try with later kvm (kernel and userspace)
> first?  The thing is that it's production machine so any downtime is
> not good...

I'm waiting for opportunity to install a new kernel with new kvm...
in a hope still.


Thank you!

/mjt


  reply	other threads:[~2008-09-23  7:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-22  9:16 writes to a virtio block device hungs Michael Tokarev
2008-09-23  7:06 ` Michael Tokarev [this message]
     [not found]   ` <20080925230251.GB22929@dmt.cnet>
2008-09-26  8:28     ` Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48D89563.5050502@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.