From: Michael Tokarev <mjt@tls.msk.ru>
To: KVM list <kvm@vger.kernel.org>
Subject: sporadic virtio_blk errors and "vcpu not ready for apic_round_robin"
Date: Fri, 06 Feb 2009 11:00:12 +0300 [thread overview]
Message-ID: <498BEE0C.600@msgid.tls.msk.ru> (raw)
Hello
Since quite some time, I'm seeing sporadic I/O errors in guests
running ontop of virtio_blk devices. The information I have is
quite bare: guest usually shows something like:
Feb 6 02:47:34 hobbit kernel: end_request: I/O error, dev vda, sector 9786968
Feb 6 02:47:34 hobbit kernel: Buffer I/O error on device vda7, logical block 473367
Feb 6 02:47:34 hobbit kernel: lost page write due to I/O error on vda7
Feb 6 02:47:34 hobbit kernel: Aborting journal on device vda7.
Feb 6 02:47:35 hobbit kernel: ext3_abort called.
Feb 6 02:47:35 hobbit kernel: EXT3-fs error (device vda7): ext3_journal_start_sb: Detected aborted journal
Feb 6 02:47:35 hobbit kernel: Remounting filesystem read-only
After this point, the system is still live but the corresponding
block device stops working. I can umount the device, but any
attempt to remount it tells the device is *busy*, and using,
say, cfdisk on it (just starting, attempting to READ the partition
table) results in a kernel OOPS after about a 2 mins of inactivity.
At which time host displays a series of
vcpu not ready for apic_round_robin
messages (about 20 of them).
I'm trying to capture the OOPS right now. But obviously the problem
is elsewhere, since that OOPS is far after the original issue (the
I/O errors).
It happens sporadically, sometimes the guest is running for a week,
sometimes (as here) it crashed after several hours of uptime. It
does not relate to system activity either, as far as I can see --
happens on either high or slightly-loaded system, and may happen
on mostly idle guest system while another high-loaded guest is
running at the same time.
The host is running 2.6.27.10 x86-64 on a AMD Phenom 9750 processor,
AMD 780G/SB700 chipset. Using stock kvm modules. Userspace is
32bits kvm-83. Guests are linux systems running 2.6.27.10 or .14,
32bits, uniprocessor.
After seeing this link -- https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/246175 ,
I disabled cpufreq on host. Bit it didn't help.
The issue persists since about a month or two (difficult to say as the
problem is very sporadic). I *think* kvm-72 (for example) exposed the
same problem on this host/guest combination, but I'm not sure.
Any pointers on how to debug the prob, or, even better, if it's a known
issue, is very welcome -- this is a production system and it becomes
quite.. unstable.
Thanks!
/mjt
reply other threads:[~2009-02-06 8:00 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=498BEE0C.600@msgid.tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox