sporadic virtio_blk errors and "vcpu not ready for apic_round_robin"

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Tokarev <mjt@tls.msk.ru>
To: KVM list <kvm@vger.kernel.org>
Subject: sporadic virtio_blk errors and "vcpu not ready for apic_round_robin"
Date: Fri, 06 Feb 2009 11:00:12 +0300	[thread overview]
Message-ID: <498BEE0C.600@msgid.tls.msk.ru> (raw)

Hello

Since quite some time, I'm seeing sporadic I/O errors in guests
running ontop of virtio_blk devices.  The information I have is
quite bare: guest usually shows something like:

Feb  6 02:47:34 hobbit kernel: end_request: I/O error, dev vda, sector 9786968
Feb  6 02:47:34 hobbit kernel: Buffer I/O error on device vda7, logical block 473367
Feb  6 02:47:34 hobbit kernel: lost page write due to I/O error on vda7
Feb  6 02:47:34 hobbit kernel: Aborting journal on device vda7.
Feb  6 02:47:35 hobbit kernel: ext3_abort called.
Feb  6 02:47:35 hobbit kernel: EXT3-fs error (device vda7): ext3_journal_start_sb: Detected aborted journal
Feb  6 02:47:35 hobbit kernel: Remounting filesystem read-only

After this point, the system is still live but the corresponding
block device stops working.  I can umount the device, but any
attempt to remount it tells the device is *busy*, and using,
say, cfdisk on it (just starting, attempting to READ the partition
table) results in a kernel OOPS after about a 2 mins of inactivity.
At which time host displays a series of

  vcpu not ready for apic_round_robin

messages (about 20 of them).

I'm trying to capture the OOPS right now.  But obviously the problem
is elsewhere, since that OOPS is far after the original issue (the
I/O errors).

It happens sporadically, sometimes the guest is running for a week,
sometimes (as here) it crashed after several hours of uptime.  It
does not relate to system activity either, as far as I can see --
happens on either high or slightly-loaded system, and may happen
on mostly idle guest system while another high-loaded guest is
running at the same time.

The host is running 2.6.27.10 x86-64 on a AMD Phenom 9750 processor,
AMD 780G/SB700 chipset.  Using stock kvm modules.  Userspace is
32bits kvm-83.  Guests are linux systems running 2.6.27.10 or .14,
32bits, uniprocessor.

After seeing this link -- https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/246175 ,
I disabled cpufreq on host.  Bit it didn't help.

The issue persists since about a month or two (difficult to say as the
problem is very sporadic).  I *think* kvm-72 (for example) exposed the
same problem on this host/guest combination, but I'm not sure.

Any pointers on how to debug the prob, or, even better, if it's a known
issue, is very welcome -- this is a production system and it becomes
quite.. unstable.

Thanks!

/mjt

                 reply	other threads:[~2009-02-06  8:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=498BEE0C.600@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox