All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: KVM list <kvm@vger.kernel.org>
Subject: sporadic virtio_blk errors and "vcpu not ready for apic_round_robin"
Date: Fri, 06 Feb 2009 11:00:12 +0300	[thread overview]
Message-ID: <498BEE0C.600@msgid.tls.msk.ru> (raw)

Hello

Since quite some time, I'm seeing sporadic I/O errors in guests
running ontop of virtio_blk devices.  The information I have is
quite bare: guest usually shows something like:

Feb  6 02:47:34 hobbit kernel: end_request: I/O error, dev vda, sector 9786968
Feb  6 02:47:34 hobbit kernel: Buffer I/O error on device vda7, logical block 473367
Feb  6 02:47:34 hobbit kernel: lost page write due to I/O error on vda7
Feb  6 02:47:34 hobbit kernel: Aborting journal on device vda7.
Feb  6 02:47:35 hobbit kernel: ext3_abort called.
Feb  6 02:47:35 hobbit kernel: EXT3-fs error (device vda7): ext3_journal_start_sb: Detected aborted journal
Feb  6 02:47:35 hobbit kernel: Remounting filesystem read-only

After this point, the system is still live but the corresponding
block device stops working.  I can umount the device, but any
attempt to remount it tells the device is *busy*, and using,
say, cfdisk on it (just starting, attempting to READ the partition
table) results in a kernel OOPS after about a 2 mins of inactivity.
At which time host displays a series of

  vcpu not ready for apic_round_robin

messages (about 20 of them).

I'm trying to capture the OOPS right now.  But obviously the problem
is elsewhere, since that OOPS is far after the original issue (the
I/O errors).

It happens sporadically, sometimes the guest is running for a week,
sometimes (as here) it crashed after several hours of uptime.  It
does not relate to system activity either, as far as I can see --
happens on either high or slightly-loaded system, and may happen
on mostly idle guest system while another high-loaded guest is
running at the same time.

The host is running 2.6.27.10 x86-64 on a AMD Phenom 9750 processor,
AMD 780G/SB700 chipset.  Using stock kvm modules.  Userspace is
32bits kvm-83.  Guests are linux systems running 2.6.27.10 or .14,
32bits, uniprocessor.

After seeing this link -- https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/246175 ,
I disabled cpufreq on host.  Bit it didn't help.

The issue persists since about a month or two (difficult to say as the
problem is very sporadic).  I *think* kvm-72 (for example) exposed the
same problem on this host/guest combination, but I'm not sure.

Any pointers on how to debug the prob, or, even better, if it's a known
issue, is very welcome -- this is a production system and it becomes
quite.. unstable.

Thanks!

/mjt

                 reply	other threads:[~2009-02-06  8:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=498BEE0C.600@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.