* Disk I/O stuck with KVM - no clue how to solve that
@ 2010-11-05 17:16 Hermann Himmelbauer
2010-11-06 19:58 ` Stefan Hajnoczi
0 siblings, 1 reply; 5+ messages in thread
From: Hermann Himmelbauer @ 2010-11-05 17:16 UTC (permalink / raw)
To: kvm
Hi,
I already tried to get some help on the KVM list for my problem but had no
success, so the problem could be not KVM related at all, therefore maybe
someone here has an idea:
I experience strange disk I/O stucks on my Linux Host + Guest with KVM, which
make the system (especially the guests) almost unusable. These stucks come
periodically, e.g. every 2 to 10 seconds and last between 3 and sometimes
over 120 seconds, which trigger kernel messages like this (on host and/or
guest):
INFO: task postgres:2195 blocked for more than 120 seconds
If the stucks are shorter, no error messages can be seen in any log file
(neither on host, nor on guest).
On the other hand sometimes the system may remain responsive for e.g. half an
hour, then the stucks come back.
I have the following configuration:
Host:
Debian Lenny, Kernel 2.6.32-bpo and/or 2.6.36, qemu-kvm 0.12.5
The host has 6 SATA-disks, whereas
Devices: md0/1/2, sda/sdc = WD Raptor
Devices md3: sdb/sdd WD Caviar Green
Devices md4: sde/sdf WD Caviar Green
On top of the md-devices I have LVM volumes.
The mainboard is an Asus Z8NR-D12 with 2 Xeon L5520 processors and 16 GB RAM.
The chipset is a i5500/ICH10R.
Currently I have the following 2 guests:
1) "vmUranos": Debian Lenny, Kernel 2.6.32-bpo with virtio-block, on a LVM
partition in /dev/md2
2) "galemo": Debian Lenny, Kernel 2.6.32-bpo with virtio-block, on a qemu-file
on LVM partition on /dev/md3
The KVM parameters are attached on the end of this mail in case this is
important.
I did extensive disk-read I/O testing on the host without any guests started,
e.g. on the devices itself (sda-sdf in parallel) and on the md-devices, then
also on the LVM volumes, parallel, several combinations. The reads are all
very fast and stable, no stucks, no problems, which leads me to the
conclusion that the hardware is o.k.
Next in my test I start a KVM guest while performing read tests on all devices
(sda-sdf). As soon as a KVM is started, the stucks begin to appear. So, if I
start the virtual machine "galemo", which reads from /dev/md3, the read tests
on sdb and sdd begin to have stucks, if I start "vmUranos", stucks happen on
sda/sdc.
These stucks can be seen both on the host and in the guest, whereas they seem
more severe in the guest.
If I shutdown/destroy the guests while performing read tests the stucks on the
host persist, although the KVM process is gone, which leads me to the
conclusion that the problem may be kernel related.
If I stop all read tests and wait for some time, I can restart the read tests
and the stucks are gone, so the system seems to have recovered.
My impression is that KVM (and/or virtio-block) seems to affect the I/O
subsystem in some way, so that it gets mixed up in some way, e.g. some
scheduler does not know how to distribute I/O reads, or something like that.
I have absolutely no clue what to do to solve the problem, my last idea would
be to change the mainboard, as my current one has the i5500 chipset instead
of the more common i5000 server chipset, however, this is costly and there's
no guarantee that the problem is solved then.
What's your opinion on this?
Any help is appreciated!
Best Regards,
Hermann
P.S.: Here are the KVM parameters, in case they are relevant:
/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp
2,sockets=2,cores=1,threads=1 -name vmUranos -uuid
8e5139ce-c561-c52f-35e1-07db9bc5045b -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/vmUranos.monitor,server,nowait -mon
chardev=monitor,mode=readline -rtc base=utc -boot c -drive
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
file=/dev/capella_raptor/UranosBase,if=none,id=drive-virtio-disk0,boot=on,cache=none -device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -device
virtio-net-pci,vlan=0,id=net0,mac=54:52:00:03:f4:ca,bus=pci.0,addr=0x5 -net
tap,fd=17,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:0 -k de -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
/usr/bin/kvm -S -M pc -enable-kvm -m 1024 -smp
1,sockets=1,cores=1,threads=1 -name galemo -uuid
171b4536-84ea-041d-d318-16b8fb20f855 -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/galemo.monitor,server,nowait -mon
chardev=monitor,mode=readline -rtc base=utc -boot c -drive
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive
file=/dev/capella_data1/galemo,if=none,id=drive-virtio-disk0,boot=on -device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -device
virtio-net-pci,vlan=0,id=net0,mac=54:52:00:45:9c:d9,bus=pci.0,addr=0x5 -net
tap,fd=18,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:1 -k de -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
--
hermann@qwer.tk
GPG key ID: 299893C7 (on keyservers)
FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Disk I/O stuck with KVM - no clue how to solve that
2010-11-05 17:16 Disk I/O stuck with KVM - no clue how to solve that Hermann Himmelbauer
@ 2010-11-06 19:58 ` Stefan Hajnoczi
2010-11-07 16:07 ` Hermann Himmelbauer
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2010-11-06 19:58 UTC (permalink / raw)
To: Hermann Himmelbauer; +Cc: kvm
On Fri, Nov 5, 2010 at 5:16 PM, Hermann Himmelbauer <dusty@qwer.tk> wrote:
> I experience strange disk I/O stucks on my Linux Host + Guest with KVM, which
> make the system (especially the guests) almost unusable. These stucks come
> periodically, e.g. every 2 to 10 seconds and last between 3 and sometimes
> over 120 seconds, which trigger kernel messages like this (on host and/or
> guest):
>
> INFO: task postgres:2195 blocked for more than 120 seconds
The fact that this happens on the host too suggests there's an issue
with the host software/hardware and the VM is triggering it but not
the root cause.
Does dmesg display any other suspicious messages?
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Disk I/O stuck with KVM - no clue how to solve that
2010-11-06 19:58 ` Stefan Hajnoczi
@ 2010-11-07 16:07 ` Hermann Himmelbauer
2010-11-08 6:32 ` Stefan Hajnoczi
0 siblings, 1 reply; 5+ messages in thread
From: Hermann Himmelbauer @ 2010-11-07 16:07 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
Am Samstag 06 November 2010 20:58:12 schrieb Stefan Hajnoczi:
> On Fri, Nov 5, 2010 at 5:16 PM, Hermann Himmelbauer <dusty@qwer.tk> wrote:
> > I experience strange disk I/O stucks on my Linux Host + Guest with KVM,
> > which make the system (especially the guests) almost unusable. These
> > stucks come periodically, e.g. every 2 to 10 seconds and last between 3
> > and sometimes over 120 seconds, which trigger kernel messages like this
> > (on host and/or guest):
> >
> > INFO: task postgres:2195 blocked for more than 120 seconds
>
> The fact that this happens on the host too suggests there's an issue
> with the host software/hardware and the VM is triggering it but not
> the root cause.
>
> Does dmesg display any other suspicious messages?
No, there's anything that can be seen via dmesg. I at first suspected the
hardware, too. I can think of the following reasons:
1) Broken SATA cable / Harddisks - I changed some cables, no change, thus this
is probably ruled out. I also can't see anything via S.M.A.R.T. Moreover, the
problem is not bound to a specific device, instead it happens on sda - sdd,
so I doubt it's harddisk related.
2) Broken Power Supply / Insufficient Power - I'd expect either a complete
crash or some error messages in this case, so I'd rather rule that out.
3) Broken SATA-Controller - I cannot think of any way to check that, but I'd
also expect some crashes or kernel messages. I flashed the board to the
latest BIOS version, no change either.
However, it seems no one except me seems to have this problem, so I'll buy a
new, similar but different mainboard (Intel instead of Asus), hopefully this
solves the problem.
What do you think, any better idea?
Anyway, thanks for your reply!
Best Regards,
Hermann
--
hermann@qwer.tk
GPG key ID: 299893C7 (on keyservers)
FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Disk I/O stuck with KVM - no clue how to solve that
2010-11-07 16:07 ` Hermann Himmelbauer
@ 2010-11-08 6:32 ` Stefan Hajnoczi
2010-11-08 14:05 ` Hermann Himmelbauer
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2010-11-08 6:32 UTC (permalink / raw)
To: Hermann Himmelbauer; +Cc: kvm
On Sun, Nov 7, 2010 at 4:07 PM, Hermann Himmelbauer <dusty@qwer.tk> wrote:
> Am Samstag 06 November 2010 20:58:12 schrieb Stefan Hajnoczi:
>> On Fri, Nov 5, 2010 at 5:16 PM, Hermann Himmelbauer <dusty@qwer.tk> wrote:
>> > I experience strange disk I/O stucks on my Linux Host + Guest with KVM,
>> > which make the system (especially the guests) almost unusable. These
>> > stucks come periodically, e.g. every 2 to 10 seconds and last between 3
>> > and sometimes over 120 seconds, which trigger kernel messages like this
>> > (on host and/or guest):
>> >
>> > INFO: task postgres:2195 blocked for more than 120 seconds
>>
>> The fact that this happens on the host too suggests there's an issue
>> with the host software/hardware and the VM is triggering it but not
>> the root cause.
>>
>> Does dmesg display any other suspicious messages?
>
> No, there's anything that can be seen via dmesg. I at first suspected the
> hardware, too. I can think of the following reasons:
>
> 1) Broken SATA cable / Harddisks - I changed some cables, no change, thus this
> is probably ruled out. I also can't see anything via S.M.A.R.T. Moreover, the
> problem is not bound to a specific device, instead it happens on sda - sdd,
> so I doubt it's harddisk related.
>
> 2) Broken Power Supply / Insufficient Power - I'd expect either a complete
> crash or some error messages in this case, so I'd rather rule that out.
>
> 3) Broken SATA-Controller - I cannot think of any way to check that, but I'd
> also expect some crashes or kernel messages. I flashed the board to the
> latest BIOS version, no change either.
>
> However, it seems no one except me seems to have this problem, so I'll buy a
> new, similar but different mainboard (Intel instead of Asus), hopefully this
> solves the problem.
>
> What do you think, any better idea?
If you have the time, you can use perf probes to trace I/O requests in
the host kernel. Perhaps completion interrupts are being dropped.
You may wish to start by tracing requests issued and completed by the
SATA driver.
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Disk I/O stuck with KVM - no clue how to solve that
2010-11-08 6:32 ` Stefan Hajnoczi
@ 2010-11-08 14:05 ` Hermann Himmelbauer
0 siblings, 0 replies; 5+ messages in thread
From: Hermann Himmelbauer @ 2010-11-08 14:05 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
Am Montag 08 November 2010 07:32:39 schrieb Stefan Hajnoczi:
> If you have the time, you can use perf probes to trace I/O requests in
> the host kernel. Perhaps completion interrupts are being dropped.
> You may wish to start by tracing requests issued and completed by the
> SATA driver.
Fortunately, some poster from the lkml suggested to set the SATA controller in
the BIOS from IDE to AHCI, which I did and *maybe* it solved the problem.
There are still some short blocks, but the overall read speed is stable (can
be seen via iostats/iotop), so only the I/O read distribution between the
host and KVM has some hiccups, but that seems to be tolerable for now.
But thanks for your idea, if the problem comes back, perf probes may come
handy!
Best Regards,
Hermann
--
hermann@qwer.tk
GPG key ID: 299893C7 (on keyservers)
FP: 0124 2584 8809 EF2A DBF9 4902 64B4 D16B 2998 93C7
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-08 14:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-05 17:16 Disk I/O stuck with KVM - no clue how to solve that Hermann Himmelbauer
2010-11-06 19:58 ` Stefan Hajnoczi
2010-11-07 16:07 ` Hermann Himmelbauer
2010-11-08 6:32 ` Stefan Hajnoczi
2010-11-08 14:05 ` Hermann Himmelbauer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox