On 13/03/12 19:23, Stefan Hajnoczi wrote: > On Tue, Mar 13, 2012 at 10:42 AM, Amos Kong wrote: >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(), >> and check if available ioeventfd exists. If not, virtio-pci will >> fallback to userspace, and don't use ioeventfd for io notification. Hi Stefan, > Please explain how it fails with 232 devices. Where does it abort and why? (gdb) bt #0 0x00007ffff48c8885 in raise () from /lib64/libc.so.6 #1 0x00007ffff48ca065 in abort () from /lib64/libc.so.6 #2 0x00007ffff7e89a3d in kvm_io_ioeventfd_add (section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778 #3 0x00007ffff7e89b3f in kvm_eventfd_add (listener=0x7ffff82ebe80, section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:802 #4 0x00007ffff7e9bcf7 in address_space_add_del_ioeventfds (as=0x7ffff8b278a0, fds_new=0x7fffb80106f0, fds_new_nb=201, fds_old=0x7fffb800db20, fds_old_nb=200) at /home/devel/qemu/memory.c:612 #5 0x00007ffff7e9c04f in address_space_update_ioeventfds (as=0x7ffff8b278a0) at /home/devel/qemu/memory.c:645 #6 0x00007ffff7e9caa0 in address_space_update_topology (as=0x7ffff8b278a0) at /home/devel/qemu/memory.c:726 #7 0x00007ffff7e9cb95 in memory_region_update_topology (mr=0x7fffdeb179b0) at /home/devel/qemu/memory.c:746 #8 0x00007ffff7e9e802 in memory_region_add_eventfd (mr=0x7fffdeb179b0, addr=16, size=2, match_data=true, data=0, fd=461) at /home/devel/qemu/memory.c:1220 #9 0x00007ffff7d9e832 in virtio_pci_set_host_notifier_internal (proxy=0x7fffdeb175a0, n=0, assign=true) at /home/devel/qemu/hw/virtio-pci.c:175 #10 0x00007ffff7d9ea5f in virtio_pci_start_ioeventfd (proxy=0x7fffdeb175a0) at /home/devel/qemu/hw/virtio-pci.c:230 #11 0x00007ffff7d9ee51 in virtio_ioport_write (opaque=0x7fffdeb175a0, addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:325 #12 0x00007ffff7d9f37b in virtio_pci_config_writeb (opaque=0x7fffdeb175a0, addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:457 #13 0x00007ffff7e9ac23 in memory_region_iorange_write (iorange=0x7fffb8005cc0, offset=18, width=1, data=7) at /home/devel/qemu/memory.c:427 #14 0x00007ffff7e857e2 in ioport_writeb_thunk (opaque=0x7fffb8005cc0, addr=61970, data=7) at /home/devel/qemu/ioport.c:212 #15 0x00007ffff7e85197 in ioport_write (index=0, address=61970, data=7) at /home/devel/qemu/ioport.c:83 #16 0x00007ffff7e85d9a in cpu_outb (addr=61970, val=7 '\a') at /home/devel/qemu/ioport.c:289 #17 0x00007ffff7e8a70a in kvm_handle_io (port=61970, data=0x7ffff7c11000, direction=1, size=1, count=1) at /home/devel/qemu/kvm-all.c:1123 #18 0x00007ffff7e8ad0a in kvm_cpu_exec (env=0x7fffc1688010) at /home/devel/qemu/kvm-all.c:1271 #19 0x00007ffff7e595fc in qemu_kvm_cpu_thread_fn (arg=0x7fffc1688010) at /home/devel/qemu/cpus.c:733 #20 0x00007ffff63687f1 in start_thread () from /lib64/libpthread.so.0 #21 0x00007ffff497b92d in clone () from /lib64/libc.so.6 (gdb) frame 2 #2 0x00007ffff7e89a3d in kvm_io_ioeventfd_add (section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778 778 abort(); (gdb) l 773 assert(match_data && section->size == 2); 774 775 r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space, 776 data, true); 777 if (r < 0) { 778 abort(); 779 } 780 } 781 782 static void kvm_io_ioeventfd_del(MemoryRegionSection *section, (gdb) p r $1 = -28 -28 -> -ENOSPC In the older kernel, Kmod has a limitation of iobus device (200) in the past, It's increased to 300 by commit 2b3c246a [PATCH] KVM: Make coalesced mmio use a device per zone include/linux/kvm_host.h: struct kvm_io_bus { ... #define NR_IOBUS_DEVS 300 struct kvm_io_range range[NR_IOBUS_DEVS]; I touched this problem with the kernel which has 200 iobus devs. (attached qemu cmdline) One virtio-blk dev will allocate one ioeventfd for io notification. When I start guest with 232 multiple-function disks, qemu will abort for the ENOSPC error. > hw/virtio-pci.c:virtio_pci_start_ioeventfd() fails "gracefully" when > virtio_pci_set_host_notifier_internal()'s event_notifier_init() call > fails. (This might be because we've hit our file descriptor rlimit.) > > Perhaps the problem is that we've exceeded the kvm.ko io device limit? yes. Actually I had increased the limitation to 1000 in kvm upstream http://git.kernel.org/?p=virt/kvm/kvm.git;a=commitdiff;h=29f3ec59a0d175d1b2976131feb7553ec4baa678 But if we use pci-bridge, this limitation will also be breached. so we need to process ENOSPC condition, fallback to userspace is better than abort. > I guess that is now handled by the new memory region API and we need > to handle failure gracefully there too. > Either way, I don't think that using kvm_has_many_ioeventfds() is the > right answer. -- Amos.