From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Alex Lyakas" Subject: pci_enable_msix() fails with ENOMEM/EINVAL Date: Mon, 19 Nov 2012 17:18:19 +0200 Message-ID: <190FFE9600054603A543CCDA4408AC89@alyakaslap> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit To: Return-path: Received: from mail-la0-f46.google.com ([209.85.215.46]:48559 "EHLO mail-la0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752949Ab2KSPSW (ORCPT ); Mon, 19 Nov 2012 10:18:22 -0500 Received: by mail-la0-f46.google.com with SMTP id p5so1165781lag.19 for ; Mon, 19 Nov 2012 07:18:20 -0800 (PST) Sender: kvm-owner@vger.kernel.org List-ID: Greetings all, I am running Ubuntu-Precise 3.2.0-29-generic #46, with stock KVM ("QEMU emulator version 1.0 (qemu-kvm-1.0)") on a Dell R510 server. I have one dual-port Intel's NIC 82599, of which I spawn 32 VFs from each port. I spawn virtual machines with KVM, each VM has 4 VFs attached (two from each PF). Once in a while, in particular when I spawn multiple VMs in parallel, I hit an issue that one of the VFs does not have an IRQ assigned to it. I am checking this in /proc/interrupts, looking for entries like "kvm:0000:03:14.6". In some cases, an entry is missing for a particular VF. As a result, the VF within the VM is non-functional. I debugged this issue further, by adding prints to kvm.ko code. I see that the failure happens in kvm_vm_ioctl_assigned_device/KVM_ASSIGN_DEV_IRQ path, which calls assigned_device_enable_host_msix() function, which calls pci_enable_msix(), which fails with EINVAL or with ENOMEM. This path is called twice for each VF. For the ENOMEM failure, I see that first pci_enable_msix() returns -12, and when kvm_vm_ioctl_set_msix_nr() is called again, it sees that adev->entries_nr != 0 and fails the call with EINVAL. I can repro it only when spawning like 8 or 10 VMs in parallel, but it doesn't happen every time. So it seems like this is not a resource shortage problem, but some race somewhere. I tested this with several version of ixgbe drivers, including the in-tree version that comes with Precise. It reproduces with all the versions. Can anybody advise on how to proceed debugging this issue? Thanks, Alex.