From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wright Subject: Re: pci-stub error and MSI-X for KVM guest Date: Thu, 7 Jan 2010 16:50:03 -0800 Message-ID: <20100108005003.GA20720@sequoia.sous-sol.org> References: <0199E0D51A61344794750DC57738F58E6D718922FC@GVW1118EXC.americas.hpqcorp.net> <20091221191923.GA5979@sequoia.sous-sol.org> <0199E0D51A61344794750DC57738F58E6D71892321@GVW1118EXC.americas.hpqcorp.net> <20091221195849.GC5979@sequoia.sous-sol.org> <0199E0D51A61344794750DC57738F58E6D723AADC9@GVW1118EXC.americas.hpqcorp.net> <20100104151659.GA27601@sequoia.sous-sol.org> <0199E0D51A61344794750DC57738F58E6D723AB1EC@GVW1118EXC.americas.hpqcorp.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Chris Wright , "kvm@vger.kernel.org" , libvir-list@redhat.com To: "Fischer, Anna" Return-path: Received: from sous-sol.org ([216.99.217.87]:40421 "EHLO sequoia.sous-sol.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754431Ab0AHAu0 (ORCPT ); Thu, 7 Jan 2010 19:50:26 -0500 Content-Disposition: inline In-Reply-To: <0199E0D51A61344794750DC57738F58E6D723AB1EC@GVW1118EXC.americas.hpqcorp.net> Sender: kvm-owner@vger.kernel.org List-ID: * Fischer, Anna (anna.fischer@hp.com) wrote: > So, when setting a breakpoint for the exit() call I'm getting a bit closer to figuring where it kills my guest. Thanks, this helps clarify what is happening. > Breakpoint 1, exit (status=1) at exit.c:99 > 99 { > Current language: auto > The current source language is "auto; currently c". > (gdb) bt > #0 exit (status=1) at exit.c:99 > #1 0x0000000000470c6e in assigned_dev_pci_read_config (d=0x259c6f0, address=64, len=4) assigned_dev_pci_read_config(..., 64, 4) ^^ This is a libvirt issue. When you use virt-manager it has libvirtd fork/exec qemu-kvm. libvirtd will drop privileges and run qemu-kvm as user qemu (or perhaps root if you've edited qemu.conf). Regardless of the user, it clears capabilities. Reading PCI config space beyond just the header requires CAP_SYS_ADMIN. The above is reading the first 4 bytes of device dependent config space, and the kernel is returning 0 because qemu doesn't have CAP_SYS_ADMIN. Basically, this means that device assignment w/ libvirt will break MSI/MSI-X because qemu will never be able to see that the host device has those PCI capabilities. This, in turn, renders VF device assignment useless (since a VF is required to support MSI and/or MSI-X). Granting CAP_SYS_ADMIN for each qemu instance that does device assignment would render the privilege reduction useless (CAP_SYS_ADMIN is the kitchen sink catchall of the Linux capability system). Hmmph...