From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Xen pci-passthrough problem with pci-detach and pci-assignable-remove Date: Wed, 16 Apr 2014 11:30:57 -0400 Message-ID: <20140416153057.GA16539@phenom.dumpdata.com> References: <1087166993.20140110165729@eikelenboom.it> <20140110161248.GE21360@phenom.dumpdata.com> <1010658460.20140110171623@eikelenboom.it> <20140110173809.GA19423@pegasus.dumpdata.com> <1889333978.20140124143602@eikelenboom.it> <20140124174806.GA15571@phenom.dumpdata.com> <1142136480.20140220095359@eikelenboom.it> <929649832.20140220171846@eikelenboom.it> <20140401161309.GA10072@phenom.dumpdata.com> <1551452773.20140402124312@eikelenboom.it> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WaRno-0000ql-PJ for xen-devel@lists.xenproject.org; Wed, 16 Apr 2014 15:31:09 +0000 Content-Disposition: inline In-Reply-To: <1551452773.20140402124312@eikelenboom.it> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Sander Eikelenboom Cc: xen-devel , Ian Campbell List-Id: xen-devel@lists.xenproject.org On Wed, Apr 02, 2014 at 12:43:12PM +0200, Sander Eikelenboom wrote: > > Tuesday, April 1, 2014, 6:13:09 PM, you wrote: > > > On Thu, Feb 20, 2014 at 05:18:46PM +0100, Sander Eikelenboom wrote: > >> > >> Thursday, February 20, 2014, 9:53:59 AM, you wrote: > >> > >> > >> > Friday, January 24, 2014, 6:48:06 PM, you wrote: > >> > >> >> On Fri, Jan 24, 2014 at 02:36:02PM +0100, Sander Eikelenboom wrote: > >> >>> > >> >>> Friday, January 10, 2014, 6:38:10 PM, you wrote: > >> >>> > >> >>> >> > Wow. You just walked in a pile of bugs didn't you? And on Friday > >> >>> >> > nonethless. > >> >>> >> > >> >>> >> As usual ;-) > >> >>> > >> >>> > Ha! > >> >>> > ..snip.. > >> >>> >> >> [ 489.082358] [] ? mutex_spin_on_owner+0x38/0x45 > >> >>> >> >> [ 489.106272] [] ? schedule_preempt_disabled+0x6/0x9 > >> >>> >> >> [ 489.130158] [] ? __mutex_lock_slowpath+0x159/0x1b5 > >> >>> >> >> [ 489.154147] [] ? mutex_lock+0x16/0x25 > >> >>> >> >> [ 489.177890] [] ? pci_reset_function+0x26/0x4e > >> >>> >> > >> >>> >> > Yeah, that bug my RFC patchset (the one that does the slot/bus reset) should also fix. > >> >>> >> > I totally forgot about it ! > >> >>> >> > >> >>> >> Got a link to that patchset ? > >> >>> > >> >>> > https://lkml.org/lkml/2013/12/13/315 > >> >>> > >> >>> >> I at least could give it a spin .. you never know when fortune is on your side :-) > >> >>> > >> >>> > It is also at this git tree: > >> >>> > >> >>> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git and the > >> >>> > branch name is "devel/xen-pciback.slot_and_bus.v0". You will likely > >> >>> > want to merge it in your current Linus tree. > >> >>> > >> >>> > Thank you! > >> >>> > >> >>> > >> >>> Hi Konrad, > >> >>> > >> >>> Just got time to test this some more, when merging this branch *except* the last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd) > >> >>> seems to help with my problem,i'm no capable of using: > >> >>> - xl pci-detach > >> >>> - xl pci-assignable-remove > >> >>> - echo "BDF" > /sys/bus/pci/drivers//bind > >> >>> > >> >>> to remove a pci device from a running HVM guest and rebinding it to a driver in dom0 without those nasty stacktraces :-) > >> >>> So the first 4 seem to be an improvement. > >> >>> > >> >>> That last commit (9599a5ad38a3bb250e996ccb2cdaab6fb68aaacd) seems to give troubles of it's own. > >> > >> >> Could you email me your lspci output and also which devices you move/switch etc? > >> > >> > Hi Konrad, > >> > >> > At the moment i found some time to figure out what goes wrong with the xl pci-detach and xl pci-assignable-remove, i have been > >> > able to narrow it down a bit: > >> > >> > The problem only occurs when you: > >> > - passthrough 2 (or more?) pci devices assigned to a guest .. > >> > - and only remove 1 of those devices with "xl pci-detach" followed by a "xl pci-assignable-remove" > >> > - when you first detach both devices with "xl pci-detach" before doing the "xl pci-assignable-remove" it works ok. > >> > >> > In my case i'm passingthrough 2 devices (02:00.0 and 00:19.0) > >> > >> > I added some printk's and what i found out is that: > >> > - after doing the pci-detach of 02:00.0, it doesn't call pcistub_put_pci_dev for that device ... > >> > - but when i subsequently pci-detach the second (and last) device 00:19.0 .. it does call it for both 02:00.0 and 00:19.0 ... > >> > - so somehow that call for the first detached device gets deferred .. but since it are different devices and not functions of the same device i don't > >> > see any reason for it to wait until all other devices would have been detached ... > >> > >> > >> > I tried to capture the console output but some how that didn't work out, so i attached a screenshot of what happens when: > >> > - doing a xl pci-list for the guest > >> > - doing a xl pci-assignable-list > >> > >> > - doing the xl pci-detach for 02:00.0 > >> > >> > - doing a xl pci-list for the guest > >> > - doing a xl pci-assignable-list > >> > >> > - waiting some time ... > >> > >> > - doing the xl pci-detach for 00:19.0 > >> > >> > - doing a xl pci-list for the guest > >> > - doing a xl pci-assignable-list > >> > >> > There you can see this strange sequence of events :-) > >> > >> > But i haven't been able to spot the culprit > >> > >> Enabled some extra debugging and added some more printk's .. (see new screenshot) > >> > >> From what it seems .. the frontend state for the first device isn't changed on the first pci-detach .. > >> > >> Is the signaling on pci-detach the guests (pcifront) responsibility or the toolstacks (libxl) ? > > > It usually is pcifront. And in the screenshot I see: > > .. frontend is gone! unregister device > > which should trigger the process. And it does look to do that. > > Hm, I am wondering what the toolstack is waiting for. > > Time to debug. > > Ok thx :-) Just to make sure - you are not using the xen-pciback.hide parameter right? Just doing the /sysfs dance of 'echo BDF'> to various places.