From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: PCI device not properly reset after VFIO Date: Thu, 18 Oct 2012 17:06:32 +0200 Message-ID: <50801AF8.7040305@suse.de> References: <507FA602.9050801@suse.de> <1350571230.2112.344.camel@bling.home> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm-devel , Linux Virtualization , Alexander Graf To: Alex Williamson Return-path: Received: from cantor2.suse.de ([195.135.220.15]:41514 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756340Ab2JRPGe (ORCPT ); Thu, 18 Oct 2012 11:06:34 -0400 In-Reply-To: <1350571230.2112.344.camel@bling.home> Sender: kvm-owner@vger.kernel.org List-ID: On 10/18/2012 04:40 PM, Alex Williamson wrote: > Hi Hannes, > > Thanks for testing vfio > > On Thu, 2012-10-18 at 08:47 +0200, Hannes Reinecke wrote: >> Hi Alex, >> >> I've been playing around with VFIO and megasas (of course). >> What I did now was switching between VFIO and 'normal' operation, ie >> emulated access. >> >> megasas is happily running under VFIO, but when I do an emergency >> stop like killing the Qemu session the PCI device is not properly re= set. >> IE when I load 'megaraid_sas' after unbinding the vfio_pci module >> the driver cannot initialize the card and waits forever for the >> firmware state to change. >> >> I need to do a proper pci reset via >> echo 1 > /sys/bus/pci/device/XXXX/reset >> to get it into a working state again. >> >> Looking at vfio_pci_disable() pci reset is called before the config >> state and BARs are restored. >> Seeing that vfio_pci_enable() calls pci reset right at the start, >> too, before modifying anything I do wonder whether the pci reset is >> at the correct location for disable. >> >> I would have expected to call pci reset in vfio_pci_disable() >> _after_ we have restored the configuration, to ensure a sane state >> after reset. >> And, as experience show, we do need to call it there. >> >> So what is the rationale for the pci reset? >> Can we move it to the end of vfio_pci_disable() or do we need to >> call pci reset twice? > > I believe the rationale was that by resetting the device before we > restore the state we stop anything that the device was doing. Restor= ing > the saved state on a running device seems like it could cause problem= s, > so you may be right and we actually need to do reset, load, restore, > reset. Does adding another call to pci_reset_function in the > pci_restore_state (as below) solve the problem? Traditional KVM devi= ce > assignment has a nearly identical path, does it have this same bug? It's actually the first time I've been able to test this (the=20 hardware is a bit tricky to setup ...), so I cannot tell (yet) if KVM exhibited the same thing. > Thanks, > > Alex > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.= c > index 6c11994..d07a45c 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -107,9 +107,10 @@ static void vfio_pci_disable(struct vfio_pci_dev= ice *vdev) > pci_reset_function(vdev->pdev); > > if (pci_load_and_free_saved_state(vdev->pdev, > - &vdev->pci_saved_state) =3D= =3D 0) > + &vdev->pci_saved_state) =3D= =3D 0) { > pci_restore_state(vdev->pdev); > - else > + pci_reset_function(vdev->pdev); > + } else > pr_info("%s: Couldn't reload %s saved state\n", > __func__, dev_name(&vdev->pdev->dev)); > > > I would have called reset after unmapping the BARs; the HBA I'm=20 working with does need to access the BARs, so the content of them=20 might be relevant, too. But then I'm not really a PCI expert. Maybe we should ask Tony Luck or Bjorn Helgaas. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg GF: J. Hawn, J. Guild, F. Imend=C3=B6rffer, HRB 16746 (AG N=C3=BCrnberg= )