From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56213) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aB7fk-0003w1-CM for qemu-devel@nongnu.org; Mon, 21 Dec 2015 16:07:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aB7fg-0007Tt-AF for qemu-devel@nongnu.org; Mon, 21 Dec 2015 16:07:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51464) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aB7fg-0007Tn-0T for qemu-devel@nongnu.org; Mon, 21 Dec 2015 16:07:08 -0500 Message-ID: <1450732026.3781.89.camel@redhat.com> From: Alex Williamson Date: Mon, 21 Dec 2015 14:07:06 -0700 In-Reply-To: <56737D85.3030305@cn.fujitsu.com> References: <35d75210aef922bd049d93e0c8dbe13a2b5b1dd3.1447748073.git.chen.fan.fnst@cn.fujitsu.com> <1450384318.2674.110.camel@redhat.com> <56737D85.3030305@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v14 Resend 10/13] pci: add pci device pre-post reset callbacks for host bus reset List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chen Fan , Cao jin , qemu-devel@nongnu.org Cc: mst@redhat.com On Fri, 2015-12-18 at 11:29 +0800, Chen Fan wrote: > On 12/18/2015 04:31 AM, Alex Williamson wrote: > > On Thu, 2015-12-17 at 09:41 +0800, Cao jin wrote: > > > From: Chen Fan > > >=20 > > > Particularly, For vfio devices, Once need to recovery devices > > > by bus reset such as AER, we always need to reset the host bus > > > to recovery the devices under the bus, so we need to add pci > > > device > > > callbacks to specify to do host bus reset. > > >=20 > > > Signed-off-by: Chen Fan > > > Reviewed-by: Michael S. Tsirkin > > > --- > > > =C2=A0 hw/pci/pci.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0| 18 ++++++++++++++++++ > > > =C2=A0 hw/pci/pci_bridge.c=C2=A0=C2=A0|=C2=A0=C2=A09 +++++++++ > > > =C2=A0 hw/vfio/pci.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= | 26 ++++++++++++++++++++++++++ > > > =C2=A0 hw/vfio/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0=C2=A02 ++ > > > =C2=A0 include/hw/pci/pci.h |=C2=A0=C2=A07 +++++++ > > > =C2=A0 5 files changed, 62 insertions(+) > > >=20 > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > index f6ca6ef..64fa2cc 100644 > > > --- a/hw/pci/pci.c > > > +++ b/hw/pci/pci.c > > > @@ -247,6 +247,24 @@ static void pci_do_device_reset(PCIDevice > > > *dev) > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0msix_reset(dev); > > > =C2=A0 } > > > =C2=A0=C2=A0 > > > +void pci_device_pre_reset(PCIBus *bus, PCIDevice *dev, void > > > *unused) > > > +{ > > > +=C2=A0=C2=A0=C2=A0=C2=A0PCIDeviceClass *dc =3D PCI_DEVICE_GET_CLAS= S(dev); > > > + > > > +=C2=A0=C2=A0=C2=A0=C2=A0if (dc->pre_reset) { > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0dc->pre_reset(dev)= ; > > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > > +} > > > + > > > +void pci_device_post_reset(PCIBus *bus, PCIDevice *dev, void > > > *unused) > > > +{ > > > +=C2=A0=C2=A0=C2=A0=C2=A0PCIDeviceClass *dc =3D PCI_DEVICE_GET_CLAS= S(dev); > > > + > > > +=C2=A0=C2=A0=C2=A0=C2=A0if (dc->post_reset) { > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0dc->post_reset(dev= ); > > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > > +} > > > + > > > =C2=A0 /* > > > =C2=A0=C2=A0=C2=A0* This function is called on #RST and FLR. > > > =C2=A0=C2=A0=C2=A0* FLR if PCI_EXP_DEVCTL_BCR_FLR is set > > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > > > index 40c97b1..ddb76ab 100644 > > > --- a/hw/pci/pci_bridge.c > > > +++ b/hw/pci/pci_bridge.c > > > @@ -267,8 +267,17 @@ void pci_bridge_write_config(PCIDevice *d, > > > =C2=A0=C2=A0 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0newctl =3D pci_get_word(d->conf= ig + PCI_BRIDGE_CONTROL); > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (~oldctl & newctl & PCI_BRID= GE_CTL_BUS_RESET) { > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Notify all= vfio-pci devices under the bus > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* should do = physical bus reset. > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PCIBus *sec_bus =3D= pci_bridge_get_sec_bus(s); > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pci_for_each_devic= e(sec_bus, pci_bus_num(sec_bus), > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0pci_device_pre_reset, NULL); > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Trig= ger hot reset on 0->1 transition. */ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0qbus_re= set_all(&s->sec_bus.qbus); > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0pci_for_each_devic= e(sec_bus, pci_bus_num(sec_bus), > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0pci_device_post_reset, NULL); > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > > =C2=A0 } > > > =C2=A0=C2=A0 > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > > index e17dc89..df32618 100644 > > > --- a/hw/vfio/pci.c > > > +++ b/hw/vfio/pci.c > > > @@ -39,6 +39,7 @@ > > > =C2=A0=C2=A0 > > > =C2=A0 static void vfio_disable_interrupts(VFIOPCIDevice *vdev); > > > =C2=A0 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool > > > enabled); > > > +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single); > > > =C2=A0=C2=A0 > > > =C2=A0 /* > > > =C2=A0=C2=A0=C2=A0* Disabling BAR mmaping can be slow, but toggling= it around > > > INTx > > > can > > > @@ -1888,6 +1889,8 @@ static int > > > vfio_check_host_bus_reset(VFIOPCIDevice *vdev) > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* List all affected devices by= bus reset */ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0devices =3D &info->devices[0]; > > > =C2=A0=C2=A0 > > > +=C2=A0=C2=A0=C2=A0=C2=A0vdev->single_depend_dev =3D (info->count =3D= =3D 1); > > > + > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Verify that we have all the = groups required */ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0for (i =3D 0; i < info->count; = i++) { > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PCIHost= DeviceAddress host; > > > @@ -2029,10 +2032,26 @@ static int > > > vfio_check_bus_reset(NotifierWithReturn *n, void *opaque) > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return vfio_check_host_bus_rese= t(vdev); > > > =C2=A0 } > > > =C2=A0=C2=A0 > > > +static void vfio_aer_pre_reset(PCIDevice *pdev) > > > +{ > > > +=C2=A0=C2=A0=C2=A0=C2=A0VFIOPCIDevice *vdev =3D DO_UPCAST(VFIOPCID= evice, pdev, pdev); > > > + > > > +=C2=A0=C2=A0=C2=A0=C2=A0vdev->aer_reset =3D true; > > > +=C2=A0=C2=A0=C2=A0=C2=A0vfio_pci_hot_reset(vdev, vdev->single_depe= nd_dev); > > > +} > > Doesn't this lead to multiple host bus resets per guest bus reset > > in > > many cases?=C2=A0=C2=A0It looks like we'll do it once per vfio-pci de= vice, > > even > > if those devices are on the same host bus.=C2=A0=C2=A0That's a 1 seco= nd > > operation > > per device.=C2=A0=C2=A0Can we avoid that?=C2=A0=C2=A0Maybe some sort = of sequence ID > > could > > help a device figure out whether it's already been reset as part of > > a > > dependent device for this particular guest bus reset.=C2=A0=C2=A0Than= ks, > That's right, I missed this case, but I don't understand the scenario > how to > use a sequence ID to mark the device if been reset. can you detail it > ? I don't really have a concrete idea for a sequence ID, it was just a thought that maybe if each bus reset had a sequence ID then devices could know whether they've already been reset for that sequence ID. =C2=A0The basic problem we have is that reset callbacks are per device an= d it's difficult to infer which individual resets are part of that bus reset. =C2=A0In fact, do we propagate resets correctly down secondary bridges? =C2=A0We're triggering off a VM write of the bridge control bus reset bit triggering from 0->1 and we then call qbus_reset_all() on that qbus, which I think is just going to call=C2=A0pci_bridge_reset() fo= r any other bridges, which doesn't do anything about resetting deeper subordinate buses. =C2=A0I think that means that if we had a root port wi= th a switch below it and endpoints below that, if the VM triggered a secondary bus reset at the root port, the endpoints would never see it, which is not how real hardware works. > additional, there was a mechanism to compute device whether need to > be reset > by hot reset. so I simply modify the code as the following: >=20 > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index a9bc67e..42774ca 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -2063,13 +2063,19 @@ static void vfio_aer_pre_reset(PCIDevice > *pdev) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0VFIOPCIDevice *vdev =3D DO_UPCAST(V= FIOPCIDevice, pdev, pdev); >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vdev->aer_reset =3D true; > -=C2=A0=C2=A0=C2=A0=C2=A0vfio_pci_hot_reset(vdev, vdev->single_depend_d= ev); > +=C2=A0=C2=A0=C2=A0=C2=A0vdev->vbasedev.needs_reset =3D true; > =C2=A0 } >=20 > =C2=A0 static void vfio_aer_post_reset(PCIDevice *pdev) > =C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0VFIOPCIDevice *vdev =3D DO_UPCAST(V= FIOPCIDevice, pdev, pdev); >=20 > +=C2=A0=C2=A0=C2=A0=C2=A0if (!vdev->single_depend_dev && vdev->vbasedev= .needs_reset) { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vfio_pci_hot_reset(vde= v, false); > +=C2=A0=C2=A0=C2=A0=C2=A0} else { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vfio_pci_hot_reset(vde= v, true); > +=C2=A0=C2=A0=C2=A0=C2=A0} > + > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vdev->aer_reset =3D false; > =C2=A0 } >=20 > what do you think of this ? I think it might be a bigger problem than that subtle change. =C2=A0I won= der if we really need a better model of the reset line through the subordinate buses. =C2=A0When reset is asserted, we'd set a bus_in_reset flag on the bus and trigger downstream bridges to do the same. =C2=A0Then when the user de-asserts reset, we'd call qbus_reset_all() and propagate it through to downstream buses. =C2=A0That way the per device reset callback could check to see if the bus is in reset and aer devices can then know to do a bus reset. =C2=A0Finally, the bus_in_reset flag would be cleared on all the affected buses. =C2=A0I'm sure there are numerous details missing there, but it seems like it might be a reasonable approach. =C2=A0Thanks, Alex