From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [PATCH] kvm: deassign irqs in reset path Date: Fri, 30 Mar 2012 21:29:23 +0200 Message-ID: <4F760993.304@web.de> References: <201203301918.q2UJI63c005908@int-mx02.intmail.prod.int.phx2.redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig090999C6486DAD9E9D72AB6D" Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, mst@redhat.com, alex.williamson@redhat.com To: Jason Baron Return-path: Received: from fmmailgate07.web.de ([217.72.192.248]:43925 "EHLO fmmailgate07.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760537Ab2C3Tal (ORCPT ); Fri, 30 Mar 2012 15:30:41 -0400 Received: from moweb001.kundenserver.de (moweb001.kundenserver.de [172.19.20.114]) by fmmailgate07.web.de (Postfix) with ESMTP id 6FFE7FE0A86 for ; Fri, 30 Mar 2012 21:29:29 +0200 (CEST) In-Reply-To: <201203301918.q2UJI63c005908@int-mx02.intmail.prod.int.phx2.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig090999C6486DAD9E9D72AB6D Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable On 2012-03-30 21:18, Jason Baron wrote: > We've hit a kernel host panic, when issuing a 'system_reset' with a 825= 76 nic > assigned and a Windows guest. Host system is a PowerEdge R815. >=20 > [Hardware Error]: Hardware error from APEI Generic Hardware Error Sourc= e: 32993 > [Hardware Error]: APEI generic hardware error status > [Hardware Error]: severity: 1, fatal > [Hardware Error]: section: 0, severity: 1, fatal > [Hardware Error]: flags: 0x01 > [Hardware Error]: primary > [Hardware Error]: section_type: PCIe error > [Hardware Error]: port_type: 0, PCIe end point > [Hardware Error]: version: 1.0 > [Hardware Error]: command: 0x0000, status: 0x0010 > [Hardware Error]: device_id: 0000:08:00.0 > [Hardware Error]: slot: 1 > [Hardware Error]: secondary_bus: 0x00 > [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 > [Hardware Error]: class_code: 000002 > [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 > [Hardware Error]: Unsupported Request > [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequester = ID > [Hardware Error]: aer_uncor_severity: 0x00067011 > [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000 > [Hardware Error]: section: 1, severity: 1, fatal > [Hardware Error]: flags: 0x01 > [Hardware Error]: primary > [Hardware Error]: section_type: PCIe error > [Hardware Error]: port_type: 0, PCIe end point > [Hardware Error]: version: 1.0 > [Hardware Error]: command: 0x0000, status: 0x0010 > [Hardware Error]: device_id: 0000:08:00.0 > [Hardware Error]: slot: 1 > [Hardware Error]: secondary_bus: 0x00 > [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 > [Hardware Error]: class_code: 000002 > [Hardware Error]: aer_status: 0x00100000, aer_mask: 0x00018000 > [Hardware Error]: Unsupported Request > [Hardware Error]: aer_layer=3DTransaction Layer, aer_agent=3DRequester = ID > [Hardware Error]: aer_uncor_severity: 0x00067011 > [Hardware Error]: aer_tlp_header: 40001001 0020000f edbf800c 01000000 > Kernel panic - not syncing: Fatal hardware error! > Pid: 0, comm: swapper Not tainted 2.6.32-242.el6.x86_64 #1 > Call Trace: > [] ? panic+0xa0/0x168 > [] ? ghes_notify_nmi+0x17c/0x180 > [] ? notifier_call_chain+0x55/0x80 > [] ? atomic_notifier_call_chain+0x1a/0x20 > [] ? notify_die+0x2e/0x30 > [] ? do_nmi+0x1a1/0x2b0 > [] ? nmi+0x20/0x30 > [] ? native_safe_halt+0xb/0x10 > <> [] ? default_idle+0x4d/0xb0 > [] ? cpu_idle+0xb6/0x110 > [] ? rest_init+0x7a/0x80 > [] ? start_kernel+0x424/0x430 > [] ? x86_64_start_reservations+0x125/0x129 > [] ? x86_64_start_kernel+0xfa/0x109 >=20 > The root cause of the problem is that the 'reset_assigned_device()' cod= e > first writes a 0 to the command register. Then, when qemu subsequently = does > a kvm_deassign_irq() (called by assign_irq(), in the system_reset path)= , > the kernel ends up calling '__msix_mask_irq()', which performs a write = to > the memory mapped msi vector space. Since, we've explicitly told the de= vice > to disallow mmio access (via the 0 write to the command register), we e= nd > up with the above 'Unsupported Request'. >=20 > The fix here is to first call kvm_deassign_irq(), before doing the rese= t, s/fix/workaround/. This is a kernel bug if userspace can crash the system like this, no? Let's fix the kernel first and then look at what needs to be changed here. Jan > and then calling assign_irq() to put the device in an INTx mode. In thi= s > way, the device is a known state after reset (INTx mode), and we avoid = touching > msi memory mapped space on any subsequent 'kvm_deassign_irq()', since w= e're > in INTx mode. >=20 > Thanks to Michael S. Tsirkin for help in understanding what was going o= n here. >=20 > Signed-off-by: Jason Baron > Signed-off-by: Alex Williamson > --- > hw/device-assignment.c | 27 +++++++++++++++++++++++++++ > 1 files changed, 27 insertions(+), 0 deletions(-) >=20 > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > index 89823f1..31aed17 100644 > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -1609,10 +1609,32 @@ static void reset_assigned_device(DeviceState *= dev) > { > PCIDevice *pci_dev =3D DO_UPCAST(PCIDevice, qdev, dev); > AssignedDevice *adev =3D DO_UPCAST(AssignedDevice, dev, pci_dev); > + struct kvm_assigned_irq assigned_irq_data; > char reset_file[64]; > const char reset[] =3D "1"; > int fd, ret; > =20 > + /* > + * Make sure the irq for the device is set to a consistent state o= f INTx > + * on reset. This also ensures that a subsequent deassign_irq/assi= gn_irq > + * sequence (such as during 'system_reset'), does not touch memory= > + * mapped msi space, since we are about to disallow that access vi= a a > + * 0 write to the command register. In addition, the 'kvm_deassign= _irq()' > + * clears the msi enable bit, thus preventing any unexpected MSIs.= > + */ > + memset(&assigned_irq_data, 0, sizeof assigned_irq_data); > + assigned_irq_data.assigned_dev_id =3D > + calc_assigned_dev_id(adev); > + assigned_irq_data.flags =3D adev->irq_requested_type; > + free_dev_irq_entries(adev); > + ret =3D kvm_deassign_irq(kvm_state, &assigned_irq_data); > + /* -ENXIO means no assigned irq */ > + if (ret && ret !=3D -ENXIO) { > + perror("reset_assigned_device: deassign irq"); > + } > + > + adev->irq_requested_type =3D 0; > + > snprintf(reset_file, sizeof(reset_file), > "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset", > adev->host.seg, adev->host.bus, adev->host.dev, adev->hos= t.func); > @@ -1635,6 +1657,11 @@ static void reset_assigned_device(DeviceState *d= ev) > * disconnected from the PCI bus. This avoids further DMA transfer= s. > */ > assigned_dev_pci_write_config(pci_dev, PCI_COMMAND, 0, 2); > + > + ret =3D assign_irq(adev); > + if (ret) { > + perror("reset_assigned_device: assign irq"); > + } > } > =20 > static int assigned_initfn(struct PCIDevice *pci_dev) --------------enig090999C6486DAD9E9D72AB6D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk92CZcACgkQitSsb3rl5xQ7xACg35l3l6eosL9sqdlI01hu3Exo m4oAoKMtCbHAN8PiuUVuGEdHy6qDssFp =IFaz -----END PGP SIGNATURE----- --------------enig090999C6486DAD9E9D72AB6D--