From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Guthro Subject: Re: S3 crash with VTD Queue Invalidation enabled Date: Wed, 5 Jun 2013 19:53:43 -0400 Message-ID: References: <51ACECEB.9030904@citrix.com> <51ADC77402000078000DAF95@nat28.tlf.novell.com> <51AE0F6602000078000DB1F4@nat28.tlf.novell.com> <51AF11F102000078000DB589@nat28.tlf.novell.com> <51AF71E902000078000DB8B7@nat28.tlf.novell.com> <51AF777F02000078000DB944@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Andrew Cooper , xiantao.zhang@intel.com, xen-devel List-Id: xen-devel@lists.xenproject.org On Wed, Jun 5, 2013 at 4:27 PM, Ben Guthro wrote: > On Wed, Jun 5, 2013 at 11:38 AM, Jan Beulich wrote: >>>>> On 05.06.13 at 17:25, Ben Guthro wrote: >>> On Wed, Jun 5, 2013 at 11:14 AM, Jan Beulich wrote: >>>> Depending on whether ATS is in use, more than one invalidation >>>> can be done in the processing here - could you therefore check >>>> whether there's any sign of ATS use ("iommu=verbose" should >>>> make you see respective messages), and if so see whether >>>> disabling it ("ats=off") makes a difference? >>> >>> ATS does not appear to be running: >>> >>> (XEN) [VT-D]dmar.c:737: Host address width 36 >>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD: >>> (XEN) [VT-D]dmar.c:412: dmaru->address = fed90000 >>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed90000 iommu->reg = ffff82c3ffd57000 >>> (XEN) [VT-D]iommu.c:1199: cap = c0000020e60262 ecap = f0101a >>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:02.0 >>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD: >>> (XEN) [VT-D]dmar.c:412: dmaru->address = fed91000 >>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed91000 iommu->reg = ffff82c3ffd56000 >>> (XEN) [VT-D]iommu.c:1199: cap = c9008020660262 ecap = f0105a >>> (XEN) [VT-D]dmar.c:354: IOAPIC: 0000:f0:1f.0 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.0 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.1 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.2 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.3 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.4 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.5 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.6 >>> (XEN) [VT-D]dmar.c:332: MSI HPET: 0000:00:0f.7 >>> (XEN) [VT-D]dmar.c:426: flags: INCLUDE_ALL >>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR: >>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:1d.0 >>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:1a.0 >>> (XEN) [VT-D]dmar.c:625: RMRR region: base_addr ba8d5000 end_address >>> ba8ebfff >>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR: >>> (XEN) [VT-D]dmar.c:338: endpoint: 0000:00:02.0 >>> (XEN) [VT-D]dmar.c:625: RMRR region: base_addr bb800000 end_address >>> bf9fffff >>> >>> I would expect a line with "found ACPI_DMAR_ATSR" to be printed, if it >>> was found. >> >> Right. So one less variable. > > Some more info. > Ross Philipson provided me with a handy utility to dump a bunch more > info about the DMAR tables, and with some more trace, this appears to > be tied to the IGD. > > Early in the boot process, I see queue_invalidate_wait() called for > DRHD unit 0, and 1 > (unit 0 is wired up to the IGD, unit 1 is everything else) > > Up until i915 does the following, I see that unit being flushed with > queue_invalidate_wait() : > > [ 0.704537] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' > [ 0.704537] ENERGY_PERF_BIAS: View and update with x86_energy_p > (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0 > (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0 > [ 1.983028] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to > bit banging on pin 5 > [ 2.253551] fbcon: inteldrmfb (fb0) is primary device > [ 3.111838] Console: switching to colour frame buffer device 170x48 > [ 3.171631] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device > [ 3.171634] i915 0000:00:02.0: registered panic notifier > [ 3.173339] acpi device:00: registered as cooling_device1 > [ 3.173401] ACPI: Video Device [VID] (multi-head: yes rom: no post: no) > [ 3.173962] input: Video Bus as > /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4 > [ 3.174232] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 > [ 3.174258] ahci 0000:00:1f.2: version 3.0 > [ 3.174270] xen: registering gsi 19 triggering 0 polarity 1 > [ 3.174274] Already setup the GSI :19 > > > After that - the unit never seems to be flushed. > > ...until we enter into the S3 hypercall, which loops over all DRHD > units, and explicitly flushes all of them via iommu_flush_all() > > It is at that point that it hangs up when talking to the device that > the IGD is plumbed up to. > > > Does this point to something in the i915 driver doing something that > is incompatible with Xen? I actually separated it from the S3 hypercall, adding a new debug key 'F' - to just call iommu_flush_all() I can crash it on demand with this. Booting with "i915.modeset=0 single" (to prevent both KMS, and Xorg) - it does not occur. So, that pretty much narrows it down to the IGD, in my mind.