Re: S3 crash with VTD Queue Invalidation enabled

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Ben Guthro <ben.guthro@gmail.com>
To: "Zhang, Xiantao" <xiantao.zhang@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ben Guthro <ben@guthro.net>, Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: S3 crash with VTD Queue Invalidation enabled
Date: Thu, 6 Jun 2013 11:17:01 -0400	[thread overview]
Message-ID: <-4872575961693725132@unknownmsgid> (raw)
In-Reply-To: <B6C2EB9186482D47BD0C5A9A4834564404CF3456@SHSMSX104.ccr.corp.intel.com>

On Jun 6, 2013, at 11:13 AM, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:

>
>
>> -----Original Message-----
>> From: Ben Guthro [mailto:ben.guthro@gmail.com]
>> Sent: Thursday, June 06, 2013 11:08 PM
>> To: Zhang, Xiantao
>> Cc: Jan Beulich; Ben Guthro; Andrew Cooper; xen-devel
>> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled
>>
>> On Jun 6, 2013, at 11:06 AM, "Zhang, Xiantao" <xiantao.zhang@intel.com>
>> wrote:
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Thursday, June 06, 2013 2:59 PM
>>>> To: Ben Guthro
>>>> Cc: Andrew Cooper; Zhang, Xiantao; xen-devel
>>>> Subject: Re: [Xen-devel] S3 crash with VTD Queue Invalidation enabled
>>>>
>>>>>>> On 06.06.13 at 01:53, Ben Guthro <ben@guthro.net> wrote:
>>>>> On Wed, Jun 5, 2013 at 4:27 PM, Ben Guthro <ben@guthro.net> wrote:
>>>>>> On Wed, Jun 5, 2013 at 11:38 AM, Jan Beulich <JBeulich@suse.com>
>> wrote:
>>>>>>>>>> On 05.06.13 at 17:25, Ben Guthro <ben@guthro.net> wrote:
>>>>>>>> On Wed, Jun 5, 2013 at 11:14 AM, Jan Beulich <JBeulich@suse.com>
>>>> wrote:
>>>>>>>>> Depending on whether ATS is in use, more than one invalidation
>>>>>>>>> can be done in the processing here - could you therefore check
>>>>>>>>> whether there's any sign of ATS use ("iommu=verbose" should
>>>>>>>>> make you see respective messages), and if so see whether
>>>>>>>>> disabling it ("ats=off") makes a difference?
>>>>>>>>
>>>>>>>> ATS does not appear to be running:
>>>>>>>>
>>>>>>>> (XEN) [VT-D]dmar.c:737: Host address width 36
>>>>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD:
>>>>>>>> (XEN) [VT-D]dmar.c:412:   dmaru->address = fed90000
>>>>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed90000 iommu->reg =
>>>> ffff82c3ffd57000
>>>>>>>> (XEN) [VT-D]iommu.c:1199: cap = c0000020e60262 ecap = f0101a
>>>>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:02.0
>>>>>>>> (XEN) [VT-D]dmar.c:751: found ACPI_DMAR_DRHD:
>>>>>>>> (XEN) [VT-D]dmar.c:412:   dmaru->address = fed91000
>>>>>>>> (XEN) [VT-D]iommu.c:1197: drhd->address = fed91000 iommu->reg =
>>>> ffff82c3ffd56000
>>>>>>>> (XEN) [VT-D]iommu.c:1199: cap = c9008020660262 ecap = f0105a
>>>>>>>> (XEN) [VT-D]dmar.c:354:  IOAPIC: 0000:f0:1f.0
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.0
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.1
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.2
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.3
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.4
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.5
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.6
>>>>>>>> (XEN) [VT-D]dmar.c:332:  MSI HPET: 0000:00:0f.7
>>>>>>>> (XEN) [VT-D]dmar.c:426:   flags: INCLUDE_ALL
>>>>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR:
>>>>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:1d.0
>>>>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:1a.0
>>>>>>>> (XEN) [VT-D]dmar.c:625:   RMRR region: base_addr ba8d5000
>>>> end_address
>>>>>>>> ba8ebfff
>>>>>>>> (XEN) [VT-D]dmar.c:756: found ACPI_DMAR_RMRR:
>>>>>>>> (XEN) [VT-D]dmar.c:338:  endpoint: 0000:00:02.0
>>>>>>>> (XEN) [VT-D]dmar.c:625:   RMRR region: base_addr bb800000
>>>> end_address
>>>>>>>> bf9fffff
>>>>>>>>
>>>>>>>> I would expect a line with "found ACPI_DMAR_ATSR" to be printed, if it
>>>>>>>> was found.
>>>>>>>
>>>>>>> Right. So one less variable.
>>>>>>
>>>>>> Some more info.
>>>>>> Ross Philipson provided me with a handy utility to dump a bunch more
>>>>>> info about the DMAR tables, and with some more trace, this appears to
>>>>>> be tied to the IGD.
>>>>>>
>>>>>> Early in the boot process, I see queue_invalidate_wait() called for
>>>>>> DRHD unit 0, and 1
>>>>>> (unit 0 is wired up to the IGD, unit 1 is everything else)
>>>>>>
>>>>>> Up until i915 does the following, I see that unit being flushed with
>>>>>> queue_invalidate_wait() :
>>>>>>
>>>>>> [    0.704537] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
>>>>>> [    0.704537] ENERGY_PERF_BIAS: View and update with x86_energy_p
>>>>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0
>>>>>> (XEN) XXX queue_invalidate_wait:282 CPU0 DRHD0 ret=0
>>>>>> [    1.983028] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to
>>>>>> bit banging on pin 5
>>>>>> [    2.253551] fbcon: inteldrmfb (fb0) is primary device
>>>>>> [    3.111838] Console: switching to colour frame buffer device 170x48
>>>>>> [    3.171631] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
>>>>>> [    3.171634] i915 0000:00:02.0: registered panic notifier
>>>>>> [    3.173339] acpi device:00: registered as cooling_device1
>>>>>> [    3.173401] ACPI: Video Device [VID] (multi-head: yes  rom: no  post: no)
>>>>>> [    3.173962] input: Video Bus as
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input4
>>>>>> [    3.174232] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on
>>>>> minor 0
>>>>>> [    3.174258] ahci 0000:00:1f.2: version 3.0
>>>>>> [    3.174270] xen: registering gsi 19 triggering 0 polarity 1
>>>>>> [    3.174274] Already setup the GSI :19
>>>>>>
>>>>>>
>>>>>> After that - the unit never seems to be flushed.
>>>>>>
>>>>>> ...until we enter into the S3 hypercall, which loops over all DRHD
>>>>>> units, and explicitly flushes all of them via iommu_flush_all()
>>>>>>
>>>>>> It is at that point that it hangs up when talking to the device that
>>>>>> the IGD is plumbed up to.
>>>>>>
>>>>>>
>>>>>> Does this point to something in the i915 driver doing something that
>>>>>> is incompatible with Xen?
>>>>>
>>>>> I actually separated it from the S3 hypercall, adding a new debug key
>>>>> 'F' - to just call iommu_flush_all()
>>>>> I can crash it on demand with this.
>>>>>
>>>>> Booting with "i915.modeset=0 single" (to prevent both KMS, and Xorg) -
>>>>> it does not occur.
>>>>> So, that pretty much narrows it down to the IGD, in my mind.
>>>>
>>>> Indeed, I agree. Yet I can't in any way comment on what or why.
>>>> Xiantao (perhaps some graphics person would good to be Cc-ed
>>>> here too)?
>>> Hi, Jan/Ben
>>> Thanks for your analysis! Could you try to enable  "snb_igd_quirk"  to have a
>> try ?  thanks!
>>> Xiantao
>>
>>
>> Thanks for your reply. I tried this param yesterday, but it did not
>> change the behavior.
> Okay, I recalled one bug in IGD i915 driver is found recently, and it may bring some errors  to VT-d,  and should be fixed in latest kernel.  Could you try latest kernel 3.9.4 or 3.10-rcx ?
> Xiantao

It may have been dropped off of the top of this thread, but i sent out
what i have tested with, and this was one of them.

Testing 3.10 did not change this behavior.

Did you have a particular changeset in mind?

next prev parent reply	other threads:[~2013-06-06 15:17 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-03 18:29 S3 crash with VTD Queue Invalidation enabled Ben Guthro
2013-06-03 19:22 ` Andrew Cooper
2013-06-04  8:54   ` Jan Beulich
2013-06-04 12:25     ` Ben Guthro
2013-06-04 14:01       ` Jan Beulich
2013-06-04 19:20         ` Ben Guthro
2013-06-04 19:49           ` Ben Guthro
2013-06-04 21:09             ` Ben Guthro
2013-06-05  8:24               ` Jan Beulich
2013-06-05 13:54                 ` Ben Guthro
2013-06-05 15:14                   ` Jan Beulich
2013-06-05 15:25                     ` Ben Guthro
2013-06-05 15:38                       ` Jan Beulich
2013-06-05 20:27                         ` Ben Guthro
2013-06-05 23:53                           ` Ben Guthro
2013-06-06  6:58                             ` Jan Beulich
2013-06-06 15:06                               ` Zhang, Xiantao
2013-06-06 15:07                                 ` Ben Guthro
2013-06-06 15:13                                   ` Zhang, Xiantao
2013-06-06 15:17                                     ` Ben Guthro [this message]
2013-06-07  1:33                                       ` Zhang, Xiantao
2013-06-07 15:52                                         ` Ben Guthro
2013-06-14  8:38                             ` Jan Beulich
2013-06-14 17:01                               ` Ben Guthro
2013-06-14 18:27                                 ` Ben Guthro
2013-06-17  7:23                                   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=-4872575961693725132@unknownmsgid \
    --to=ben.guthro@gmail.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ben@guthro.net \
    --cc=xen-devel@lists.xen.org \
    --cc=xiantao.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).