All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Sander Eikelenboom <linux@eikelenboom.it>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble
Date: Mon, 30 Mar 2015 12:04:26 +0100	[thread overview]
Message-ID: <55192DBA.2070702@citrix.com> (raw)
In-Reply-To: <1687342964.20150328211022@eikelenboom.it>

On 28/03/15 20:10, Sander Eikelenboom wrote:
> Saturday, March 28, 2015, 6:30:39 PM, you wrote:
>
>> On 28/03/15 15:34, Sander Eikelenboom wrote:
>>> Hi Jan,
>>>
>>> Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622:
>>> "x86 don't change affinity with interrupt unmasked",
>>> gives trouble on my AMD box, symptoms:
>>> - APIC errors in xl dmesg that weren't previously there:
>>>   (XEN) [2015-03-26 20:35:37.085] IOAPIC[0]: Set PCI routing entry (6-13 -> 0x88 -> IRQ 13 Mode:0 Active:0)
>>>   (XEN) [2015-03-26 20:35:37.101] PCI: Using MCFG for segment 0000 bus 00-ff
>>>   (XEN) [2015-03-26 20:35:37.097] IOAPIC[0]: Set PCI routing entry (6-8 -> 0x58 -> IRQ 8 Mode:0 Active:0)
>>>   (XEN) [2015-03-26 20:35:37.112] IOAPIC[0]: Set PCI routing entry (6-18 -> 0xb8 -> IRQ 18 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.189] IOAPIC[0]: Set PCI routing entry (6-17 -> 0xc0 -> IRQ 17 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-29 -> 0xc8 -> IRQ 53 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-24 -> 0xd0 -> IRQ 48 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-30 -> 0xd8 -> IRQ 54 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-12 -> 0x21 -> IRQ 36 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-13 -> 0x29 -> IRQ 37 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.421] IOAPIC[1]: Set PCI routing entry (7-16 -> 0x31 -> IRQ 40 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.495] IOAPIC[1]: Set PCI routing entry (7-28 -> 0x39 -> IRQ 52 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.498] IOAPIC[0]: Set PCI routing entry (6-16 -> 0x89 -> IRQ 16 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.498] IOAPIC[1]: Set PCI routing entry (7-14 -> 0xa9 -> IRQ 38 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:37.548] IOAPIC[0]: Set PCI routing entry (6-22 -> 0xb9 -> IRQ 22 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:39.620] IOAPIC[1]: Set PCI routing entry (7-9 -> 0xc1 -> IRQ 33 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:39.646] IOAPIC[1]: Set PCI routing entry (7-8 -> 0xc9 -> IRQ 32 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:39.647] IOAPIC[1]: Set PCI routing entry (7-23 -> 0xd1 -> IRQ 47 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:41.732] IOAPIC[1]: Set PCI routing entry (7-5 -> 0xd9 -> IRQ 29 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:41.779] IOAPIC[1]: Set PCI routing entry (7-4 -> 0x22 -> IRQ 28 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:41.803] mm.c:803: d0: Forcing read-only access to MFN fed00
>>>   (XEN) [2015-03-26 20:35:41.894] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x2a -> IRQ 19 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:42.057] IOAPIC[1]: Set PCI routing entry (7-22 -> 0x72 -> IRQ 46 Mode:1 Active:1)
>>>   (XEN) [2015-03-26 20:35:42.093] IOAPIC[1]: Set PCI routing entry (7-27 -> 0x8a -> IRQ 51 Mode:1 Active:1)
>>>
>>>   these:
>>>   (XEN) [2015-03-26 20:35:42.205] APIC error on CPU0: 00(40)
>>>   (XEN) [2015-03-26 20:35:42.372] APIC error on CPU0: 40(40)
>>>
>>>   (XEN) [2015-03-26 20:35:42.691] d0 attempted to change d0v1's CR4 flags 00000660 -> 00000760
>>>   (XEN) [2015-03-26 20:35:42.691] IOAPIC[1]: Set PCI routing entry (7-1 -> 0x9a -> IRQ 25 Mode:1 Active:1)
>>>
>>>   and this one:
>>>   (XEN) [2015-03-26 20:35:42.707] APIC error on CPU0: 40(40)
>>>   (XEN) [2015-03-26 20:35:43.958] d0 attempted to change d0v0's CR4 flags 00000660 -> 00000760
>>>   (XEN) [2015-03-26 20:35:43.970] d0 attempted to change d0v2's CR4 flags 00000660 -> 00000760
>>>   (XEN) [2015-03-26 20:35:43.988] d0 attempted to change d0v3's CR4 flags 00000660 -> 00000760
>>>   (XEN) [2015-03-26 20:35:43.992] d0 attempted to change d0v4's CR4 flags 00000660 -> 00000760
>>>   (XEN) [2015-03-26 20:35:43.996] d0 attempted to change d0v5's CR4 flags 00000660 -> 00000760
>>>   (d1) [2015-03-26 20:40:42.220] mapping kernel into physical memory
>>>   (d1) [2015-03-26 20:40:42.220] about to get started...
>>>
>>>
>>> - random failures on dom0 SATA devices, the SATA controller is using multiple MSI 
>>>   interrupts.
>>>
>>> - failues on XHCI controllers passed through to a HVM guest which uses MSI-X
>>>   interrupts. Leading to these in the guest dmesg:
>>>   [  350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma 000000003cdf7140 trb-start 000000003cdf7240 trb-end 000000003cdf7240 seg-start 000000003cdf7000 seg-end 000000003cdf73f0
>>>   [  350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
>>>   [  350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma 000000003cdf7150 trb-start 000000003cdf7240 trb-end 000000003cdf7240 seg-start 000000003cdf7000 seg-end 000000003cdf73f0
>>>   [  350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1
>>>
>>>
>>> Reverting this specific commit makes all the troubles go away ..
>> That is unfortunate, as conceptually the identified patch definitely
>> fixes a bug.
>> The "APIC error" messages have bit 6 set, which is "Receive Illegal
>> Vector".  i.e. a device has attempted to deliver an interrupt with a
>> vector field less than 16.  I presume that this means that the device is
>> ending up with a malformed data field programmed into it.
>> Can you identify the PCI sbdf's of the problematic devices, and collect
>> debug-keys Q, M and i on a working system so I can identify precisely
>> which of the MSI interrupt drivers is in use (Xen has several, depending
>> on exact hardware circumstance).  If you can, the same debug-keys with
>> the problematic changeset present might also be interesting.
>> ~Andrew
>
> Hi Andrew,
>
> The passed through xhci is 08:00.0
> The SATA controller is 00:11.0
>
> Most clear failure is on the xhci controller.
>
> The working and not working config only differ in the revert of the mentioned 
> commit.
>
> Attached are:
>
> - lspci in dom0 of the working config 
> - serial-log of the working config (with debug-keys Q, M and i after full boot 
>   and guest start)
> - serial-log of the not working config (with debug-keys Q, M and i after full 
> boot and guest start)

Thanks.

As an utter longshot, can you give this patch a try?  Could you also see
about capturing an lspci in dom0 while the bad situation is manifesting
itself?

~Andrew

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c
b/xen/drivers/passthrough/amd/iommu_intr.c
index c1b76fb..439ba05 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -529,10 +529,12 @@ int amd_iommu_msi_msg_update_ire(
     } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
 
     if ( !rc )
+    {
         for ( i = 1; i < nr; ++i )
             msi_desc[i].remap_index = msi_desc->remap_index + i;
+        msg->data = data;
+    }
 
-    msg->data = data;
     return rc;
 }

  reply	other threads:[~2015-03-30 11:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-28 15:34 Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble Sander Eikelenboom
2015-03-28 17:30 ` Andrew Cooper
2015-03-28 20:10   ` Sander Eikelenboom
2015-03-30 11:04     ` Andrew Cooper [this message]
2015-03-30 13:26       ` Sander Eikelenboom
2015-04-01 14:43         ` Andrew Cooper
2015-04-01 14:47           ` Sander Eikelenboom
2015-04-14 12:46           ` Sander Eikelenboom
2015-04-15 14:15             ` Jan Beulich
2015-04-17 11:43             ` Jan Beulich
2015-04-17 14:13               ` Sander Eikelenboom
2015-04-17 15:11               ` Sander Eikelenboom
2015-04-17 15:20                 ` Jan Beulich
2015-04-17 15:46                   ` Andrew Cooper
2015-04-17 15:53                     ` Sander Eikelenboom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55192DBA.2070702@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=linux@eikelenboom.it \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.