public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets
@ 2013-03-01 17:17 Neil Horman
  2013-03-01 18:20 ` Yinghai Lu
                   ` (11 more replies)
  0 siblings, 12 replies; 62+ messages in thread
From: Neil Horman @ 2013-03-01 17:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Neil Horman, Prarit Bhargava, Don Zickus, Don Dutile,
	Bjorn Helgaas, Asit Mallick, linux-pci

A few years back intel published a spec update:
http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf

For the 5520 and 5500 chipsets which contained an errata (specificially errata
53), which noted that these chipsets can't properly do interrupt remapping, and
as a result the recommend that interrupt remapping be disabled in bios.  While
many vendors have a bios update to do exactly that, not all do, and of course
not all users update their bios to a level that corrects the problem.  As a
result, occasionally interrupts can arrive at a cpu even after affinity for that
interrupt has be moved, leading to lost or spurrious interrupts (usually
characterized by the message:
kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)

There have been several incidents recently of people seeing this error, and
investigation has shown that they have system for which their BIOS level is such
that this feature was not properly turned off.  As such, it would be good to
give them a reminder that their systems are vulnurable to this problem.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Prarit Bhargava <prarit@redhat.com>
CC: Don Zickus <dzickus@redhat.com>
CC: Don Dutile <ddutile@redhat.com>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Asit Mallick <asit.k.mallick@intel.com>
CC: linux-pci@vger.kernel.org
---
 drivers/iommu/intel_irq_remapping.c | 20 ++++++++++++++++++++
 include/linux/pci_ids.h             |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f3b8f23..9bfb6c2 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1113,3 +1113,23 @@ struct irq_remap_ops intel_irq_remap_ops = {
 	.msi_setup_irq		= intel_msi_setup_irq,
 	.setup_hpet_msi		= intel_setup_hpet_msi,
 };
+
+
+static void intel_remapping_check(struct pci_dev *dev)
+{
+	u8 revision;
+
+	pci_read_config_byte(dev, PCI_REVISION_ID, &revision);
+
+	if ((revision == 0x13) && irq_remapping_enabled) {
+		pr_warn("WARNING WARNING WARNING WARNING WARNING WARNING\n"
+			"This system BIOS has enabled interrupt remapping\n"
+			"on a chipset that contains an errata making that\n"
+			"feature unstable.  Please reboot with nointremap\n"
+			"added to the kernel command line and contact\n"
+			"your BIOS vendor for an update");
+	}
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check);
+
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 31717bd..54027a6 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2732,6 +2732,8 @@
 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2  0x2db2
 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV2    0x2db3
 #define PCI_DEVICE_ID_INTEL_82855PM_HB	0x3340
+#define PCI_DEVICE_ID_INTEL_5500_IOHUB	0x3403
+#define PCI_DEVICE_ID_INTEL_5520_IOHUB	0x3406
 #define PCI_DEVICE_ID_INTEL_IOAT_TBG4	0x3429
 #define PCI_DEVICE_ID_INTEL_IOAT_TBG5	0x342a
 #define PCI_DEVICE_ID_INTEL_IOAT_TBG6	0x342b
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 62+ messages in thread
* Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets
@ 2013-04-06  1:25 Neil Horman
  2013-04-06  2:32 ` Yinghai Lu
  0 siblings, 1 reply; 62+ messages in thread
From: Neil Horman @ 2013-04-06  1:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Linux Kernel Mailing List, Prarit Bhargava, Don Zickus,
	Don Dutile, Bjorn Helgaas, Asit Mallick, David Woodhouse,
	 linux-pci@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 6432 bytes --]

I'm sorry.  Forgot to change the wording of the error for the new model that I'm following here.  Although the message is mostly right as bios is responsible for setting and clearing the IRQ remapping feature bit in the chips capabilities register.

I'll fix and repost Monday

Neil

Yinghai Lu <yinghai@kernel.org> wrote:

>On Fri, Apr 5, 2013 at 12:31 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>> A few years back intel published a spec update:
>> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>>
>> For the 5520 and 5500 chipsets which contained an errata (specificially errata
>> 53), which noted that these chipsets can't properly do interrupt remapping, and
>> as a result the recommend that interrupt remapping be disabled in bios.  While
>> many vendors have a bios update to do exactly that, not all do, and of course
>> not all users update their bios to a level that corrects the problem.  As a
>> result, occasionally interrupts can arrive at a cpu even after affinity for that
>> interrupt has be moved, leading to lost or spurrious interrupts (usually
>> characterized by the message:
>> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>>
>> There have been several incidents recently of people seeing this error, and
>> investigation has shown that they have system for which their BIOS level is such
>> that this feature was not properly turned off.  As such, it would be good to
>> give them a reminder that their systems are vulnurable to this problem.
>>
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: Prarit Bhargava <prarit@redhat.com>
>> CC: Don Zickus <dzickus@redhat.com>
>> CC: Don Dutile <ddutile@redhat.com>
>> CC: Bjorn Helgaas <bhelgaas@google.com>
>> CC: Asit Mallick <asit.k.mallick@intel.com>
>> CC: David Woodhouse <dwmw2@infradead.org>
>> CC: linux-pci@vger.kernel.org
>> ---
>>
>> Change notes:
>>
>> v2)
>>
>> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
>> chipset series is x86 only.  I decided however to keep the quirk as a regular
>> quirk, not an early_quirk.  Early quirks have no way currently to determine if
>> BIOS has properly disabled the feature in the iommu, at least not without
>> significant hacking, and since its quite possible this will be a short lived
>> quirk, should Don Z's workaround code prove successful (and it looks like it may
>> well), I don't think that necessecary.
>>
>> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
>> string, I opted to leave the newlines in place however, as I really couldnt
>> find a way to keep the text on a single line is still legible from a code
>> perspective.  I think theres enough language in there that using cscope on just
>> about any substring however will turn it up, and again, this may be a short
>> lived quirk.
>>
>> v3)
>>
>> * Removed defines from pci_ids.h, and used direct id values as per request from
>> Bjorn.
>>
>> v4)
>>
>> * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David
>> Woodhouse
>>
>> v5)
>>
>> * Moved check to an early quirk, and flagged the broken chip, so we could
>> reasonably disable irq remapping during bootup.
>>
>> v6)
>> * Clean up of stupid extra thrash in quirks.c
>> ---
>>  arch/x86/kernel/early-quirks.c | 25 +++++++++++++++++++++++++
>>  drivers/iommu/irq_remapping.c  | 12 ++++++++++++
>>  drivers/iommu/irq_remapping.h  |  1 +
>>  3 files changed, 38 insertions(+)
>>
>> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
>> index 3755ef4..bfa3139 100644
>> --- a/arch/x86/kernel/early-quirks.c
>> +++ b/arch/x86/kernel/early-quirks.c
>> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int func)
>>  }
>>  #endif
>>
>> +#ifdef CONFIG_IRQ_REMAP
>> +static void __init intel_remapping_check(int num, int slot, int func)
>> +{
>> +       u8 revision;
>> +
>> +       revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID);
>> +
>> +       /*
>> +        * Revision 0x13 of this chipset supports irq remapping
>> +        * but has an erratum that breaks its behavior, flag it as such
>> +        */
>> +       if (revision == 0x13)
>> +               irq_remap_broken = 1;
>> +
>> +}
>> +#else
>> +static void __init intel_remapping_check(int num, int slot, int func)
>> +{
>> +}
>> +#endif
>> +
>>  #define QFLAG_APPLY_ONCE       0x1
>>  #define QFLAG_APPLIED          0x2
>>  #define QFLAG_DONE             (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
>> @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = {
>>           PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
>>         { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>>           PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
>> +       { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
>> +         PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
>> +       { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
>> +         PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
>>         {}
>>  };
>>
>> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
>> index d56f8c1..2b56e92 100644
>> --- a/drivers/iommu/irq_remapping.c
>> +++ b/drivers/iommu/irq_remapping.c
>> @@ -19,6 +19,7 @@
>>  int irq_remapping_enabled;
>>
>>  int disable_irq_remap;
>> +int irq_remap_broken;
>>  int disable_sourceid_checking;
>>  int no_x2apic_optout;
>>
>> @@ -216,6 +217,17 @@ int irq_remapping_supported(void)
>>         if (disable_irq_remap)
>>                 return 0;
>>
>> +       if (irq_remap_broken) {
>> +               WARN_TAINT(1, TAIN_FIRMWARE_WORKAROUND,
>> +                          "This system BIOS has enabled interrupt remapping\n"
>> +                          "on a chipset that contains an erratum making that\n"
>> +                          "feature unstable.  Please reboot with nointremap\n"
>> +                          "added to the kernel command line and contact\n"
>> +                          "your BIOS vendor for an update");
>
>What do you mean "This system BIOS has enabled interrupt remapping" ?
>BIOS have interrupt pre-enabled or BIOS just provide DMAR table ?
>
>Why do you need "Please reboot with nointremap" ?
>
>Thanks
>
>Yinghai
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2013-04-18 17:00 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-01 17:17 [PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets Neil Horman
2013-03-01 18:20 ` Yinghai Lu
2013-03-01 19:29   ` Neil Horman
2013-03-02  2:28   ` Jiang Liu
2013-03-02 15:59 ` Andreas Mohr
2013-03-04 13:24   ` Don Dutile
2013-03-10  1:11     ` Prarit Bhargava
2013-03-02 16:21 ` Prarit Bhargava
2013-03-02 20:13   ` Neil Horman
2013-03-04 19:04 ` [PATCH v2] " Neil Horman
2013-03-09 20:49   ` Neil Horman
2013-03-09 22:20     ` Myron Stowe
2013-03-11  1:31       ` Don Dutile
2013-03-11 11:25       ` Neil Horman
2013-03-11 12:17         ` Prarit Bhargava
2013-04-03 23:53   ` Bjorn Helgaas
2013-04-04 11:17     ` Neil Horman
2013-04-04 14:27     ` David Woodhouse
2013-04-04 14:50       ` Neil Horman
2013-04-04 14:57         ` Bjorn Helgaas
2013-04-04 15:39           ` Neil Horman
2013-04-04 17:14             ` Bjorn Helgaas
2013-04-04 17:51               ` Neil Horman
2013-04-04 18:41                 ` Bjorn Helgaas
2013-04-04 20:02                   ` Neil Horman
2013-04-04 13:54 ` [PATCH v3] " Neil Horman
2013-04-04 15:08 ` [PATCH v4] " Neil Horman
2013-04-04 16:16   ` Yinghai Lu
2013-04-04 17:27     ` Don Dutile
2013-04-04 17:40       ` Yinghai Lu
2013-04-04 20:04         ` Neil Horman
2013-04-04 20:33           ` Bjorn Helgaas
2013-04-04 21:11             ` Yinghai Lu
2013-04-05  0:24               ` Neil Horman
2013-04-05 19:25 ` [PATCH v5] " Neil Horman
2013-04-05 19:29   ` Neil Horman
2013-04-05 19:31 ` [PATCH v6] " Neil Horman
2013-04-05 23:37   ` Yinghai Lu
2013-04-06  1:55   ` Bjorn Helgaas
2013-04-08 15:29     ` Don Dutile
2013-04-08 17:17       ` Bjorn Helgaas
2013-04-08 17:42         ` Neil Horman
2013-04-09 10:08           ` Joerg Roedel
2013-04-15 11:18 ` [PATCH v7] " Neil Horman
2013-04-15 15:30   ` Bjorn Helgaas
2013-04-15 16:28     ` Neil Horman
2013-04-15 16:28 ` [PATCH v8] " Neil Horman
2013-04-15 22:41 ` [PATCH v9] " Neil Horman
2013-04-15 23:02   ` Yinghai Lu
2013-04-16  0:43     ` Neil Horman
2013-04-16  6:20   ` Arkadiusz Miskiewicz
2013-04-16 10:24   ` Joerg Roedel
2013-04-16 13:07     ` Neil Horman
2013-04-16 13:35     ` Neil Horman
2013-04-16 16:37       ` Joerg Roedel
2013-04-16 17:25         ` Neil Horman
2013-04-16 20:38 ` [PATCH v10] " Neil Horman
2013-04-16 22:08   ` Don Dutile
2013-04-18 15:02   ` Joerg Roedel
2013-04-18 17:00     ` Neil Horman
  -- strict thread matches above, loose matches on Subject: below --
2013-04-06  1:25 [PATCH v6] " Neil Horman
2013-04-06  2:32 ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox