linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
@ 2011-11-23 12:39 Chris Edwards
  2011-11-23 13:52 ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Edwards @ 2011-11-23 12:39 UTC (permalink / raw)
  To: linux-rt-users

Hi all,

Problem:
IRQ-related "nobody cared" kernel call traces not long after bootup on 
Linux 3.0.10-rt27.  I thought I'd try a -rt kernel to see if it would 
resolve the audio drop-outs on my new Firewire audio interface.

System:
Dual Opteron 244 processors on an Iwill DK8N (AMD 8131 and nForce3 250Gb)
Ubuntu 10.04.3 LTS

`uname -a` returns:
Linux zaphod 3.0.10-rt27 #1 SMP PREEMPT RT Thu Nov 24 00:50:56 NZDT 2011 
x86_64 GNU/Linux

`dmesg` snippet:
[   86.720241] irq 17: nobody cared (try booting with the "irqpoll" option)
[   86.720248] Pid: 553, comm: irq/17-firewire Not tainted 3.0.10-rt27 #1
[   86.720251] Call Trace:
[   86.720261]  [<ffffffff810c315a>] __report_bad_irq+0x3a/0xd0
[   86.720266]  [<ffffffff810c3332>] note_interrupt+0x142/0x1f0
[   86.720270]  [<ffffffff810c2e16>] irq_thread+0x1b6/0x1e0
[   86.720273]  [<ffffffff810c2f10>] ? exit_irq_thread+0x80/0x80
[   86.720277]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[   86.720281]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[   86.720286]  [<ffffffff81081876>] kthread+0x96/0xa0
[   86.720290]  [<ffffffff8104dda2>] ? finish_task_switch+0x52/0x100
[   86.720296]  [<ffffffff815f09e4>] kernel_thread_helper+0x4/0x10
[   86.720300]  [<ffffffff810817e0>] ? kthreadd+0x170/0x170
[   86.720304]  [<ffffffff815f09e0>] ? gs_change+0x13/0x13
[   86.720306] handlers:
[   86.720310] [<ffffffff810c1760>] irq_default_primary_handler threaded 
[<ffffffffa010bd70>] irq_handler
[   86.720319] Disabling IRQ #17
[   87.905515] irq 18: nobody cared (try booting with the "irqpoll" option)
[   87.905521] Pid: 470, comm: irq/18-Gina24 Not tainted 3.0.10-rt27 #1
[   87.905523] Call Trace:
[   87.905528]  [<ffffffff810c315a>] __report_bad_irq+0x3a/0xd0
[   87.905531]  [<ffffffff810c3332>] note_interrupt+0x142/0x1f0
[   87.905535]  [<ffffffff810c2e16>] irq_thread+0x1b6/0x1e0
[   87.905538]  [<ffffffff810c2f10>] ? exit_irq_thread+0x80/0x80
[   87.905542]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[   87.905545]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[   87.905550]  [<ffffffff81081876>] kthread+0x96/0xa0
[   87.905553]  [<ffffffff8104dda2>] ? finish_task_switch+0x52/0x100
[   87.905557]  [<ffffffff815f09e4>] kernel_thread_helper+0x4/0x10
[   87.905562]  [<ffffffff810817e0>] ? kthreadd+0x170/0x170
[   87.905566]  [<ffffffff815f09e0>] ? gs_change+0x13/0x13
[   87.905568] handlers:
[   87.905571] [<ffffffff810c1760>] irq_default_primary_handler threaded 
[<ffffffffa02fe7b0>] snd_echo_interrupt
[   87.905579] Disabling IRQ #18

I don't see these messages with the non-rt version of the same kernel.  
These seem to be the only IRQs affected.

I did try the "irqpoll" kernel command-line option as suggested in the 
error messages, but the kernel messages say that it's not supported in -rt.

`cat /proc/interrupts' reports:
            CPU0       CPU1
   0:        135          0   IO-APIC-edge      timer
   1:          0          2   IO-APIC-edge      i8042
   3:          0          2   IO-APIC-edge
   4:          0          3   IO-APIC-edge
   6:          0          5   IO-APIC-edge      floppy
   7:          1          0   IO-APIC-edge
   8:          0          0   IO-APIC-edge      rtc0
   9:          0          0   IO-APIC-fasteoi   acpi
  12:          0          4   IO-APIC-edge      i8042
  14:          0          0   IO-APIC-edge      pata_amd
  15:          0          0   IO-APIC-edge      pata_amd
  17:     199111        890   IO-APIC-fasteoi   firewire_ohci
  18:     196249       3752   IO-APIC-fasteoi   Gina24
  20:       4511        193   IO-APIC-fasteoi   ohci_hcd:usb2
  21:       2375       3144   IO-APIC-fasteoi   ehci_hcd:usb1
  22:          0          0   IO-APIC-fasteoi   sata_nv, ohci_hcd:usb3
  25:       6138        482   IO-APIC-fasteoi   aic7xxx
  26:       5520         91   IO-APIC-fasteoi   eth1, aic7xxx
  27:      42997       6943   IO-APIC-fasteoi   sata_sil
  29:          1        375   IO-APIC-fasteoi   megaraid
NMI:         17         16   Non-maskable interrupts
LOC:     254097     230128   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:         17         16   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:      26350      25414   Rescheduling interrupts
CAL:       2825      32855   Function call interrupts
TLB:       1697       1346   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        117        117   Machine check polls
ERR:          1
MIS:          0


Needless to say, the Firewire audio is a no-go.

I'm also having trouble getting dual-head X to work, but that's another 
issue, most likely.  I'm also wondering if I can get the HPET timer 
enabled: AFAICT (from looking at other BIOSes) the nForce3 has one, but 
this system's ACPI tables don't advertise it.

Any help or suggestions gratefully received. :^)
--
Chris


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-11-23 12:39 IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system Chris Edwards
@ 2011-11-23 13:52 ` Steven Rostedt
  2011-11-23 23:12   ` Chris Edwards
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2011-11-23 13:52 UTC (permalink / raw)
  To: Chris Edwards; +Cc: linux-rt-users

On Thu, 2011-11-24 at 01:39 +1300, Chris Edwards wrote:
> Hi all,
> 
> Problem:
> IRQ-related "nobody cared" kernel call traces not long after bootup on 
> Linux 3.0.10-rt27.  I thought I'd try a -rt kernel to see if it would 
> resolve the audio drop-outs on my new Firewire audio interface.

I wonder if this is another bad irq chipset. Does it go away if you boot
with noapic in the kernel command line?

> 
> System:
> Dual Opteron 244 processors on an Iwill DK8N (AMD 8131 and nForce3 250Gb)
> Ubuntu 10.04.3 LTS
> 

> I did try the "irqpoll" kernel command-line option as suggested in the 
> error messages, but the kernel messages say that it's not supported in -rt.

Right. There's an issue on some chipsets that running interrupts as
threads causes the chipset to go into legacy mode, and interrupts coming
in on one apic line, appear on another line. There is a work around for
it (I believe), and we black list these chipsets, but as with all black
lists, we only list those that we know about. Yours may be a new one.


> 
> Needless to say, the Firewire audio is a no-go.
> 
> I'm also having trouble getting dual-head X to work, but that's another 
> issue, most likely.  I'm also wondering if I can get the HPET timer 
> enabled: AFAICT (from looking at other BIOSes) the nForce3 has one, but 
> this system's ACPI tables don't advertise it.

Keep us posted. Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-11-23 13:52 ` Steven Rostedt
@ 2011-11-23 23:12   ` Chris Edwards
  2011-11-29  2:25     ` Chris Edwards
  2011-11-30 22:10     ` Steven Rostedt
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Edwards @ 2011-11-23 23:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users

On 24/11/11 02:52, Steven Rostedt wrote:
> On Thu, 2011-11-24 at 01:39 +1300, Chris Edwards wrote:
>> Hi all,
>>
>> Problem:
>> IRQ-related "nobody cared" kernel call traces not long after bootup on
>> Linux 3.0.10-rt27.  I thought I'd try a -rt kernel to see if it would
>> resolve the audio drop-outs on my new Firewire audio interface.
> I wonder if this is another bad irq chipset. Does it go away if you boot
> with noapic in the kernel command line?
>

Thanks for the quick reply, Steven.  Booting with "noapic" does seem to 
avoid the problem with IRQs 17 and 18, and the Firewire audio now works, 
but the "nobody cared" error now appears for IRQ 7:

[  752.450644] irq 7: nobody cared (try booting with the "irqpoll" option)
[  752.450653] Pid: 74, comm: irq/7-ohci_hcd: Not tainted 3.0.10-rt27 #1
[  752.450657] Call Trace:
[  752.450671]  [<ffffffff810c315a>] __report_bad_irq+0x3a/0xd0
[  752.450676]  [<ffffffff810c3332>] note_interrupt+0x142/0x1f0
[  752.450680]  [<ffffffff810c2e16>] irq_thread+0x1b6/0x1e0
[  752.450684]  [<ffffffff810c2f10>] ? exit_irq_thread+0x80/0x80
[  752.450688]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[  752.450692]  [<ffffffff810c2c60>] ? irq_finalize_oneshot+0x130/0x130
[  752.450699]  [<ffffffff81081876>] kthread+0x96/0xa0
[  752.450703]  [<ffffffff8104dda2>] ? finish_task_switch+0x52/0x100
[  752.450711]  [<ffffffff815f09e4>] kernel_thread_helper+0x4/0x10
[  752.450715]  [<ffffffff810817e0>] ? kthreadd+0x170/0x170
[  752.450719]  [<ffffffff815f09e0>] ? gs_change+0x13/0x13
[  752.450721] handlers:
[  752.450726] [<ffffffff810c1760>] irq_default_primary_handler threaded 
[<ffffffff81464e60>] usb_hcd_irq
[  752.450733] Disabling IRQ #7


Here is the interrupt arrangement now:

cat /proc/interrupts
            CPU0       CPU1
   0:        125          0    XT-PIC-XT-PIC    timer
   1:          2          0    XT-PIC-XT-PIC    i8042
   2:          0          0    XT-PIC-XT-PIC    cascade
   3:          1          0    XT-PIC-XT-PIC
   4:          1          0    XT-PIC-XT-PIC
   5:       4794        279    XT-PIC-XT-PIC    ehci_hcd:usb1
   6:          4          1    XT-PIC-XT-PIC    floppy
   7:       2071      16855    XT-PIC-XT-PIC    ohci_hcd:usb3
   8:          0          0    XT-PIC-XT-PIC    rtc0
   9:        295         18    XT-PIC-XT-PIC    acpi, sata_nv, ohci_hcd:usb2
  10:       9995       1467    XT-PIC-XT-PIC    sata_sil, Gina24
  11:       1758        305    XT-PIC-XT-PIC    megaraid, aic7xxx, 
firewire_ohci, eth1, aic7xxx
  12:          4          0    XT-PIC-XT-PIC    i8042
  14:          0          0    XT-PIC-XT-PIC    pata_amd
  15:          0          0    XT-PIC-XT-PIC    pata_amd
NMI:          2          2   Non-maskable interrupts
LOC:      39580      37870   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          2          2   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:      12595      17142   Rescheduling interrupts
CAL:       4611       2901   Function call interrupts
TLB:        264        285   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         21         21   Machine check polls
ERR:         34
MIS:          0

Looking at the interrupt rates while running jackd with different 
settings, it looks a lot like the Firewire interrupts are being 
duplicated on IRQ 7.  Also (and I'm not sure if this is significant) the 
interrupt rates seem to be swapped with respect to the CPUs:

IRQ 7 on CPU 0 (ohci_hcd:usb3) shows essentially the same rate as IRQ 11 
on CPU 1 (megaraid, aic7xxx, firewire_ohci, eth1, aic7xxx).

IRQ 7 on CPU 1 (ohci_hcd:usb3) ditto WRT IRQ 11 on CPU 0 (megaraid, 
aic7xxx, firewire_ohci, eth1, aic7xxx)

Here are the interrupt stats after that testing - you can see the 
relationship between IRQs 7 and 11 and the ERR (and, I just noticed, 
also the RES) counts.

cat /proc/interrupts
            CPU0       CPU1
   0:        125          0    XT-PIC-XT-PIC    timer
   1:          2          0    XT-PIC-XT-PIC    i8042
   2:          0          0    XT-PIC-XT-PIC    cascade
   3:          1          0    XT-PIC-XT-PIC
   4:          1          0    XT-PIC-XT-PIC
   5:       4794        279    XT-PIC-XT-PIC    ehci_hcd:usb1
   6:          4          1    XT-PIC-XT-PIC    floppy
   7:    1110109    5323192    XT-PIC-XT-PIC    ohci_hcd:usb3
   8:          0          0    XT-PIC-XT-PIC    rtc0
   9:      28893       5605    XT-PIC-XT-PIC    acpi, sata_nv, ohci_hcd:usb2
  10:      25475       2831    XT-PIC-XT-PIC    sata_sil, Gina24
  11:    5264022    1101395    XT-PIC-XT-PIC    megaraid, aic7xxx, 
firewire_ohci, eth1, aic7xxx
  12:          4          0    XT-PIC-XT-PIC    i8042
  14:          0          0    XT-PIC-XT-PIC    pata_amd
  15:          0          0    XT-PIC-XT-PIC    pata_amd
NMI:         81         78   Non-maskable interrupts
LOC:    2386834    1935552   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:         81         78   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:    1343954    5673656   Rescheduling interrupts
CAL:      10113      12523   Function call interrupts
TLB:        859        674   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        493        493   Machine check polls
ERR:    6233341
MIS:          0

BTW, the Firewire audio is now working flawlessly, even with very small 
buffers/low latency (up to the point I run out of CPU). :)

Thanks again,
Chris


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-11-23 23:12   ` Chris Edwards
@ 2011-11-29  2:25     ` Chris Edwards
  2011-11-30 22:10     ` Steven Rostedt
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Edwards @ 2011-11-29  2:25 UTC (permalink / raw)
  To: linux-rt-users

Further to my previous posts...

I don't know much about ACPI and how interrupt routing is supposed to 
work (especially on more complex systems such as this).  AFAICT, this 
board has three PCI expansion buses (not including the AGP port), 
arranged like so (ASCII art at left is supposed to be the physical 
layout of the slots on the board):

   ====|==|==    AGP
=======|==      PCI 32-bit 33 MHz, PCI bus 01 (on nForce3),  BIOS slot # 1
==|=======|===  PCI-X,             PCI bus 04 (on AMD 8131), BIOS slot # 4
==|=======|===  PCI-X,             PCI bus 04 (on AMD 8131), BIOS slot # 5
==|=======|===  PCI 64-bit 66 MHz, PCI bus 05 (on AMD 8131), BIOS slot # 2
==|=======|===  PCI 64-bit 66 MHz, PCI bus 05 (on AMD 8131), BIOS slot # 3

Onboard Firewire/IEEE 1394 controller is also on PCI bus 01.
Onboard SiI 3114 SATA controller is also on PCI bus 06.

(This is based on what `biosdecode` and `lspci` report.  Actually, the 
bus numbering changes depending on what cards are installed: I have a 
RAID card with a PCI bridge on it that bumps the higher bus numbers around.)


# biosdecode
# biosdecode 2.9
BIOS32 Service Directory present.
     Revision: 0
     Calling Interface Address: 0x000F0010
PCI Interrupt Routing 1.0 present.
     Router ID: 00:01.0
     Exclusive IRQs: None
     Compatible Router: 10de:00e0
     Slot Entry 1: ID 00:01, on-board
     Slot Entry 2: ID 00:02, on-board
     Slot Entry 3: ID 00:05, on-board
     Slot Entry 4: ID 00:06, on-board
     Slot Entry 5: ID 00:0b, on-board
     Slot Entry 6: ID 00:09, on-board
     Slot Entry 7: ID 00:0a, on-board
     Slot Entry 8: ID 01:07, slot number 1
     Slot Entry 9: ID 01:06, on-board
     Slot Entry 10: ID 05:01, slot number 2
     Slot Entry 11: ID 05:02, slot number 3
     Slot Entry 12: ID 04:01, slot number 4
     Slot Entry 13: ID 04:02, slot number 5
     Slot Entry 14: ID 05:03, on-board
PNP BIOS 1.0 present.
     Event Notification: Not Supported
     Real Mode 16-bit Code Address: F000:57D2
     Real Mode 16-bit Data Address: F000:0000
     16-bit Protected Mode Code Address: 0x000F57FA
     16-bit Protected Mode Data Address: 0x000F0000
ACPI 2.0 present.
     OEM Identifier: ACPIAM
     RSD Table 32-bit Address: 0xBFF40000
     XSD Table 64-bit Address: 0x00000000BFF40100
SMBIOS 2.3 present.
     Structure Table Length: 1933 bytes
     Structure Table Address: 0x000FA2F0
     Number Of Structures: 47
     Maximum Structure Size: 182 bytes


# lspci
00:00.0 Host bridge: nVidia Corporation nForce3 250Gb Host Bridge (rev a1)
00:01.0 ISA bridge: nVidia Corporation nForce3 250Gb LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation nForce 250Gb PCI System Management 
(rev a1)
00:02.0 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.2 USB Controller: nVidia Corporation nForce3 EHCI USB 2.0 
Controller (rev a2)
00:05.0 Ethernet controller: nVidia Corporation CK8S Ethernet Controller 
(rev a2)
00:08.0 IDE interface: nVidia Corporation CK8S Parallel ATA Controller 
(v2.5) (rev a2)
00:0a.0 IDE interface: nVidia Corporation nForce3 Serial ATA Controller 
(rev a2)
00:0b.0 PCI bridge: nVidia Corporation nForce3 250Gb AGP Host to PCI 
Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge 
(rev a2)
00:17.0 Communication controller: nVidia Corporation Device 00ec (rev a1)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
01:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A 
IEEE-1394a-2000 Controller (PHY/Link)
01:07.0 Multimedia controller: Motorola DSP56361 Digital Signal Processor
02:00.0 VGA compatible controller: ATI Technologies Inc R420 JP [Radeon 
X800XT]
02:00.1 Display controller: ATI Technologies Inc R420 [X800XT-PE] 
(Secondary)
03:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge 
(rev 12)
03:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
03:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge 
(rev 12)
03:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
05:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet 
Controller (rev 04)
05:03.0 Mass storage controller: Silicon Image, Inc. SiI 3114 
[SATALink/SATARaid] Serial ATA Controller (rev 02)


There seem to be 3 IOAPICs: one on the nForce3, I presume, and two on 
the AMD 8131.

# dmesg | grep -i ioapic
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 
0-23
[    0.000000] ACPI: IOAPIC (id[0x03] address[0xfebfe000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 3, version 17, address 0xfebfe000, GSI 
24-27
[    0.000000] ACPI: IOAPIC (id[0x04] address[0xfebff000] gsi_base[28])
[    0.000000] IOAPIC[2]: apic_id 4, version 17, address 0xfebff000, GSI 
28-31
[    0.143685] ACPI: Using IOAPIC for interrupt routing

So it seems the problematic IRQs are both on the nForce3: the onboard 
Firewire controller and the Gina24 sound card (IRQs 17 and 18) are both 
on the nForce3's 32-bit 33 MHz PCI bus, and both experience those "irq 
... nobody cared" errors.  I can't move the Gina24 sound card, as the 
other slots are keyed for a different voltage, and moving the on-board 
Firewire controller is obviously not an option either. :)

I also noticed the following groups of kernel messages, which might be 
of use to someone:

[    0.165850] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.166337] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
[    0.184280] ACPI: PCI Interrupt Routing Table [\_SB_.PCIB.GOLA._PRT]
[    0.184550] ACPI: PCI Interrupt Routing Table [\_SB_.PCIB.GOLB._PRT]

[    0.185842] ACPI: PCI Interrupt Link [LNKA] (IRQs 16 17 18 19) *0, 
disabled.
[    0.186111] ACPI: PCI Interrupt Link [LNKB] (IRQs 16 17 18 19) *0, 
disabled.
[    0.186366] ACPI: PCI Interrupt Link [LNKC] (IRQs 16 17 18 19) *11
[    0.186621] ACPI: PCI Interrupt Link [LNKD] (IRQs 16 17 18 19) *9
[    0.186876] ACPI: PCI Interrupt Link [LNKE] (IRQs 16 17 18 19) *11
[    0.187127] ACPI: PCI Interrupt Link [LUS0] (IRQs 20 21 22) *11
[    0.187364] ACPI: PCI Interrupt Link [LUS1] (IRQs 20 21 22) *7
[    0.187601] ACPI: PCI Interrupt Link [LUS2] (IRQs 20 21 22) *5
[    0.187844] ACPI: PCI Interrupt Link [LKLN] (IRQs 20 21 22) *9
[    0.188099] ACPI: PCI Interrupt Link [LAUI] (IRQs 20 21 22) *0, disabled.
[    0.188338] ACPI: PCI Interrupt Link [LKMO] (IRQs 20 21 22) *0, disabled.
[    0.188578] ACPI: PCI Interrupt Link [LKSM] (IRQs 20 21 22) *9
[    0.188819] ACPI: PCI Interrupt Link [LTID] (IRQs 20 21 22) *0
[    0.189084] ACPI: PCI Interrupt Link [LTIE] (IRQs 20 21 22) *0, disabled.
[    0.189385] ACPI: PCI Interrupt Link [LATA] (IRQs 20 21 22) *14

[    3.076267] ACPI: PCI Interrupt Link [LTID] BIOS reported IRQ 0, 
using IRQ 22
[    3.076270] ACPI: PCI Interrupt Link [LTID] enabled at IRQ 22
[    3.087843] ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 21
[    3.095754] ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 20
[    3.149622] ACPI: PCI Interrupt Link [LUS1] enabled at IRQ 22
[    5.116414] ACPI: PCI Interrupt Link [LKLN] enabled at IRQ 21
[    6.803745] ACPI: PCI Interrupt Link [LNKE] enabled at IRQ 19
[    7.073385] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 18
[    7.433420] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 17

[    3.076289] sata_nv 0000:00:0a.0: PCI INT A -> Link[LTID] -> GSI 22 
(level, low) -> IRQ 22
[    3.078327] sata_sil 0000:05:03.0: PCI INT A -> GSI 27 (level, low) 
-> IRQ 27
[    3.087859] ehci_hcd 0000:00:02.2: PCI INT C -> Link[LUS2] -> GSI 21 
(level, low) -> IRQ 21
[    3.095766] ohci_hcd 0000:00:02.0: PCI INT A -> Link[LUS0] -> GSI 20 
(level, low) -> IRQ 20
[    3.149626] ohci_hcd 0000:00:02.1: PCI INT B -> Link[LUS1] -> GSI 22 
(level, low) -> IRQ 22
[    5.116423] forcedeth 0000:00:05.0: PCI INT A -> Link[LKLN] -> GSI 21 
(level, low) -> IRQ 21
[    5.167149] e1000 0000:05:02.0: PCI INT A -> GSI 26 (level, low) -> 
IRQ 26
[    6.803787] pci 0000:02:00.0: PCI INT A -> Link[LNKE] -> GSI 19 
(level, low) -> IRQ 19
[    7.073410] Echoaudio Gina24 0000:01:07.0: PCI INT A -> Link[LNKD] -> 
GSI 18 (level, low) -> IRQ 18
[    7.433444] firewire_ohci 0000:01:06.0: PCI INT A -> Link[LNKC] -> 
GSI 17 (level, low) -> IRQ 17


Is the kernel "pirq" command-line parameter worth trying?  I'm not 
exactly sure how it works - it seems you specify sequences of numbers in 
groups of 4 corresponding to the IRQs that you want the kernel to use 
for each PCI IRQ (PIRQ).  Does the ordering of these quads correspond to 
the PCI bus numbering?  (In my case, I have PCI buses 00, 01, 02, 03, 04 
and 05, but would bus 00 (nForce3 host bridge), 02 (AGP) and 03 (AMD 
8131 bridges) be excluded?)  And how would I know what system IRQ number 
to specify at each position?  Should they be chosen to match the BIOS 
IRQ numbers reported at POST?

Also, are there any disadvantages to running with "noapic" as a 
permanent fix?  Performance? Increased IRQ sharing?

Thanks again,
Chris


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-11-23 23:12   ` Chris Edwards
  2011-11-29  2:25     ` Chris Edwards
@ 2011-11-30 22:10     ` Steven Rostedt
  2011-12-03  9:41       ` Chris Edwards
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2011-11-30 22:10 UTC (permalink / raw)
  To: Chris Edwards; +Cc: linux-rt-users, Thomas Gleixner

On Thu, 2011-11-24 at 12:12 +1300, Chris Edwards wrote:
> On 24/11/11 02:52, Steven Rostedt wrote:
> > On Thu, 2011-11-24 at 01:39 +1300, Chris Edwards wrote:
> >> Hi all,
> >>
> >> Problem:
> >> IRQ-related "nobody cared" kernel call traces not long after bootup on
> >> Linux 3.0.10-rt27.  I thought I'd try a -rt kernel to see if it would
> >> resolve the audio drop-outs on my new Firewire audio interface.
> > I wonder if this is another bad irq chipset. Does it go away if you boot
> > with noapic in the kernel command line?
> >
> 
> Thanks for the quick reply, Steven.  Booting with "noapic" does seem to 
> avoid the problem with IRQs 17 and 18, and the Firewire audio now works, 
> but the "nobody cared" error now appears for IRQ 7:

A couple of things:

Could you also try mainline, with "threadirqs" on the command line and
see if it gives you the same issue. It should also tell you if it is a
chipset problem or not. Try v3.0, and then v3.2.

Could you also apply the below patch to 3.0-rt. Thomas pointed me to the
following commits. The patch below is a back port of them.

 commit 52553ddff
 commit c75d720f

Oh, and remove the noapic from the command line when you do all of this.

Thanks,

-- Steve

diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index e57f1b3..d09e0f5 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -84,7 +84,9 @@ static int try_one_irq(int irq, struct irq_desc *desc, bool force)
 	 */
 	action = desc->action;
 	if (!action || !(action->flags & IRQF_SHARED) ||
-	    (action->flags & __IRQF_TIMER) || !action->next)
+	    (action->flags & __IRQF_TIMER) ||
+	    (action->handler(irq, action->dev_id) == IRQ_HANDLED) ||
+	    !action->next)
 		goto out;
 
 	/* Already running on another processor */
@@ -115,7 +117,7 @@ static int misrouted_irq(int irq)
 	struct irq_desc *desc;
 	int i, ok = 0;
 
-	if (atomic_inc_return(&irq_poll_active) == 1)
+	if (atomic_inc_return(&irq_poll_active) != 1)
 		goto out;
 
 	irq_poll_cpu = smp_processor_id();



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-11-30 22:10     ` Steven Rostedt
@ 2011-12-03  9:41       ` Chris Edwards
  2011-12-03 10:42         ` Chris Edwards
  2011-12-03 16:29         ` Thomas Gleixner
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Edwards @ 2011-12-03  9:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Thomas Gleixner

On 01/12/11 11:10, Steven Rostedt wrote:
> On Thu, 2011-11-24 at 12:12 +1300, Chris Edwards wrote:
>> Thanks for the quick reply, Steven.  Booting with "noapic" does seem to
>> avoid the problem with IRQs 17 and 18, and the Firewire audio now works,
>> but the "nobody cared" error now appears for IRQ 7:
> A couple of things:
>
> Could you also try mainline, with "threadirqs" on the command line and
> see if it gives you the same issue. It should also tell you if it is a
> chipset problem or not. Try v3.0, and then v3.2.
>


Thanks - and sorry it's taken a while to finish testing these!  I had 
some problems building and running some of the 3.2 versions.

(BTW, one (possibly useful) thing I found during testing was that the 
burst of interrupt activity on the Firewire IRQ (IRQ 17 when running in 
APIC mode) coincided with network activity, with interrupts on IRQ 26 
apparently being duplicated on IRQ 17.  The Ethernet controller is an 
Intel 82545GM card, on the 64-bit PCI (not the PCI-X) bus on the AMD 8131.)


Here are my testing results:


linux-3.0.9 mainline (no "noapic" or "threadirqs"):

     IRQs:
         17    Firewire
         18    Radeon
         19    Gina24

     Almost no interrupts on IRQ 17 (only 126 in total after 4 minutes 
uptime).

     JACK/FFADO runs OK, ~3200 interrupts/s on IRQ 17 (--rate 96000 
--period 128 --nperiods 3).  DSP load 30%.

     No CAL interrupt storms on GTK text redrawing (which I'd seen on 
2.6), or "irq ... nobody cared" errors.

     Ardour plays nicely.



linux-3.0.9 mainline, with "threadirqs":

     IRQs:
         17    Firewire
         18    Radeon
         19    Gina24

     IRQ 17 is definitely noisier now, with ~500,000 interrupts during 4 
mins uptime without JACK running, but no "nobody cared" errors observed.

     JACK/FFADO runs OK, ~3200 interrupts/s on IRQ 17 (--rate 96000 
--period 128 --nperiods 3).  DSP load 30%.

     PROBLEM: Audio drop-outs and JACK XRUNs while playing back in 
Ardour.  These seem to coincide with extra bursts of interrupt activity 
on IRQ 17 (4,000-10,000 per second, above the 3,200 baseline).  Same 
problem observed using mplayer with JACK playback.



linux-3.2.0-rc4:

     Seemingly OK; same behaviour as for 3.0.9 without "threadirqs".



linux-3.2.0-rc4 with "threadirqs":

     ~40,000 interrupts on IRQ 17 (Firewire) in 4 minutes.

     "nobody cared" message after running NetPIPE to exercise the 
Ethernet controller, having finally identified that as a/the source of 
extraneous IRQs.



linux-3.0.12-rt30-rc1:

     This looks to have the back-ported changes for 3.0-rt that you sent 
as a patch already applied.

     This one seems to have fixed things!  No sign of spurious interrupt 
activity on IRQ 17, and no "nobody cared" messages, even when running 
NetPIPE and gtkperf.  JACK/FFADO and Ardour run pretty reliably.

     I do still have some problems with scratchy audio and excessive CPU 
use when running Pure Data when its graphics get busy, but it seems 
likely this is a Pure Data (or maybe Tk) issue.

     I also fired up LatencyTop to see what sorts of figures it was 
reporting.  The one unusual thing it showed was latencies on the order 
of 50-90 ms for "drm_mode_cursor_ioctl" in Xorg - similar behaviour to 
that described here:

     
http://lists.linuxfoundation.org/pipermail/bugme-new/2011-August/027897.html

     I may have to do some more testing to identify if that's an -rt 
thing or a kernel version thing...


Hope this is helpful. :)  It is for me - I have a pretty usable system 
now. :)

Thanks again,
Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-03  9:41       ` Chris Edwards
@ 2011-12-03 10:42         ` Chris Edwards
  2011-12-03 16:29         ` Thomas Gleixner
  1 sibling, 0 replies; 18+ messages in thread
From: Chris Edwards @ 2011-12-03 10:42 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Thomas Gleixner

On 03/12/11 22:41, Chris Edwards wrote:
>
> linux-3.0.12-rt30-rc1:
>
>     This looks to have the back-ported changes for 3.0-rt that you 
> sent as a patch already applied.
>
>     This one seems to have fixed things!

Sorry, scratch that - I'd patched the source and built it without 
realising I didn't have CONFIG_PREEMPT_RT_FULL=y!  With that enabled, I 
do still see the spurious interrupts cascading onto the Firewire/IRQ 17 
line from the Ethernet controller on IRQ 26, and the "nobody cared" 
errors.  :^(
--
Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-03  9:41       ` Chris Edwards
  2011-12-03 10:42         ` Chris Edwards
@ 2011-12-03 16:29         ` Thomas Gleixner
       [not found]           ` <4EDAAEFD.9060209@ripples.dyndns.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-03 16:29 UTC (permalink / raw)
  To: Chris Edwards; +Cc: Steven Rostedt, linux-rt-users

On Sat, 3 Dec 2011, Chris Edwards wrote:
> linux-3.0.9 mainline, with "threadirqs":
> 
>     IRQs:
>         17    Firewire
>         18    Radeon
>         19    Gina24
> 
>     IRQ 17 is definitely noisier now, with ~500,000 interrupts during 4 mins
> uptime without JACK running, but no "nobody cared" errors observed.
> 
>     JACK/FFADO runs OK, ~3200 interrupts/s on IRQ 17 (--rate 96000 --period
> 128 --nperiods 3).  DSP load 30%.
> 
>     PROBLEM: Audio drop-outs and JACK XRUNs while playing back in Ardour.
> These seem to coincide with extra bursts of interrupt activity on IRQ 17
> (4,000-10,000 per second, above the 3,200 baseline).  Same problem observed
> using mplayer with JACK playback.

Ok, that tells us something. So there is something unhappy in your
system about the way how the threaded irq handling works. Can you
please provide the output of lspci -vvv and a full boot log (any
3.0/3.2 kernel you have handy)?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
       [not found]           ` <4EDAAEFD.9060209@ripples.dyndns.org>
@ 2011-12-04 13:32             ` Thomas Gleixner
  2011-12-05 13:39               ` Chris Edwards
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-04 13:32 UTC (permalink / raw)
  To: Chris Edwards; +Cc: Steven Rostedt, linux-rt-users

On Sun, 4 Dec 2011, Chris Edwards wrote:
> On 04/12/11 05:29, Thomas Gleixner wrote:
> > Ok, that tells us something. So there is something unhappy in your
> > system about the way how the threaded irq handling works. Can you
> > please provide the output of lspci -vvv and a full boot log (any
> > 3.0/3.2 kernel you have handy)?
> 
> Attached. :)

Could you disable the e1000 for a test? Just boot up and bring the
interface down.

Does that change the situation?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-04 13:32             ` Thomas Gleixner
@ 2011-12-05 13:39               ` Chris Edwards
  2011-12-05 16:56                 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Edwards @ 2011-12-05 13:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Steven Rostedt, linux-rt-users

On 05/12/11 02:32, Thomas Gleixner wrote:
> On Sun, 4 Dec 2011, Chris Edwards wrote:
>> On 04/12/11 05:29, Thomas Gleixner wrote:
>>> Ok, that tells us something. So there is something unhappy in your
>>> system about the way how the threaded irq handling works. Can you
>>> please provide the output of lspci -vvv and a full boot log (any
>>> 3.0/3.2 kernel you have handy)?
>> Attached. :)
> Could you disable the e1000 for a test? Just boot up and bring the
> interface down.
>
> Does that change the situation?

Yes - I tested with 3.2.0-rc4-rt5 (and it actually is an RT kernel this 
time - see below!) and with the Ethernet interface down, it seems to be 
working properly.  Even Pure Data didn't cause any crackling or 
stuttering (other than when starting up).

[    0.000000] Linux version 3.2.0-rc4-rt5 (root@babelfish) (gcc version 
4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #1 SMP PREEMPT RT Tue Dec 6 01:22:11 
NZDT 2011
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.0-rc4-rt5 
root=UUID=ca21e0bf-b7f8-45c3-8fc9-066c4dd6052e ro quiet splash

What next?  Should I try moving the Ethernet card to other slots and see 
if anything changes?

Thanks,
Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-05 13:39               ` Chris Edwards
@ 2011-12-05 16:56                 ` Thomas Gleixner
  2011-12-05 18:14                   ` Borislav Petkov
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-05 16:56 UTC (permalink / raw)
  To: Chris Edwards; +Cc: Steven Rostedt, linux-rt-users, Borislav Petkov

On Tue, 6 Dec 2011, Chris Edwards wrote:

> On 05/12/11 02:32, Thomas Gleixner wrote:
> > On Sun, 4 Dec 2011, Chris Edwards wrote:
> > > On 04/12/11 05:29, Thomas Gleixner wrote:
> > > > Ok, that tells us something. So there is something unhappy in your
> > > > system about the way how the threaded irq handling works. Can you
> > > > please provide the output of lspci -vvv and a full boot log (any
> > > > 3.0/3.2 kernel you have handy)?
> > > Attached. :)
> > Could you disable the e1000 for a test? Just boot up and bring the
> > interface down.
> > 
> > Does that change the situation?
> 
> Yes - I tested with 3.2.0-rc4-rt5 (and it actually is an RT kernel this time -
> see below!) and with the Ethernet interface down, it seems to be working
> properly.  Even Pure Data didn't cause any crackling or stuttering (other than
> when starting up).
> 
> [    0.000000] Linux version 3.2.0-rc4-rt5 (root@babelfish) (gcc version 4.4.3
> (Ubuntu 4.4.3-4ubuntu5) ) #1 SMP PREEMPT RT Tue Dec 6 01:22:11 NZDT 2011
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.0-rc4-rt5
> root=UUID=ca21e0bf-b7f8-45c3-8fc9-066c4dd6052e ro quiet splash
> 
> What next?  Should I try moving the Ethernet card to other slots and see if
> anything changes?

That card hangs on the AMD bridge and that bridge has nasty interrupt
related erratas. Your "feature" is undocumented so far. It looks like
it sends interrupts which are masked, but pending over and over to a
different interrupt line :( We've seen this before. It's a legacy mode
feature, but your chip is excluded from the fixup.

Boris, any idea ?

You could try the following patch. Be aware that it might not work at
all, but I don't expect that you need a fire extinguisher :)

Thanks,

	tglx
---
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1791,8 +1791,7 @@ static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
 
 	if (noioapicquirk)
 		return;
-	if ((dev->revision == AMD_813X_REV_B1) ||
-	    (dev->revision == AMD_813X_REV_B2))
+	if (dev->revision == AMD_813X_REV_B2)
 		return;
 
 	pci_read_config_dword(dev, AMD_813X_MISC, &pci_config_dword);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-05 16:56                 ` Thomas Gleixner
@ 2011-12-05 18:14                   ` Borislav Petkov
  2011-12-05 21:02                     ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2011-12-05 18:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Edwards, Steven Rostedt, linux-rt-users, Borislav Petkov

On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> On Tue, 6 Dec 2011, Chris Edwards wrote:
> 
> > On 05/12/11 02:32, Thomas Gleixner wrote:
> > > On Sun, 4 Dec 2011, Chris Edwards wrote:
> > > > On 04/12/11 05:29, Thomas Gleixner wrote:
> > > > > Ok, that tells us something. So there is something unhappy in your
> > > > > system about the way how the threaded irq handling works. Can you
> > > > > please provide the output of lspci -vvv and a full boot log (any
> > > > > 3.0/3.2 kernel you have handy)?
> > > > Attached. :)
> > > Could you disable the e1000 for a test? Just boot up and bring the
> > > interface down.
> > > 
> > > Does that change the situation?
> > 
> > Yes - I tested with 3.2.0-rc4-rt5 (and it actually is an RT kernel this time -
> > see below!) and with the Ethernet interface down, it seems to be working
> > properly.  Even Pure Data didn't cause any crackling or stuttering (other than
> > when starting up).
> > 
> > [    0.000000] Linux version 3.2.0-rc4-rt5 (root@babelfish) (gcc version 4.4.3
> > (Ubuntu 4.4.3-4ubuntu5) ) #1 SMP PREEMPT RT Tue Dec 6 01:22:11 NZDT 2011
> > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.0-rc4-rt5
> > root=UUID=ca21e0bf-b7f8-45c3-8fc9-066c4dd6052e ro quiet splash
> > 
> > What next?  Should I try moving the Ethernet card to other slots and see if
> > anything changes?
> 
> That card hangs on the AMD bridge and that bridge has nasty interrupt
> related erratas. Your "feature" is undocumented so far. It looks like
> it sends interrupts which are masked, but pending over and over to a
> different interrupt line :( We've seen this before. It's a legacy mode
> feature, but your chip is excluded from the fixup.
> 
> Boris, any idea ?

Hmm, that's the old 8131 chipset, correct?

> You could try the following patch. Be aware that it might not work at
> all, but I don't expect that you need a fire extinguisher :)
> 
> Thanks,
> 
> 	tglx
> ---
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1791,8 +1791,7 @@ static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
>  
>  	if (noioapicquirk)
>  		return;
> -	if ((dev->revision == AMD_813X_REV_B1) ||
> -	    (dev->revision == AMD_813X_REV_B2))
> +	if (dev->revision == AMD_813X_REV_B2)
>  		return;

Ok, according to my docs, this erratum you're addressing here is
supposed to be fixed in revision B1 of the chipset but your test patch
enables the quirk for B1 too.

What is dev->revision on that board, 0x12?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-05 18:14                   ` Borislav Petkov
@ 2011-12-05 21:02                     ` Thomas Gleixner
  2011-12-06  2:51                       ` Chris Edwards
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-05 21:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Chris Edwards, Steven Rostedt, linux-rt-users, Borislav Petkov

On Mon, 5 Dec 2011, Borislav Petkov wrote:
> On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> > That card hangs on the AMD bridge and that bridge has nasty interrupt
> > related erratas. Your "feature" is undocumented so far. It looks like
> > it sends interrupts which are masked, but pending over and over to a
> > different interrupt line :( We've seen this before. It's a legacy mode
> > feature, but your chip is excluded from the fixup.
> > 
> > Boris, any idea ?
> 
> Hmm, that's the old 8131 chipset, correct?
> 
> > You could try the following patch. Be aware that it might not work at
> > all, but I don't expect that you need a fire extinguisher :)
> > 
> > Thanks,
> > 
> > 	tglx
> > ---
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -1791,8 +1791,7 @@ static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
> >  
> >  	if (noioapicquirk)
> >  		return;
> > -	if ((dev->revision == AMD_813X_REV_B1) ||
> > -	    (dev->revision == AMD_813X_REV_B2))
> > +	if (dev->revision == AMD_813X_REV_B2)
> >  		return;
> 
> Ok, according to my docs, this erratum you're addressing here is
> supposed to be fixed in revision B1 of the chipset but your test patch
> enables the quirk for B1 too.
> 
> What is dev->revision on that board, 0x12?

Yep:

[    0.716126] pci 0000:03:01.0: AMD8131 rev 12 detected; disabling PCI-X MMRBC
[    0.716138] pci 0000:03:02.0: AMD8131 rev 12 detected; disabling PCI-X MMRBC

But that interrupt behaviour is exaclty the same which we saw with the
other pre rev 12 versions. So I just wanted Edward to try that quirk
and see whether it solves his issues.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-05 21:02                     ` Thomas Gleixner
@ 2011-12-06  2:51                       ` Chris Edwards
  2011-12-06 11:17                         ` Borislav Petkov
                                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chris Edwards @ 2011-12-06  2:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Steven Rostedt, linux-rt-users, Borislav Petkov

On 06/12/11 10:02, Thomas Gleixner wrote:
> On Mon, 5 Dec 2011, Borislav Petkov wrote:
>> On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
>>
>>> You could try the following patch. Be aware that it might not work at
>>> all, but I don't expect that you need a fire extinguisher :)

No smoke or flames noted. :)  And that appears to have solved the 
problem.  Firewire audio works, and even after running NetPIPE a couple 
of times over that Ethernet interface, there's only a handful of 
interrupts on the Firewire IRQ (as seems to be normal):

$ cat /proc/interrupts
            CPU0       CPU1
   0:        134          0   IO-APIC-edge      timer
   1:          0          2   IO-APIC-edge      i8042
   3:          0          2   IO-APIC-edge
   4:          0          2   IO-APIC-edge
   6:          0          5   IO-APIC-edge      floppy
   7:          1          0   IO-APIC-edge
   8:          0          0   IO-APIC-edge      rtc0
   9:          0          0   IO-APIC-fasteoi   acpi
  12:          0          4   IO-APIC-edge      i8042
  14:          0          0   IO-APIC-edge      pata_amd
  15:          0          0   IO-APIC-edge      pata_amd
  17:          9        120   IO-APIC-fasteoi   firewire_ohci
  18:       2645       2932   IO-APIC-fasteoi   radeon
  19:          0         46   IO-APIC-fasteoi   snd_gina24
  20:       2590       2357   IO-APIC-fasteoi   ohci_hcd:usb2
  21:        234       9128   IO-APIC-fasteoi   ehci_hcd:usb1, eth0
  22:       1346      10720   IO-APIC-fasteoi   sata_nv, ohci_hcd:usb3
  26:     471010     475803   IO-APIC-fasteoi   eth1
  27:          0          0   IO-APIC-fasteoi   sata_sil
NMI:          7          7   Non-maskable interrupts
LOC:     111328     106458   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          7          7   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:      27286       7746   Rescheduling interrupts
CAL:       1051       5180   Function call interrupts
TLB:        224        229   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         62         62   Machine check polls
ERR:          1
MIS:          0

Confirming kernel version and options:
[    0.000000] Linux version 3.2.0-rc4-rt5 (root@babelfish) (gcc version 
4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #2 SMP PREEMPT RT Tue Dec 6 14:32:44 
NZDT 2011
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.0-rc4-rt5 
root=UUID=ca21e0bf-b7f8-45c3-8fc9-066c4dd6052e ro quiet splash

I'll try reinstalling my other expansion cards and check that all 
remains well.

Thank you very much for the help.  Einfach Klasse!
--
Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-06  2:51                       ` Chris Edwards
@ 2011-12-06 11:17                         ` Borislav Petkov
  2011-12-07  0:32                           ` Thomas Gleixner
  2011-12-06 19:42                         ` Borislav Petkov
  2011-12-07  0:37                         ` Thomas Gleixner
  2 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2011-12-06 11:17 UTC (permalink / raw)
  To: Chris Edwards
  Cc: Thomas Gleixner, Borislav Petkov, Steven Rostedt, linux-rt-users

On Tue, Dec 06, 2011 at 03:51:00PM +1300, Chris Edwards wrote:
> On 06/12/11 10:02, Thomas Gleixner wrote:
> >On Mon, 5 Dec 2011, Borislav Petkov wrote:
> >>On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> >>
> >>>You could try the following patch. Be aware that it might not work at
> >>>all, but I don't expect that you need a fire extinguisher :)
> 
> No smoke or flames noted. :)  And that appears to have solved the
> problem.  Firewire audio works, and even after running NetPIPE a
> couple of times over that Ethernet interface, there's only a handful
> of interrupts on the Firewire IRQ (as seems to be normal):

Oh ok, good.

I started poking at hw people for comments, let's see what happens. Please let
us know should there be other issues.

@tglx: You could give me a couple of days to confirm the workaround but
this is old stuff so don't hold your breath for too long :).

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-06  2:51                       ` Chris Edwards
  2011-12-06 11:17                         ` Borislav Petkov
@ 2011-12-06 19:42                         ` Borislav Petkov
  2011-12-07  0:37                         ` Thomas Gleixner
  2 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2011-12-06 19:42 UTC (permalink / raw)
  To: Chris Edwards
  Cc: Thomas Gleixner, Borislav Petkov, Steven Rostedt, linux-rt-users

On Tue, Dec 06, 2011 at 03:51:00PM +1300, Chris Edwards wrote:
> On 06/12/11 10:02, Thomas Gleixner wrote:
> >On Mon, 5 Dec 2011, Borislav Petkov wrote:
> >>On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> >>
> >>>You could try the following patch. Be aware that it might not work at
> >>>all, but I don't expect that you need a fire extinguisher :)
> 
> No smoke or flames noted. :)  And that appears to have solved the
> problem.  Firewire audio works, and even after running NetPIPE a
> couple of times over that Ethernet interface, there's only a handful
> of interrupts on the Firewire IRQ (as seems to be normal):

Btw,

can you send me

lspci -nn -vvvv

output from the box?

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-06 11:17                         ` Borislav Petkov
@ 2011-12-07  0:32                           ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-07  0:32 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Chris Edwards, Steven Rostedt, linux-rt-users

On Tue, 6 Dec 2011, Borislav Petkov wrote:

> On Tue, Dec 06, 2011 at 03:51:00PM +1300, Chris Edwards wrote:
> > On 06/12/11 10:02, Thomas Gleixner wrote:
> > >On Mon, 5 Dec 2011, Borislav Petkov wrote:
> > >>On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> > >>
> > >>>You could try the following patch. Be aware that it might not work at
> > >>>all, but I don't expect that you need a fire extinguisher :)
> > 
> > No smoke or flames noted. :)  And that appears to have solved the
> > problem.  Firewire audio works, and even after running NetPIPE a
> > couple of times over that Ethernet interface, there's only a handful
> > of interrupts on the Firewire IRQ (as seems to be normal):
> 
> Oh ok, good.
> 
> I started poking at hw people for comments, let's see what happens. Please let
> us know should there be other issues.
> 
> @tglx: You could give me a couple of days to confirm the workaround but
> this is old stuff so don't hold your breath for too long :).

Sure, I'm not in a hurry. It seems only Chris has one of these oddballs :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system
  2011-12-06  2:51                       ` Chris Edwards
  2011-12-06 11:17                         ` Borislav Petkov
  2011-12-06 19:42                         ` Borislav Petkov
@ 2011-12-07  0:37                         ` Thomas Gleixner
  2 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2011-12-07  0:37 UTC (permalink / raw)
  To: Chris Edwards
  Cc: Borislav Petkov, Steven Rostedt, linux-rt-users, Borislav Petkov

On Tue, 6 Dec 2011, Chris Edwards wrote:
> On 06/12/11 10:02, Thomas Gleixner wrote:
> > On Mon, 5 Dec 2011, Borislav Petkov wrote:
> > > On Mon, Dec 05, 2011 at 05:56:01PM +0100, Thomas Gleixner wrote:
> > > 
> > > > You could try the following patch. Be aware that it might not work at
> > > > all, but I don't expect that you need a fire extinguisher :)
> 
> No smoke or flames noted. :)  And that appears to have solved the problem.
> Firewire audio works, and even after running NetPIPE a couple of times over
> that Ethernet interface, there's only a handful of interrupts on the Firewire
> IRQ (as seems to be normal):
> 
> I'll try reinstalling my other expansion cards and check that all remains
> well.
> 
> Thank you very much for the help.  Einfach Klasse!

Thank you very much for your persistance and providing all the info I
asked for! That kind of problems are extremly hard to decode without
the help of a competent reporter. I hope that Boris can figure
something out with his hardware folks.

Vielen Dank,

       Thomas

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-12-07  0:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-23 12:39 IRQ "nobody cared...Disabling" errors on linux-3.0.10-rt27 on SMP AMD64 system Chris Edwards
2011-11-23 13:52 ` Steven Rostedt
2011-11-23 23:12   ` Chris Edwards
2011-11-29  2:25     ` Chris Edwards
2011-11-30 22:10     ` Steven Rostedt
2011-12-03  9:41       ` Chris Edwards
2011-12-03 10:42         ` Chris Edwards
2011-12-03 16:29         ` Thomas Gleixner
     [not found]           ` <4EDAAEFD.9060209@ripples.dyndns.org>
2011-12-04 13:32             ` Thomas Gleixner
2011-12-05 13:39               ` Chris Edwards
2011-12-05 16:56                 ` Thomas Gleixner
2011-12-05 18:14                   ` Borislav Petkov
2011-12-05 21:02                     ` Thomas Gleixner
2011-12-06  2:51                       ` Chris Edwards
2011-12-06 11:17                         ` Borislav Petkov
2011-12-07  0:32                           ` Thomas Gleixner
2011-12-06 19:42                         ` Borislav Petkov
2011-12-07  0:37                         ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).