public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* qstor driver -> irq 193: nobody cared
@ 2006-11-03 18:02 Alberto Alonso
  2006-11-03 19:00 ` Sergey Vlasov
  2006-11-04 18:39 ` Mark Lord
  0 siblings, 2 replies; 20+ messages in thread
From: Alberto Alonso @ 2006-11-03 18:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: mlord

I have a Pacific Digital qstor card on irq 193. I am using kernel
2.6.17.13 SMP

The error happens every now and then. I have not been able to
figure out any triggers and I can not reproduce it on demand. Today
it happened 3 times within a 40 minutes period. 

All disks connected to the card are disabled and I can't do anything
other than a reboot to get them back.

It is reported as follows:

irq 193: nobody cared (try booting with the "irqpoll" option)
 <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
+0x30/0x70
 <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
 <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
 <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
 <c046699d> start_kernel+0x18d/0x1d0  <c0466330> unknown_bootoption
+0x0/0x1d0
handlers:
[<c0301300>] (qs_intr+0x0/0x220)
Disabling IRQ #193

The PCI info for the card is:

0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
        Subsystem: Pacific Digital Corp: Unknown device 2068
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
        I/O ports at eff0 [size=8]
        I/O ports at efe0 [size=8]
        I/O ports at efa8 [size=8]
        I/O ports at efa0 [size=8]
        Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at febe0000 [disabled] [size=64K]
        Capabilities: [40] Message Signalled Interrupts: 64bit+
Queue=0/0 Enable-
        Capabilities: [50] Power Management version 2
        Capabilities: [58] PCI-X non-bridge device.


This is really creating problems for me now. If anybody knows
of a dumb (ie. non-raid) PCI SATA card with 8 ports with good
Linux support for a couple hundred backs I would appreciate
it.

If there is any other info that I should provide to help 
troubleshoot please let me know.

Thanks,

Alberto


-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-03 18:02 qstor driver -> irq 193: nobody cared Alberto Alonso
@ 2006-11-03 19:00 ` Sergey Vlasov
  2006-11-03 20:22   ` Alberto Alonso
  2006-11-04 18:39 ` Mark Lord
  1 sibling, 1 reply; 20+ messages in thread
From: Sergey Vlasov @ 2006-11-03 19:00 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: mlord, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:

> I have a Pacific Digital qstor card on irq 193. I am using kernel
> 2.6.17.13 SMP
> 
> The error happens every now and then. I have not been able to
> figure out any triggers and I can not reproduce it on demand. Today
> it happened 3 times within a 40 minutes period. 
> 
> All disks connected to the card are disabled and I can't do anything
> other than a reboot to get them back.
> 
> It is reported as follows:
> 
> irq 193: nobody cared (try booting with the "irqpoll" option)

Did you try this option?  It may decrease performance, but in some cases
IRQ routing is so screwed that only irqpoll helps.

[...]
> handlers:
> [<c0301300>] (qs_intr+0x0/0x220)
> Disabling IRQ #193
[..]
> If there is any other info that I should provide to help 
> troubleshoot please let me know.

The "nobody cared" error is often caused by some other device which
shares the same interrupt, but Linux does not know about it (either due
to broken IRQ routing tables in BIOS, or because the driver for that
device is not loaded, but the device really is active and asserts its
IRQ line - sometimes this also happens due to a broken BIOS).

Please post complete /proc/interrupts and lspci -v output, and also
information about the motherboard model and BIOS version.

If your motherboard has a VIA chipset, you may also try the patch from
http://lkml.org/lkml/2006/9/7/235 which attempts to fix IRQ routing on
these chipsets.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-03 19:00 ` Sergey Vlasov
@ 2006-11-03 20:22   ` Alberto Alonso
  2006-11-03 20:52     ` Sergey Vlasov
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-03 20:22 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: linux-kernel

On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote: 
> On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
> 
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> > 
> > 
> > irq 193: nobody cared (try booting with the "irqpoll" option)
> 
> Did you try this option?  It may decrease performance, but in some cases
> IRQ routing is so screwed that only irqpoll helps.

I have now switched to using that option. Will kick in after I reboot.

> [...]
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
> [..]
> > If there is any other info that I should provide to help 
> > troubleshoot please let me know.
> 
> The "nobody cared" error is often caused by some other device which
> shares the same interrupt, but Linux does not know about it (either due
> to broken IRQ routing tables in BIOS, or because the driver for that
> device is not loaded, but the device really is active and asserts its
> IRQ line - sometimes this also happens due to a broken BIOS).
> 
> Please post complete /proc/interrupts and lspci -v output, and also
> information about the motherboard model and BIOS version.
> 

The system is an old Gateway 6400 server w/ dual P3 1 GHz PCUs. It is
my NFS/samba file server.

Here is the rest of the requested info:

w100:/usr/src/linux-2.6.17.13# cat /proc/interrupts
           CPU0       CPU1
  0:    1398199    1412518    IO-APIC-edge  timer
  1:          1          8    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          3          1    IO-APIC-edge  rtc
 14:       8163       8710    IO-APIC-edge  ide0
 15:          5          7    IO-APIC-edge  ide1
145:          0          0   IO-APIC-level  ivtv0
153:      94715     121999   IO-APIC-level  ide2, ide3, ide4, ide5
161:       6909      18301   IO-APIC-level  ide8, ide9
169:       8323      17288   IO-APIC-level  ide6, ide7
185:    1216892         92   IO-APIC-level  eth0
193:       2281       1954   IO-APIC-level  libata
NMI:          0          0
LOC:    2810842    2810841
ERR:          0
MIS:          0
w100:/usr/src/linux-2.6.17.13# lspci -v
0000:00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
        Flags: bus master, medium devsel, latency 32

0000:00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 06)
        Flags: bus master, medium devsel, latency 16

0000:00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
        Subsystem: Intel Corp. 82559 Fast Ethernet LAN on Motherboard
        Flags: bus master, medium devsel, latency 64, IRQ 177
        Memory at feaff000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at b000 [size=64]
        Memory at fe800000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at fe700000 [disabled] [size=1M]
        Capabilities: [dc] Power Management version 2

0000:00:03.0 Multimedia video controller: Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder (rev 01)
        Subsystem: Hauppauge computer works Inc.: Unknown device 0001
        Flags: bus master, medium devsel, latency 64, IRQ 145
        Memory at f8000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [44] Power Management version 2

0000:00:04.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
        Subsystem: Triones Technologies, Inc.: Unknown device 0001
        Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 153
        I/O ports at bf60 [size=8]
        I/O ports at bf7c [size=4]
        I/O ports at bf58 [size=8]
        I/O ports at bf54 [size=4]
        I/O ports at b400 [size=256]
        Expansion ROM at fe960000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2

0000:00:04.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
        Subsystem: Triones Technologies, Inc.: Unknown device 0001
        Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 153
        I/O ports at bf88 [size=8]
        I/O ports at bf84 [size=4]
        I/O ports at bf68 [size=8]
        I/O ports at bf80 [size=4]
        I/O ports at b800 [size=256]
        Capabilities: [60] Power Management version 2

0000:00:05.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
        Subsystem: Triones Technologies, Inc. HPT370 UDMA100
        Flags: bus master, medium devsel, latency 120, IRQ 161
        I/O ports at bfa0 [size=8]
        I/O ports at bf9c [size=4]
        I/O ports at bf90 [size=8]
        I/O ports at bf98 [size=4]
        I/O ports at c000 [size=256]
        Expansion ROM at fe980000 [disabled] [size=128K]

0000:00:06.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03)
        Subsystem: Triones Technologies, Inc. HPT370 UDMA100
        Flags: bus master, medium devsel, latency 120, IRQ 169
        I/O ports at bff0 [size=8]
        I/O ports at bfe4 [size=4]
        I/O ports at bfa8 [size=8]
        I/O ports at bfe0 [size=4]
        I/O ports at c400 [size=256]
        Expansion ROM at fe9a0000 [disabled] [size=128K]

0000:00:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
        Subsystem: Gateway 2000: Unknown device 6400
        Flags: bus master, stepping, medium devsel, latency 64
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        I/O ports at c800 [size=256]
        Memory at fe9ff000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at fe9c0000 [disabled] [size=128K]
        Capabilities: [5c] Power Management version 2

0000:00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 50)
        Subsystem: ServerWorks OSB4 South Bridge
        Flags: bus master, medium devsel, latency 0

0000:00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
        Flags: bus master, medium devsel, latency 64
        I/O ports at ffa0 [size=16]

0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
        Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
        Flags: bus master, medium devsel, latency 64, IRQ 10
        Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]

0000:01:02.0 Ethernet controller: Intel Corp. 82543GC Gigabit Ethernet Controller (Copper) (rev 02)
        Subsystem: Intel Corp. PRO/1000 T Server Adapter
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 185
        Memory at febc0000 (32-bit, non-prefetchable) [size=128K]
        Memory at febb0000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at feba0000 [disabled] [size=64K]
        Capabilities: [dc] Power Management version 2

0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
        Subsystem: Pacific Digital Corp: Unknown device 2068
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
        I/O ports at eff0 [size=8]
        I/O ports at efe0 [size=8]
        I/O ports at efa8 [size=8]
        I/O ports at efa0 [size=8]
        Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at febe0000 [disabled] [size=64K]
        Capabilities: [40] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
        Capabilities: [50] Power Management version 2
        Capabilities: [58] PCI-X non-bridge device.

0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
        Subsystem: Gateway 2000: Unknown device 1040
        Flags: medium devsel
        I/O ports at 1000 [disabled] [size=256]
        Memory at <ignored> (64-bit, non-prefetchable) [disabled]
        Memory at <ignored> (64-bit, non-prefetchable) [disabled]
        Capabilities: [40] Power Management version 2

0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
        Subsystem: Gateway 2000: Unknown device 1040
        Flags: medium devsel
        I/O ports at 1400 [disabled] [size=256]
        Memory at <ignored> (64-bit, non-prefetchable) [disabled]
        Memory at <ignored> (64-bit, non-prefetchable) [disabled]
        Capabilities: [40] Power Management version 2


Thanks,

Alberto

-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-03 20:22   ` Alberto Alonso
@ 2006-11-03 20:52     ` Sergey Vlasov
  2006-11-03 23:21       ` Alberto Alonso
  0 siblings, 1 reply; 20+ messages in thread
From: Sergey Vlasov @ 2006-11-03 20:52 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

On Fri, Nov 03, 2006 at 02:22:07PM -0600, Alberto Alonso wrote:
> On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote: 
> > On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
> > 
> > > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > > 2.6.17.13 SMP
> > > 
> > > 
> > > irq 193: nobody cared (try booting with the "irqpoll" option)
> > 
> > Did you try this option?  It may decrease performance, but in some cases
> > IRQ routing is so screwed that only irqpoll helps.
> 
> I have now switched to using that option. Will kick in after I reboot.

Actually, it might be better to wait with this - it is the last resort
option, if there is nothing else to try.

> 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
>         Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
>         Flags: bus master, medium devsel, latency 64, IRQ 10
>         Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]

Can you try with ohci-hcd loaded (and without irqpoll)?  USB
controllers might cause such problems, because BIOSen often do weird
things with them for legacy emulation; however, loading a proper
driver should make BIOS stop messing with the hardware behind the back
of the OS.

After loading ohci-hcd look which IRQ does the USB controller use - if
it is the same IRQ as the qstor card, you should keep ohci-hcd loaded
to avoid problems.  (If it is on some other IRQ, loading ohci-hcd
might still help - the code for enabling the PCI device calls ACPI
BIOS, which might change interrupt routing setup in the chipset.)

Also look for something like "USB Keyboard Support" and maybe "USB
Mouse Support" in BIOS setup - some BIOS versions have severe problems
with this emulation, so it may be better to disable it unless you
really need it.

Disabling USB in the BIOS setup if you really don't need might be
another option.

> 0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
>         Subsystem: Gateway 2000: Unknown device 1040
>         Flags: medium devsel
>         I/O ports at 1000 [disabled] [size=256]
>         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
>         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
>         Capabilities: [40] Power Management version 2
> 
> 0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
>         Subsystem: Gateway 2000: Unknown device 1040
>         Flags: medium devsel
>         I/O ports at 1400 [disabled] [size=256]
>         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
>         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
>         Capabilities: [40] Power Management version 2

These devices seem to be disabled, but if loading ohci-hcd does not
help, you can try to load the sym53c8xx driver and look whether it
ends up on the same IRQ as qstor.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-03 20:52     ` Sergey Vlasov
@ 2006-11-03 23:21       ` Alberto Alonso
  0 siblings, 0 replies; 20+ messages in thread
From: Alberto Alonso @ 2006-11-03 23:21 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: linux-kernel

On Fri, 2006-11-03 at 23:52 +0300, Sergey Vlasov wrote:
> On Fri, Nov 03, 2006 at 02:22:07PM -0600, Alberto Alonso wrote:
> > On Fri, 2006-11-03 at 22:00 +0300, Sergey Vlasov wrote: 
> > > On Fri, 03 Nov 2006 12:02:53 -0600 Alberto Alonso wrote:
> > > 
> > > > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > > > 2.6.17.13 SMP
> > > > 
> > > > 
> > > > irq 193: nobody cared (try booting with the "irqpoll" option)
> > > 
> > > Did you try this option?  It may decrease performance, but in some cases
> > > IRQ routing is so screwed that only irqpoll helps.
> > 
> > I have now switched to using that option. Will kick in after I reboot.
> 
> Actually, it might be better to wait with this - it is the last resort
> option, if there is nothing else to try.


I'll wait then. This is a file server with 6 active HD controllers and 
14 drives. For what I can figure out reading LKML irqpoll would really
have a bad effect.

> 
> > 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
> >         Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
> >         Flags: bus master, medium devsel, latency 64, IRQ 10
> >         Memory at fe9fe000 (32-bit, non-prefetchable) [size=4K]
> 
> Can you try with ohci-hcd loaded (and without irqpoll)?  USB
> controllers might cause such problems, because BIOSen often do weird
> things with them for legacy emulation; however, loading a proper
> driver should make BIOS stop messing with the hardware behind the back
> of the OS.
> 
> After loading ohci-hcd look which IRQ does the USB controller use - if
> it is the same IRQ as the qstor card, you should keep ohci-hcd loaded
> to avoid problems.  (If it is on some other IRQ, loading ohci-hcd
> might still help - the code for enabling the PCI device calls ACPI
> BIOS, which might change interrupt routing setup in the chipset.)

I have now compiled and enabled the ohci-hcd module. It ended up 
in IRQ10

> 
> Also look for something like "USB Keyboard Support" and maybe "USB
> Mouse Support" in BIOS setup - some BIOS versions have severe problems
> with this emulation, so it may be better to disable it unless you
> really need it.
> 
> Disabling USB in the BIOS setup if you really don't need might be
> another option.
> 
> > 0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> >         Subsystem: Gateway 2000: Unknown device 1040
> >         Flags: medium devsel
> >         I/O ports at 1000 [disabled] [size=256]
> >         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> >         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> >         Capabilities: [40] Power Management version 2
> > 
> > 0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
> >         Subsystem: Gateway 2000: Unknown device 1040
> >         Flags: medium devsel
> >         I/O ports at 1400 [disabled] [size=256]
> >         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> >         Memory at <ignored> (64-bit, non-prefetchable) [disabled]
> >         Capabilities: [40] Power Management version 2
> 
> These devices seem to be disabled, but if loading ohci-hcd does not
> help, you can try to load the sym53c8xx driver and look whether it
> ends up on the same IRQ as qstor.


I will go in tonight and disable all USB and symbios SCSI controllers
on the MB via the BIOS. I don't use SCSI disks on that server anymore,
and I can get by without the USB ports.

The really bad thing is that I can't reproduce it on purpose. Some times
I've gone for 6 weeks without any problems. This morning got really bad,
but it hasn't happened again since.

Thanks for all your help,

Alberto


-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-03 18:02 qstor driver -> irq 193: nobody cared Alberto Alonso
  2006-11-03 19:00 ` Sergey Vlasov
@ 2006-11-04 18:39 ` Mark Lord
  2006-11-06  4:04   ` Alberto Alonso
  2006-11-10 17:36   ` Alberto Alonso
  1 sibling, 2 replies; 20+ messages in thread
From: Mark Lord @ 2006-11-04 18:39 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]

Alberto Alonso wrote:
> I have a Pacific Digital qstor card on irq 193. I am using kernel
> 2.6.17.13 SMP
> 
> The error happens every now and then. I have not been able to
> figure out any triggers and I can not reproduce it on demand. Today
> it happened 3 times within a 40 minutes period. 
> 
> All disks connected to the card are disabled and I can't do anything
> other than a reboot to get them back.
> 
> It is reported as follows:
> 
> irq 193: nobody cared (try booting with the "irqpoll" option)
>  <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
> +0x30/0x70
>  <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
>  <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
>  <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
>  <c046699d> start_kernel+0x18d/0x1d0  <c0466330> unknown_bootoption
> +0x0/0x1d0
> handlers:
> [<c0301300>] (qs_intr+0x0/0x220)
> Disabling IRQ #193

What other devices are routed to that same interrupt line?

The sata_qstor driver is very rigorous in acknowledging *only* it's own
interrupts, to prevent other devices sharing the same IRQ from losing theirs.

Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
Try this patch (attached) and report back.

Thanks.



[-- Attachment #2: qstor_spurious.patch --]
[-- Type: text/x-patch, Size: 833 bytes --]

--- linux/drivers/scsi/sata_qstor.c.orig	2006-09-19 23:42:06.000000000 -0400
+++ linux/drivers/scsi/sata_qstor.c	2006-11-04 13:35:45.000000000 -0500
@@ -459,6 +459,7 @@
 {
 	struct ata_host_set *host_set = dev_instance;
 	unsigned int handled = 0;
+	static int spurious = 0;
 
 	VPRINTK("ENTER\n");
 
@@ -466,6 +467,19 @@
 	handled  = qs_intr_pkt(host_set) | qs_intr_mmio(host_set);
 	spin_unlock(&host_set->lock);
 
+	/* There are reports of at least one system with glitchy interrupts,
+	 * and if we return !handled here for those then the kernel may totally
+	 * disable our own IRQ line (duh!).
+	 * So, try and be tolerant, but not *too* tolerant.
+	 */
+	if (!handled) {
+		if (spurious < 10) {
+			++spurious;
+			handled = 1;
+		}
+	} else if (spurious)
+		--spurious;
+
 	VPRINTK("EXIT\n");
 
 	return IRQ_RETVAL(handled);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-04 18:39 ` Mark Lord
@ 2006-11-06  4:04   ` Alberto Alonso
  2006-11-06 14:51     ` Mark Lord
  2006-11-10 17:36   ` Alberto Alonso
  1 sibling, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-06  4:04 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

I have patched the system. I'll let you know if I get the
same problem. Since I can't reproduce it on demand I may not see
the problem for months. I don't know what happened the other day
when it just started to happen repeatedly, I haven't had the issue
since.

Thanks,

Alberto


On Sat, 2006-11-04 at 13:39 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> > 
> > The error happens every now and then. I have not been able to
> > figure out any triggers and I can not reproduce it on demand. Today
> > it happened 3 times within a 40 minutes period. 
> > 
> > All disks connected to the card are disabled and I can't do anything
> > other than a reboot to get them back.
> > 
> > It is reported as follows:
> > 
> > irq 193: nobody cared (try booting with the "irqpoll" option)
> >  <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
> > +0x30/0x70
> >  <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
> >  <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
> >  <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
> >  <c046699d> start_kernel+0x18d/0x1d0  <c0466330> unknown_bootoption
> > +0x0/0x1d0
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
> 
> What other devices are routed to that same interrupt line?
> 
> The sata_qstor driver is very rigorous in acknowledging *only* it's own
> interrupts, to prevent other devices sharing the same IRQ from losing theirs.
> 
> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> Try this patch (attached) and report back.
> 
> Thanks.
> 
> 
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-06  4:04   ` Alberto Alonso
@ 2006-11-06 14:51     ` Mark Lord
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Lord @ 2006-11-06 14:51 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

Alberto Alonso wrote:
> I have patched the system. I'll let you know if I get the
> same problem. Since I can't reproduce it on demand I may not see
> the problem for months. I don't know what happened the other day
> when it just started to happen repeatedly, I haven't had the issue
> since.

Ahh. Well, in that case please add a printk() line so that we'll know
if it *ever* does anything!  Updated patch attached.

Cheers


[-- Attachment #2: qstor_spurious2.patch --]
[-- Type: text/x-patch, Size: 877 bytes --]

--- old/drivers/scsi/sata_qstor.c	2006-09-19 23:42:06.000000000 -0400
+++ linux/drivers/scsi/sata_qstor.c	2006-11-06 09:50:02.000000000 -0500
@@ -459,6 +459,7 @@
 {
 	struct ata_host_set *host_set = dev_instance;
 	unsigned int handled = 0;
+	static int spurious = 0;
 
 	VPRINTK("ENTER\n");
 
@@ -466,6 +467,20 @@
 	handled  = qs_intr_pkt(host_set) | qs_intr_mmio(host_set);
 	spin_unlock(&host_set->lock);
 
+	/* There are reports of at least one system with glitchy interrupts,
+	 * and if we return !handled here for those then the kernel may totally
+	 * disable our own IRQ line (duh!).
+	 * So, try and be tolerant, but not *too* tolerant.
+	 */
+	if (!handled) {
+		if (spurious < 10) {
+			printk("sata_qstor: spurious=%d\n", spurious);
+			++spurious;
+			handled = 1;
+		}
+	} else if (spurious)
+		--spurious;
+
 	VPRINTK("EXIT\n");
 
 	return IRQ_RETVAL(handled);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-04 18:39 ` Mark Lord
  2006-11-06  4:04   ` Alberto Alonso
@ 2006-11-10 17:36   ` Alberto Alonso
  2006-11-12  5:09     ` Mark Lord
  1 sibling, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-10 17:36 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

The saga continues. It happened again this morning even with the
patch:

irq 193: nobody cared (try booting with the "irqpoll" option)
 <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
+0x30/0x70
 <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
 <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
 <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
 <c046699d> start_kernel+0x18d/0x1d0  <c0466330> unknown_bootoption
+0x0/0x1d0
handlers:
[<c0301300>] (qs_intr+0x0/0x240)
Disabling IRQ #193
ata4: command 0xea timeout, stat 0xff host_stat 0x0
ata1: command 0xea timeout, stat 0xff host_stat 0x0

[...]

The drives on this controller went offline.

This is my interrupts table:

cat /proc/interrupts
           CPU0       CPU1
  0:   48405419   48520371    IO-APIC-edge  timer
  1:         60        146    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          1          3    IO-APIC-edge  rtc
 10:         13         13   IO-APIC-level  ohci_hcd:usb1
 14:     228219     237351    IO-APIC-edge  ide0
 15:          3          9    IO-APIC-edge  ide1
145:          0          0   IO-APIC-level  ivtv0
153:    2272523    2393822   IO-APIC-level  ide2, ide3, ide4, ide5
161:       5416      23455   IO-APIC-level  ide8, ide9
169:       7148      22129   IO-APIC-level  ide6, ide7
185:    3680365         86   IO-APIC-level  eth0
193:     337154     162846   IO-APIC-level  libata
NMI:          0          0
LOC:   96924026   96924027
ERR:          0
MIS:          0

lspci -v shows this for the controller. No other device (at least
pci device) is using IRQ 193.


0000:01:03.0 0106: Pacific Digital Corp: Unknown device 2068 (rev 01)
        Subsystem: Pacific Digital Corp: Unknown device 2068
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 193
        I/O ports at eff0 [size=8]
        I/O ports at efe0 [size=8]
        I/O ports at efa8 [size=8]
        I/O ports at efa0 [size=8]
        Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at febe0000 [disabled] [size=64K]
        Capabilities: <available only to root>


Any other ideas I can try to look for? Is there any other type
of device that may be using the IRQ but not listed with the above
commands?

Thanks,

Alberto


On Sat, 2006-11-04 at 13:39 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > I have a Pacific Digital qstor card on irq 193. I am using kernel
> > 2.6.17.13 SMP
> > 
> > The error happens every now and then. I have not been able to
> > figure out any triggers and I can not reproduce it on demand. Today
> > it happened 3 times within a 40 minutes period. 
> > 
> > All disks connected to the card are disabled and I can't do anything
> > other than a reboot to get them back.
> > 
> > It is reported as follows:
> > 
> > irq 193: nobody cared (try booting with the "irqpoll" option)
> >  <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
> > +0x30/0x70
> >  <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
> >  <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
> >  <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
> >  <c046699d> start_kernel+0x18d/0x1d0  <c0466330> unknown_bootoption
> > +0x0/0x1d0
> > handlers:
> > [<c0301300>] (qs_intr+0x0/0x220)
> > Disabling IRQ #193
> 
> What other devices are routed to that same interrupt line?
> 
> The sata_qstor driver is very rigorous in acknowledging *only* it's own
> interrupts, to prevent other devices sharing the same IRQ from losing theirs.
> 
> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> Try this patch (attached) and report back.
> 
> Thanks.
> 
> 
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-10 17:36   ` Alberto Alonso
@ 2006-11-12  5:09     ` Mark Lord
  2006-11-12  5:17       ` Alberto Alonso
  2006-11-12 20:31       ` Alberto Alonso
  0 siblings, 2 replies; 20+ messages in thread
From: Mark Lord @ 2006-11-12  5:09 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

Alberto Alonso wrote:
> The saga continues. It happened again this morning even with the
> patch:
..
>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
>> Try this patch (attached) and report back.

Did you add the printk() to the patch, as suggested?
If so, did it log anything ?

Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-12  5:09     ` Mark Lord
@ 2006-11-12  5:17       ` Alberto Alonso
  2006-11-12 20:31       ` Alberto Alonso
  1 sibling, 0 replies; 20+ messages in thread
From: Alberto Alonso @ 2006-11-12  5:17 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

Sorry, I missed that post. I will add the printk line right
now. I'll let you know the results as soon as it happens
again.

Thanks,


Alberto

On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > The saga continues. It happened again this morning even with the
> > patch:
> ..
> >> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >> Try this patch (attached) and report back.
> 
> Did you add the printk() to the patch, as suggested?
> If so, did it log anything ?
> 
> Thanks
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-12  5:09     ` Mark Lord
  2006-11-12  5:17       ` Alberto Alonso
@ 2006-11-12 20:31       ` Alberto Alonso
  2006-11-13 14:29         ` Mark Lord
  1 sibling, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-12 20:31 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

OK, after adding the printk line I can start seeing
results.

I guess it has been close to 10 on quite a few
occasions.

Thanks,

Alberto

# grep qstor /var/log/messages
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=0
Nov 12 07:03:35 w100 kernel: sata_qstor: spurious=0
Nov 12 07:34:44 w100 kernel: sata_qstor: spurious=0
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=1
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=1
Nov 12 07:35:08 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=1
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=0
Nov 12 09:11:17 w100 kernel: sata_qstor: spurious=1
Nov 12 09:11:18 w100 kernel: sata_qstor: spurious=0
Nov 12 09:36:34 w100 kernel: sata_qstor: spurious=0
Nov 12 09:38:26 w100 kernel: sata_qstor: spurious=0
Nov 12 09:38:26 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=1
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=2
Nov 12 09:53:06 w100 kernel: sata_qstor: spurious=0
Nov 12 09:53:09 w100 kernel: sata_qstor: spurious=1
Nov 12 10:09:03 w100 kernel: sata_qstor: spurious=0
Nov 12 10:09:04 w100 kernel: sata_qstor: spurious=1
Nov 12 10:09:05 w100 kernel: sata_qstor: spurious=0
Nov 12 10:09:05 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=1
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=2
Nov 12 10:11:53 w100 kernel: sata_qstor: spurious=3
Nov 12 10:11:54 w100 kernel: sata_qstor: spurious=0
Nov 12 10:11:55 w100 kernel: sata_qstor: spurious=1
Nov 12 10:11:55 w100 kernel: sata_qstor: spurious=0
Nov 12 10:12:35 w100 kernel: sata_qstor: spurious=1
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=1
Nov 12 10:12:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=1
Nov 12 10:30:51 w100 kernel: sata_qstor: spurious=2
Nov 12 10:30:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:30:52 w100 kernel: sata_qstor: spurious=0
Nov 12 10:40:08 w100 kernel: sata_qstor: spurious=0
Nov 12 10:40:08 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:34 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=0
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=1
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=2
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=3
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=4
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=5
Nov 12 10:57:37 w100 kernel: sata_qstor: spurious=3
Nov 12 11:01:18 w100 kernel: sata_qstor: spurious=0
Nov 12 11:01:19 w100 kernel: sata_qstor: spurious=1
Nov 12 11:18:35 w100 kernel: sata_qstor: spurious=0
Nov 12 11:18:39 w100 kernel: sata_qstor: spurious=1
Nov 12 11:18:40 w100 kernel: sata_qstor: spurious=0
Nov 12 11:18:40 w100 kernel: sata_qstor: spurious=0
Nov 12 11:21:42 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=1
Nov 12 11:35:11 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=0
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=1
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=2
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=3
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=4
Nov 12 11:35:12 w100 kernel: sata_qstor: spurious=2
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=0
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=1
Nov 12 11:38:01 w100 kernel: sata_qstor: spurious=0
Nov 12 11:42:53 w100 kernel: sata_qstor: spurious=0
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=0
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=1
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=2
Nov 12 11:50:26 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=1
Nov 12 12:07:03 w100 kernel: sata_qstor: spurious=2
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=1
Nov 12 12:07:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:08 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=1
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=0
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=1
Nov 12 12:15:31 w100 kernel: sata_qstor: spurious=2
Nov 12 12:41:11 w100 kernel: sata_qstor: spurious=0
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=1
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=2
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=3
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=4
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=5
Nov 12 12:41:28 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:42 w100 kernel: sata_qstor: spurious=0
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=1
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=2
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=4
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=5
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:47 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=0
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=1
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=2
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=3
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=4
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=5
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=6
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=7
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=8
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=9
Nov 12 13:05:52 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=2
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=3
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=0
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=1
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=2
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=3
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=4
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=5
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=6
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=7
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=8
Nov 12 13:06:01 w100 kernel: sata_qstor: spurious=9
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=0
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=1
Nov 12 13:20:15 w100 kernel: sata_qstor: spurious=2
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=0
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=1
Nov 12 13:20:17 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=2
Nov 12 13:23:30 w100 kernel: sata_qstor: spurious=3
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:23:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=1
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=1
Nov 12 13:24:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=0
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=1
Nov 12 13:26:36 w100 kernel: sata_qstor: spurious=2
Nov 12 13:31:30 w100 kernel: sata_qstor: spurious=0
Nov 12 13:31:33 w100 kernel: sata_qstor: spurious=1
Nov 12 13:31:33 w100 kernel: sata_qstor: spurious=1
Nov 12 13:32:18 w100 kernel: sata_qstor: spurious=0
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=1
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=2
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=3
Nov 12 13:32:43 w100 kernel: sata_qstor: spurious=3
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=0
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=1
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=2
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=3
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=4
Nov 12 13:33:07 w100 kernel: sata_qstor: spurious=2
Nov 12 13:33:08 w100 kernel: sata_qstor: spurious=0
Nov 12 13:33:39 w100 kernel: sata_qstor: spurious=0
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=0
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=1
Nov 12 13:34:53 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:15 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:15 w100 kernel: sata_qstor: spurious=1
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=0
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=1
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=2
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=3
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=4
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=5
Nov 12 13:41:19 w100 kernel: sata_qstor: spurious=3
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=1
Nov 12 13:49:43 w100 kernel: sata_qstor: spurious=2
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=0
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=1
Nov 12 13:49:49 w100 kernel: sata_qstor: spurious=2
Nov 12 13:53:04 w100 kernel: sata_qstor: spurious=0
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=0
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=1
Nov 12 13:53:32 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=1
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=2
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=1
Nov 12 13:56:11 w100 kernel: sata_qstor: spurious=2
Nov 12 13:56:13 w100 kernel: sata_qstor: spurious=0
Nov 12 13:56:13 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:49 w100 kernel: sata_qstor: spurious=0
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=1
Nov 12 14:10:50 w100 kernel: sata_qstor: spurious=0
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=0
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=1
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=2
Nov 12 14:11:40 w100 kernel: sata_qstor: spurious=3
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=1
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=1
Nov 12 14:14:32 w100 kernel: sata_qstor: spurious=0
Nov 12 14:22:18 w100 kernel: sata_qstor: spurious=0
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=1
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=2
Nov 12 14:22:22 w100 kernel: sata_qstor: spurious=3
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=0
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=1
Nov 12 14:26:51 w100 kernel: sata_qstor: spurious=0



On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > The saga continues. It happened again this morning even with the
> > patch:
> ..
> >> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >> Try this patch (attached) and report back.
> 
> Did you add the printk() to the patch, as suggested?
> If so, did it log anything ?
> 
> Thanks
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-12 20:31       ` Alberto Alonso
@ 2006-11-13 14:29         ` Mark Lord
  2006-11-14  4:50           ` Alberto Alonso
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Lord @ 2006-11-13 14:29 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

Alberto Alonso wrote:
> OK, after adding the printk line I can start seeing
> results.
> 
> I guess it has been close to 10 on quite a few
> occasions.
..
> # grep qstor /var/log/messages
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
> Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
> Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
..
> On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
>> Alberto Alonso wrote:
>>> The saga continues. It happened again this morning even with the
>>> patch:
>> ..
>>>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
>>>> Try this patch (attached) and report back.
>> Did you add the printk() to the patch, as suggested?
..

Excellent!

So, either we have a very noisy bit of hardware in there,
or something is wrong with sata_qstor.c.

The device has two methods for dealing with commands.
Regular R/W uses the driver's host queue "packet" interface,
and all other commands pass through the legacy MMIO mechanism.

I'm betting on some bug/interaction with the latter.

Try this patch and see what happens, on top of the printk patch.

Thanks




[-- Attachment #2: qstor_check_status.patch --]
[-- Type: text/x-patch, Size: 703 bytes --]

--- linux/drivers/scsi/sata_qstor.c.printk	2006-11-06 09:50:02.000000000 -0500
+++ linux/drivers/scsi/sata_qstor.c	2006-11-13 09:25:49.000000000 -0500
@@ -431,6 +431,7 @@
 		if (ap &&
 		    !(ap->flags & ATA_FLAG_DISABLED)) {
 			struct ata_queued_cmd *qc;
+			u8 status = ata_check_status(ap);
 			struct qs_port_priv *pp = ap->private_data;
 			if (!pp || pp->state != qs_state_mmio)
 				continue;
@@ -438,7 +439,7 @@
 			if (qc && (!(qc->tf.flags & ATA_TFLAG_POLLING))) {
 
 				/* check main status, clearing INTRQ */
-				u8 status = ata_check_status(ap);
+				//u8 status = ata_check_status(ap);
 				if ((status & ATA_BUSY))
 					continue;
 				DPRINTK("ata%u: protocol %d (dev_stat 0x%X)\n",

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-13 14:29         ` Mark Lord
@ 2006-11-14  4:50           ` Alberto Alonso
  2006-11-14 16:46             ` Mark Lord
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-14  4:50 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

Things are improving, after the latest patch I can still
see spurious messages, but the count stays at 0.

Sorry for including all the lines, but I think it may help
to see if a time pattern is visible. Here is the output:

Nov 13 14:01:08 w100 kernel: scsi0 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi1 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi2 : sata_qstor
Nov 13 14:01:08 w100 kernel: scsi3 : sata_qstor
Nov 13 14:05:15 w100 kernel: sata_qstor: spurious=0
Nov 13 14:08:29 w100 kernel: sata_qstor: spurious=0
Nov 13 14:11:01 w100 kernel: sata_qstor: spurious=0
Nov 13 14:11:06 w100 kernel: sata_qstor: spurious=0
Nov 13 14:15:03 w100 kernel: sata_qstor: spurious=0
Nov 13 14:20:28 w100 kernel: sata_qstor: spurious=0
Nov 13 14:28:44 w100 kernel: sata_qstor: spurious=0
Nov 13 14:40:40 w100 kernel: sata_qstor: spurious=0
Nov 13 14:46:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:00:15 w100 kernel: sata_qstor: spurious=0
Nov 13 15:01:52 w100 kernel: sata_qstor: spurious=0
Nov 13 15:13:18 w100 kernel: sata_qstor: spurious=0
Nov 13 15:18:25 w100 kernel: sata_qstor: spurious=0
Nov 13 15:19:57 w100 kernel: sata_qstor: spurious=0
Nov 13 15:24:56 w100 kernel: sata_qstor: spurious=0
Nov 13 15:32:52 w100 kernel: sata_qstor: spurious=0
Nov 13 15:32:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:37:47 w100 kernel: sata_qstor: spurious=0
Nov 13 15:42:30 w100 kernel: sata_qstor: spurious=0
Nov 13 15:42:33 w100 kernel: sata_qstor: spurious=0
Nov 13 15:45:50 w100 kernel: sata_qstor: spurious=0
Nov 13 15:54:53 w100 kernel: sata_qstor: spurious=0
Nov 13 15:59:32 w100 kernel: sata_qstor: spurious=0
Nov 13 16:04:07 w100 kernel: sata_qstor: spurious=0
Nov 13 16:06:01 w100 kernel: sata_qstor: spurious=0
Nov 13 16:07:24 w100 kernel: sata_qstor: spurious=0
Nov 13 16:19:54 w100 kernel: sata_qstor: spurious=0
Nov 13 16:19:55 w100 kernel: sata_qstor: spurious=0
Nov 13 16:27:34 w100 kernel: sata_qstor: spurious=0
Nov 13 16:33:21 w100 kernel: sata_qstor: spurious=0
Nov 13 16:39:57 w100 kernel: sata_qstor: spurious=0
Nov 13 16:42:38 w100 kernel: sata_qstor: spurious=0
Nov 13 16:43:40 w100 kernel: sata_qstor: spurious=0
Nov 13 16:50:36 w100 kernel: sata_qstor: spurious=0
Nov 13 16:52:11 w100 kernel: sata_qstor: spurious=0
Nov 13 16:53:41 w100 kernel: sata_qstor: spurious=0
Nov 13 16:58:30 w100 kernel: sata_qstor: spurious=0
Nov 13 17:18:40 w100 kernel: sata_qstor: spurious=0
Nov 13 17:20:21 w100 kernel: sata_qstor: spurious=0
Nov 13 17:20:46 w100 kernel: sata_qstor: spurious=0
Nov 13 17:24:02 w100 kernel: sata_qstor: spurious=0
Nov 13 17:27:26 w100 kernel: sata_qstor: spurious=0
Nov 13 17:35:40 w100 kernel: sata_qstor: spurious=0
Nov 13 17:39:01 w100 kernel: sata_qstor: spurious=0
Nov 13 17:40:23 w100 kernel: sata_qstor: spurious=0
Nov 13 18:03:57 w100 kernel: sata_qstor: spurious=0
Nov 13 18:08:16 w100 kernel: sata_qstor: spurious=0
Nov 13 18:11:20 w100 kernel: sata_qstor: spurious=0
Nov 13 18:16:32 w100 kernel: sata_qstor: spurious=0
Nov 13 18:16:32 w100 kernel: sata_qstor: spurious=0
Nov 13 18:21:28 w100 kernel: sata_qstor: spurious=0
Nov 13 18:22:57 w100 kernel: sata_qstor: spurious=0
Nov 13 18:26:36 w100 kernel: sata_qstor: spurious=0
Nov 13 18:29:21 w100 kernel: sata_qstor: spurious=0
Nov 13 18:32:34 w100 kernel: sata_qstor: spurious=0
Nov 13 18:34:39 w100 kernel: sata_qstor: spurious=0
Nov 13 18:36:17 w100 kernel: sata_qstor: spurious=0
Nov 13 18:38:07 w100 kernel: sata_qstor: spurious=0
Nov 13 18:42:58 w100 kernel: sata_qstor: spurious=0
Nov 13 18:45:19 w100 kernel: sata_qstor: spurious=0
Nov 13 18:47:14 w100 kernel: sata_qstor: spurious=0
Nov 13 18:49:42 w100 kernel: sata_qstor: spurious=0
Nov 13 18:52:22 w100 kernel: sata_qstor: spurious=0
Nov 13 18:55:28 w100 kernel: sata_qstor: spurious=0
Nov 13 18:55:42 w100 kernel: sata_qstor: spurious=0
Nov 13 19:00:12 w100 kernel: sata_qstor: spurious=0
Nov 13 19:04:15 w100 kernel: sata_qstor: spurious=0
Nov 13 19:10:41 w100 kernel: sata_qstor: spurious=0
Nov 13 19:12:11 w100 kernel: sata_qstor: spurious=0
Nov 13 19:15:51 w100 kernel: sata_qstor: spurious=0
Nov 13 19:20:12 w100 kernel: sata_qstor: spurious=0
Nov 13 19:24:27 w100 kernel: sata_qstor: spurious=0
Nov 13 19:26:40 w100 kernel: sata_qstor: spurious=0
Nov 13 19:30:41 w100 kernel: sata_qstor: spurious=0
Nov 13 19:38:02 w100 kernel: sata_qstor: spurious=0
Nov 13 19:41:07 w100 kernel: sata_qstor: spurious=0
Nov 13 19:42:42 w100 kernel: sata_qstor: spurious=0
Nov 13 19:48:06 w100 kernel: sata_qstor: spurious=0
Nov 13 19:55:05 w100 kernel: sata_qstor: spurious=0
Nov 13 19:55:07 w100 kernel: sata_qstor: spurious=0
Nov 13 19:58:00 w100 kernel: sata_qstor: spurious=0
Nov 13 19:58:07 w100 kernel: sata_qstor: spurious=0
Nov 13 20:05:51 w100 kernel: sata_qstor: spurious=0
Nov 13 20:08:32 w100 kernel: sata_qstor: spurious=0
Nov 13 20:09:39 w100 kernel: sata_qstor: spurious=0
Nov 13 20:11:54 w100 kernel: sata_qstor: spurious=0
Nov 13 20:17:19 w100 kernel: sata_qstor: spurious=0
Nov 13 20:17:57 w100 kernel: sata_qstor: spurious=0
Nov 13 20:23:48 w100 kernel: sata_qstor: spurious=0
Nov 13 20:25:14 w100 kernel: sata_qstor: spurious=0
Nov 13 20:25:16 w100 kernel: sata_qstor: spurious=0
Nov 13 20:28:56 w100 kernel: sata_qstor: spurious=0
Nov 13 20:29:00 w100 kernel: sata_qstor: spurious=0
Nov 13 20:32:39 w100 kernel: sata_qstor: spurious=0
Nov 13 20:35:06 w100 kernel: sata_qstor: spurious=0
Nov 13 20:38:20 w100 kernel: sata_qstor: spurious=0
Nov 13 20:41:30 w100 kernel: sata_qstor: spurious=0
Nov 13 20:41:33 w100 kernel: sata_qstor: spurious=0
Nov 13 20:45:00 w100 kernel: sata_qstor: spurious=0
Nov 13 21:08:22 w100 kernel: sata_qstor: spurious=0
Nov 13 21:10:28 w100 kernel: sata_qstor: spurious=0
Nov 13 21:13:26 w100 kernel: sata_qstor: spurious=0
Nov 13 21:15:22 w100 kernel: sata_qstor: spurious=0
Nov 13 21:23:32 w100 kernel: sata_qstor: spurious=0
Nov 13 21:24:11 w100 kernel: sata_qstor: spurious=0
Nov 13 21:26:40 w100 kernel: sata_qstor: spurious=0
Nov 13 21:33:14 w100 kernel: sata_qstor: spurious=0
Nov 13 21:45:55 w100 kernel: sata_qstor: spurious=0
Nov 13 21:45:57 w100 kernel: sata_qstor: spurious=0
Nov 13 21:56:52 w100 kernel: sata_qstor: spurious=0
Nov 13 21:56:56 w100 kernel: sata_qstor: spurious=0
Nov 13 22:01:53 w100 kernel: sata_qstor: spurious=0
Nov 13 22:22:46 w100 kernel: sata_qstor: spurious=0
Nov 13 22:30:44 w100 kernel: sata_qstor: spurious=0
Nov 13 22:38:52 w100 kernel: sata_qstor: spurious=0
Nov 13 22:41:48 w100 kernel: sata_qstor: spurious=0
Nov 13 22:43:20 w100 kernel: sata_qstor: spurious=0
Nov 13 22:44:58 w100 kernel: sata_qstor: spurious=0


Thanks,

Alberto

On Mon, 2006-11-13 at 09:29 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > OK, after adding the printk line I can start seeing
> > results.
> > 
> > I guess it has been close to 10 on quite a few
> > occasions.
> ..
> > # grep qstor /var/log/messages
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=1
> > Nov 12 07:00:53 w100 kernel: sata_qstor: spurious=0
> > Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=1
> > Nov 12 07:00:56 w100 kernel: sata_qstor: spurious=2
> ..
> > On Sun, 2006-11-12 at 00:09 -0500, Mark Lord wrote:
> >> Alberto Alonso wrote:
> >>> The saga continues. It happened again this morning even with the
> >>> patch:
> >> ..
> >>>> Mmm.. We could apply a bit of fuzzy tolerance for the odd glitch.
> >>>> Try this patch (attached) and report back.
> >> Did you add the printk() to the patch, as suggested?
> ..
> 
> Excellent!
> 
> So, either we have a very noisy bit of hardware in there,
> or something is wrong with sata_qstor.c.
> 
> The device has two methods for dealing with commands.
> Regular R/W uses the driver's host queue "packet" interface,
> and all other commands pass through the legacy MMIO mechanism.
> 
> I'm betting on some bug/interaction with the latter.
> 
> Try this patch and see what happens, on top of the printk patch.
> 
> Thanks
> 
> 
> 
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-14  4:50           ` Alberto Alonso
@ 2006-11-14 16:46             ` Mark Lord
  2006-11-14 18:17               ` Alberto Alonso
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Lord @ 2006-11-14 16:46 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

Alberto Alonso wrote:
> Things are improving, after the latest patch I can still
> see spurious messages, but the count stays at 0.

Mmm.. Okay, so we have a kludge fix (just get rid of the printk's we added).

But I would like to find out more about what is going on.
We seem to be getting lots of "leftover interrupts".
I'll look through my full qstor block driver (the high-performance
queuing driver, out-of-tree), and see if we missed an IRQ-mask bit
someplace in the simple sata_qstor.c re-implementation.

Cheers

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-14 16:46             ` Mark Lord
@ 2006-11-14 18:17               ` Alberto Alonso
  2006-11-14 18:23                 ` Mark Lord
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-14 18:17 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

Sounds good, let me know if you need me to do any
testing or help on the programming. I've been wanting
to get into kernel drivers, but never did find the
right time/place to do it.

Thanks,

Alberto

On Tue, 2006-11-14 at 11:46 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Things are improving, after the latest patch I can still
> > see spurious messages, but the count stays at 0.
> 
> Mmm.. Okay, so we have a kludge fix (just get rid of the printk's we added).
> 
> But I would like to find out more about what is going on.
> We seem to be getting lots of "leftover interrupts".
> I'll look through my full qstor block driver (the high-performance
> queuing driver, out-of-tree), and see if we missed an IRQ-mask bit
> someplace in the simple sata_qstor.c re-implementation.
> 
> Cheers
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-14 18:17               ` Alberto Alonso
@ 2006-11-14 18:23                 ` Mark Lord
  2006-11-16  6:35                   ` Alberto Alonso
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Lord @ 2006-11-14 18:23 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 289 bytes --]

Alberto Alonso wrote:
> Sounds good, let me know if you need me to do any
> testing or help on the programming. 

Okay, try this (untested) and see if it cleans things up.
I'm also including the old "printk" patch, so that we can
tell whether the "fix" is actually working or not.

Thanks

[-- Attachment #2: qstor_spurious2.patch --]
[-- Type: text/x-patch, Size: 877 bytes --]

--- old/drivers/scsi/sata_qstor.c	2006-09-19 23:42:06.000000000 -0400
+++ linux/drivers/scsi/sata_qstor.c	2006-11-06 09:50:02.000000000 -0500
@@ -459,6 +459,7 @@
 {
 	struct ata_host_set *host_set = dev_instance;
 	unsigned int handled = 0;
+	static int spurious = 0;
 
 	VPRINTK("ENTER\n");
 
@@ -466,6 +467,20 @@
 	handled  = qs_intr_pkt(host_set) | qs_intr_mmio(host_set);
 	spin_unlock(&host_set->lock);
 
+	/* There are reports of at least one system with glitchy interrupts,
+	 * and if we return !handled here for those then the kernel may totally
+	 * disable our own IRQ line (duh!).
+	 * So, try and be tolerant, but not *too* tolerant.
+	 */
+	if (!handled) {
+		if (spurious < 10) {
+			printk("sata_qstor: spurious=%d\n", spurious);
+			++spurious;
+			handled = 1;
+		}
+	} else if (spurious)
+		--spurious;
+
 	VPRINTK("EXIT\n");
 
 	return IRQ_RETVAL(handled);

[-- Attachment #3: qstor_nien_fix.patch --]
[-- Type: text/x-patch, Size: 994 bytes --]

--- old/drivers/scsi/sata_qstor.c	2006-11-14 11:30:37.000000000 -0500
+++ linux/drivers/scsi/sata_qstor.c	2006-11-14 13:22:05.000000000 -0500
@@ -299,6 +299,19 @@
 	return nelem;
 }
 
+static void qs_set_nien (struct ata_port *ap, int new_nien)
+{
+	struct ata_ioports *ioaddr = &ap->ioaddr;
+
+	if ((ap->ctl & ATA_NIEN) != new_nien) {
+		ap->ctl = (ap->ctl & ~ATA_NIEN) | new_nien;
+		ap->last_ctl = ap->ctl;
+		writeb(ap->ctl, (void __iomem *)ioaddr->ctl_addr);
+		wmb();
+		ata_check_status(ap);
+	}
+}
+
 static void qs_qc_prep(struct ata_queued_cmd *qc)
 {
 	struct qs_port_priv *pp = qc->ap->private_data;
@@ -357,6 +370,8 @@
 
 	switch (qc->tf.protocol) {
 	case ATA_PROT_DMA:
+		/* packet mode requires that the NIEN bit be turned on */
+		qs_set_nien(qc->ap, ATA_NIEN);
 
 		pp->state = qs_state_pkt;
 		qs_packet_start(qc);
@@ -370,6 +385,7 @@
 		break;
 	}
 
+	qs_set_nien(qc->ap, 0); // FIXME: is this necessary here?
 	pp->state = qs_state_mmio;
 	return ata_qc_issue_prot(qc);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-14 18:23                 ` Mark Lord
@ 2006-11-16  6:35                   ` Alberto Alonso
  2006-11-16 13:53                     ` Mark Lord
  0 siblings, 1 reply; 20+ messages in thread
From: Alberto Alonso @ 2006-11-16  6:35 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

Sorry for the long delay, I've been called on too
many issues at work this week.

Anyway, the patch basically made the drives not usable.
Here is the log from the boot process:


libata version 1.20 loaded.
sata_qstor 0000:01:03.0: version 0.05
ata1: SATA max UDMA/133 cmd 0xF8820400 ctl 0xF8820440 bmdma 0x0 irq 185
ata2: SATA max UDMA/133 cmd 0xF8824400 ctl 0xF8824440 bmdma 0x0 irq 185
ata3: SATA max UDMA/133 cmd 0xF8828400 ctl 0xF8828440 bmdma 0x0 irq 185
ata4: SATA max UDMA/133 cmd 0xF882C400 ctl 0xF882C440 bmdma 0x0 irq 185
ata1: SATA link up 3.0 Gbps (SStatus 123)
ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata1: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_qstor
ata2: SATA link up 3.0 Gbps (SStatus 123)
ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata2: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : sata_qstor
ata3: SATA link up 3.0 Gbps (SStatus 123)
ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata3: dev 0 ATA-7, max UDMA/133, 976771055 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : sata_qstor
ata4: SATA link up 3.0 Gbps (SStatus 123)
ata4: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4023 85:3469 86:3c01 87:4023
88:407f
ata4: dev 0 ATA-7, max UDMA/133, 976773168 sectors: LBA48
ata4: dev 0 configured for UDMA/133
scsi3 : sata_qstor
  Vendor: ATA       Model: ST3500641AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500641AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500641AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500641AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: unknown partition table
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: unknown partition table
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 976771055 512-byte hdwr sectors (500107 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
SCSI device sdc: 976771055 512-byte hdwr sectors (500107 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
 sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
 sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: Attached scsi generic sg2 type 0
sd 3:0:0:0: Attached scsi generic sg3 type 0

[...]

md: md3 stopped.
ata1: command 0x25 timeout, stat 0xff host_stat 0x0
ata1: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xff { Busy }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sda, sector 976772992
Buffer I/O error on device sda, logical block 122096624
ata1: command 0x25 timeout, stat 0xff host_stat 0x0
ata1: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xff { Busy }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sda, sector 976772992
Buffer I/O error on device sda, logical block 122096624
ata2: command 0x25 timeout, stat 0xff host_stat 0x0
ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xff { Busy }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdb, sector 976772992
Buffer I/O error on device sdb, logical block 122096624
ata2: command 0x25 timeout, stat 0xff host_stat 0x0
ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xff { Busy }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdb, sector 976772992
Buffer I/O error on device sdb, logical block 122096624
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770816
Buffer I/O error on device sdc, logical block 976770816
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770817
Buffer I/O error on device sdc, logical block 976770817
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770818
Buffer I/O error on device sdc, logical block 976770818
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770819
Buffer I/O error on device sdc, logical block 976770819
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770820
Buffer I/O error on device sdc, logical block 976770820
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770821
Buffer I/O error on device sdc, logical block 976770821
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770822
Buffer I/O error on device sdc, logical block 976770822
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770823
Buffer I/O error on device sdc, logical block 976770823
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770816
Buffer I/O error on device sdc, logical block 976770816
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770817
Buffer I/O error on device sdc, logical block 976770817
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770818
Buffer I/O error on device sdc, logical block 976770818
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770819
Buffer I/O error on device sdc, logical block 976770819
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770820
Buffer I/O error on device sdc, logical block 976770820
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770821
Buffer I/O error on device sdc, logical block 976770821
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770822
Buffer I/O error on device sdc, logical block 976770822
ata3: command 0x25 timeout, stat 0xff host_stat 0x0
ata3: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata3: status=0xff { Busy }
sd 2:0:0:0: SCSI error: return code = 0x8000002
sdc: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdc, sector 976770823
Buffer I/O error on device sdc, logical block 976770823
ata4: command 0x25 timeout, stat 0xff host_stat 0x0
ata4: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata4: status=0xff { Busy }
sd 3:0:0:0: SCSI error: return code = 0x8000002
sdd: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdd, sector 976772992
Buffer I/O error on device sdd, logical block 122096624
ata4: command 0x25 timeout, stat 0xff host_stat 0x0
ata4: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata4: status=0xff { Busy }
sd 3:0:0:0: SCSI error: return code = 0x8000002
sdd: Current: sense key=0xb
    ASC=0x47 ASCQ=0x0
end_request: I/O error, dev sdd, sector 976772992
Buffer I/O error on device sdd, logical block 122096624

[...]

md3 is software RAID 5 composed of sda through sdd, which are
the disks attached to the qstor.

I went back to the old kernel to be able to use this
array.

Any ideas?


Thanks,

Alberto


On Tue, 2006-11-14 at 13:23 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Sounds good, let me know if you need me to do any
> > testing or help on the programming. 
> 
> Okay, try this (untested) and see if it cleans things up.
> I'm also including the old "printk" patch, so that we can
> tell whether the "fix" is actually working or not.
> 
> Thanks
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-16  6:35                   ` Alberto Alonso
@ 2006-11-16 13:53                     ` Mark Lord
  2006-11-17 21:42                       ` Alberto Alonso
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Lord @ 2006-11-16 13:53 UTC (permalink / raw)
  To: Alberto Alonso; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

Alberto Alonso wrote:
> Sorry for the long delay, I've been called on too
> many issues at work this week.
> 
> Anyway, the patch basically made the drives not usable.

Mmm.. Okay, thanks for helping track this down.

It appears that this got broken when the ATA_TFLAG_POLLING
got introduced into libata, replacing previous checks of ATA_NIEN.
Or maybe even before that.  Not many of us have qstor cards!

Speaking of which, I'll dig my own qstor card out of mothballs soon,
and work out a proper fix for it soon-ish.

In the meanwhile, could you take a clean kernel, and apply the first
attached patch (qstor_spurious_1.patch), and see if it fixes things.

If not, then you can instead apply the second patch (qstor_spurious_kludge.patch)
and your problems should disappear.  But I cannot actually push that rubbish
upstream, so a "proper" fix will have to come later.

Cheers

[-- Attachment #2: qstor_spurious_1.patch --]
[-- Type: text/x-patch, Size: 458 bytes --]

--- linux/drivers/scsi/sata_qstor.c.orig	2006-09-19 23:42:06.000000000 -0400
+++ linux/drivers/scsi/sata_qstor.c	2006-11-16 08:46:43.000000000 -0500
@@ -399,6 +399,7 @@
 			if (ap && !(ap->flags & ATA_FLAG_DISABLED)) {
 				struct ata_queued_cmd *qc;
 				struct qs_port_priv *pp = ap->private_data;
+				ata_check_status(ap); /* kill spurious ints */
 				if (!pp || pp->state != qs_state_pkt)
 					continue;
 				qc = ata_qc_from_tag(ap, ap->active_tag);

[-- Attachment #3: qstor_spurious_kludge.patch --]
[-- Type: text/x-patch, Size: 934 bytes --]

--- linux/drivers/scsi/sata_qstor.c.orig	2006-09-19 23:42:06.000000000 -0400
+++ linux/drivers/scsi/sata_qstor.c	2006-11-16 08:49:57.000000000 -0500
@@ -423,7 +423,7 @@
 
 static inline unsigned int qs_intr_mmio(struct ata_host_set *host_set)
 {
-	unsigned int handled = 0, port_no;
+	unsigned int handled = 1, port_no;
 
 	for (port_no = 0; port_no < host_set->n_ports; ++port_no) {
 		struct ata_port *ap;
@@ -432,13 +432,13 @@
 		    !(ap->flags & ATA_FLAG_DISABLED)) {
 			struct ata_queued_cmd *qc;
 			struct qs_port_priv *pp = ap->private_data;
+			u8 status = ata_check_status(ap);
 			if (!pp || pp->state != qs_state_mmio)
 				continue;
 			qc = ata_qc_from_tag(ap, ap->active_tag);
 			if (qc && (!(qc->tf.flags & ATA_TFLAG_POLLING))) {
 
 				/* check main status, clearing INTRQ */
-				u8 status = ata_check_status(ap);
 				if ((status & ATA_BUSY))
 					continue;
 				DPRINTK("ata%u: protocol %d (dev_stat 0x%X)\n",

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: qstor driver -> irq 193: nobody cared
  2006-11-16 13:53                     ` Mark Lord
@ 2006-11-17 21:42                       ` Alberto Alonso
  0 siblings, 0 replies; 20+ messages in thread
From: Alberto Alonso @ 2006-11-17 21:42 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-kernel

FYI: trying with just the first patch didn't work. I got
the usual:

irq 185: nobody cared (try booting with the "irqpoll" option)
 <c013e19a> __report_bad_irq+0x2a/0xa0  <c013d970> handle_IRQ_event
+0x30/0x70
 <c013e2b0> note_interrupt+0x80/0xf0  <c013da8c> __do_IRQ+0xdc/0xf0
 <c0105799> do_IRQ+0x19/0x30  <c010391a> common_interrupt+0x1a/0x20
 <c0100d91> default_idle+0x41/0x70  <c0100e60> cpu_idle+0x80/0x90
handlers:
[<c0301300>] (qs_intr+0x0/0x230)
Disabling IRQ #185

Interestingly enough this is the first time I see it outside
of irq 193, not that it matter though.

I am going to try now with the second patch instead.

Alberto


On Thu, 2006-11-16 at 08:53 -0500, Mark Lord wrote:
> Alberto Alonso wrote:
> > Sorry for the long delay, I've been called on too
> > many issues at work this week.
> > 
> > Anyway, the patch basically made the drives not usable.
> 
> Mmm.. Okay, thanks for helping track this down.
> 
> It appears that this got broken when the ATA_TFLAG_POLLING
> got introduced into libata, replacing previous checks of ATA_NIEN.
> Or maybe even before that.  Not many of us have qstor cards!
> 
> Speaking of which, I'll dig my own qstor card out of mothballs soon,
> and work out a proper fix for it soon-ish.
> 
> In the meanwhile, could you take a clean kernel, and apply the first
> attached patch (qstor_spurious_1.patch), and see if it fixes things.
> 
> If not, then you can instead apply the second patch (qstor_spurious_kludge.patch)
> and your problems should disappear.  But I cannot actually push that rubbish
> upstream, so a "proper" fix will have to come later.
> 
> Cheers
-- 
Alberto Alonso                        Global Gate Systems LLC.
(512) 351-7233                        http://www.ggsys.net
Hardware, consulting, sysadmin, monitoring and remote backups


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-11-17 21:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-03 18:02 qstor driver -> irq 193: nobody cared Alberto Alonso
2006-11-03 19:00 ` Sergey Vlasov
2006-11-03 20:22   ` Alberto Alonso
2006-11-03 20:52     ` Sergey Vlasov
2006-11-03 23:21       ` Alberto Alonso
2006-11-04 18:39 ` Mark Lord
2006-11-06  4:04   ` Alberto Alonso
2006-11-06 14:51     ` Mark Lord
2006-11-10 17:36   ` Alberto Alonso
2006-11-12  5:09     ` Mark Lord
2006-11-12  5:17       ` Alberto Alonso
2006-11-12 20:31       ` Alberto Alonso
2006-11-13 14:29         ` Mark Lord
2006-11-14  4:50           ` Alberto Alonso
2006-11-14 16:46             ` Mark Lord
2006-11-14 18:17               ` Alberto Alonso
2006-11-14 18:23                 ` Mark Lord
2006-11-16  6:35                   ` Alberto Alonso
2006-11-16 13:53                     ` Mark Lord
2006-11-17 21:42                       ` Alberto Alonso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox