* IRQ problem with sata_sil
@ 2006-05-08 10:00 Nicolas STRANSKY
2006-05-08 10:19 ` Paul M.
2006-05-10 4:36 ` Tejun Heo
0 siblings, 2 replies; 5+ messages in thread
From: Nicolas STRANSKY @ 2006-05-08 10:00 UTC (permalink / raw)
To: linux-ide
Hi all,
I've encountered a problem when trying to use a SATA card with a Silicon
Image SIL 3112 host controller chip. When inserting the module, the
kernel made this error:
> May 7 21:15:18 aneto kernel: sata_sil 0000:02:0c.0: version 1.0
> May 7 21:15:18 aneto kernel: PCI: Found IRQ 10 for device 0000:02:0c.0
> May 7 21:15:18 aneto kernel: PCI: Sharing IRQ 10 with 0000:02:05.0
> May 7 21:15:18 aneto kernel: ata1: SATA max UDMA/100 cmd 0xF8FBC080 ctl 0xF8FBC08A bmdma 0xF8FBC000 irq 10
> May 7 21:15:18 aneto kernel: ata2: SATA max UDMA/100 cmd 0xF8FBC0C0 ctl 0xF8FBC0CA bmdma 0xF8FBC008 irq 10
> May 7 21:15:18 aneto kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> May 7 21:15:18 aneto kernel: irq 10: nobody cared (try booting with the "irqpoll" option)
> May 7 21:15:18 aneto kernel: <c0131ad3> __report_bad_irq+0x2b/0x69 <c0131cd8> note_interrupt+0x1c7/0x1f7
> May 7 21:15:18 aneto kernel: <c0131660> __do_IRQ+0x8d/0xcd <c01043ad> do_IRQ+0x1d/0x28
> May 7 21:15:18 aneto kernel: <c0102cc2> common_interrupt+0x1a/0x20 <c0118237> __do_softirq+0x2c/0x7d
> May 7 21:15:18 aneto kernel: <c01182aa> do_softirq+0x22/0x26 <c011837f> irq_exit+0x29/0x34
> May 7 21:15:18 aneto kernel: <c01043b2> do_IRQ+0x22/0x28 <c0102cc2> common_interrupt+0x1a/0x20
> May 7 21:15:18 aneto kernel: <c016034c> __d_lookup+0x62/0x123 <c0157fa4> do_lookup+0x25/0x13f
> May 7 21:15:18 aneto kernel: <c015839e> __link_path_walk+0x2e0/0xbf6 <c01767b1> proc_delete_inode+0x22/0x75
> May 7 21:15:18 aneto kernel: <c0158cfd> link_path_walk+0x49/0xbb <c015919b> do_path_lookup+0x1a4/0x1d5
> May 7 21:15:18 aneto kernel: <c01593db> do_unlinkat+0x2f/0xff <c0357ca7> syscall_call+0x7/0xb
> May 7 21:15:18 aneto kernel: <c035007b> xfrm_aalg_get_byid+0x27/0x39
> May 7 21:15:18 aneto kernel: handlers:
> May 7 21:15:18 aneto kernel: [<c02d2c90>] (ata_interrupt+0x0/0x13f)
> May 7 21:15:18 aneto kernel: Disabling IRQ #10
> May 7 21:15:18 aneto kernel: ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:207f
> May 7 21:15:18 aneto kernel: ata1: dev 0 ATA-6, max UDMA/133, 312581808 sectors: LBA48
> May 7 21:15:18 aneto kernel: ata1: dev 0 configured for UDMA/100
> May 7 21:15:18 aneto kernel: scsi1 : sata_sil
> May 7 21:15:18 aneto kernel: ata2: SATA link down (SStatus 0 SControl 310)
> May 7 21:15:18 aneto kernel: scsi2 : sata_sil
> May 7 21:15:18 aneto kernel: Vendor: ATA Model: ST3160827AS Rev: 3.42
> May 7 21:15:18 aneto kernel: Type: Direct-Access ANSI SCSI revision: 05
> May 7 21:15:18 aneto kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
> May 7 21:15:18 aneto kernel: sda: Write Protect is off
> May 7 21:15:18 aneto kernel: sda: Mode Sense: 00 3a 00 00
> May 7 21:15:18 aneto kernel: SCSI device sda: drive cache: write back
> May 7 21:15:18 aneto kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
> May 7 21:15:18 aneto kernel: sda: Write Protect is off
> May 7 21:15:18 aneto kernel: sda: Mode Sense: 00 3a 00 00
I've tried with both 2.6.16 and 2.6.17-rc3-mm1, each time with or
without the "irqpoll" boot option, but it always ended with the same error.
Please let me know if I can provide any additional information.
Thanks for support,
--
Nico
L'esprit nous sert quelquefois à faire hardiment des sottises.
-+- François de La Rochefoucauld (1613-1680), Maximes 415 -+-
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: IRQ problem with sata_sil
2006-05-08 10:00 IRQ problem with sata_sil Nicolas STRANSKY
@ 2006-05-08 10:19 ` Paul M.
2006-05-09 5:39 ` Nicolas STRANSKY
2006-05-10 4:36 ` Tejun Heo
1 sibling, 1 reply; 5+ messages in thread
From: Paul M. @ 2006-05-08 10:19 UTC (permalink / raw)
To: Nicolas STRANSKY; +Cc: linux-ide
I had a similar error with my system's on board SI controller while
accessing a 3ware controller at high speeds. Personally I'm probably
going to avoid the whole problem and try to boot from the 3ware array.
-Paul
(My email to the Xen list)
To: xen-users@lists.xensource.com
Date: Apr 29, 2006 9:23 AM
Subject: SATA Controller Loosing Interrupt During High Speed Access To
RAID Array
A server I've been setting up has been experiencing "lockups" during
sustained high speed access to its hardware RAID controller. The
"lockup" is actual the system loosing access to the SATA disk that the
system partitions reside on. The error message reported to the console
leads me to believe that the SATA disk/controller "lost" its
interrupt.
I'm fairly sure that this isn't power or hardware related. And I'm
betting that 3ware's Linux drives on amd64 are stable. But I won't be
able to rule these out for sure until I run a few more tests. After I
finish this I'll capture the error message if it still looks to be Xen
related.
I'm posting this now to see if this sort of thing has been seen before.
Hard:
1x dual core opteron 275
4GB RAM
Silicone Image SATA controller
w/ 1x SATA drive
3WARE ESCALAD 9550SX-4LP SATA
w/ 2x SATA drives
Soft:
Xen 3.0.1 & 3.0.2
dom0 Debian 31r0a
Using a custom compiled kernel
I run into problems when I use dd to either read or write a >=8GB file
to the drive. Light disk access does not cause problems. When I ran
the array in RAID1 (~75MB/s sustained write) I didn't get any errors.
When I switched it to RAID0 (~150MB/s sustained write) the errors came
back.
Lemme know if you have any thoughts on this. I'll repost when I have
more information.
-Paul
On 5/8/06, Nicolas STRANSKY <Nico@stransky.cx> wrote:
> Hi all,
>
> I've encountered a problem when trying to use a SATA card with a Silicon
> Image SIL 3112 host controller chip. When inserting the module, the
> kernel made this error:
>
> > May 7 21:15:18 aneto kernel: sata_sil 0000:02:0c.0: version 1.0
> > May 7 21:15:18 aneto kernel: PCI: Found IRQ 10 for device 0000:02:0c.0
> > May 7 21:15:18 aneto kernel: PCI: Sharing IRQ 10 with 0000:02:05.0
> > May 7 21:15:18 aneto kernel: ata1: SATA max UDMA/100 cmd 0xF8FBC080 ctl 0xF8FBC08A bmdma 0xF8FBC000 irq 10
> > May 7 21:15:18 aneto kernel: ata2: SATA max UDMA/100 cmd 0xF8FBC0C0 ctl 0xF8FBC0CA bmdma 0xF8FBC008 irq 10
> > May 7 21:15:18 aneto kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > May 7 21:15:18 aneto kernel: irq 10: nobody cared (try booting with the "irqpoll" option)
> > May 7 21:15:18 aneto kernel: <c0131ad3> __report_bad_irq+0x2b/0x69 <c0131cd8> note_interrupt+0x1c7/0x1f7
> > May 7 21:15:18 aneto kernel: <c0131660> __do_IRQ+0x8d/0xcd <c01043ad> do_IRQ+0x1d/0x28
> > May 7 21:15:18 aneto kernel: <c0102cc2> common_interrupt+0x1a/0x20 <c0118237> __do_softirq+0x2c/0x7d
> > May 7 21:15:18 aneto kernel: <c01182aa> do_softirq+0x22/0x26 <c011837f> irq_exit+0x29/0x34
> > May 7 21:15:18 aneto kernel: <c01043b2> do_IRQ+0x22/0x28 <c0102cc2> common_interrupt+0x1a/0x20
> > May 7 21:15:18 aneto kernel: <c016034c> __d_lookup+0x62/0x123 <c0157fa4> do_lookup+0x25/0x13f
> > May 7 21:15:18 aneto kernel: <c015839e> __link_path_walk+0x2e0/0xbf6 <c01767b1> proc_delete_inode+0x22/0x75
> > May 7 21:15:18 aneto kernel: <c0158cfd> link_path_walk+0x49/0xbb <c015919b> do_path_lookup+0x1a4/0x1d5
> > May 7 21:15:18 aneto kernel: <c01593db> do_unlinkat+0x2f/0xff <c0357ca7> syscall_call+0x7/0xb
> > May 7 21:15:18 aneto kernel: <c035007b> xfrm_aalg_get_byid+0x27/0x39
> > May 7 21:15:18 aneto kernel: handlers:
> > May 7 21:15:18 aneto kernel: [<c02d2c90>] (ata_interrupt+0x0/0x13f)
> > May 7 21:15:18 aneto kernel: Disabling IRQ #10
> > May 7 21:15:18 aneto kernel: ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:207f
> > May 7 21:15:18 aneto kernel: ata1: dev 0 ATA-6, max UDMA/133, 312581808 sectors: LBA48
> > May 7 21:15:18 aneto kernel: ata1: dev 0 configured for UDMA/100
> > May 7 21:15:18 aneto kernel: scsi1 : sata_sil
> > May 7 21:15:18 aneto kernel: ata2: SATA link down (SStatus 0 SControl 310)
> > May 7 21:15:18 aneto kernel: scsi2 : sata_sil
> > May 7 21:15:18 aneto kernel: Vendor: ATA Model: ST3160827AS Rev: 3.42
> > May 7 21:15:18 aneto kernel: Type: Direct-Access ANSI SCSI revision: 05
> > May 7 21:15:18 aneto kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
> > May 7 21:15:18 aneto kernel: sda: Write Protect is off
> > May 7 21:15:18 aneto kernel: sda: Mode Sense: 00 3a 00 00
> > May 7 21:15:18 aneto kernel: SCSI device sda: drive cache: write back
> > May 7 21:15:18 aneto kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
> > May 7 21:15:18 aneto kernel: sda: Write Protect is off
> > May 7 21:15:18 aneto kernel: sda: Mode Sense: 00 3a 00 00
>
> I've tried with both 2.6.16 and 2.6.17-rc3-mm1, each time with or
> without the "irqpoll" boot option, but it always ended with the same error.
>
> Please let me know if I can provide any additional information.
>
> Thanks for support,
> --
> Nico
> L'esprit nous sert quelquefois à faire hardiment des sottises.
> -+- François de La Rochefoucauld (1613-1680), Maximes 415 -+-
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: IRQ problem with sata_sil
2006-05-08 10:19 ` Paul M.
@ 2006-05-09 5:39 ` Nicolas STRANSKY
2006-05-10 1:11 ` jason
0 siblings, 1 reply; 5+ messages in thread
From: Nicolas STRANSKY @ 2006-05-09 5:39 UTC (permalink / raw)
To: linux-ide
Le 05/08/2006 12:19 PM, Paul M. a écrit :
> I had a similar error with my system's on board SI controller while
> accessing a 3ware controller at high speeds. Personally I'm probably
> going to avoid the whole problem and try to boot from the 3ware array.
IMO the two problems are unrelated because here the error appears just
when inserting the module and because of an IRQ mess, not because of big
file transfers..
--
Nico
Le signe "oui" est d'un homme qui s'endort ;
au contraire, le réveil secoue la tête et dit non.
-+- Émile Chartier, dit Alain (1868-1951) -+-
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: IRQ problem with sata_sil
2006-05-09 5:39 ` Nicolas STRANSKY
@ 2006-05-10 1:11 ` jason
0 siblings, 0 replies; 5+ messages in thread
From: jason @ 2006-05-10 1:11 UTC (permalink / raw)
To: Nicolas STRANSKY; +Cc: linux-ide
Hi Nicolas,
You should also provide the following information.
# lspci
# lspci -vv -s 02:0c.0
# lspci -vv -s 02:05.0
# cat /proc/interrupts
BTW: Have you tried your SATA card with any other PCI slots? Or may be
a BIOS update or DSDT update could also solve the problem.
Yours,
jason
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: IRQ problem with sata_sil
2006-05-08 10:00 IRQ problem with sata_sil Nicolas STRANSKY
2006-05-08 10:19 ` Paul M.
@ 2006-05-10 4:36 ` Tejun Heo
1 sibling, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2006-05-10 4:36 UTC (permalink / raw)
To: Nicolas STRANSKY; +Cc: linux-ide
Nicolas STRANSKY wrote:
> Hi all,
>
> I've encountered a problem when trying to use a SATA card with a Silicon
> Image SIL 3112 host controller chip. When inserting the module, the
> kernel made this error:
>
Hello, Nicolas.
One of the most annoying problem with traditional IDE interface is that
the controller doesn't have 'hey! I'm the needy one!' flag. So, when an
IRQ is raised, there is no way whether the controller is actually
raising the interrupt or not. libata (and probably other IDE drivers
too) works around this by determining when to expect interrupt and
handle the interrupt according to HSM.
Under normal circumstances, this usually works, but every now and then
something weird happens and the controller raises IRQ that libata wasn't
expecting, resulting in stuck IRQ and Linux IRQ layer saves its ass by
disabling the IRQ, which is a good thing; otherwise, the whole system
would hang completely.
In your case, the IRQ was in raised state before you loaded the module.
While loading the module, IRQ gets unmasked. As libata hasn't issued
any command yet, it isn't expecting interrupts, so it thinks the
interrupt doesn't belong to it. So, the stuck IRQ and your symptom.
libata is in the way of getting new EH/probing mechanism and the problem
will be resolved. New EH/probing does two things to prevent such things.
* freeze (mask interrupt) the port before requesting IRQ and thaw
(unmask) only after the controller and drives are reset and in known
good state.
* sil3112 family of controllers (and most of other modern SATA
controllers too) actually have the needy flag. New version of sata_sil
will have customized IRQ handler which properly detects and handles
unexpected/spurious interrupts.
New EH/probing will probably included in 2.6.18 but I intend to make
patches against stable series. So, if this problem bugs you, stay tuned.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-05-10 4:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-08 10:00 IRQ problem with sata_sil Nicolas STRANSKY
2006-05-08 10:19 ` Paul M.
2006-05-09 5:39 ` Nicolas STRANSKY
2006-05-10 1:11 ` jason
2006-05-10 4:36 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).