* 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ
@ 2007-01-03 23:26 Bjorn Wesen
2007-01-04 1:55 ` Bjorn Wesen
0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Wesen @ 2007-01-03 23:26 UTC (permalink / raw)
To: linux-ide
Hi folks,
There is only one thing keeping suspend-to-ram on my Sony Vaio SZ2 laptop
from working currently, and that is that most of the times I suspend/resume,
the SATA HD becomes blocked. Looking closer it is because an irq occurs
during resume, which reaches ata_interrupt, it does not handle it, and Linux
responds by blocking it permanently (the dreaded irq X: nobody cared).
The SZ2 has a Core Duo CPU, ICH7 and SATA is handled by the ata_piix driver.
No other PCI interrupts are mapped or enabled to the same interrupt.
I can't help thinking this looks like a race, because it resumes correctly
sometimes and then the HD works perfectly fine. Perhaps the SATA driver
hasn't recovered itself at the time the first irq occurs and thus it feels
it shouldn't handle it ?
I can't paste in the dmesg because the HD locks when it happens and thus
it's not written to it, but essentially, the "nobody cared" msg comes after
the extra CPU core is brought up, then the irq is disabled, then the ata
driver is trying to reconfigure the devices and then it locks up.
I can debug it but I'd need some pointers on where to start. For example,
one strategy could be to try forcing an acknowledge of the interrupt somehow
in the bottom of ata_interrupt if it feels it can't handle an interrupt
(I've only programmed embedded linux before not i386 so I don't know where
to ack such an irq - in the PCI bridge itself, or in the ATA device ? :).
Another strategy could be to find the reason why the interrupt is not
handled by enabling some debug or something which I haven't really looked
into yet...
Any ideas ?
Regards,
Bjorn
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ
2007-01-03 23:26 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ Bjorn Wesen
@ 2007-01-04 1:55 ` Bjorn Wesen
2007-01-04 4:37 ` Tejun Heo
2007-01-04 10:13 ` Alan
0 siblings, 2 replies; 4+ messages in thread
From: Bjorn Wesen @ 2007-01-04 1:55 UTC (permalink / raw)
To: linux-ide
Just adding some info here!
I added this to the bottom of ata_interrupt in libata-core.c which fixed the
problem:
if(!handled) {
printk("ata_interrupt nobody cared. Trying to clear irq src\n");
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap;
ap = host->ports[i];
ata_bmdma_irq_clear(ap);
}
handled = 1;
}
The result was that the above message comes 3 times in a row during resume,
then it silences and everything works. Also, I noticed that ata_host_intr is
not called in these cases, so when the interrupt reaches the driver after
the resume, it ignores it probably because it thinks it has no QC active
(correctly probably). Question is, where is the irq coming from then.
Obviously this is a horribly wrong fix, since if the interrupt is shared, we
will shadow the other interrupt so it never gets run (and corrupt our own
BM DMA operations).
A bit troubling that it seems to happen 3 times in a row, so anything simple
like clearing the BM IRQ status bit during the first stage of resume is not
enough perhaps (and I guess it already does that when re-initializing?).
For reference, I found this message from november about ICH7 and spurious BM
interrupts and a (not optimal) solution:
http://marc.theaimsgroup.com/?l=linux-ide&m=116373296023279&w=2
/Bjorn
On Thu, 4 Jan 2007, Bjorn Wesen wrote:
> Hi folks,
>
> There is only one thing keeping suspend-to-ram on my Sony Vaio SZ2 laptop
> from working currently, and that is that most of the times I suspend/resume,
> the SATA HD becomes blocked. Looking closer it is because an irq occurs
> during resume, which reaches ata_interrupt, it does not handle it, and Linux
> responds by blocking it permanently (the dreaded irq X: nobody cared).
>
> The SZ2 has a Core Duo CPU, ICH7 and SATA is handled by the ata_piix driver.
> No other PCI interrupts are mapped or enabled to the same interrupt.
>
> I can't help thinking this looks like a race, because it resumes correctly
> sometimes and then the HD works perfectly fine. Perhaps the SATA driver
> hasn't recovered itself at the time the first irq occurs and thus it feels it
> shouldn't handle it ?
>
> I can't paste in the dmesg because the HD locks when it happens and thus it's
> not written to it, but essentially, the "nobody cared" msg comes after the
> extra CPU core is brought up, then the irq is disabled, then the ata driver
> is trying to reconfigure the devices and then it locks up.
>
> I can debug it but I'd need some pointers on where to start. For example, one
> strategy could be to try forcing an acknowledge of the interrupt somehow in
> the bottom of ata_interrupt if it feels it can't handle an interrupt (I've
> only programmed embedded linux before not i386 so I don't know where to ack
> such an irq - in the PCI bridge itself, or in the ATA device ? :).
>
> Another strategy could be to find the reason why the interrupt is not handled
> by enabling some debug or something which I haven't really looked into yet...
>
> Any ideas ?
>
> Regards,
>
> Bjorn
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ
2007-01-04 1:55 ` Bjorn Wesen
@ 2007-01-04 4:37 ` Tejun Heo
2007-01-04 10:13 ` Alan
1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2007-01-04 4:37 UTC (permalink / raw)
To: Bjorn Wesen; +Cc: linux-ide
Bjorn Wesen wrote:
> Just adding some info here!
>
> I added this to the bottom of ata_interrupt in libata-core.c which fixed
> the problem:
>
> if(!handled) {
> printk("ata_interrupt nobody cared. Trying to clear irq src\n");
> for (i = 0; i < host->n_ports; i++) {
> struct ata_port *ap;
>
> ap = host->ports[i];
>
> ata_bmdma_irq_clear(ap);
> }
> handled = 1;
> }
>
> The result was that the above message comes 3 times in a row during
> resume, then it silences and everything works. Also, I noticed that
> ata_host_intr is not called in these cases, so when the interrupt
> reaches the driver after the resume, it ignores it probably because it
> thinks it has no QC active (correctly probably). Question is, where is
> the irq coming from then.
>
> Obviously this is a horribly wrong fix, since if the interrupt is
> shared, we will shadow the other interrupt so it never gets run (and
> corrupt our own BM DMA operations).
It is not that horribly wrong. The only problem you'll cause by
returning spurious non-zero handled is defeating 'nobody cared'
detection logic. So, if the other device sharing interrupt was raising
spurious interrupt (not likely, most modern controllers have irq pending
bit and drivers can reliably tell whether it is raising interrupt or
not, IDE interface is just ooooold), the machine will lock up hard.
I'll investigate deeper when when I get some spare time but for the time
being, you solution should suffice. Another solution would be switching
the controller into ahci mode which is much better.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ
2007-01-04 1:55 ` Bjorn Wesen
2007-01-04 4:37 ` Tejun Heo
@ 2007-01-04 10:13 ` Alan
1 sibling, 0 replies; 4+ messages in thread
From: Alan @ 2007-01-04 10:13 UTC (permalink / raw)
To: Bjorn Wesen; +Cc: linux-ide
> (correctly probably). Question is, where is the irq coming from then.
>
> Obviously this is a horribly wrong fix, since if the interrupt is shared, we
> will shadow the other interrupt so it never gets run (and corrupt our own
> BM DMA operations).
I wonder if it's left over from the resume I/O completing or the chip
coming back up in a stupid state. What occurs if the bit is cleared
during the early resume before IRQs are turned back on ?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-01-04 10:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-03 23:26 2.6.20-rc3 IRQ race upon resume? => killing SATA IRQ Bjorn Wesen
2007-01-04 1:55 ` Bjorn Wesen
2007-01-04 4:37 ` Tejun Heo
2007-01-04 10:13 ` Alan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).