* [git patches] libata fixes for .26
@ 2008-07-04 13:10 Jeff Garzik
2008-07-04 17:09 ` Linus Torvalds
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Garzik @ 2008-07-04 13:10 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds; +Cc: linux-ide, LKML
The libata-sff change is longer than I'd like for 2.6.26-rc, but it's
all printk changes/additions. No behavior changes, just improved
diagnostics upon error, something we really need in that area.
Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git upstream-linus
to receive the following updates:
drivers/ata/ahci.c | 9 +++------
drivers/ata/libata-sff.c | 30 +++++++++++++++++++++---------
drivers/ata/sata_mv.c | 2 +-
drivers/ata/sata_sil24.c | 1 +
drivers/ata/sata_uli.c | 1 +
5 files changed, 27 insertions(+), 16 deletions(-)
Mark Lord (1):
sata_mv: safer logic for limit_warnings
Tejun Heo (4):
sata_uli: hardreset is broken
sata_sil24: add DID for another adaptec flavor
ahci: always clear all bits in irq_stat
libata-sff: improve HSM violation reporting
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 6a4a2a2..061817a 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1777,7 +1777,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
struct ahci_host_priv *hpriv;
unsigned int i, handled = 0;
void __iomem *mmio;
- u32 irq_stat, irq_ack = 0;
+ u32 irq_stat;
VPRINTK("ENTER\n");
@@ -1809,14 +1809,11 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
"interrupt on disabled port %u\n", i);
}
- irq_ack |= (1 << i);
- }
-
- if (irq_ack) {
- writel(irq_ack, mmio + HOST_IRQ_STAT);
handled = 1;
}
+ writel(irq_stat, mmio + HOST_IRQ_STAT);
+
spin_unlock(&host->lock);
VPRINTK("EXIT\n");
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 215d186..c0908c2 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -1094,6 +1094,7 @@ static void ata_hsm_qc_complete(struct ata_queued_cmd *qc, int in_wq)
int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc,
u8 status, int in_wq)
{
+ struct ata_eh_info *ehi = &ap->link.eh_info;
unsigned long flags = 0;
int poll_next;
@@ -1125,9 +1126,12 @@ fsm_start:
if (likely(status & (ATA_ERR | ATA_DF)))
/* device stops HSM for abort/error */
qc->err_mask |= AC_ERR_DEV;
- else
+ else {
/* HSM violation. Let EH handle this */
+ ata_ehi_push_desc(ehi,
+ "ST_FIRST: !(DRQ|ERR|DF)");
qc->err_mask |= AC_ERR_HSM;
+ }
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -1146,9 +1150,9 @@ fsm_start:
* the CDB.
*/
if (!(qc->dev->horkage & ATA_HORKAGE_STUCK_ERR)) {
- ata_port_printk(ap, KERN_WARNING,
- "DRQ=1 with device error, "
- "dev_stat 0x%X\n", status);
+ ata_ehi_push_desc(ehi, "ST_FIRST: "
+ "DRQ=1 with device error, "
+ "dev_stat 0x%X", status);
qc->err_mask |= AC_ERR_HSM;
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -1205,9 +1209,9 @@ fsm_start:
* let the EH abort the command or reset the device.
*/
if (unlikely(status & (ATA_ERR | ATA_DF))) {
- ata_port_printk(ap, KERN_WARNING, "DRQ=1 with "
- "device error, dev_stat 0x%X\n",
- status);
+ ata_ehi_push_desc(ehi, "ST-ATAPI: "
+ "DRQ=1 with device error, "
+ "dev_stat 0x%X", status);
qc->err_mask |= AC_ERR_HSM;
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -1226,13 +1230,17 @@ fsm_start:
if (likely(status & (ATA_ERR | ATA_DF)))
/* device stops HSM for abort/error */
qc->err_mask |= AC_ERR_DEV;
- else
+ else {
/* HSM violation. Let EH handle this.
* Phantom devices also trigger this
* condition. Mark hint.
*/
+ ata_ehi_push_desc(ehi, "ST-ATA: "
+ "DRQ=1 with device error, "
+ "dev_stat 0x%X", status);
qc->err_mask |= AC_ERR_HSM |
AC_ERR_NODEV_HINT;
+ }
ap->hsm_task_state = HSM_ST_ERR;
goto fsm_start;
@@ -1257,8 +1265,12 @@ fsm_start:
status = ata_wait_idle(ap);
}
- if (status & (ATA_BUSY | ATA_DRQ))
+ if (status & (ATA_BUSY | ATA_DRQ)) {
+ ata_ehi_push_desc(ehi, "ST-ATA: "
+ "BUSY|DRQ persists on ERR|DF, "
+ "dev_stat 0x%X", status);
qc->err_mask |= AC_ERR_HSM;
+ }
/* ata_pio_sectors() might change the
* state to HSM_ST_LAST. so, the state
diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
index 28092bc..ad169ff 100644
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@@ -1607,7 +1607,7 @@ static unsigned int mv_qc_issue(struct ata_queued_cmd *qc)
* Much of the time, this could just work regardless.
* So for now, just log the incident, and allow the attempt.
*/
- if (limit_warnings && (qc->nbytes / qc->sect_size) > 1) {
+ if (limit_warnings > 0 && (qc->nbytes / qc->sect_size) > 1) {
--limit_warnings;
ata_link_printk(qc->dev->link, KERN_WARNING, DRV_NAME
": attempting PIO w/multiple DRQ: "
diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
index 8ee6b5b..84ffcc2 100644
--- a/drivers/ata/sata_sil24.c
+++ b/drivers/ata/sata_sil24.c
@@ -370,6 +370,7 @@ static const struct pci_device_id sil24_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, 0x3124), BID_SIL3124 },
{ PCI_VDEVICE(CMD, 0x3132), BID_SIL3132 },
{ PCI_VDEVICE(CMD, 0x0242), BID_SIL3132 },
+ { PCI_VDEVICE(CMD, 0x0244), BID_SIL3132 },
{ PCI_VDEVICE(CMD, 0x3131), BID_SIL3131 },
{ PCI_VDEVICE(CMD, 0x3531), BID_SIL3131 },
diff --git a/drivers/ata/sata_uli.c b/drivers/ata/sata_uli.c
index f277cea..db529b8 100644
--- a/drivers/ata/sata_uli.c
+++ b/drivers/ata/sata_uli.c
@@ -83,6 +83,7 @@ static struct ata_port_operations uli_ops = {
.inherits = &ata_bmdma_port_ops,
.scr_read = uli_scr_read,
.scr_write = uli_scr_write,
+ .hardreset = ATA_OP_NULL,
};
static const struct ata_port_info uli_port_info = {
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [git patches] libata fixes for .26
2008-07-04 13:10 [git patches] libata fixes for .26 Jeff Garzik
@ 2008-07-04 17:09 ` Linus Torvalds
2008-07-05 3:13 ` Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2008-07-04 17:09 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Andrew Morton, linux-ide, LKML, Tejun Heo
On Fri, 4 Jul 2008, Jeff Garzik wrote:
>
> The libata-sff change is longer than I'd like for 2.6.26-rc, but it's
> all printk changes/additions. No behavior changes, just improved
> diagnostics upon error, something we really need in that area.
Hmm..
Looking at the AHCI change, I think it's still potentially buggy.
I think it is potentially buggy for two separate reasons:
- if the interrupt happens because of some port that we don't handle, we
should still ACK it, in order to get rid of it. I don't think Tejun's
patch fixed anything at all, since it still did a binary 'and' with
hpriv->port_map on the bits, so it would never ACK anything that didn't
have a bit set, and the (bogus) interrupt would keep screaming.
- I also wonder if / suspect that the IRQ ACK should happen _before_ we
handle the source of the interrupt, because otherwise if one port ends
up having two events in close succession (can this happen? I think so),
then we end up perhaps getting the irq for the first one, and handle
that first event, but then the second event happens immediately
afterwards, and before we do the writel() to ACK it, so now the
_hardware_ thinks we have handled both of them, even though we never
actually reacted to the second event.
So I think the appended (TOTALLY UNTESTED!) patch - based on top of the
pull that I already did - might be a good idea.
NOTE! I _really_ didn't test it. I do not know how AHCI works at a low
level, and maybe there is some reason why the IRQ ACK writel() actually
has to happen after you've handled the event (to avoid getting a new
interrupt immediately. But based on other controllers I've worked with,
this is the correct way to not lose irq events.
[ And yes, the race for the irq ack issue is small. And yes, the
likelihood of a bogus interrupt triggering is probably small too. And
see above about my lack of specific knowledge about AHCI.
So I'm sure as heck not going to commit this patch, I'm just sending it
out as a "Are you sure you shouldn't do it like this?" RFC patch.. ]
Hmm?
Linus
---
drivers/ata/ahci.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 061817a..5cfee74 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1786,12 +1786,17 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
/* sigh. 0xffffffff is a valid return from h/w */
irq_stat = readl(mmio + HOST_IRQ_STAT);
- irq_stat &= hpriv->port_map;
if (!irq_stat)
return IRQ_NONE;
spin_lock(&host->lock);
+ /* Ack _all_ sources of interrupts.. */
+ writel(irq_stat, mmio + HOST_IRQ_STAT);
+
+ /* ..but only care (and report as handled) about the ones we can handle */
+ irq_stat &= hpriv->port_map;
+
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap;
@@ -1811,9 +1816,6 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
handled = 1;
}
-
- writel(irq_stat, mmio + HOST_IRQ_STAT);
-
spin_unlock(&host->lock);
VPRINTK("EXIT\n");
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [git patches] libata fixes for .26
2008-07-04 17:09 ` Linus Torvalds
@ 2008-07-05 3:13 ` Tejun Heo
2008-07-05 3:18 ` Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2008-07-05 3:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff Garzik, Andrew Morton, linux-ide, LKML
Hello, Linus.
Linus Torvalds wrote:
> Looking at the AHCI change, I think it's still potentially buggy.
>
> I think it is potentially buggy for two separate reasons:
>
> - if the interrupt happens because of some port that we don't handle, we
> should still ACK it, in order to get rid of it. I don't think Tejun's
> patch fixed anything at all, since it still did a binary 'and' with
> hpriv->port_map on the bits, so it would never ACK anything that didn't
> have a bit set, and the (bogus) interrupt would keep screaming.
Strange. Yeah, the AND should go. The original patch was...
http://htj.dyndns.org/export/testing/head-x86_64-bug390937_dbg0/0001-ahci-interrupt-debug.patch
And the reporter verified debug option 2 (the posted patch) and 3
(always write 0xffffffff) fixed the problem. The report was complete
w/ boot log showing which debug option was active. Ah... weird. The
debug option 2 shouldn't have made any difference. :-(
Anyways, will send a patch to drop the AND.
> - I also wonder if / suspect that the IRQ ACK should happen _before_ we
> handle the source of the interrupt, because otherwise if one port ends
> up having two events in close succession (can this happen? I think so),
> then we end up perhaps getting the irq for the first one, and handle
> that first event, but then the second event happens immediately
> afterwards, and before we do the writel() to ACK it, so now the
> _hardware_ thinks we have handled both of them, even though we never
> actually reacted to the second event.
>
> So I think the appended (TOTALLY UNTESTED!) patch - based on top of the
> pull that I already did - might be a good idea.
>
> NOTE! I _really_ didn't test it. I do not know how AHCI works at a low
> level, and maybe there is some reason why the IRQ ACK writel() actually
> has to happen after you've handled the event (to avoid getting a new
> interrupt immediately. But based on other controllers I've worked with,
> this is the correct way to not lose irq events.
>
> [ And yes, the race for the irq ack issue is small. And yes, the
> likelihood of a bogus interrupt triggering is probably small too. And
> see above about my lack of specific knowledge about AHCI.
>
> So I'm sure as heck not going to commit this patch, I'm just sending it
> out as a "Are you sure you shouldn't do it like this?" RFC patch.. ]
The current code is correct. From 10.6.2.1 of ahci 1.1,
In order to clear an interrupt, software must first clear the event
from the PxIS register, then clear the interrupt event from the IS
register. If software clears IS register only, leaving the level of
the virtual wire from the PxIS register set, the resulting level of
1 shall cause the IS register bit to be re-set.
so, it basically behaves like level triggered IRQ latch for ports,
which also creates an easy-to-miss requirement for MSI implementation
where the controller should generate a new MSI message when the driver
clears some of the IRQ pending bits but not all.
I do agree it's strange. Whenever I come back to ahci_interrupt()
after enough time has passed, that clearing code always makes me look
up the spec. I guess it's about time to add some comment there.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [git patches] libata fixes for .26
2008-07-05 3:13 ` Tejun Heo
@ 2008-07-05 3:18 ` Tejun Heo
2008-07-05 4:10 ` [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat Tejun Heo
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2008-07-05 3:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff Garzik, Andrew Morton, linux-ide, LKML
Tejun Heo wrote:
> so, it basically behaves like level triggered IRQ latch for ports,
> which also creates an easy-to-miss requirement for MSI implementation
> where the controller should generate a new MSI message when the driver
> clears some of the IRQ pending bits but not all.
Oops, confused. The easy-to-miss requirement is the controller needing
to resend the message if the irq_stat is cleared but port IRQ is still
pending. Resending on partial clear is always necessary.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat
2008-07-05 3:18 ` Tejun Heo
@ 2008-07-05 4:10 ` Tejun Heo
2008-07-06 13:45 ` Jeff Garzik
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2008-07-05 4:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jeff Garzik, Andrew Morton, linux-ide, LKML
Commit ea0c62f7cf70f13a67830471b613337bd0c9a62e tried to clear all
bits in irq_stat but it didn't actually achieve that as irq_stat was
anded with port_map right after read. This patch makes ahci driver
always use the unmasked value to clear irq_status.
While at it, add explanation on the peculiarities of ahci IRQ
clearing.
This was spotted by Linus Torvalds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
The original report requires further investigation regarding why the
bogus change fixed the problem but this change is the correct thing to
do regardless.
Thanks.
drivers/ata/ahci.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 061817a..5e6468a 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1777,7 +1777,7 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
struct ahci_host_priv *hpriv;
unsigned int i, handled = 0;
void __iomem *mmio;
- u32 irq_stat;
+ u32 irq_stat, irq_masked;
VPRINTK("ENTER\n");
@@ -1786,16 +1786,17 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
/* sigh. 0xffffffff is a valid return from h/w */
irq_stat = readl(mmio + HOST_IRQ_STAT);
- irq_stat &= hpriv->port_map;
if (!irq_stat)
return IRQ_NONE;
+ irq_masked = irq_stat & hpriv->port_map;
+
spin_lock(&host->lock);
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap;
- if (!(irq_stat & (1 << i)))
+ if (!(irq_masked & (1 << i)))
continue;
ap = host->ports[i];
@@ -1812,6 +1813,15 @@ static irqreturn_t ahci_interrupt(int irq, void *dev_instance)
handled = 1;
}
+ /* HOST_IRQ_STAT behaves as level triggered latch meaning that
+ * it should be cleared after all the port events are cleared;
+ * otherwise, it will raise a spurious interrupt after each
+ * valid one. Please read section 10.6.2 of ahci 1.1 for more
+ * information.
+ *
+ * Also, use the unmasked value to clear interrupt as spurious
+ * pending event on a dummy port might cause screaming IRQ.
+ */
writel(irq_stat, mmio + HOST_IRQ_STAT);
spin_unlock(&host->lock);
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat
2008-07-05 4:10 ` [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat Tejun Heo
@ 2008-07-06 13:45 ` Jeff Garzik
0 siblings, 0 replies; 6+ messages in thread
From: Jeff Garzik @ 2008-07-06 13:45 UTC (permalink / raw)
To: Tejun Heo, Linus Torvalds; +Cc: Andrew Morton, linux-ide, LKML
Tejun Heo wrote:
> Commit ea0c62f7cf70f13a67830471b613337bd0c9a62e tried to clear all
> bits in irq_stat but it didn't actually achieve that as irq_stat was
> anded with port_map right after read. This patch makes ahci driver
> always use the unmasked value to clear irq_status.
>
> While at it, add explanation on the peculiarities of ahci IRQ
> clearing.
>
> This was spotted by Linus Torvalds.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> ---
> The original report requires further investigation regarding why the
> bogus change fixed the problem but this change is the correct thing to
> do regardless.
>
> Thanks.
Belated post-holiday ACK (since it's already in).
Thanks for taking care of this, and good catch too.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-07-06 13:46 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-04 13:10 [git patches] libata fixes for .26 Jeff Garzik
2008-07-04 17:09 ` Linus Torvalds
2008-07-05 3:13 ` Tejun Heo
2008-07-05 3:18 ` Tejun Heo
2008-07-05 4:10 ` [PATCH #upstream-fixes] ahci: give another shot at clearing all bits in irq_stat Tejun Heo
2008-07-06 13:45 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).