linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
@ 2008-06-05  5:09 Nick Piggin
  2008-06-05  9:24 ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2008-06-05  5:09 UTC (permalink / raw)
  To: alan, jgarzik, linux-ide

Hi,

I bisected a sata regression to commit a57c1bade5a0ee5cd8b74502db9cbebb7f5780b2

The details are as follows:

System is a powerpc (g5)

CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
# CONFIG_SATA_PMP is not set
# CONFIG_SATA_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=y

lspci:
0000:00:0b.0 PCI bridge: Apple Computer Inc. Device 005b
0000:0a:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600] (rev a2)
0001:00:00.0 Host bridge: Apple Computer Inc. U4 HT Bridge
0001:00:01.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-X bridge (rev a3)
0001:00:02.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-X bridge (rev a3)
0001:00:03.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-Express Bridge (rev a3)
0001:00:04.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-Express Bridge (rev a3)
0001:00:05.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-Express Bridge (rev a3)
0001:00:06.0 PCI bridge: Broadcom BCM5780 [HT2000] PCI-Express Bridge (rev a3)
0001:00:07.0 PCI bridge: Apple Computer Inc. Shasta PCI Bridge
0001:00:08.0 PCI bridge: Apple Computer Inc. Shasta PCI Bridge
0001:00:09.0 PCI bridge: Apple Computer Inc. Shasta PCI Bridge
0001:01:07.0 Class ff00: Apple Computer Inc. Shasta Mac I/O
0001:01:0b.0 USB Controller: NEC Corporation USB (rev 43)
0001:01:0b.1 USB Controller: NEC Corporation USB (rev 43)
0001:01:0b.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0001:03:0c.0 IDE interface: Broadcom K2 SATA
0001:03:0d.0 Class ff00: Apple Computer Inc. Shasta IDE
0001:03:0e.0 FireWire (IEEE 1394): Apple Computer Inc. Shasta Firewire
0001:05:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 03)
0001:05:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 03)

dmesg (from the kernel previous to the commit in question):
sata_svw 0001:03:0c.0: version 2.3
scsi0 : sata_svw
scsi1 : sata_svw
scsi2 : sata_svw
scsi3 : sata_svw
ata1: SATA max UDMA/133 mmio m8192@0xfa402000 port 0xfa402000 irq 18
ata2: SATA max UDMA/133 mmio m8192@0xfa402000 port 0xfa402100 irq 18
ata3: SATA max UDMA/133 mmio m8192@0xfa402000 port 0xfa402200 irq 18
ata4: SATA max UDMA/133 mmio m8192@0xfa402000 port 0xfa402300 irq 18
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: Maxtor 7Y250M0, YAR51HW0, max UDMA/133
ata1.00: 490234752 sectors, multi 0: LBA48
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: Maxtor 7Y250M0, YAR51HW0, max UDMA/133
ata2.00: 490234752 sectors, multi 0: LBA48
ata2.00: configured for UDMA/133
ata3: SATA link down (SStatus 0 SControl 0)
ata4: SATA link down (SStatus 0 SControl 0)
scsi 0:0:0:0: Direct-Access     ATA      Maxtor 7Y250M0   YAR5 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
 or FUA
sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO
 or FUA
 sda: [mac] sda1 sda2

And when booting with the commit applied, I instead get a whole lot of
messages like this (this is the first one, copied by hand):

ata2.00: exception Emask 0x0 SAct 0x0 Serr 0x10000000 action 0x6 frozen
ata2: SError: { }
ata2.00: cmd c8/00:02:42:08:20/00:00:00:00:00/e0 tag 0 dma 1024 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete


Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
  2008-06-05  5:09 patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression Nick Piggin
@ 2008-06-05  9:24 ` Alan Cox
  2008-06-05 10:05   ` Jeff Garzik
  2008-06-05 10:21   ` Nick Piggin
  0 siblings, 2 replies; 6+ messages in thread
From: Alan Cox @ 2008-06-05  9:24 UTC (permalink / raw)
  To: Nick Piggin; +Cc: jgarzik, linux-ide

> And when booting with the commit applied, I instead get a whole lot of
> messages like this (this is the first one, copied by hand):
> 
> ata2.00: exception Emask 0x0 SAct 0x0 Serr 0x10000000 action 0x6 frozen
> ata2: SError: { }
> ata2.00: cmd c8/00:02:42:08:20/00:00:00:00:00/e0 tag 0 dma 1024 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata2.00: status: { DRDY }
> ata2: hard resetting link
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: configured for UDMA/133
> ata2: EH complete

Well I've been over the patch twice now and I cannot see a single point
at which the sequence of code that *should* be executed is any different.

Stick wmb();rmb(); (or similar barriers to compiler optimisation and I/O
fencing) at the start and end of your ata_sff_altstatus() and see what
happens, if it suddenly decides to behave or forcing it no inline makes
it behave then that would be useful info.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
  2008-06-05  9:24 ` Alan Cox
@ 2008-06-05 10:05   ` Jeff Garzik
  2008-06-05 10:21   ` Nick Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Garzik @ 2008-06-05 10:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: Nick Piggin, jgarzik, linux-ide

Alan Cox wrote:
>> And when booting with the commit applied, I instead get a whole lot of
>> messages like this (this is the first one, copied by hand):
>>
>> ata2.00: exception Emask 0x0 SAct 0x0 Serr 0x10000000 action 0x6 frozen
>> ata2: SError: { }
>> ata2.00: cmd c8/00:02:42:08:20/00:00:00:00:00/e0 tag 0 dma 1024 in
>>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> ata2.00: status: { DRDY }
>> ata2: hard resetting link
>> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> ata2.00: configured for UDMA/133
>> ata2: EH complete
> 
> Well I've been over the patch twice now and I cannot see a single point
> at which the sequence of code that *should* be executed is any different.
> 
> Stick wmb();rmb(); (or similar barriers to compiler optimisation and I/O
> fencing) at the start and end of your ata_sff_altstatus() and see what
> happens, if it suddenly decides to behave or forcing it no inline makes
> it behave then that would be useful info.

If he's getting a timeout, I wonder if that points to
ata_sff_irq_status() ...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
  2008-06-05  9:24 ` Alan Cox
  2008-06-05 10:05   ` Jeff Garzik
@ 2008-06-05 10:21   ` Nick Piggin
  2008-06-05 10:31     ` Nick Piggin
  1 sibling, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2008-06-05 10:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: jgarzik, linux-ide

On Thu, Jun 05, 2008 at 10:24:24AM +0100, Alan Cox wrote:
> > And when booting with the commit applied, I instead get a whole lot of
> > messages like this (this is the first one, copied by hand):
> > 
> > ata2.00: exception Emask 0x0 SAct 0x0 Serr 0x10000000 action 0x6 frozen
> > ata2: SError: { }
> > ata2.00: cmd c8/00:02:42:08:20/00:00:00:00:00/e0 tag 0 dma 1024 in
> >          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> > ata2.00: status: { DRDY }
> > ata2: hard resetting link
> > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > ata2.00: configured for UDMA/133
> > ata2: EH complete
> 
> Well I've been over the patch twice now and I cannot see a single point
> at which the sequence of code that *should* be executed is any different.

If it is of any help to you, doing this:
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 90d20c6..f6cb8d7 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -1583,6 +1583,12 @@ inline unsigned int ata_sff_host_intr(struct ata_port *ap
        if (status & ATA_BUSY)
                goto idle_irq;

+       /* check main status, clearing INTRQ */
+       status = ap->ops->sff_check_status(ap);
+       if (unlikely(status & ATA_BUSY))
+               goto idle_irq;
+
+
        /* ack bmdma irq events */
        ap->ops->sff_irq_clear(ap);

Gets it working again...


> Stick wmb();rmb(); (or similar barriers to compiler optimisation and I/O
> fencing) at the start and end of your ata_sff_altstatus() and see what
> happens, if it suddenly decides to behave or forcing it no inline makes
> it behave then that would be useful info.

Will give that a try next


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
  2008-06-05 10:21   ` Nick Piggin
@ 2008-06-05 10:31     ` Nick Piggin
  2008-06-05 11:19       ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2008-06-05 10:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: jgarzik, linux-ide

On Thu, Jun 05, 2008 at 12:21:29PM +0200, Nick Piggin wrote:
> On Thu, Jun 05, 2008 at 10:24:24AM +0100, Alan Cox wrote:
> > > And when booting with the commit applied, I instead get a whole lot of
> > > messages like this (this is the first one, copied by hand):
> > > 
> > > ata2.00: exception Emask 0x0 SAct 0x0 Serr 0x10000000 action 0x6 frozen
> > > ata2: SError: { }
> > > ata2.00: cmd c8/00:02:42:08:20/00:00:00:00:00/e0 tag 0 dma 1024 in
> > >          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> > > ata2.00: status: { DRDY }
> > > ata2: hard resetting link
> > > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > ata2.00: configured for UDMA/133
> > > ata2: EH complete
> > 
> > Well I've been over the patch twice now and I cannot see a single point
> > at which the sequence of code that *should* be executed is any different.
> 
> If it is of any help to you, doing this:
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index 90d20c6..f6cb8d7 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -1583,6 +1583,12 @@ inline unsigned int ata_sff_host_intr(struct ata_port *ap
>         if (status & ATA_BUSY)
>                 goto idle_irq;
> 
> +       /* check main status, clearing INTRQ */
> +       status = ap->ops->sff_check_status(ap);
> +       if (unlikely(status & ATA_BUSY))
> +               goto idle_irq;
> +
> +
>         /* ack bmdma irq events */
>         ap->ops->sff_irq_clear(ap);
> 
> Gets it working again...
> 
> 
> > Stick wmb();rmb(); (or similar barriers to compiler optimisation and I/O
> > fencing) at the start and end of your ata_sff_altstatus() and see what
> > happens, if it suddenly decides to behave or forcing it no inline makes
> > it behave then that would be useful info.
> 
> Will give that a try next

And doing this made no difference

diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 90d20c6..db6be15 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -249,10 +249,18 @@ u8 ata_sff_check_status(struct ata_port *ap)
  */
 static u8 ata_sff_altstatus(struct ata_port *ap)
 {
+       u8 ret;
+
+       mb();
+
        if (ap->ops->sff_check_altstatus)
-               return ap->ops->sff_check_altstatus(ap);
+               ret = ap->ops->sff_check_altstatus(ap);
+
+       ret = ioread8(ap->ioaddr.altstatus_addr);

-       return ioread8(ap->ioaddr.altstatus_addr);
+       mb();
+
+       return ret;
 }

 /**


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression
  2008-06-05 10:31     ` Nick Piggin
@ 2008-06-05 11:19       ` Alan Cox
  0 siblings, 0 replies; 6+ messages in thread
From: Alan Cox @ 2008-06-05 11:19 UTC (permalink / raw)
  To: Nick Piggin; +Cc: jgarzik, linux-ide

> > @@ -1583,6 +1583,12 @@ inline unsigned int ata_sff_host_intr(struct ata_port *ap
> >         if (status & ATA_BUSY)
> >                 goto idle_irq;
> > 
> > +       /* check main status, clearing INTRQ */
> > +       status = ap->ops->sff_check_status(ap);
> > +       if (unlikely(status & ATA_BUSY))
> > +               goto idle_irq;
> > +
> > +

Aha all is revealed: and yes it would only show up on a few boxes

libata-sff: Don't assume that check_status is an SFF read

From: Alan Cox <alan@redhat.com>

For a few controllers the sff check status is actually a method not an I/O
access and we must call the method when doing the IRQ check
---

 drivers/ata/libata-sff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Signed-off-by: Alan Cox <alan@redhat.com>

diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 90d20c6..215d186 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -278,7 +278,7 @@ static u8 ata_sff_irq_status(struct ata_port *ap)
 		    	return status;
 	}
 	/* Clear INTRQ latch */
-	status = ata_sff_check_status(ap);
+	status = ap->ops->sff_check_status(ap);
 	return status;
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-05 11:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05  5:09 patch "libata-sff: Fix oops for pnp devices with no ctl" causes regression Nick Piggin
2008-06-05  9:24 ` Alan Cox
2008-06-05 10:05   ` Jeff Garzik
2008-06-05 10:21   ` Nick Piggin
2008-06-05 10:31     ` Nick Piggin
2008-06-05 11:19       ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).