From: Gary Hade <garyhade@us.ibm.com>
To: Gary Hade <garyhade@us.ibm.com>
Cc: Tejun Heo <htejun@gmail.com>,
Kovid Goyal <kovid@theory.caltech.edu>,
linux-ide@vger.kernel.org, lcm@us.ibm.com,
Jeff Garzik <jgarzik@pobox.com>,
konradr@us.ibm.com
Subject: Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W)
Date: Fri, 16 Feb 2007 16:34:54 -0800 [thread overview]
Message-ID: <20070217003454.GA25571@us.ibm.com> (raw)
In-Reply-To: <20070130233735.GA7483@us.ibm.com>
On Tue, Jan 30, 2007 at 03:37:36PM -0800, Gary Hade wrote:
> On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote:
> > Hello, Gary.
> >
> > Gary Hade wrote:
> > >>> If they verify your fix (ie,
> > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs
> > >>> after SRST), I'll push similar patch upstream.
> > >> Thanks. If you think that changes to increase the delays are
> > >> the way to go (at least until we can find a better solution)
> > >> I can provide patches.
> > >
> > > Tejun,
> > > I haven't heard anything from you on this so I'm including a delay
> > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below.
> > > I hope that you, Jeff, and others find this acceptable.
> >
> > Sorry about being unresponsive. The thing is that the change adds
> > unnecessary 2 secs of delay to a lot of other normal device-not-present
> > cases, so I was hesitant to ack the patch. I'll give it more thoughts
> > (and respond timely this time :-)
>
> Thanks! My followup was untimely so we're even. :-)
>
> Some of my random thoughts:
> There does appear to be this invalid assumption that 0xFF status
> always implies device-not-present. The status register access
> restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The
> contents of this register, except for BSY, shall be ignored when
> BSY is set to one." which the code does not honor. There is apparently
> past experience that 0xFF status implies device-not-present for some
> controllers (the odd clowns :) but I have no idea how common these are.
> We obviously can't get rid of the check but since we cannot clear
> the read-only status register and there appears to be no specification
> dictated upper limit on how long it should take for a software reset to
> complete it just seems like we need to wait long enough to support the
> slowest known device which may be the GoVault.
>
> >
> > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is
> > > useable following boot although the below messages are being logged
> > > during initialization. Please let me know if you have any thoughts
> > > on this.
> > > scsi1 : ahci
> > > ata2: softreset failed (port busy but CLO unavailable)
> > > ata2: softreset failed, retrying in 5 secs
> > > ata2: port is slow to respond, please be patient (Status 0x80)
> > > ata2: port failed to respond (30 secs, Status 0x80)
> > > ata2: COMRESET failed (device not ready)
> > > ata2: hardreset failed, retrying in 5 secs
> > > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > > ata2.00: ATAPI, max UDMA/66
> > > ata2.00: configured for UDMA/66
> >
> > The above should have been fixed in 2.6.20-rc6. Please test it. It was
> > caused by the ahci driver incorrectly clearing ahci CAP register and
> > fixed recently.
>
> I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix
> issue it does not appear to be dependent on the port to which the
> device is attached. I've been playing around with this today and
> found that it could be solved by inserting a delay between the
> ahci_stop_engine() call and BSY/DRQ check.
>
> This change:
> --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig 2007-01-30 11:01:20.000000000 -0800
> +++ linux-2.6.20-rc6/drivers/ata/ahci.c 2007-01-30 12:59:38.000000000 -0800
> @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por
> goto fail_restart;
> }
>
> + {
> + int delay;
> + u8 stat;
> + for (delay = 0; delay < 2000; delay+=100) {
> + if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)))
> + break;
> + msleep(100);
> + stat = ahci_check_status(ap);
> + ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d DRQ=%d\n",
> + delay, (stat & ATA_BUSY)?1:0, (stat & ATA_DRQ)?1:0);
> + }
> + }
> +
> /* check BUSY/DRQ, perform Command List Override if necessary */
> if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) {
> rc = ahci_clo(ap);
>
> Yielded this output both with and without the RDC inserted:
> scsi1 : ahci
> ata2: delay=0 BSY=1 DRQ=0
> ata2: delay=100 BSY=1 DRQ=0
> ata2: delay=200 BSY=1 DRQ=0
> ata2: delay=300 BSY=1 DRQ=0
> ata2: delay=400 BSY=1 DRQ=0
> ata2: delay=500 BSY=1 DRQ=0
> ata2: delay=600 BSY=1 DRQ=0
> ata2: delay=700 BSY=1 DRQ=0
> ata2: delay=800 BSY=1 DRQ=0
> ata2: delay=900 BSY=1 DRQ=0
> ata2: delay=1000 BSY=1 DRQ=0
> ata2: delay=1100 BSY=1 DRQ=0
> ata2: delay=1200 BSY=0 DRQ=0
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ATAPI, max UDMA/66
> ata2.00: configured for UDMA/66
>
> So it appears that we may also have a similar device slowness issue
> with this driver.
Tejun,
I instrumented the code and found that for the SATA hard drive BSY was set
just before the call to ahci_init_port() from ahci_port_start() and clear
after the return from ahci_init_port(). For the GoVault BSY was still set
after the return from ahci_init_port() and remained set for almost 2 seconds.
The below patch which gives BSY some extra time to clear repairs the problem.
Unlike the extra delay for ata-piix needed by GoVault I believe this delay
will only be seen for attached devices that need it. Please let me know
what you think.
Thanks.
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc
We encountered a problem where the BSY status bit is still
set on entry to the 'ahci' error handler during initialization
of the Quantum GoVault when attached to an ICH6R/ICH6RW controller.
This caused a software reset failure due to failed BSY/DRQ check
forcing a hard reset with the following messages logged.
ata1: softreset failed (port busy but CLO unavailable)
ata1: softreset failed, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ata1: COMRESET failed (device not ready)
ata1: hardreset failed, retrying in 5 secs
It was taking almost 2 seconds for BSY to clear following the
return from ahci_init_port() in ahci_port_start() so this patch
gives BSY up to 3 seconds extra time to clear eliminating the
problem.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
--- linux-2.6.20-rc7/drivers/ata/ahci.c.orig 2007-02-16 10:11:21.000000000 -0800
+++ linux-2.6.20-rc7/drivers/ata/ahci.c 2007-02-16 13:23:04.000000000 -0800
@@ -1423,6 +1423,8 @@ static int ahci_port_start(struct ata_po
void *mem;
dma_addr_t mem_dma;
int rc;
+ u8 status;
+ unsigned long timeout;
pp = kmalloc(sizeof(*pp), GFP_KERNEL);
if (!pp)
@@ -1477,6 +1479,17 @@ static int ahci_port_start(struct ata_po
/* initialize port */
ahci_init_port(port_mmio, hpriv->cap, pp->cmd_slot_dma, pp->rx_fis_dma);
+ status = ahci_check_status(ap);
+
+ /* for some devices we need to delay to allow BSY to clear */
+ if (status & ATA_BUSY) {
+ timeout = jiffies + 3*HZ;
+ while ((status & ATA_BUSY) && time_before(jiffies, timeout)) {
+ msleep(50);
+ status = ahci_check_status(ap);
+ }
+ }
+
return 0;
}
next prev parent reply other threads:[~2007-02-17 0:34 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-11 18:03 [2.6.18,19] SATA boot problems (ICH6/ICH6W) Kovid Goyal
2006-12-20 0:44 ` Tejun Heo
2006-12-20 2:00 ` Kovid Goyal
2006-12-20 2:13 ` Tejun Heo
2006-12-20 4:56 ` Kovid Goyal
2007-01-11 23:32 ` Kovid Goyal
2007-01-13 2:19 ` Tejun Heo
2006-12-20 3:29 ` Gary Hade
2006-12-20 3:53 ` Tejun Heo
2006-12-20 4:30 ` Tejun Heo
2006-12-21 17:10 ` Gary Hade
2007-01-30 1:55 ` Gary Hade
2007-01-30 7:32 ` Tejun Heo
2007-01-30 23:37 ` Gary Hade
2007-01-31 0:54 ` Jeff Garzik
2007-01-31 11:00 ` Tejun Heo
2007-01-31 12:20 ` Alan
2007-01-31 13:16 ` Tejun Heo
2007-01-31 15:24 ` Jeff Garzik
2007-01-31 15:30 ` Mark Lord
2007-01-31 10:44 ` Tejun Heo
2007-01-31 10:47 ` Jeff Garzik
2007-01-31 11:00 ` Tejun Heo
2007-02-01 0:49 ` Gary Hade
2007-02-17 0:34 ` Gary Hade [this message]
2007-02-21 12:40 ` Tejun Heo
2007-02-22 0:41 ` Gary Hade
2007-02-23 0:32 ` Gary Hade
2007-01-23 21:49 ` danieljzhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070217003454.GA25571@us.ibm.com \
--to=garyhade@us.ibm.com \
--cc=htejun@gmail.com \
--cc=jgarzik@pobox.com \
--cc=konradr@us.ibm.com \
--cc=kovid@theory.caltech.edu \
--cc=lcm@us.ibm.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.