linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gary Hade <garyhade@us.ibm.com>
To: Gary Hade <garyhade@us.ibm.com>
Cc: Tejun Heo <htejun@gmail.com>,
	Kovid Goyal <kovid@theory.caltech.edu>,
	linux-ide@vger.kernel.org, lcm@us.ibm.com,
	Jeff Garzik <jgarzik@pobox.com>,
	konradr@us.ibm.com
Subject: Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W)
Date: Fri, 16 Feb 2007 16:34:54 -0800	[thread overview]
Message-ID: <20070217003454.GA25571@us.ibm.com> (raw)
In-Reply-To: <20070130233735.GA7483@us.ibm.com>

On Tue, Jan 30, 2007 at 03:37:36PM -0800, Gary Hade wrote:
> On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote:
> > Hello, Gary.
> > 
> > Gary Hade wrote:
> > >>> If they verify your fix (ie,
> > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs
> > >>> after SRST), I'll push similar patch upstream.
> > >> Thanks.  If you think that changes to increase the delays are
> > >> the way to go (at least until we can find a better solution)
> > >> I can provide patches.
> > > 
> > > Tejun, 
> > > I haven't heard anything from you on this so I'm including a delay
> > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below.  
> > > I hope that you, Jeff, and others find this acceptable.
> > 
> > Sorry about being unresponsive.  The thing is that the change adds
> > unnecessary 2 secs of delay to a lot of other normal device-not-present
> > cases, so I was hesitant to ack the patch.  I'll give it more thoughts
> > (and respond timely this time :-)
> 
> Thanks!  My followup was untimely so we're even. :-)
> 
> Some of my random thoughts:
> There does appear to be this invalid assumption that 0xFF status 
> always implies device-not-present.  The status register access 
> restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The 
> contents of this register, except for BSY, shall be ignored when 
> BSY is set to one." which the code does not honor.  There is apparently 
> past experience that 0xFF status implies device-not-present for some
> controllers (the odd clowns :) but I have no idea how common these are.
> We obviously can't get rid of the check but since we cannot clear
> the read-only status register and there appears to be no specification 
> dictated upper limit on how long it should take for a software reset to 
> complete it just seems like we need to wait long enough to support the 
> slowest known device which may be the GoVault.
> 
> > 
> > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is 
> > > useable following boot although the below messages are being logged 
> > > during initialization.  Please let me know if you have any thoughts 
> > > on this.  
> > >   scsi1 : ahci
> > >   ata2: softreset failed (port busy but CLO unavailable)
> > >   ata2: softreset failed, retrying in 5 secs
> > >   ata2: port is slow to respond, please be patient (Status 0x80)
> > >   ata2: port failed to respond (30 secs, Status 0x80)
> > >   ata2: COMRESET failed (device not ready)
> > >   ata2: hardreset failed, retrying in 5 secs
> > >   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >   ata2.00: ATAPI, max UDMA/66
> > >   ata2.00: configured for UDMA/66
> > 
> > The above should have been fixed in 2.6.20-rc6.  Please test it.  It was
> > caused by the ahci driver incorrectly clearing ahci CAP register and
> > fixed recently.
> 
> I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix
> issue it does not appear to be dependent on the port to which the
> device is attached.  I've been playing around with this today and
> found that it could be solved by inserting a delay between the 
> ahci_stop_engine() call and BSY/DRQ check.
> 
> This change:
> --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig	2007-01-30 11:01:20.000000000 -0800
> +++ linux-2.6.20-rc6/drivers/ata/ahci.c	2007-01-30 12:59:38.000000000 -0800
> @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por
>  		goto fail_restart;
>  	}
> 
> +	{
> +		int delay;
> +		u8 stat;
> +		for (delay = 0; delay < 2000; delay+=100) {
> +			if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)))
> +				break;
> +			msleep(100);
> +			stat = ahci_check_status(ap);
> +			ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d DRQ=%d\n",
> +				delay, (stat & ATA_BUSY)?1:0, (stat & ATA_DRQ)?1:0);
> +		}
> +	}
> +
>  	/* check BUSY/DRQ, perform Command List Override if necessary */
>  	if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) {
>  		rc = ahci_clo(ap);
> 
> Yielded this output both with and without the RDC inserted:
> scsi1 : ahci
> ata2: delay=0 BSY=1 DRQ=0
> ata2: delay=100 BSY=1 DRQ=0
> ata2: delay=200 BSY=1 DRQ=0
> ata2: delay=300 BSY=1 DRQ=0
> ata2: delay=400 BSY=1 DRQ=0
> ata2: delay=500 BSY=1 DRQ=0
> ata2: delay=600 BSY=1 DRQ=0
> ata2: delay=700 BSY=1 DRQ=0
> ata2: delay=800 BSY=1 DRQ=0
> ata2: delay=900 BSY=1 DRQ=0
> ata2: delay=1000 BSY=1 DRQ=0
> ata2: delay=1100 BSY=1 DRQ=0
> ata2: delay=1200 BSY=0 DRQ=0
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ATAPI, max UDMA/66
> ata2.00: configured for UDMA/66
> 
> So it appears that we may also have a similar device slowness issue 
> with this driver.

Tejun,
I instrumented the code and found that for the SATA hard drive BSY was set 
just before the call to ahci_init_port() from ahci_port_start() and clear 
after the return from ahci_init_port().  For the GoVault BSY was still set 
after the return from ahci_init_port() and remained set for almost 2 seconds.

The below patch which gives BSY some extra time to clear repairs the problem.  
Unlike the extra delay for ata-piix needed by GoVault I believe this delay 
will only be seen for attached devices that need it.  Please let me know 
what you think.  

Thanks.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@us.ibm.com
http://www.ibm.com/linux/ltc


We encountered a problem where the BSY status bit is still 
set on entry to the 'ahci' error handler during initialization
of the Quantum GoVault when attached to an ICH6R/ICH6RW controller.
This caused a software reset failure due to failed BSY/DRQ check
forcing a hard reset with the following messages logged.
  ata1: softreset failed (port busy but CLO unavailable)
  ata1: softreset failed, retrying in 5 secs
  ata1: port is slow to respond, please be patient (Status 0x80)
  ata1: port failed to respond (30 secs, Status 0x80)
  ata1: COMRESET failed (device not ready)
  ata1: hardreset failed, retrying in 5 secs

It was taking almost 2 seconds for BSY to clear following the
return from ahci_init_port() in ahci_port_start() so this patch
gives BSY up to 3 seconds extra time to clear eliminating the
problem.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>

--- linux-2.6.20-rc7/drivers/ata/ahci.c.orig	2007-02-16 10:11:21.000000000 -0800
+++ linux-2.6.20-rc7/drivers/ata/ahci.c	2007-02-16 13:23:04.000000000 -0800
@@ -1423,6 +1423,8 @@ static int ahci_port_start(struct ata_po
 	void *mem;
 	dma_addr_t mem_dma;
 	int rc;
+	u8 status;
+	unsigned long timeout;
 
 	pp = kmalloc(sizeof(*pp), GFP_KERNEL);
 	if (!pp)
@@ -1477,6 +1479,17 @@ static int ahci_port_start(struct ata_po
 	/* initialize port */
 	ahci_init_port(port_mmio, hpriv->cap, pp->cmd_slot_dma, pp->rx_fis_dma);
 
+	status = ahci_check_status(ap);
+
+	/* for some devices we need to delay to allow BSY to clear */
+	if (status & ATA_BUSY) {
+		timeout = jiffies + 3*HZ;
+		while ((status & ATA_BUSY) && time_before(jiffies, timeout)) {
+			msleep(50);
+			status = ahci_check_status(ap);
+		}
+	}
+
 	return 0;
 }
 

  parent reply	other threads:[~2007-02-17  0:34 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-11 18:03 [2.6.18,19] SATA boot problems (ICH6/ICH6W) Kovid Goyal
2006-12-20  0:44 ` Tejun Heo
2006-12-20  2:00   ` Kovid Goyal
2006-12-20  2:13     ` Tejun Heo
2006-12-20  4:56       ` Kovid Goyal
2007-01-11 23:32       ` Kovid Goyal
2007-01-13  2:19         ` Tejun Heo
2006-12-20  3:29   ` Gary Hade
2006-12-20  3:53     ` Tejun Heo
2006-12-20  4:30       ` Tejun Heo
2006-12-21 17:10       ` Gary Hade
2007-01-30  1:55         ` Gary Hade
2007-01-30  7:32           ` Tejun Heo
2007-01-30 23:37             ` Gary Hade
2007-01-31  0:54               ` Jeff Garzik
2007-01-31 11:00                 ` Tejun Heo
2007-01-31 12:20                   ` Alan
2007-01-31 13:16                     ` Tejun Heo
2007-01-31 15:24                       ` Jeff Garzik
2007-01-31 15:30                         ` Mark Lord
2007-01-31 10:44               ` Tejun Heo
2007-01-31 10:47                 ` Jeff Garzik
2007-01-31 11:00                   ` Tejun Heo
2007-02-01  0:49                 ` Gary Hade
2007-02-17  0:34               ` Gary Hade [this message]
2007-02-21 12:40                 ` Tejun Heo
2007-02-22  0:41                   ` Gary Hade
2007-02-23  0:32                   ` Gary Hade
2007-01-23 21:49 ` danieljzhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070217003454.GA25571@us.ibm.com \
    --to=garyhade@us.ibm.com \
    --cc=htejun@gmail.com \
    --cc=jgarzik@pobox.com \
    --cc=konradr@us.ibm.com \
    --cc=kovid@theory.caltech.edu \
    --cc=lcm@us.ibm.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).