ahci_start_engine compliance with AHCI spec

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ahci_start_engine compliance with AHCI spec
@ 2011-07-08 23:01 Brian Norris
  2011-07-13 13:14 ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2011-07-08 23:01 UTC (permalink / raw)
  To: linux-ide
  Cc: Tejun Heo, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee,
	Brian Norris

Hello,

I am looking into a problem similar to one Jian Peng had, where my
AHCI controller cannot handle ahci_start_engine() requests when its
ports are in the wrong states (BSY/DRQ/PxSSTS.DET==0). As this is
partly an issue of compliance with the AHCI specification, I would
like to find a good fix for this problem that is valid on most
controllers, not requiring a special flag to enable a workaround as
was suggested earlier.

See Jian's patch:
https://lkml.org/lkml/2011/4/23/9

And the regression it caused:
https://lkml.org/lkml/2011/5/11/472

I am able to reproduce the regressions seen by Rafael and Michael on
my Dell Latitude E6410 laptop, in case that's helpful.

I haven't been able to come up with a good generic solution to bring
the driver in line with the AHCI specification. Any comments on this
issue would be helpful, as I'm fairly new to the ATA/AHCI driver
subsystem. I'm looking mainly at the device initialization for AHCI,
via ahci_init_one(), and the eventual ahci_start_engine() call.

What I've found so far:

It seems that at first device initialization on either my Dell E6410
or my special controller, the ahci_start_engine will invariably be
called with the either the BSY or DRQ bit set, depending on whether or
not there is an actual device on the affected port (see stack trace
below). I'm not sure what is causing the device to be requesting data
or busy at this point, but whatever it is causes the device
initialization process to fail (links are *not* up). Instead, we rely
on ahci_error_handler to clean this up, where after a hard reset, the

- ahci_init_one
-- ata_host_activate
--- ata_host_start
---- ahci_port_start
---- ahci_port_resume
----- ahci_start_port
------ ahci_power_up [1]
------ ahci_start_engine (DRQ or BSY *will* be active) [2]

and later

- scsi_error_handler
-- ata_scsi_error
--- ata_scsi_port_error_handler
---- ahci_error_handler
----- sata_pmp_error_handler
------ ata_eh_recover
------- ata_eh_reset
-------- ata_do_reset
--------- ahci_hardreset
---------- ahci_start_engine (DRQ, BSY cleared, link up)

I'm not sure if the "error_handler" and "hard reset" processes are
intended for initialization...as I said I'm a little new!

I have a few other questions:

What operation could be putting devices in DRQ or BSY states during
initialization but before ahci_start_engine?

How much of section 10.1 of the AHCI 1.3 spec applies to our AHCI
driver? Just 10.1.2 or do we have to do the "firmware initialization"
in 10.1.1 as well? Either way, it seems that section 10.10.2 implies
that we need to do some of the "firmware initialization" (because we
use staggered spin-up in ahci_power_up):
"In order to spin up the devices attached to the HBA, software should
perform the procedure outlined in section 10.1.1 for staggered
spin-up."

Then the applicable step from 10.10.1 (step 5):
"Wait for a positive indication that a device is attached to the port
(the maximum amount of time to wait for presence indication is
specified in the Serial ATA Revision 2.6 specification). This is done
by polling PxSSTS.DET. If PxSSTS.DET returns a value of 1h or 3h when
read, then system software shall continue to the next step, otherwise
if the polling process times out system software moves to the next
implemented port and returns to step 1."

I bring up all of this because it seems that if I put some amount of
"wait time" between [1] and [2] above, then my system transitions from
DRQ to BSY and its link is connected (PxSSTS.DET == 0x3). I still
don't know why the device is BSY, but at least it solves my problem...
Perhaps I will try implementing the wait with ata_wait_register (or
maybe ata_wait_ready + ata_phys_link_online) on the PxSSTS.DET flags
and send a patch.

Sorry if this e-mail is too complicated or disorganized. I've been
racking my brain on this one for a few weeks now, and I've only come
up with a few half answers and some more questions. Feel free to ask
for more explanation, but don't worry if I don't respond immediately,
as I am on vacation for all of next week. If I don't get to them
before I leave, I will get to your replies when I return.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-08 23:01 ahci_start_engine compliance with AHCI spec Brian Norris
@ 2011-07-13 13:14 ` Tejun Heo
  2011-07-18 18:40   ` Brian Norris
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2011-07-13 13:14 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hello,

On Fri, Jul 08, 2011 at 04:01:17PM -0700, Brian Norris wrote:
> I am able to reproduce the regressions seen by Rafael and Michael on
> my Dell Latitude E6410 laptop, in case that's helpful.

Oh yes, it is.

> - ahci_init_one
> -- ata_host_activate
> --- ata_host_start
> ---- ahci_port_start
> ---- ahci_port_resume
> ----- ahci_start_port
> ------ ahci_power_up [1]
> ------ ahci_start_engine (DRQ or BSY *will* be active) [2]
> 
> and later
> 
> - scsi_error_handler
> -- ata_scsi_error
> --- ata_scsi_port_error_handler
> ---- ahci_error_handler
> ----- sata_pmp_error_handler
> ------ ata_eh_recover
> ------- ata_eh_reset
> -------- ata_do_reset
> --------- ahci_hardreset
> ---------- ahci_start_engine (DRQ, BSY cleared, link up)
> 
> I'm not sure if the "error_handler" and "hard reset" processes are
> intended for initialization...as I said I'm a little new!

That's how it's supposed to work.  EH is integral part of probing
sequence.

> I have a few other questions:
> 
> What operation could be putting devices in DRQ or BSY states during
> initialization but before ahci_start_engine?

Hmmm... I have no idea, maybe it has something to do with the first
D2H Reg FIS device sends after link gets reset during controller init?

> I bring up all of this because it seems that if I put some amount of
> "wait time" between [1] and [2] above, then my system transitions from
> DRQ to BSY and its link is connected (PxSSTS.DET == 0x3). I still
> don't know why the device is BSY, but at least it solves my problem...
> Perhaps I will try implementing the wait with ata_wait_register (or
> maybe ata_wait_ready + ata_phys_link_online) on the PxSSTS.DET flags
> and send a patch.

Hmmm... what happens if you don't comment out ahci_start_engine() call
from ahci_start_port()?

> Sorry if this e-mail is too complicated or disorganized. I've been
> racking my brain on this one for a few weeks now, and I've only come
> up with a few half answers and some more questions. Feel free to ask
> for more explanation, but don't worry if I don't respond immediately,
> as I am on vacation for all of next week. If I don't get to them
> before I leave, I will get to your replies when I return.

Is this the same IP block that Jian Peng was using?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-13 13:14 ` Tejun Heo
@ 2011-07-18 18:40   ` Brian Norris
  2011-07-21  8:49     ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2011-07-18 18:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hi Tejun,

On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
> On Fri, Jul 08, 2011 at 04:01:17PM -0700, Brian Norris wrote:
>> I'm not sure if the "error_handler" and "hard reset" processes are
>> intended for initialization...as I said I'm a little new!
>
> That's how it's supposed to work.  EH is integral part of probing
> sequence.

I began to suspect that was the case. It just seemed awkward that the
expected operation includes a code path named "error".

>> I have a few other questions:
>>
>> What operation could be putting devices in DRQ or BSY states during
>> initialization but before ahci_start_engine?
>
> Hmmm... I have no idea, maybe it has something to do with the first
> D2H Reg FIS device sends after link gets reset during controller init?

OK, I'll try to track that one down, but I think that DRQ is set much
earlier in the initialization than that and won't be cleared until
between ahci_power_up() and ahci_start_engine() - I tried
wait-and-poll at several different points in the process to understand
the HBA/device initialization process.

> Hmmm... what happens if you don't comment out ahci_start_engine() call
> from ahci_start_port()?

I wasn't commenting out the ahci_start_engine() from
ahci_start_port(). Can you clarify what you mean?

> Is this the same IP block that Jian Peng was using?

Yes, it is. I'm taking over some of his work.

Thanks for the responses.

Brian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-18 18:40   ` Brian Norris
@ 2011-07-21  8:49     ` Tejun Heo
  2011-07-21 17:13       ` Brian Norris
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2011-07-21  8:49 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hello,

On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Fri, Jul 08, 2011 at 04:01:17PM -0700, Brian Norris wrote:
> >> I'm not sure if the "error_handler" and "hard reset" processes are
> >> intended for initialization...as I said I'm a little new!
> >
> > That's how it's supposed to work.  EH is integral part of probing
> > sequence.
> 
> I began to suspect that was the case. It just seemed awkward that the
> expected operation includes a code path named "error".

Heh, well, SCSI called it error_handler but inside ATA I usually call
it exception handler and probing / hot plugging / etc are exceptional
cases compared to boring sending and receiving of bits to already
attached devices. ;)

> >> I have a few other questions:
> >>
> >> What operation could be putting devices in DRQ or BSY states during
> >> initialization but before ahci_start_engine?
> >
> > Hmmm... I have no idea, maybe it has something to do with the first
> > D2H Reg FIS device sends after link gets reset during controller init?
> 
> OK, I'll try to track that one down, but I think that DRQ is set much
> earlier in the initialization than that and won't be cleared until
> between ahci_power_up() and ahci_start_engine() - I tried
> wait-and-poll at several different points in the process to understand
> the HBA/device initialization process.

I see.

> > Hmmm... what happens if you don't comment out ahci_start_engine() call
> > from ahci_start_port()?
> 
> I wasn't commenting out the ahci_start_engine() from
> ahci_start_port(). Can you clarify what you mean?

Oh, I meant "what if you comment out..."  I wrote that sentence in
negative and then switched but forgot removing "don't".

> > Is this the same IP block that Jian Peng was using?
> 
> Yes, it is. I'm taking over some of his work.

Is there any way to detect that particular IP block.  It's the only
one with this problem so maybe we just should treat it specially.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-21  8:49     ` Tejun Heo
@ 2011-07-21 17:13       ` Brian Norris
  2011-07-22  9:03         ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Brian Norris @ 2011-07-21 17:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

On Thu, Jul 21, 2011 at 1:49 AM, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
>> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
>> > Hmmm... what happens if you don't comment out ahci_start_engine() call
>> > from ahci_start_port()?
>>
>> I wasn't commenting out the ahci_start_engine() from
>> ahci_start_port(). Can you clarify what you mean?
>
> Oh, I meant "what if you comment out..."  I wrote that sentence in
> negative and then switched but forgot removing "don't".

OK, well I tried simply commenting out that ahci_start_engine() on
both my special controller and on the Dell E6410 laptop and it worked
just fine (solved my issues and didn't cause any issues on the Dell).
Is this safe? It seems like we end up calling ahci_start_engine() at
the end of the error handling process anyway, so maybe this call is
not really necessary in the first place?

Anyway, I also tried my own fix for this: adding a small delay to wait
for some link recognition at the end of ahci_power_up(). I'm not sure
if this is the greatest, but it also works for both systems I'm
testing. I included the test patch here (based on linux-2.6). BTW, I'm
not sure my mail will be formatted perfectly here. I can resend with
my other mailer if needed.

>> > Is this the same IP block that Jian Peng was using?
>>
>> Yes, it is. I'm taking over some of his work.
>
> Is there any way to detect that particular IP block.  It's the only
> one with this problem so maybe we just should treat it specially.

I'm not sure how easy it would be to detect this particular version.
Besides, that is not our first choice solution, as our special-case
fix would not be very well tested for logic bugs.

Brian

---
 drivers/ata/libahci.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 41223c7..0c0c444 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -657,6 +657,19 @@ static void ahci_power_up(struct ata_port *ap)

        /* wake up link */
        writel(cmd | PORT_CMD_ICC_ACTIVE, port_mmio + PORT_CMD);
+
+       /*
+        * Wait for PxSSTS.DET == 1h or 3h, step 5 of 10.1.1 in
+        * AHCI 1.3 spec. See 10.10.2 for spin-up procedure.
+        */
+       if (ata_wait_register(ap, port_mmio + ahci_scr_offset(ap, SCR_STATUS),
+                               0x01, 0x00, 1, 10)) {
+               /* also wait for DRQ to be cleared */
+               ata_wait_register(ap, port_mmio + PORT_TFDATA, ATA_DRQ,
+                               ATA_DRQ, 1, 10);
+
+               /* proceed with step 6, 7 of 10.1.1 in AHCI 1.3 spec? */
+       }
 }

 static int ahci_set_lpm(struct ata_link *link, enum ata_lpm_policy policy,

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-21 17:13       ` Brian Norris
@ 2011-07-22  9:03         ` Tejun Heo
  2011-07-22  9:41           ` [PATCH #upstream] ahci: start engine only during soft/hard resets Tejun Heo
  2011-08-03  0:06           ` ahci_start_engine compliance with AHCI spec Brian Norris
  0 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2011-07-22  9:03 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hello, Brian.

On Thu, Jul 21, 2011 at 10:13:16AM -0700, Brian Norris wrote:
> On Thu, Jul 21, 2011 at 1:49 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
> >> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@kernel.org> wrote:
> >> > Hmmm... what happens if you don't comment out ahci_start_engine() call
> >> > from ahci_start_port()?
> >>
> >> I wasn't commenting out the ahci_start_engine() from
> >> ahci_start_port(). Can you clarify what you mean?
> >
> > Oh, I meant "what if you comment out..."  I wrote that sentence in
> > negative and then switched but forgot removing "don't".
> 
> OK, well I tried simply commenting out that ahci_start_engine() on
> both my special controller and on the Dell E6410 laptop and it worked
> just fine (solved my issues and didn't cause any issues on the Dell).
> Is this safe? It seems like we end up calling ahci_start_engine() at
> the end of the error handling process anyway, so maybe this call is
> not really necessary in the first place?

Yes, I believe so.

> Anyway, I also tried my own fix for this: adding a small delay to wait
> for some link recognition at the end of ahci_power_up(). I'm not sure
> if this is the greatest, but it also works for both systems I'm
> testing. I included the test patch here (based on linux-2.6). BTW, I'm
> not sure my mail will be formatted perfectly here. I can resend with
> my other mailer if needed.

The problem is that both my and your approach aren't ultimately safe
on this particular IP block.  I don't think it's possible make things
completely safe for it.  There's no mutual exclusion against PHY
events - be it flaky signal, power surge or actual hotplug - and
driver operation.  No matter how careful the driver behaves, if PHY
events happen after the last check before starting DMA engine, DRQ may
be set by the time driver gets to it.

The IP block you're dealing with is inherently buggy.  What the spec
means, I think, is the DMA engine might not start or behave properly
if enabled while DRQ is set, which is fine.  Driver will notice that,
reset stuff and retry.  It is *completely* different from "the
controller becomes brick until power cycled if that happens".  So, we
can work around all we want but that is one buggy controller.  If
possible, please tell the manufacturer or licensor to fix it.

For now, let's first try removing ahci_start_engine() call from
port_start and see how that goes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH #upstream] ahci: start engine only during soft/hard resets
  2011-07-22  9:03         ` Tejun Heo
@ 2011-07-22  9:41           ` Tejun Heo
  2011-07-22 20:41             ` Brian Norris
  2011-07-23 22:12             ` Jeff Garzik
  2011-08-03  0:06           ` ahci_start_engine compliance with AHCI spec Brian Norris
  1 sibling, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2011-07-22  9:41 UTC (permalink / raw)
  To: Jeff Garzik, Brian Norris
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Michael Leun,
	linux-kernel, Jian Peng, Kevin Cernekee

This is another attempt at fixing the same problem that 270dac35c2
(libata: ahci_start_engine compliant to AHCI spec) tried to solve.
Unfortunately, 270dac35c2 created regressions for a lot more common
controllers and got reverted.

This specific AHCI IP block becomes a brick if the DMA engine is
started while DRQ is set.  It is not possible to avoid the condition
completely but the most common occurrence is caused by spurious use of
ahci_start_engine() from ahci_start_port() during init sequence.

DMA engine is started after both soft and hard resets and
ahci_start_port() is always followed by resets, so there is no reason
to start DMA engine from ahci_start_port().

This patch removes ahci_start_engine() invocation from
ahci_start_port().  This change makes failure path of
ahci_port_suspend() leave engine stopped without following resets.
This is resolved by replacing ahci_start_port() call with
ata_port_freeze() which forces resets afterwards, which is the better
behavior anyway.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Brian Norris <computersforpeace@gmail.com>
Reported-by: Jian Peng <jipeng2005@gmail.com>
---
Jeff, this needs to be tested for a while in linux-next before sending
it mainline.  Please apply it for 3.1 merge window and let's see
whether anything explodes.

Thanks.

 drivers/ata/libahci.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 41223c7..d57f4cf 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -745,9 +745,6 @@ static void ahci_start_port(struct ata_port *ap)
 	/* enable FIS reception */
 	ahci_start_fis_rx(ap);

-	/* enable DMA */
-	ahci_start_engine(ap);
-
 	/* turn on LEDs */
 	if (ap->flags & ATA_FLAG_EM) {
 		ata_for_each_link(link, ap, EDGE) {
@@ -1976,7 +1973,7 @@ static int ahci_port_suspend(struct ata_port *ap, pm_message_t mesg)
 		ahci_power_down(ap);
 	else {
 		ata_port_printk(ap, KERN_ERR, "%s (%d)\n", emsg, rc);
-		ahci_start_port(ap);
+		ata_port_freeze(ap);
 	}

 	return rc;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH #upstream] ahci: start engine only during soft/hard resets
  2011-07-22  9:41           ` [PATCH #upstream] ahci: start engine only during soft/hard resets Tejun Heo
@ 2011-07-22 20:41             ` Brian Norris
  2011-07-23 22:12             ` Jeff Garzik
  1 sibling, 0 replies; 11+ messages in thread
From: Brian Norris @ 2011-07-22 20:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, linux-ide, Valdis.Kletnieks, Rafael J. Wysocki,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

On Fri, Jul 22, 2011 at 2:41 AM, Tejun Heo <tj@kernel.org> wrote:
> This is another attempt at fixing the same problem that 270dac35c2
> (libata: ahci_start_engine compliant to AHCI spec) tried to solve.
> Unfortunately, 270dac35c2 created regressions for a lot more common
> controllers and got reverted.

I've tested this on my problem controller with success so far. Also
tried on Dell Latitude E6410 (with various boot, suspend, resume,
unplug power, etc. cycles) with no problem. Thanks Tejun!

In case it's useful I'll give a tested-by:

Tested-by: Brian Norris <computersforpeace@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH #upstream] ahci: start engine only during soft/hard resets
  2011-07-22  9:41           ` [PATCH #upstream] ahci: start engine only during soft/hard resets Tejun Heo
  2011-07-22 20:41             ` Brian Norris
@ 2011-07-23 22:12             ` Jeff Garzik
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2011-07-23 22:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Brian Norris, linux-ide, Valdis.Kletnieks, Rafael J. Wysocki,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

On 07/22/2011 05:41 AM, Tejun Heo wrote:
> This is another attempt at fixing the same problem that 270dac35c2
> (libata: ahci_start_engine compliant to AHCI spec) tried to solve.
> Unfortunately, 270dac35c2 created regressions for a lot more common
> controllers and got reverted.
>
> This specific AHCI IP block becomes a brick if the DMA engine is
> started while DRQ is set.  It is not possible to avoid the condition
> completely but the most common occurrence is caused by spurious use of
> ahci_start_engine() from ahci_start_port() during init sequence.
>
> DMA engine is started after both soft and hard resets and
> ahci_start_port() is always followed by resets, so there is no reason
> to start DMA engine from ahci_start_port().
>
> This patch removes ahci_start_engine() invocation from
> ahci_start_port().  This change makes failure path of
> ahci_port_suspend() leave engine stopped without following resets.
> This is resolved by replacing ahci_start_port() call with
> ata_port_freeze() which forces resets afterwards, which is the better
> behavior anyway.
>
> Signed-off-by: Tejun Heo<tj@kernel.org>
> Reported-by: Brian Norris<computersforpeace@gmail.com>
> Reported-by: Jian Peng<jipeng2005@gmail.com>
> ---
> Jeff, this needs to be tested for a while in linux-next before sending
> it mainline.  Please apply it for 3.1 merge window and let's see
> whether anything explodes.

Stuffed this into branch #upstream-next, which appears periodically for 
situations like this.  #upstream-next will become #upstream after the 
merge window closes.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-07-22  9:03         ` Tejun Heo
  2011-07-22  9:41           ` [PATCH #upstream] ahci: start engine only during soft/hard resets Tejun Heo
@ 2011-08-03  0:06           ` Brian Norris
  2011-08-04  9:44             ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: Brian Norris @ 2011-08-03  0:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hi Tejun,

I wanted to follow up a bit on your last comments.

On Fri, Jul 22, 2011 at 2:03 AM, Tejun Heo <tj@kernel.org> wrote:
> The problem is that both my and your approach aren't ultimately safe
> on this particular IP block.  I don't think it's possible make things
> completely safe for it.  There's no mutual exclusion against PHY
> events - be it flaky signal, power surge or actual hotplug - and
> driver operation.  No matter how careful the driver behaves, if PHY
> events happen after the last check before starting DMA engine, DRQ may
> be set by the time driver gets to it.

Can DRQ be set from 0->1 without a software-initiated action? I didn't
think it was directly tied to PHY events, and so we can fairly well
predict that it will remain 0.

On the other hand, PxSSTS.DET can be affected by PHY, but I don't
believe DET != 3 directly triggers this hardware bug.

> The IP block you're dealing with is inherently buggy.  What the spec
> means, I think, is the DMA engine might not start or behave properly
> if enabled while DRQ is set, which is fine.  Driver will notice that,
> reset stuff and retry.  It is *completely* different from "the
> controller becomes brick until power cycled if that happens".  So, we
> can work around all we want but that is one buggy controller.  If
> possible, please tell the manufacturer or licensor to fix it.

Yes, I believe the hardware designers know how buggy this is...but
it's still worth some effort to fix the software as well as possible
for current hardware behavior.

> For now, let's first try removing ahci_start_engine() call from
> port_start and see how that goes.

Thanks for the help, and happy testing!

Brian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ahci_start_engine compliance with AHCI spec
  2011-08-03  0:06           ` ahci_start_engine compliance with AHCI spec Brian Norris
@ 2011-08-04  9:44             ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2011-08-04  9:44 UTC (permalink / raw)
  To: Brian Norris
  Cc: linux-ide, Valdis.Kletnieks, Rafael J. Wysocki, Jeff Garzik,
	Michael Leun, linux-kernel, Jian Peng, Kevin Cernekee

Hello,

On Tue, Aug 02, 2011 at 05:06:13PM -0700, Brian Norris wrote:
> Can DRQ be set from 0->1 without a software-initiated action? I didn't
> think it was directly tied to PHY events, and so we can fairly well
> predict that it will remain 0.

Sure it can.  It's updated by Register D2H FISes sent by the device.
It can change after a PHY event; otherwise, I don't think it would get
set during init without any command in flight, right?

> Yes, I believe the hardware designers know how buggy this is...but
> it's still worth some effort to fix the software as well as possible
> for current hardware behavior.

Yeap, sure, let's get it working for majority of cases.  I just wanted
to point out that the hardware eventually needs to be fixed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-08-04  9:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-08 23:01 ahci_start_engine compliance with AHCI spec Brian Norris
2011-07-13 13:14 ` Tejun Heo
2011-07-18 18:40   ` Brian Norris
2011-07-21  8:49     ` Tejun Heo
2011-07-21 17:13       ` Brian Norris
2011-07-22  9:03         ` Tejun Heo
2011-07-22  9:41           ` [PATCH #upstream] ahci: start engine only during soft/hard resets Tejun Heo
2011-07-22 20:41             ` Brian Norris
2011-07-23 22:12             ` Jeff Garzik
2011-08-03  0:06           ` ahci_start_engine compliance with AHCI spec Brian Norris
2011-08-04  9:44             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).