Re: sata_nv + ADMA + Samsung disk problem

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert Hancock <hancockr@shaw.ca>
To: Tejun Heo <htejun@gmail.com>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
	kluo@nvidia.com, AMartin@nvidia.com, pchen@nvidia.com
Subject: Re: sata_nv + ADMA + Samsung disk problem
Date: Tue, 01 Jan 2008 22:03:09 -0600	[thread overview]
Message-ID: <477B0CFD.1030603@shaw.ca> (raw)
In-Reply-To: <477B0429.7040909@gmail.com>

Tejun Heo wrote:
> [cc'ing Robert Hancock and NVidia people]
> 
> Whole thread can be read from the following URL.
> 
>   http://thread.gmane.org/gmane.linux.ide/21710
> 
> In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out.  I
> first suspected faulty disk (reallocation failure on flush) but SMART
> reports nothing suspicious and w/ ADMA disabled, the drive works just fine.
> 
> On a side note, on 2.6.22.1, SMART fails from time to time but the
> problem went away on 2.6.24-rc6.  This was apparently fixed during that
> period.  I guess we can ignore this for now.
> 
> Thanks.

This is kind of a longstanding problem which has been partially worked 
around, but it seems not entirely. This is what I had diagnosed some 
time ago:

"recently, some issues cropped up with command timeouts when a cache 
flush command was immediately followed by an NCQ write. In this case, 
sometimes when the NCQ write was issued, the status register changed 
from 0x500 (Stopped and Idle) to 0x400 (Stopped) as it normally appears 
to, however it seems like the controller would get hung in that state, 
and we would time out with no notifiers set, the gen_ctl register not 
indicating interrupt status, and the CPB response flags still 0 as we 
left them, seemingly indicating the controller hasn't done anything with 
it. Then, when the error handler kicks in we clear the GO bit to put it 
back into register mode, but the Legacy flag in the status register 
doesn't get set (or at least it takes longer than 1 microsecond). 
Finally when we do an ADMA channel reset that seems to get it responding 
again, until this happens the next time.

 From some experimentation, I found that when we are issuing a NCQ
command when the last command was non-NCQ, or vice versa, if I added in
a delay of 20 microseconds between setting up the CPB and writing to the
append register, the problem appeared to go away. Problem is I don't
know if that's because it actually needs this delay, or because it
changes the timing so that it happens to work even though we're doing
something wrong, there's some event we're not waiting for, etc.

I've now verified that no switches between ADMA and register mode occur 
near the time of these timeouts. Neither are we reading or writing any 
of the ATA shadow registers while we're in ADMA mode."

It seems likely that this is what is happening here (a switch from an 
NCQ command to a non-NCQ command, then the non-NCQ times out). It could 
be in some cases the 20 microsecond delay is not enough. But it seems 
bogus that we should need such an arbitrary delay in the first place.

The question I had for NVIDIA regarding this that I never got answered 
was, is there any reason why we would need a delay when switching 
between NCQ and non-NCQ commands on ADMA, and if not, is there any known 
cause that could cause the controller to get into this seemingly 
locked-up state?

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

next prev parent reply	other threads:[~2008-01-02  4:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-08 12:08 sata_nv + ADMA + Samsung disk problem Gabor Gombas
2007-08-14  9:30 ` Tejun Heo
2007-08-14 12:02   ` Gabor Gombas
2007-08-16 16:06   ` Gabor Gombas
2007-08-16 18:45     ` Jim Paris
2008-01-01 16:44 ` Gabor Gombas
2008-01-02  3:25   ` Tejun Heo
2008-01-02  4:03     ` Robert Hancock [this message]
2008-01-02  4:20       ` Robert Hancock
2008-01-02  4:25         ` Tejun Heo
2008-01-02  6:19           ` Jeff Garzik
2008-01-02  6:39             ` Robert Hancock
2008-01-02  6:55               ` Tejun Heo
2008-01-03  0:27                 ` Robert Hancock
2008-01-02 17:23       ` Allen Martin
2008-01-02 18:57         ` Jeff Garzik
2008-01-02 23:23           ` Allen Martin
2008-01-03  0:21             ` Robert Hancock
2008-01-03  4:14               ` Mark Lord
2008-01-03  4:17               ` Mark Lord
2008-01-03  4:54                 ` Robert Hancock
2008-01-03 15:44                   ` Mark Lord
2008-01-03 15:47                     ` Mark Lord
2008-01-03 21:13                       ` Benjamin Herrenschmidt
2008-01-04  1:43                         ` Robert Hancock
2008-01-04  5:51                           ` Benjamin Herrenschmidt
2008-01-04  0:41                   ` Allen Martin
2008-01-04  2:51                     ` Robert Hancock
2008-01-08  0:10                     ` Robert Hancock
2008-01-11 23:18                       ` Gabor Gombas
2008-01-12  1:10                         ` Robert Hancock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=477B0CFD.1030603@shaw.ca \
    --to=hancockr@shaw.ca \
    --cc=AMartin@nvidia.com \
    --cc=gombasg@sztaki.hu \
    --cc=htejun@gmail.com \
    --cc=kluo@nvidia.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pchen@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).