linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: sata_svw data corruption, strange problems
       [not found] <20080617093602.GA28140@elf.ucw.cz>
@ 2008-06-23  0:37 ` Tejun Heo
  2008-06-23  8:20   ` Pavel Machek
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2008-06-23  0:37 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list, benh, jgarzik, IDE/ATA development list

Hello,

Pavel Machek wrote:
> I see strange problems on machine with sata_svw. The machine seems to
> corrupt data every few days (ext3 error, dir index corrupted), and has
> some other very strange problems (keyboard misbehaves, pulling out
> SATA disk cures it, see
> https://bugzilla.novell.com/show_bug.cgi?id=400772 ).
> 
> Then I got to the comment 
> 
>         writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
>         /* There is a race condition in certain SATA controllers
> that can be seen when the r/w command is given to the controller
> before the host DMA is started. On a Read command, the controller
> would initiate the command to the drive even before it sees the DMA
> start. When there are very fast drives connected to the controller,
> or when the data request hits in the drive cache, there is the
> possibility that the drive returns a part or all of the requested
> data to the controller before the DMA start is issued.  In this
> case, the controller would become confused as to what to do with the
> data.  In the worst case when all the data is returned back to the
> controller, the controller could hang. In other cases it could
> return partial data returning in data corruption. This problem has
> been seen in PPC systems and can also appear on an system with very
> fast disks, where the SATA controller is sitting behind a number of
> bridges, and hence there is significant latency between the r/w
> command and the start command. */
>         /* issue r/w command if the access is to ATA*/
>         if (qc->tf.protocol == ATA_PROT_DMA)
> 
> ...and that would certainly explain what we are seeing. Are
> serverworks controllers broken by design?

The comment looks like a warning to me as the DMA engine is started
before the command is issued to the drive as explained in the next
comment.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  0:37 ` sata_svw data corruption, strange problems Tejun Heo
@ 2008-06-23  8:20   ` Pavel Machek
  2008-06-23  8:22     ` Tejun Heo
  2008-06-23  8:39     ` Andreas Schwab
  0 siblings, 2 replies; 14+ messages in thread
From: Pavel Machek @ 2008-06-23  8:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: kernel list, benh, jgarzik, IDE/ATA development list,
	Trivial patch monkey

Hi!

> > I see strange problems on machine with sata_svw. The machine seems to
> > corrupt data every few days (ext3 error, dir index corrupted), and has
> > some other very strange problems (keyboard misbehaves, pulling out
> > SATA disk cures it, see
> > https://bugzilla.novell.com/show_bug.cgi?id=400772 ).
> > 
> > Then I got to the comment 
> > 
> >         writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
> >         /* There is a race condition in certain SATA controllers
> > that can be seen when the r/w command is given to the controller
> > before the host DMA is started. On a Read command, the controller
...
> > ...and that would certainly explain what we are seeing. Are
> > serverworks controllers broken by design?
> 
> The comment looks like a warning to me as the DMA engine is started
> before the command is issued to the drive as explained in the next
> comment.

Ok, what about this?

---

Clarify data corruption comment.

Signed-off-by: Pavel Machek <pavel@suse.cz>

---
commit a362f8903eb0cdbc2ea06e0e249c97f1a64c7e1e
tree 1bfcbf9ad1b55811b71cdeb1868a41cf6b058c5d
parent 91e95912b1b48a279d0231b5c21b82388ade249e
author Pavel <pavel@amd.ucw.cz> Mon, 23 Jun 2008 10:12:47 +0200
committer Pavel <pavel@amd.ucw.cz> Mon, 23 Jun 2008 10:12:47 +0200

 drivers/ata/sata_svw.c |   38 +++++++++++++++++++++++---------------
 1 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
index 16aa683..fb13b82 100644
--- a/drivers/ata/sata_svw.c
+++ b/drivers/ata/sata_svw.c
@@ -253,21 +253,29 @@ static void k2_bmdma_start_mmio(struct a
 	/* start host DMA transaction */
 	dmactl = readb(mmio + ATA_DMA_CMD);
 	writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
-	/* There is a race condition in certain SATA controllers that can
-	   be seen when the r/w command is given to the controller before the
-	   host DMA is started. On a Read command, the controller would initiate
-	   the command to the drive even before it sees the DMA start. When there
-	   are very fast drives connected to the controller, or when the data request
-	   hits in the drive cache, there is the possibility that the drive returns a part
-	   or all of the requested data to the controller before the DMA start is issued.
-	   In this case, the controller would become confused as to what to do with the data.
-	   In the worst case when all the data is returned back to the controller, the
-	   controller could hang. In other cases it could return partial data returning
-	   in data corruption. This problem has been seen in PPC systems and can also appear
-	   on an system with very fast disks, where the SATA controller is sitting behind a
-	   number of bridges, and hence there is significant latency between the r/w command
-	   and the start command. */
-	/* issue r/w command if the access is to ATA*/
+	/* This works around possible data corruption.
+
+	   On certain SATA controllers that can be seen when the r/w
+	   command is given to the controller before the host DMA is
+	   started.
+
+	   On a Read command, the controller would initiate the
+	   command to the drive even before it sees the DMA
+	   start. When there are very fast drives connected to the
+	   controller, or when the data request hits in the drive
+	   cache, there is the possibility that the drive returns a
+	   part or all of the requested data to the controller before
+	   the DMA start is issued.  In this case, the controller
+	   would become confused as to what to do with the data.  In
+	   the worst case when all the data is returned back to the
+	   controller, the controller could hang. In other cases it
+	   could return partial data returning in data
+	   corruption. This problem has been seen in PPC systems and
+	   can also appear on an system with very fast disks, where
+	   the SATA controller is sitting behind a number of bridges,
+	   and hence there is significant latency between the r/w
+	   command and the start command. */
+	/* issue r/w command if the access is to ATA */
 	if (qc->tf.protocol == ATA_PROT_DMA)
 		ap->ops->sff_exec_command(ap, &qc->tf);
 }


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  8:20   ` Pavel Machek
@ 2008-06-23  8:22     ` Tejun Heo
  2008-06-23  8:39     ` Andreas Schwab
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2008-06-23  8:22 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, benh, jgarzik, IDE/ATA development list,
	Trivial patch monkey

Pavel Machek wrote:
> Hi!
> 
>>> I see strange problems on machine with sata_svw. The machine seems to
>>> corrupt data every few days (ext3 error, dir index corrupted), and has
>>> some other very strange problems (keyboard misbehaves, pulling out
>>> SATA disk cures it, see
>>> https://bugzilla.novell.com/show_bug.cgi?id=400772 ).
>>>
>>> Then I got to the comment 
>>>
>>>         writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
>>>         /* There is a race condition in certain SATA controllers
>>> that can be seen when the r/w command is given to the controller
>>> before the host DMA is started. On a Read command, the controller
> ...
>>> ...and that would certainly explain what we are seeing. Are
>>> serverworks controllers broken by design?
>> The comment looks like a warning to me as the DMA engine is started
>> before the command is issued to the drive as explained in the next
>> comment.
> 
> Ok, what about this?
> 
> ---
> 
> Clarify data corruption comment.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  8:20   ` Pavel Machek
  2008-06-23  8:22     ` Tejun Heo
@ 2008-06-23  8:39     ` Andreas Schwab
  2008-06-23  8:53       ` Pavel Machek
  1 sibling, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2008-06-23  8:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Tejun Heo, kernel list, benh, jgarzik, IDE/ATA development list,
	Trivial patch monkey

Pavel Machek <pavel@suse.cz> writes:

> +	   controller, the controller could hang. In other cases it
> +	   could return partial data returning in data
> +	   corruption. This problem has been seen in PPC systems and

s/returning/resulting/ ?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  8:39     ` Andreas Schwab
@ 2008-06-23  8:53       ` Pavel Machek
  2008-06-23  8:56         ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2008-06-23  8:53 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Tejun Heo, kernel list, benh, jgarzik, IDE/ATA development list,
	Trivial patch monkey

On Mon 2008-06-23 10:39:40, Andreas Schwab wrote:
> Pavel Machek <pavel@suse.cz> writes:
> 
> > +	   controller, the controller could hang. In other cases it
> > +	   could return partial data returning in data
> > +	   corruption. This problem has been seen in PPC systems and
> 
> s/returning/resulting/ ?

Fix thinko in sata_svw comment.

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
index fb13b82..d6313f1 100644
--- a/drivers/ata/sata_svw.c
+++ b/drivers/ata/sata_svw.c
@@ -269,7 +269,7 @@ static void k2_bmdma_start_mmio(struct a
 	   would become confused as to what to do with the data.  In
 	   the worst case when all the data is returned back to the
 	   controller, the controller could hang. In other cases it
-	   could return partial data returning in data
+	   could return partial data resulting in data
 	   corruption. This problem has been seen in PPC systems and
 	   can also appear on an system with very fast disks, where
 	   the SATA controller is sitting behind a number of bridges,


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  8:53       ` Pavel Machek
@ 2008-06-23  8:56         ` Tejun Heo
  2008-06-23  9:01           ` Pavel Machek
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2008-06-23  8:56 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andreas Schwab, kernel list, benh, jgarzik,
	IDE/ATA development list, Trivial patch monkey

Pavel Machek wrote:
> On Mon 2008-06-23 10:39:40, Andreas Schwab wrote:
>> Pavel Machek <pavel@suse.cz> writes:
>>
>>> +	   controller, the controller could hang. In other cases it
>>> +	   could return partial data returning in data
>>> +	   corruption. This problem has been seen in PPC systems and
>> s/returning/resulting/ ?
> 
> Fix thinko in sata_svw comment.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>

Please collapse into one patch.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  8:56         ` Tejun Heo
@ 2008-06-23  9:01           ` Pavel Machek
  2008-06-23  9:04             ` Benjamin Herrenschmidt
  2008-06-27  6:41             ` Jeff Garzik
  0 siblings, 2 replies; 14+ messages in thread
From: Pavel Machek @ 2008-06-23  9:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andreas Schwab, kernel list, benh, jgarzik,
	IDE/ATA development list, Trivial patch monkey

On Mon 2008-06-23 17:56:32, Tejun Heo wrote:
> Pavel Machek wrote:
> > On Mon 2008-06-23 10:39:40, Andreas Schwab wrote:
> >> Pavel Machek <pavel@suse.cz> writes:
> >>
> >>> +	   controller, the controller could hang. In other cases it
> >>> +	   could return partial data returning in data
> >>> +	   corruption. This problem has been seen in PPC systems and
> >> s/returning/resulting/ ?
> > 
> > Fix thinko in sata_svw comment.
> > 
> > Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> Please collapse into one patch.  Thanks.

--- 

Clarify comment in sata_svw.c.

Signed-off-by: Pavel Machek <pavel@suse.cz>

diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
index 16aa683..fb13b82 100644
--- a/drivers/ata/sata_svw.c
+++ b/drivers/ata/sata_svw.c
@@ -253,21 +253,29 @@ static void k2_bmdma_start_mmio(struct a
 	/* start host DMA transaction */
 	dmactl = readb(mmio + ATA_DMA_CMD);
 	writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
-	/* There is a race condition in certain SATA controllers that can
-	   be seen when the r/w command is given to the controller before the
-	   host DMA is started. On a Read command, the controller would initiate
-	   the command to the drive even before it sees the DMA start. When there
-	   are very fast drives connected to the controller, or when the data request
-	   hits in the drive cache, there is the possibility that the drive returns a part
-	   or all of the requested data to the controller before the DMA start is issued.
-	   In this case, the controller would become confused as to what to do with the data.
-	   In the worst case when all the data is returned back to the controller, the
-	   controller could hang. In other cases it could return partial data returning
-	   in data corruption. This problem has been seen in PPC systems and can also appear
-	   on an system with very fast disks, where the SATA controller is sitting behind a
-	   number of bridges, and hence there is significant latency between the r/w command
-	   and the start command. */
-	/* issue r/w command if the access is to ATA*/
+	/* This works around possible data corruption.
+
+	   On certain SATA controllers that can be seen when the r/w
+	   command is given to the controller before the host DMA is
+	   started.
+
+	   On a Read command, the controller would initiate the
+	   command to the drive even before it sees the DMA
+	   start. When there are very fast drives connected to the
+	   controller, or when the data request hits in the drive
+	   cache, there is the possibility that the drive returns a
+	   part or all of the requested data to the controller before
+	   the DMA start is issued.  In this case, the controller
+	   would become confused as to what to do with the data.  In
+	   the worst case when all the data is returned back to the
+	   controller, the controller could hang. In other cases it
+	   could return partial data returning in data
+	   corruption. This problem has been seen in PPC systems and
+	   can also appear on an system with very fast disks, where
+	   the SATA controller is sitting behind a number of bridges,
+	   and hence there is significant latency between the r/w
+	   command and the start command. */
+	/* issue r/w command if the access is to ATA */
 	if (qc->tf.protocol == ATA_PROT_DMA)
 		ap->ops->sff_exec_command(ap, &qc->tf);
 }


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:01           ` Pavel Machek
@ 2008-06-23  9:04             ` Benjamin Herrenschmidt
  2008-06-23  9:26               ` Pavel Machek
  2008-06-23  9:48               ` Tejun Heo
  2008-06-27  6:41             ` Jeff Garzik
  1 sibling, 2 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-23  9:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Tejun Heo, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

On Mon, 2008-06-23 at 11:01 +0200, Pavel Machek wrote:
> On Mon 2008-06-23 17:56:32, Tejun Heo wrote:
> > Pavel Machek wrote:
> > > On Mon 2008-06-23 10:39:40, Andreas Schwab wrote:
> > >> Pavel Machek <pavel@suse.cz> writes:
> > >>
> > >>> +	   controller, the controller could hang. In other cases it
> > >>> +	   could return partial data returning in data
> > >>> +	   corruption. This problem has been seen in PPC systems and
> > >> s/returning/resulting/ ?
> > > 
> > > Fix thinko in sata_svw comment.
> > > 
> > > Signed-off-by: Pavel Machek <pavel@suse.cz>
> > 
> > Please collapse into one patch.  Thanks.

Am I the only one to find Pavel variant almost as obscure as
the original one ? :-)

It should explain precisely what the workaround is. Ie. to start the
DMA there instead of where it normally is started which is the
bmdma_setup() function.

BTW. Tejun, I suppose that usually starting DMA after issuing the
command is a standard practice of legacy/sff type controllers ? Or it's
just because that's how linux did it until now ?

Ben.

> --- 
> 
> Clarify comment in sata_svw.c.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
> index 16aa683..fb13b82 100644
> --- a/drivers/ata/sata_svw.c
> +++ b/drivers/ata/sata_svw.c
> @@ -253,21 +253,29 @@ static void k2_bmdma_start_mmio(struct a
>  	/* start host DMA transaction */
>  	dmactl = readb(mmio + ATA_DMA_CMD);
>  	writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
> -	/* There is a race condition in certain SATA controllers that can
> -	   be seen when the r/w command is given to the controller before the
> -	   host DMA is started. On a Read command, the controller would initiate
> -	   the command to the drive even before it sees the DMA start. When there
> -	   are very fast drives connected to the controller, or when the data request
> -	   hits in the drive cache, there is the possibility that the drive returns a part
> -	   or all of the requested data to the controller before the DMA start is issued.
> -	   In this case, the controller would become confused as to what to do with the data.
> -	   In the worst case when all the data is returned back to the controller, the
> -	   controller could hang. In other cases it could return partial data returning
> -	   in data corruption. This problem has been seen in PPC systems and can also appear
> -	   on an system with very fast disks, where the SATA controller is sitting behind a
> -	   number of bridges, and hence there is significant latency between the r/w command
> -	   and the start command. */
> -	/* issue r/w command if the access is to ATA*/
> +	/* This works around possible data corruption.
> +
> +	   On certain SATA controllers that can be seen when the r/w
> +	   command is given to the controller before the host DMA is
> +	   started.
> +
> +	   On a Read command, the controller would initiate the
> +	   command to the drive even before it sees the DMA
> +	   start. When there are very fast drives connected to the
> +	   controller, or when the data request hits in the drive
> +	   cache, there is the possibility that the drive returns a
> +	   part or all of the requested data to the controller before
> +	   the DMA start is issued.  In this case, the controller
> +	   would become confused as to what to do with the data.  In
> +	   the worst case when all the data is returned back to the
> +	   controller, the controller could hang. In other cases it
> +	   could return partial data returning in data
> +	   corruption. This problem has been seen in PPC systems and
> +	   can also appear on an system with very fast disks, where
> +	   the SATA controller is sitting behind a number of bridges,
> +	   and hence there is significant latency between the r/w
> +	   command and the start command. */
> +	/* issue r/w command if the access is to ATA */
>  	if (qc->tf.protocol == ATA_PROT_DMA)
>  		ap->ops->sff_exec_command(ap, &qc->tf);
>  }
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:04             ` Benjamin Herrenschmidt
@ 2008-06-23  9:26               ` Pavel Machek
  2008-06-23  9:48               ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Pavel Machek @ 2008-06-23  9:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Tejun Heo, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

On Mon 2008-06-23 19:04:19, Benjamin Herrenschmidt wrote:
> On Mon, 2008-06-23 at 11:01 +0200, Pavel Machek wrote:
> > On Mon 2008-06-23 17:56:32, Tejun Heo wrote:
> > > Pavel Machek wrote:
> > > > On Mon 2008-06-23 10:39:40, Andreas Schwab wrote:
> > > >> Pavel Machek <pavel@suse.cz> writes:
> > > >>
> > > >>> +	   controller, the controller could hang. In other cases it
> > > >>> +	   could return partial data returning in data
> > > >>> +	   corruption. This problem has been seen in PPC systems and
> > > >> s/returning/resulting/ ?
> > > > 
> > > > Fix thinko in sata_svw comment.
> > > > 
> > > > Signed-off-by: Pavel Machek <pavel@suse.cz>
> > > 
> > > Please collapse into one patch.  Thanks.
> 
> Am I the only one to find Pavel variant almost as obscure as
> the original one ? :-)

At least it makes it clear that the bug should not bite in new
releases.

I'm neither native english speaker nor author of this driver, so if
you'd like to improve the comment, please do so. (I spent quite long
time wondering if we do workaround that hw problem or not, that's why
I'm trying to clarify the comment).
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:48               ` Tejun Heo
@ 2008-06-23  9:42                 ` Alan Cox
  2008-06-23 10:23                   ` Benjamin Herrenschmidt
  2008-06-23 13:05                   ` Tejun Heo
  0 siblings, 2 replies; 14+ messages in thread
From: Alan Cox @ 2008-06-23  9:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: benh, Pavel Machek, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

> > BTW. Tejun, I suppose that usually starting DMA after issuing the
> > command is a standard practice of legacy/sff type controllers ? Or it's
> > just because that's how linux did it until now ?
> 
> It's how the standard says it should be programmed.  Please take a look
> at section 3 of the following document.
> 
> http://www.centrillium-it.com/Projects/idems100.pdf
> 
> It's a non-issue for PATA ones as the host is responsible for running

It's very much an issue for PATA. If you start the DMA before time things
go wrong. The DMA has to start after the command is issued (or for ATAPI
after the command and the cdb are issued). Various ATAPI devices get
quite cross if you mess this up.

In some cases the driver code also depends upon this as we software drive
the data clocks so have to reprogram them after command issue and before
data transfer begins.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:04             ` Benjamin Herrenschmidt
  2008-06-23  9:26               ` Pavel Machek
@ 2008-06-23  9:48               ` Tejun Heo
  2008-06-23  9:42                 ` Alan Cox
  1 sibling, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2008-06-23  9:48 UTC (permalink / raw)
  To: benh
  Cc: Pavel Machek, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

Benjamin Herrenschmidt wrote:
> Am I the only one to find Pavel variant almost as obscure as
> the original one ? :-)
> 
> It should explain precisely what the workaround is. Ie. to start the
> DMA there instead of where it normally is started which is the
> bmdma_setup() function.

Well, it's better than the original which kind of directed the other
way.  :-)

> BTW. Tejun, I suppose that usually starting DMA after issuing the
> command is a standard practice of legacy/sff type controllers ? Or it's
> just because that's how linux did it until now ?

It's how the standard says it should be programmed.  Please take a look
at section 3 of the following document.

http://www.centrillium-it.com/Projects/idems100.pdf

It's a non-issue for PATA ones as the host is responsible for running
the clock and transferring data after the drive indicated readiness, so
the worst that can happen by starting the dma engine after issuing the
command is the drive waiting in ready state.

For SATA, it should work the same.  The host should hold the transfer by
not acking the data transfer request (or prefetch the data if it feels
smart and brave).  So, it's something sata_svw screwed up.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:42                 ` Alan Cox
@ 2008-06-23 10:23                   ` Benjamin Herrenschmidt
  2008-06-23 13:05                   ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2008-06-23 10:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Tejun Heo, Pavel Machek, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

On Mon, 2008-06-23 at 10:42 +0100, Alan Cox wrote:
> > > BTW. Tejun, I suppose that usually starting DMA after issuing the
> > > command is a standard practice of legacy/sff type controllers ? Or it's
> > > just because that's how linux did it until now ?
> > 
> > It's how the standard says it should be programmed.  Please take a look
> > at section 3 of the following document.
> > 
> > http://www.centrillium-it.com/Projects/idems100.pdf
> > 
> > It's a non-issue for PATA ones as the host is responsible for running
> 
> It's very much an issue for PATA. If you start the DMA before time things
> go wrong. The DMA has to start after the command is issued (or for ATAPI
> after the command and the cdb are issued). Various ATAPI devices get
> quite cross if you mess this up.
> 
> In some cases the driver code also depends upon this as we software drive
> the data clocks so have to reprogram them after command issue and before
> data transfer begins.

Might explain why those broadcom chipsets are also allergic to ATAPI
DMA :-)

Ben.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:42                 ` Alan Cox
  2008-06-23 10:23                   ` Benjamin Herrenschmidt
@ 2008-06-23 13:05                   ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2008-06-23 13:05 UTC (permalink / raw)
  To: Alan Cox
  Cc: benh, Pavel Machek, Andreas Schwab, kernel list, jgarzik,
	IDE/ATA development list, Trivial patch monkey

Alan Cox wrote:
>>> BTW. Tejun, I suppose that usually starting DMA after issuing the
>>> command is a standard practice of legacy/sff type controllers ? Or it's
>>> just because that's how linux did it until now ?
>> It's how the standard says it should be programmed.  Please take a look
>> at section 3 of the following document.
>>
>> http://www.centrillium-it.com/Projects/idems100.pdf
>>
>> It's a non-issue for PATA ones as the host is responsible for running
> 
> It's very much an issue for PATA. If you start the DMA before time things
> go wrong. The DMA has to start after the command is issued (or for ATAPI
> after the command and the cdb are issued). Various ATAPI devices get
> quite cross if you mess this up.
> 
> In some cases the driver code also depends upon this as we software drive
> the data clocks so have to reprogram them after command issue and before
> data transfer begins.

I was saying drive getting ready before command issue was non issue.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sata_svw data corruption, strange problems
  2008-06-23  9:01           ` Pavel Machek
  2008-06-23  9:04             ` Benjamin Herrenschmidt
@ 2008-06-27  6:41             ` Jeff Garzik
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff Garzik @ 2008-06-27  6:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Tejun Heo, Andreas Schwab, kernel list, benh,
	IDE/ATA development list, Trivial patch monkey

Pavel Machek wrote:
> Clarify comment in sata_svw.c.
> 
> Signed-off-by: Pavel Machek <pavel@suse.cz>
> 
> diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
> index 16aa683..fb13b82 100644
> --- a/drivers/ata/sata_svw.c
> +++ b/drivers/ata/sata_svw.c
> @@ -253,21 +253,29 @@ static void k2_bmdma_start_mmio(struct a
>  	/* start host DMA transaction */
>  	dmactl = readb(mmio + ATA_DMA_CMD);
>  	writeb(dmactl | ATA_DMA_START, mmio + ATA_DMA_CMD);
> -	/* There is a race condition in certain SATA controllers that can
> -	   be seen when the r/w command is given to the controller before the
> -	   host DMA is started. On a Read command, the controller would initiate
> -	   the command to the drive even before it sees the DMA start. When there
> -	   are very fast drives connected to the controller, or when the data request
> -	   hits in the drive cache, there is the possibility that the drive returns a part
> -	   or all of the requested data to the controller before the DMA start is issued.
> -	   In this case, the controller would become confused as to what to do with the data.
> -	   In the worst case when all the data is returned back to the controller, the
> -	   controller could hang. In other cases it could return partial data returning
> -	   in data corruption. This problem has been seen in PPC systems and can also appear
> -	   on an system with very fast disks, where the SATA controller is sitting behind a
> -	   number of bridges, and hence there is significant latency between the r/w command
> -	   and the start command. */
> -	/* issue r/w command if the access is to ATA*/
> +	/* This works around possible data corruption.
> +
> +	   On certain SATA controllers that can be seen when the r/w
> +	   command is given to the controller before the host DMA is
> +	   started.
> +
> +	   On a Read command, the controller would initiate the
> +	   command to the drive even before it sees the DMA
> +	   start. When there are very fast drives connected to the
> +	   controller, or when the data request hits in the drive
> +	   cache, there is the possibility that the drive returns a
> +	   part or all of the requested data to the controller before
> +	   the DMA start is issued.  In this case, the controller
> +	   would become confused as to what to do with the data.  In
> +	   the worst case when all the data is returned back to the
> +	   controller, the controller could hang. In other cases it
> +	   could return partial data returning in data
> +	   corruption. This problem has been seen in PPC systems and
> +	   can also appear on an system with very fast disks, where
> +	   the SATA controller is sitting behind a number of bridges,
> +	   and hence there is significant latency between the r/w
> +	   command and the start command. */
> +	/* issue r/w command if the access is to ATA */
>  	if (qc->tf.protocol == ATA_PROT_DMA)
>  		ap->ops->sff_exec_command(ap, &qc->tf);
>  }
> 
> 

applied


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-06-27  6:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080617093602.GA28140@elf.ucw.cz>
2008-06-23  0:37 ` sata_svw data corruption, strange problems Tejun Heo
2008-06-23  8:20   ` Pavel Machek
2008-06-23  8:22     ` Tejun Heo
2008-06-23  8:39     ` Andreas Schwab
2008-06-23  8:53       ` Pavel Machek
2008-06-23  8:56         ` Tejun Heo
2008-06-23  9:01           ` Pavel Machek
2008-06-23  9:04             ` Benjamin Herrenschmidt
2008-06-23  9:26               ` Pavel Machek
2008-06-23  9:48               ` Tejun Heo
2008-06-23  9:42                 ` Alan Cox
2008-06-23 10:23                   ` Benjamin Herrenschmidt
2008-06-23 13:05                   ` Tejun Heo
2008-06-27  6:41             ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).